Discussion:
[Bug-apl] Regex support
Elias Mårtenson
2017-09-20 03:59:47 UTC
Permalink
On several occasions, I have felt that built-in regex support in GNU APL
would be very helpful.

Implementing it should be rather simple, but I'd like to discuss how such
an API should look in order for it to be as useful as possible.

I was thinking of the following form:

regex ⎕Regex string

The way I envision this to work, is to have the function return ⍬ if there
is no match, or a string containing the match, if there is one:

* 'f..' ⎕Regex 'xzooy'*
┏⊖┓
┃0┃
┗━┛
* 'f..' ⎕Regex 'xfooy'*
'foo'

If the regex has subexpressions, those matches should be returned as
individual strings:

* '([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'*
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛

This would be a very useful API, and reasonably easy to implement by simply
calling into the standard regcomp() call:
http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html

What do you think? Is this a reasonable way to implement it? Any
suggestions about alternative API's?

Regards,
Elias
Giuseppe Cocomazzi
2017-09-20 10:27:47 UTC
Permalink
Hi,
I also think that adding the support would be very useful. However, I
would definitely avoid PCRE and backreference support. I think the
best solution would be to just add a basic and efficient NFA-based
implementation (the defacto original implementation for Unix). For
more information about the correct way to implement RE:
https://swtch.com/~rsc/regexp/

As for the API itself, I agree with Elias that maybe a simple
interface is the way to go. I would also prefer not to have any
support for modifiers (not even IGNORECASE) and definitely avoid the
MULTILINE horror. If we opt for the NFA implementation then, the
builtin ⎕Regex (or ⎕RE) could be universally used not only for strings
but for numeric data as well. That, in conjuction with APL arrays,
would ultimately be a killer feature (I am not aware of such a feature
in other languages).

Best,

Giuseppe Cocomazzi
http://sbudella.altervista.org
Post by Elias MÃ¥rtenson
On several occasions, I have felt that built-in regex support in GNU APL
would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an
API should look in order for it to be as useful as possible.
regex ⎕Regex string
The way I envision this to work, is to have the function return ⍬ if there
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
If the regex has subexpressions, those matches should be returned as
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply
http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
What do you think? Is this a reasonable way to implement it? Any suggestions
about alternative API's?
Regards,
Elias
Elias Mårtenson
2017-09-20 10:40:14 UTC
Permalink
Regardless whether things like casing should be supported, the problem is
that full Unicode support is required. APL is one of those languages where
you just can't get away with not supporting it. PCRE does support it, and
unfortunately I don't think POSIX regexp does.

Are there any any alternatives?

Regards,
Elias
Post by Giuseppe Cocomazzi
Hi,
I also think that adding the support would be very useful. However, I
would definitely avoid PCRE and backreference support. I think the
best solution would be to just add a basic and efficient NFA-based
implementation (the defacto original implementation for Unix). For
https://swtch.com/~rsc/regexp/
As for the API itself, I agree with Elias that maybe a simple
interface is the way to go. I would also prefer not to have any
support for modifiers (not even IGNORECASE) and definitely avoid the
MULTILINE horror. If we opt for the NFA implementation then, the
builtin ⎕Regex (or ⎕RE) could be universally used not only for strings
but for numeric data as well. That, in conjuction with APL arrays,
would ultimately be a killer feature (I am not aware of such a feature
in other languages).
Best,
Giuseppe Cocomazzi
http://sbudella.altervista.org
Post by Elias MÃ¥rtenson
On several occasions, I have felt that built-in regex support in GNU APL
would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how
such an
Post by Elias MÃ¥rtenson
API should look in order for it to be as useful as possible.
regex ⎕Regex string
The way I envision this to work, is to have the function return ⍬ if
there
Post by Elias MÃ¥rtenson
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
If the regex has subexpressions, those matches should be returned as
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by
simply
Post by Elias MÃ¥rtenson
http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
What do you think? Is this a reasonable way to implement it? Any
suggestions
Post by Elias MÃ¥rtenson
about alternative API's?
Regards,
Elias
Juergen Sauermann
2017-09-20 19:47:29 UTC
Permalink
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">Hi Elias,<br>
<br>
I am generally in favour of supporting regular expressions in GNU
APL.<br>
<br>
We should do that in a way that is compatible with the way in
which the most commonly used libraries<br>
do that (even if they are lacking some features that more exotic
libraries may have. Unfortunately I do not<br>
have a full overview of all (or even any) existing libraries. I
personally love grep and hate perl (the latter not<br>
only because of their regexes).<br>
<br>
I would like to avoid constructs like <b>s/aaa/bbb/</b> where
operations are kind of text-encoded into strings.<br>
That is, IMHO, a  hack-ish programming style and should be
replaced by a more APL-alike syntax such as<br>
<font face="Courier New, Courier, monospace"><b>'aaa' ⎕REX['s']
'bbb'</b></font> or maybe <font face="Courier New, Courier,
monospace"><b>'s' ⎕REX 'aaa' 'bbb'</b></font>. <br>
<br>
Or, if the number of operations is small (perl seems to have only
2, not counting the translate which is already<br>
covered by other APL functions), then we could also have different
⎕-functions for them and thus avoiding a<br>
third argument.<br>
<br>
Everybody else, please feel invited to join the discussion.<br>
<br>
Best Regards,<br>
Jürgen Sauermann<br>
</font><br>
<br>
<div class="moz-cite-prefix">On 09/20/2017 05:59 AM, Elias Mårtenson
wrote:<br>
</div>
<blockquote
cite="mid:CADtN0WKaM6fJFJ4d3A5pbyHME9ZD_ZJOD-whQdKas=***@mail.gmail.com"
type="cite">
<div dir="ltr">On several occasions, I have felt that built-in
regex support in GNU APL would be very helpful.
<div><br>
</div>
<div>Implementing it should be rather simple, but I'd like to
discuss how such an API should look in order for it to be as
useful as possible.</div>
<div><br>
</div>
<div>I was thinking of the following form:</div>
<div><br>
</div>
<div><font face="monospace, monospace">      regex ⎕Regex string</font></div>
<div><br>
</div>
<div>The way I envision this to work, is to have the function
return ⍬ if there is no match, or a string containing the
match, if there is one:</div>
<div><br>
</div>
<div>
<div><font face="monospace, monospace"><b>      'f..' ⎕Regex
'xzooy'</b><br>
</font></div>
<div><font face="monospace, monospace">┏⊖┓</font></div>
<div><font face="monospace, monospace">┃0┃</font></div>
<div><font face="monospace, monospace">┗━┛</font></div>
</div>
<div><font face="monospace, monospace"><b>      'f..' ⎕Regex
'xfooy'</b><br>
</font></div>
<div><font face="monospace, monospace">'foo'</font></div>
<div><br>
</div>
<div>If the regex has subexpressions, those matches should be
returned as individual strings:</div>
<div><br>
</div>
<div><font face="monospace, monospace"><b>     
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'</b></font></div>
<div><font face="monospace, monospace">┏→━━━━━━━━━━━━━━━┓<br>
</font></div>
<div><font face="monospace, monospace">┃"2017" "01" "02"┃</font></div>
<div><font face="monospace, monospace">┗∊━━━━━━━━━━━━━━━┛</font></div>
<div><br>
</div>
<div>This would be a very useful API, and reasonably easy to
implement by simply calling into the standard <font
face="monospace, monospace">regcomp()</font> call: <a
moz-do-not-send="true"
href="http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html">http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html</a></div>
<div><br>
</div>
<div>What do you think? Is this a reasonable way to implement
it? Any suggestions about alternative API's?</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
</blockquote>
<br>
</body>
</html>
Xiao-Yong Jin
2017-09-20 20:12:40 UTC
Permalink
An APL wrapper (⎕regexp[OP]) of a simple API like this would be great, (rune means unicode)

https://9fans.github.io/plan9port/man/man3/regexp.html

One can build more APL functions out of these without much performance penalty.

On the other hand, if there is an DFA implementation provided by APL (c.f. J's dyadic ;:)

http://www.jsoftware.com/help/dictionary/d332.htm

one can probably write the regular expression engine within an APL function with minimal performance lost.
Post by Juergen Sauermann
Hi Elias,
I am generally in favour of supporting regular expressions in GNU APL.
We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).
I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.
Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.
Everybody else, please feel invited to join the discussion.
Best Regards,
Jürgen Sauermann
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.
regex ⎕Regex string
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?
Regards,
Elias
e***@gmx.com
2017-09-20 20:30:25 UTC
Permalink
<mumble> anyone who loves grep and hates perl (and i hope java too) can't be all bad </mumble>

using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?

On Wed, 20 Sep 2017 21:47:29 +0200
Post by Juergen Sauermann
Hi Elias,
I am generally in favour of supporting regular expressions in GNU APL.
We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).
I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a  hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.
Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.
Everybody else, please feel invited to join the discussion.
Best Regards,
Jürgen Sauermann
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.
      regex ⎕Regex string
      'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
      'f..' ⎕Regex 'xfooy'
'foo'
      '([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?
Regards,
Elias
Peter Teeson
2017-09-21 02:19:22 UTC
Permalink
It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)

Right now we already have a proliferation of Quad functions not to mention lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.

Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.

Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?

Just my 02¢

respect

Peter
Post by e***@gmx.com
<mumble> anyone who loves grep and hates perl (and i hope java too) can't be all bad </mumble>
using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?
On Wed, 20 Sep 2017 21:47:29 +0200
Post by Juergen Sauermann
Hi Elias,
I am generally in favour of supporting regular expressions in GNU APL.
We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).
I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.
Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.
Everybody else, please feel invited to join the discussion.
Best Regards,
Jürgen Sauermann
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.
regex ⎕Regex string
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?
Regards,
Elias
Xiao-Yong Jin
2017-09-21 03:38:15 UTC
Permalink
Post by Peter Teeson
(These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)
This kind of argument for APL is exactly why Fortran is still alive and well.
Elias Mårtenson
2017-09-21 10:09:19 UTC
Permalink
I've implemented the bare minimal needed to get regexes working through
a ⎕RE function. I've attached the diff.

I really need JÃŒrgen to take a look at this, since my code that constructs
the return value cannot possibly be correct. There must be a better way to
handle this which does not involve conversion back and forth between
std::string.

Also, I have the result in an UTF-8-encoded C string, and I try to create
an UTF8_string from it like this:

Value_P field_value(UTF8_string(field.c_str()), LOC);

However, when I test this in APL I get the following result:

'(..)..(..)$' ⎕RE 'sdklfjfj⍉'
┏→━━━━━━━━━━┓
┃"lf" "jâ\215\211"┃
┗∊━━━━━━━━━━┛

It seems the UTF-8 conversion is not done correctly by the UTF8_string
constructor. What did I do wrong?

Regards,
Elias
Post by Xiao-Yong Jin
Post by Peter Teeson
(These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)
This kind of argument for APL is exactly why Fortran is still alive and well.
Juergen Sauermann
2017-09-21 11:39:21 UTC
Permalink
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">Hi Elias,<br>
<br>
the UTF8_constructors look OK, but it can be tricky to properly
interpret indices (the elements of sub in your code) of<br>
UTF8-encoded strings (i.e whether they mean code points or byte
offsets).<br>
<br>
My feeling is that you should avoid UTF8_strings completely and go
for the UTF32 option of the library (assuming that<br>
UTF32 are codepoints encoded as 32 bit integers). APL character
strings are almost UTF32 strings (except for gaps between<br>
the codepoints) and they avoid all the bits shifting needed for
UTF8 strings.<br>
<br>
Best Regards,<br>
/// Jürgen<br>
</font><br>
<br>
<div class="moz-cite-prefix">On 09/21/2017 12:09 PM, Elias Mårtenson
wrote:<br>
</div>
<blockquote
cite="mid:CADtN0WLacxzgkRAipScM_XSiY2_Df+nWTwMOcwrZKL2-***@mail.gmail.com"
type="cite">
<div dir="ltr">I've implemented the bare minimal needed to get
regexes working through a ⎕RE function. I've attached the diff.
<div><br>
</div>
<div>I really need Jürgen to take a look at this, since my code
that constructs the return value cannot possibly be correct.
There must be a better way to handle this which does not
involve conversion back and forth between std::string.</div>
<div><br>
</div>
<div>Also, I have the result in an UTF-8-encoded C string, and I
try to create an <font face="monospace, monospace">UTF8_string</font>
from it like this:</div>
<div><br>
</div>
<div><font face="monospace, monospace">    Value_P
field_value(UTF8_string(field.c_str()), LOC);</font><br>
</div>
<div><br>
</div>
<div>However, when I test this in APL I get the following
result:</div>
<div><br>
</div>
<div>
<div><font face="monospace, monospace">      '(..)..(..)$' ⎕RE
'sdklfjfj⍉'</font></div>
<div><font face="monospace, monospace">┏→━━━━━━━━━━┓<br>
</font></div>
<div><font face="monospace, monospace">┃"lf" "jâ\215\211"┃</font></div>
<div><font face="monospace, monospace">┗∊━━━━━━━━━━┛</font></div>
<div><br>
</div>
<div>It seems the UTF-8 conversion is not done correctly by
the <font face="monospace, monospace">UTF8_string</font>
constructor. What did I do wrong?</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias      </div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 21 September 2017 at 11:38,
Xiao-Yong Jin <span dir="ltr">&lt;<a moz-do-not-send="true"
href="mailto:***@gmail.com" target="_blank">***@gmail.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class=""><br>
&gt; On Sep 20, 2017, at 9:19 PM, Peter Teeson &lt;<a
moz-do-not-send="true"
href="mailto:***@icloud.com">***@icloud.com</a>&gt;
wrote:<br>
&gt;<br>
&gt; (These days performance can hardly be a compelling
argument<br>
&gt; with multiple many-core CPU chips.)<br>
<br>
</span>This kind of argument for APL is exactly why Fortran
is still alive and well.<br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>
Elias Mårtenson
2017-09-22 10:28:24 UTC
Permalink
I made the changes needed to use UTF-32 instead. It turned out that the
PCRE version 1 API I was using does not properly support UTF-32 patterns
(only match data). Thus, I changed the code to use version 2 instead.

I have attached the two files that I changed. It works, as can be seen in
the below example, but it's nowhere near complete.

* '(..)..(..)⍱$' ⎕RE "footesting⌜⍱"*
┏→━━━━━━━━┓
┃"st" "g⌜"┃
┗∊━━━━━━━━┛

Now, there are two changes I would like to see:

- If the right-hand argument is an array of strings, the pattern should
be applied to all strings, collecting the results into a 2D array. This
will be quite efficient, since the pattern only needs to be compiled once.
- I'd like an axis-argument with options. One of those options should be
a flag that causes a mismatch to yield an error instead of ⍬. This would be
useful when the regex check is used to extract data out of data which is
expected to follow a given pattern (think one-liners in interactive mode).

The reason I haven't implemented these myself is because I find the current
code to be absolutely awful, especially with all the duplicated code to
deallocate PCRE structures. In Lisp I'd use an UNWIND-PROTECT (or
try/finally in Java), but in C++ I think I have to declare a new class with
a destructor to handle this, correct? Is there anyone who would like to
clean this up?

Regards,
Elias

On 21 September 2017 at 19:39, Juergen Sauermann <
Post by Juergen Sauermann
Hi Elias,
the UTF8_constructors look OK, but it can be tricky to properly interpret
indices (the elements of sub in your code) of
UTF8-encoded strings (i.e whether they mean code points or byte offsets).
My feeling is that you should avoid UTF8_strings completely and go for the
UTF32 option of the library (assuming that
UTF32 are codepoints encoded as 32 bit integers). APL character strings
are almost UTF32 strings (except for gaps between
the codepoints) and they avoid all the bits shifting needed for UTF8
strings.
Best Regards,
/// JÃŒrgen
I've implemented the bare minimal needed to get regexes working through
a ⎕RE function. I've attached the diff.
I really need JÃŒrgen to take a look at this, since my code that constructs
the return value cannot possibly be correct. There must be a better way to
handle this which does not involve conversion back and forth between
std::string.
Also, I have the result in an UTF-8-encoded C string, and I try to create
Value_P field_value(UTF8_string(field.c_str()), LOC);
'(..)..(..)$' ⎕RE 'sdklfjfj⍉'
┏→━━━━━━━━━━┓
┃"lf" "jâ\215\211"┃
┗∊━━━━━━━━━━┛
It seems the UTF-8 conversion is not done correctly by the UTF8_string
constructor. What did I do wrong?
Regards,
Elias
Post by Xiao-Yong Jin
Post by Peter Teeson
(These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)
This kind of argument for APL is exactly why Fortran is still alive and well.
Blake McBride
2017-09-22 13:27:36 UTC
Permalink
+1
Post by Peter Teeson
It so happens that 2 of my former colleagues from I.P.Sharp came visiting
today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my
office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can
hardly be a compelling argument
with multiple many-core CPU chips.)
Right now we already have a proliferation of Quad functions not to mention
lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.
Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.
Jurgen’s guiding light is to make GNUAPL an implementation that met the
ISO and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?
Just my 02¢
respect
Peter
Juergen Sauermann
2017-09-22 15:48:41 UTC
Permalink
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">Hi Peter,<br>
<br>
I mostly agree with your concerns. As you may have noticed, I
already regretted some of the things that I implemented earlier<br>
in GNU APL. On the other hand, you also see on the GNU APL mailing
list the proposals of other GNU APL users to implement<br>
certain things. I haven't really found a way out of this dilemma.<br>
<br>
My current thinking is this:<br>
<br>
1. If a feature affects the APL language itself then it is
probably a bad thing to do. Examples for this are, IMHO, changing
the scoping<br>
    of variables, lexical binding and stuff like that. As useful
as these may be in other languages, my feeling is that they would
turn GNU<br>
   APL into something else which is no longer APL. For example, I
am a big fan of the powerful matching capabilities in Erlang but I<br>
   believe as useful as they may be, they simply do not belong
into GNU APL (or any APL for that matter). Those who really need
that (as<br>
   opposed to only believing it would improve GNU APL) might be
better off with one of the successors of APL.<br>
<br>
2. Some areas, most notably FILE I/O have traditionally not been
part of the APL language itself, but are unfortunately needed in
the<br>
    real world. I am equally concerned about a proliferation of
quad functions (and most other APLs are more keen than GNU APL to<br>
   move in that direction). However, regular expressions are a
more fundamental concept than other "nice to have but never used"<br>
   features, so that adding them as a ⎕-function should not do too
much harm. Nobody is forced to use a ⎕-function that he or she<br>
   does not know or like. And the only thing that gets more
complicated when a ⎕ function is added is the implementation and
not<br>
   the language.<br>
<br>
Rational number, BTW, have to be explicitly ./configured and are
not present in the default GNU APL. Same for parallel APL. I have<br>
seen that some users are experimenting with these features and I
believe we should allow that because chances are that these<br>
experiments result in something valuable some day. Who knows? <br>
<br>
Best Regards,<br>
/// Jürgen<br>
<br>
</font><br>
<div class="moz-cite-prefix">On 09/21/2017 04:19 AM, Peter Teeson
wrote:<br>
</div>
<blockquote
cite="mid:2466E362-4C6F-47DE-B769-***@icloud.com"
type="cite">
<pre wrap="">It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)

Right now we already have a proliferation of Quad functions not to mention lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.

Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.

Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?

Just my 02¢

respect

Peter
</pre>
<blockquote type="cite">
<pre wrap="">On Sep 20, 2017, at 4:30 PM, <a class="moz-txt-link-abbreviated" href="mailto:***@gmx.com">***@gmx.com</a> wrote:

&lt;mumble&gt; anyone who loves grep and hates perl (and i hope java too) can't be all bad &lt;/mumble&gt;

using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?

On Wed, 20 Sep 2017 21:47:29 +0200
Juergen Sauermann <a class="moz-txt-link-rfc2396E" href="mailto:***@t-online.de">&lt;***@t-online.de&gt;</a> wrote:

</pre>
<blockquote type="cite">
<pre wrap="">Hi Elias,

I am generally in favour of supporting regular expressions in GNU APL.

We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).

I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.

Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.

Everybody else, please feel invited to join the discussion.

Best Regards,
Jürgen Sauermann


On 09/20/2017 05:59 AM, Elias Mårtenson wrote:
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.

Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.

I was thinking of the following form:

regex ⎕Regex string

The way I envision this to work, is to have the function return ⍬ if there is no match, or a string containing the match, if there is one:

'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'

If the regex has subexpressions, those matches should be returned as individual strings:

'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛

This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: <a class="moz-txt-link-freetext" href="http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html">http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html</a>

What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?

Regards,
Elias

</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<pre wrap="">


</pre>
</blockquote>
<br>
</body>
</html>
Peter Teeson
2017-09-22 21:55:53 UTC
Permalink
Hi JÃŒrgen:
Thanks for your usual gracious reply. I understand the points you present.

Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language.
IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes
are really separate, but necessary, components.

You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter.
Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals.

As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and
extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all).
Separating the interpreter from the rest allows for different “models” - OS’s.

What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box
- desirable as that might be.

I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve.

FWIW

respect
.

Peter
Post by Juergen Sauermann
Hi Peter,
I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier
in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement
certain things. I haven't really found a way out of this dilemma.
1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping
of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU
APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I
believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as
opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL.
2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the
real world. I am equally concerned about a proliferation of quad functions (and most other APLs are more keen than GNU APL to
move in that direction). However, regular expressions are a more fundamental concept than other "nice to have but never used"
features, so that adding them as a ⎕-function should not do too much harm. Nobody is forced to use a ⎕-function that he or she
does not know or like. And the only thing that gets more complicated when a ⎕ function is added is the implementation and not
the language.
Rational number, BTW, have to be explicitly ./configured and are not present in the default GNU APL. Same for parallel APL. I have
seen that some users are experimenting with these features and I believe we should allow that because chances are that these
experiments result in something valuable some day. Who knows?
Best Regards,
/// JÃŒrgen
Post by Peter Teeson
It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)
Right now we already have a proliferation of Quad functions not to mention lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.
Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.
Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?
Just my 02¢
respect
Peter
Post by e***@gmx.com
<mumble> anyone who loves grep and hates perl (and i hope java too) can't be all bad </mumble>
using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?
On Wed, 20 Sep 2017 21:47:29 +0200
Post by Juergen Sauermann
Hi Elias,
I am generally in favour of supporting regular expressions in GNU APL.
We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).
I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.
Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.
Everybody else, please feel invited to join the discussion.
Best Regards,
JÃŒrgen Sauermann
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.
regex ⎕Regex string
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html <http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html>
What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?
Regards,
Elias
Giuseppe Cocomazzi
2017-09-24 17:23:45 UTC
Permalink
Hi list,
Post by Elias MÃ¥rtenson
The way I envision this to work, is to have the function return ⍬ if there
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
All other operations on the matching groups or on the matching string
(substitution, removal) should be performed with APL's already
powerful set of functions. Regex operations are just one example of
the many subtle popular DSLs whose "patterns" we should resist the
temptation of capturing into quads, in my humble opinion.

Best,

--
Giuseppe Cocomazzi
http://sbudella.altervista.org
Post by Elias MÃ¥rtenson
Thanks for your usual gracious reply. I understand the points you present.
Perhaps my perspective is too narrow? The way I see it the key “module” is
the interpreter of the language.
IMHO display of the results, means to enter and store data of various types,
providing an environment where the interpreter executes
are really separate, but necessary, components.
You mentioned that rationals need to be explicitly configured. Personally I
would prefer that approach rather than encrusting the interpreter.
Each capability added to the interpreter just complicates it - of course not
for you as the author but for us lesser mortals.
As you may recall I am on a Macintosh. One project I pickup and work on from
time to time is to try and
extract only the interpreter and then use the Mac OS facilities for the
rest. Of course that is only of use to other Mac users (if at all).
Separating the interpreter from the rest allows for different “models” -
OS’s.
What we have right now is a monolithic code base which becomes more fragile
with each added feature, version of GCC, or HW box
- desirable as that might be.
I suppose what I am suggesting is that perhaps it’s time to take a fresh
look at the project architecture and ask ourselves if we can improve.
FWIW
respect….
Peter
On Sep 22, 2017, at 11:48 AM, Juergen Sauermann
Hi Peter,
I mostly agree with your concerns. As you may have noticed, I already
regretted some of the things that I implemented earlier
in GNU APL. On the other hand, you also see on the GNU APL mailing list the
proposals of other GNU APL users to implement
certain things. I haven't really found a way out of this dilemma.
1. If a feature affects the APL language itself then it is probably a bad
thing to do. Examples for this are, IMHO, changing the scoping
of variables, lexical binding and stuff like that. As useful as these
may be in other languages, my feeling is that they would turn GNU
APL into something else which is no longer APL. For example, I am a big
fan of the powerful matching capabilities in Erlang but I
believe as useful as they may be, they simply do not belong into GNU APL
(or any APL for that matter). Those who really need that (as
opposed to only believing it would improve GNU APL) might be better off
with one of the successors of APL.
2. Some areas, most notably FILE I/O have traditionally not been part of the
APL language itself, but are unfortunately needed in the
real world. I am equally concerned about a proliferation of quad
functions (and most other APLs are more keen than GNU APL to
move in that direction). However, regular expressions are a more
fundamental concept than other "nice to have but never used"
features, so that adding them as a ⎕-function should not do too much
harm. Nobody is forced to use a ⎕-function that he or she
does not know or like. And the only thing that gets more complicated when
a ⎕ function is added is the implementation and not
the language.
Rational number, BTW, have to be explicitly ./configured and are not present
in the default GNU APL. Same for parallel APL. I have
seen that some users are experimenting with these features and I believe we
should allow that because chances are that these
experiments result in something valuable some day. Who knows?
Best Regards,
/// Jürgen
It so happens that 2 of my former colleagues from I.P.Sharp came visiting
today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my
office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can
hardly be a compelling argument
with multiple many-core CPU chips.)
Right now we already have a proliferation of Quad functions not to mention
lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.
Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.
Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO
and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?
Just my 02¢
respect
Peter
<mumble> anyone who loves grep and hates perl (and i hope java too) can't be
all bad </mumble>
using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would
monadic ⎕REX['s'] 'bbb' return?
On Wed, 20 Sep 2017 21:47:29 +0200
Hi Elias,
I am generally in favour of supporting regular expressions in GNU APL.
We should do that in a way that is compatible with the way in which the most
commonly used libraries
do that (even if they are lacking some features that more exotic libraries
may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally
love grep and hate perl (the latter not
only because of their regexes).
I would like to avoid constructs like s/aaa/bbb/ where operations are kind
of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a
more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.
Or, if the number of operations is small (perl seems to have only 2, not
counting the translate which is already
covered by other APL functions), then we could also have different
⎕-functions for them and thus avoiding a
third argument.
Everybody else, please feel invited to join the discussion.
Best Regards,
Jürgen Sauermann
On several occasions, I have felt that built-in regex support in GNU APL
would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an
API should look in order for it to be as useful as possible.
regex ⎕Regex string
The way I envision this to work, is to have the function return ⍬ if there
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply
http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
What do you think? Is this a reasonable way to implement it? Any suggestions
about alternative API's?
Regards,
Elias
Hans-Peter Sorge
2017-09-29 09:41:25 UTC
Permalink
Hi Jürgen,

The construct regex ⎕Regex string looks OK to me.

However having the following regex patterns

match: 'regexm' ['modifier'] ⎕Regex string and
substitute: 'regexs' 'regexr' ['modifier'] ⎕Regex string

the patterns
'regexm' 'modifier' ⎕Regex string and
'regexs' 'regexr' ⎕Regex string
are contradictory.

Either
'm' 'regexm' ['modifier'] ⎕Regex string and
's' 'regexs' 'regexr' ['modifier'] ⎕Regex string

or
'regexm' '' ⎕Regex string and
'regexs' 'regexr' '' ⎕Regex string
would solve this syntactical problem. But typing is a bit tedious.


So I would rather go with regex =^= 'm/.../mod' and 's/..../..../mod'

which makes expressions like
(⊂'s/..../..../mod') ⎕Regex ¨ string string string
easier to read.

(⊂'m/..../mod') ⎕Regex ¨ string string string
should return 1 for match and 0 for non match to be used in a subsequent
scan.

...... (⊂'m/..../mod') ⎕Regexi ¨ string string string
could return the indexes as vector of vectors using selective
specification: (matching_index non_matching_index) ← .......

....... (⊂'m/..../mod') ⎕Regexc ¨ string string string
should return the content as vector of vectors using selective
specification:
(matching_content non_matching_content) ← .......

and further:
dates ← '2017-01-02' '2017-01-03'
(⊂'s/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/') ⎕Regex ¨ dates
results in
('2017' '01' '02') ('2017' '01' '03')

and
dates ← ⊃ '2017-01-02' '2017-01-03'
's/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/' ⎕Regex dates
results in
'2017' '01' '02'
'2017' '01' '03'


My be I prefer ⎕Regex['i'] over ⎕Regexi ->> ⎕Regex['option' 'option']
to handle various transform alternatives from regex results to apl.

FWIIW

Hans-Peter Sorge
Post by Peter Teeson
Thanks for your usual gracious reply. I understand the points you present.
Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language.
IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes
are really separate, but necessary, components.
You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter.
Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals.
As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and
extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all).
Separating the interpreter from the rest allows for different “models” - OS’s.
What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box
- desirable as that might be.
I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve.
FWIW
respect….
Peter
Post by Juergen Sauermann
Hi Peter,
I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier
in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement
certain things. I haven't really found a way out of this dilemma.
1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping
of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU
APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I
believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as
opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL.
2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the
real world. I am equally concerned about a proliferation of quad functions (and most other APLs are more keen than GNU APL to
move in that direction). However, regular expressions are a more fundamental concept than other "nice to have but never used"
features, so that adding them as a ⎕-function should not do too much harm. Nobody is forced to use a ⎕-function that he or she
does not know or like. And the only thing that gets more complicated when a ⎕ function is added is the implementation and not
the language.
Rational number, BTW, have to be explicitly ./configured and are not present in the default GNU APL. Same for parallel APL. I have
seen that some users are experimenting with these features and I believe we should allow that because chances are that these
experiments result in something valuable some day. Who knows?
Best Regards,
/// Jürgen
Post by Peter Teeson
It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)
Right now we already have a proliferation of Quad functions not to mention lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.
Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.
Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?
Just my 02¢
respect
Peter
Post by e***@gmx.com
<mumble> anyone who loves grep and hates perl (and i hope java too) can't be all bad </mumble>
using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?
On Wed, 20 Sep 2017 21:47:29 +0200
Post by Juergen Sauermann
Hi Elias,
I am generally in favour of supporting regular expressions in GNU APL.
We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).
I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.
Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.
Everybody else, please feel invited to join the discussion.
Best Regards,
Jürgen Sauermann
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.
Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.
regex ⎕Regex string
'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'
'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛
This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html <http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html>
What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?
Regards,
Elias
Juergen Sauermann
2017-10-10 17:29:36 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font face="Helvetica, Arial, sans-serif">Hi Peter,<br>
<br>
the current syntax is A ⎕RE [X] B where A is the matching RE, B is
the subject<br>
(sthe string being matched) and X is matching flags.<br>
<br>
I never liked it when programs lumped these strings together into
a single string (or argument).<br>
<br>
What hasn't been addressed yet is substitution as opposed to
matching. I tend to believe<br>
that APL2 selective specification of some kind would be an elegant
solution, but details<br>
have not yet been worked out.<br>
<br>
Best Regards,<br>
/// Jürgen<br>
<br>
</font><br>
<div class="moz-cite-prefix">On 09/29/2017 11:41 AM, Hans-Peter
Sorge wrote:<br>
</div>
<blockquote type="cite"
cite="mid:9dc8d190-aa1f-d7d3-9254-***@netscape.net">
<pre wrap="">Hi Jürgen,

The construct regex ⎕Regex string looks OK to me.

However having the following regex patterns

match: 'regexm' ['modifier'] ⎕Regex string and
substitute: 'regexs' 'regexr' ['modifier'] ⎕Regex string

the patterns
'regexm' 'modifier' ⎕Regex string and
'regexs' 'regexr' ⎕Regex string
are contradictory.

Either
'm' 'regexm' ['modifier'] ⎕Regex string and
's' 'regexs' 'regexr' ['modifier'] ⎕Regex string

or
'regexm' '' ⎕Regex string and
'regexs' 'regexr' '' ⎕Regex string
would solve this syntactical problem. But typing is a bit tedious.


So I would rather go with regex =^= 'm/.../mod' and 's/..../..../mod'

which makes expressions like
(⊂'s/..../..../mod') ⎕Regex ¨ string string string
easier to read.

(⊂'m/..../mod') ⎕Regex ¨ string string string
should return 1 for match and 0 for non match to be used in a subsequent
scan.

...... (⊂'m/..../mod') ⎕Regexi ¨ string string string
could return the indexes as vector of vectors using selective
specification: (matching_index non_matching_index) ← .......

....... (⊂'m/..../mod') ⎕Regexc ¨ string string string
should return the content as vector of vectors using selective
specification:
(matching_content non_matching_content) ← .......

and further:
dates ← '2017-01-02' '2017-01-03'
(⊂'s/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/') ⎕Regex ¨ dates
results in
('2017' '01' '02') ('2017' '01' '03')

and
dates ← ⊃ '2017-01-02' '2017-01-03'
's/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/' ⎕Regex dates
results in
'2017' '01' '02'
'2017' '01' '03'


My be I prefer ⎕Regex['i'] over ⎕Regexi -&gt;&gt; ⎕Regex['option' 'option']
to handle various transform alternatives from regex results to apl.

FWIIW

Hans-Peter Sorge


Am 22.09.2017 um 23:55 schrieb Peter Teeson:
</pre>
<blockquote type="cite">
<pre wrap="">Hi Jürgen:
Thanks for your usual gracious reply. I understand the points you present.

Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language.
IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes
are really separate, but necessary, components.

You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter.
Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals.

As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and
extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all).
Separating the interpreter from the rest allows for different “models” - OS’s.

What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box
- desirable as that might be.

I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve.

FWIW

respect….

Peter </pre> <blockquote type="cite"> <pre wrap="">On Sep 22, 2017, at 11:48 AM, Juergen Sauermann <a class="moz-txt-link-rfc2396E" href="mailto:***@t-online.de">&lt;***@t-online.de&gt;</a> wrote:
Hi Peter,

I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier
in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement
certain things. I haven't really found a way out of this dilemma.

My current thinking is this:

1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping
of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU
APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I
believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as
opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL.

2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the
real world. I am equally concerned about a proliferation of quad functions (and most other APLs are more keen than GNU APL to
move in that direction). However, regular expressions are a more fundamental concept than other "nice to have but never used"
features, so that adding them as a ⎕-function should not do too much harm. Nobody is forced to use a ⎕-function that he or she
does not know or like. And the only thing that gets more complicated when a ⎕ function is added is the implementation and not
the language.

Rational number, BTW, have to be explicitly ./configured and are not present in the default GNU APL. Same for parallel APL. I have
seen that some users are experimenting with these features and I believe we should allow that because chances are that these
experiments result in something valuable some day. Who knows?

Best Regards,
/// Jürgen


On 09/21/2017 04:19 AM, Peter Teeson wrote:
</pre>
<blockquote type="cite">
<pre wrap="">It so happens that 2 of my former colleagues from I.P.Sharp came visiting today and we were chatting about this.
Ken was not in favour of making APL complicated. When I worked at IPSA my office was next to Ken’s
and when someone suggested some form of addition to the language he would usually ask
why we could not do it with an APL function. (These days performance can hardly be a compelling argument
with multiple many-core CPU chips.)

Right now we already have a proliferation of Quad functions not to mention lambdas and native functions.
We also have divergent APLs such as Dyalog (good as it is) and so on.

Complex numbers, rationals and file systems are good additions.
But IMHO we should have one simple mechanism - i.e. the libapl APL API
and all the rest go through that as native functions.

Jurgen’s guiding light is to make GNUAPL an implementation that met the ISO and APL2 definitions.
We have already wondered away from that. Pity. When will it stop?

Just my 02¢

respect

Peter </pre> <blockquote type="cite"> <pre wrap="">On Sep 20, 2017, at 4:30 PM, <a class="moz-txt-link-abbreviated" href="mailto:***@gmx.com">***@gmx.com</a> <a class="moz-txt-link-rfc2396E" href="mailto:***@gmx.com">&lt;mailto:***@gmx.com&gt;</a> wrote:

&lt;mumble&gt; anyone who loves grep and hates perl (and i hope java too) can't be all bad &lt;/mumble&gt;

using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?

On Wed, 20 Sep 2017 21:47:29 +0200
Juergen Sauermann <a class="moz-txt-link-rfc2396E" href="mailto:***@t-online.de">&lt;***@t-online.de&gt;</a> <a class="moz-txt-link-rfc2396E" href="mailto:***@t-online.de">&lt;mailto:***@t-online.de&gt;</a> wrote:

</pre>
<blockquote type="cite">
<pre wrap="">Hi Elias,

I am generally in favour of supporting regular expressions in GNU APL.

We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).

I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.

Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.

Everybody else, please feel invited to join the discussion.

Best Regards,
Jürgen Sauermann


On 09/20/2017 05:59 AM, Elias Mårtenson wrote:
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.

Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.

I was thinking of the following form:

regex ⎕Regex string

The way I envision this to work, is to have the function return ⍬ if there is no match, or a string containing the match, if there is one:

'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
'f..' ⎕Regex 'xfooy'
'foo'

If the regex has subexpressions, those matches should be returned as individual strings:

'([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛

This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: <a class="moz-txt-link-freetext" href="http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html">http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html</a> <a class="moz-txt-link-rfc2396E" href="http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html">&lt;http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html&gt;</a>

What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?

Regards,
Elias

</pre>
</blockquote>
</blockquote>
<pre wrap="">

</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<pre wrap="">

</pre>
</blockquote>
<pre wrap="">

</pre>
</blockquote>
<br>
</body>
</html>

Elias Mårtenson
2017-09-21 02:44:06 UTC
Permalink
On 21 September 2017 at 04:30, <***@gmx.com> wrote:

<mumble> anyone who loves grep and hates perl (and i hope java too) can't
Post by e***@gmx.com
be all bad </mumble>
using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would
monadic ⎕REX['s'] 'bbb' return?
I don't think there is any reasonably monadic interpretation of a regex. So
the answer to your question is that it should return an error, IMHO.

Regards,
Elias
Juergen Sauermann
2017-09-22 16:31:22 UTC
Permalink
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">Hi,<br>
<br>
I can't speak about java. In my (previous) work life I have used
about 15 different languages at<br>
different times and for different purposes. I have not seen a
single language that is suitable for<br>
all purposes, so I normally try to understand a problem first and
then pick a language in which<br>
I can solve the problem (after all, I am a lazy guy).<br>
<br>
The way I teach myself a new language is by taking a small sample
problem and implement it in<br>
the new language. That tells me quickly how to judge the new
language and to learn what it might be<br>
good for.<br>
<br>
I did the same with java, but failed big time to even solve a
simple problem. I if I remember correctly then<br>
it was something like printing "hello world" onto a serial
interface. After a week or so I decided that<br>
this problem was too big for java and myself to solve and never
touched java again. Java was the<br>
only language in my life that refused to work for me and my
opinion about it is accordingly. But this<br>
opinion is, as explained above, not based on any profound
knowledge of the language itself. If you<br>
would ask me for a single adjective to characterize java, then
"useless" would be the first that came<br>
to my mind.<br>
<br>
</font>/// Jürgen<br>
<br>
<br>
<div class="moz-cite-prefix">On 09/20/2017 10:30 PM, <a class="moz-txt-link-abbreviated" href="mailto:***@gmx.com">***@gmx.com</a>
wrote:<br>
</div>
<blockquote cite="mid:***@gmx.com"
type="cite">
<pre wrap="">
&lt;mumble&gt; anyone who loves grep and hates perl (and i hope java too) can't be all bad &lt;/mumble&gt;

using apl like syntax is good aaa' ⎕REX['s'] 'bbb' what would monadic ⎕REX['s'] 'bbb' return?

On Wed, 20 Sep 2017 21:47:29 +0200
Juergen Sauermann <a class="moz-txt-link-rfc2396E" href="mailto:***@t-online.de">&lt;***@t-online.de&gt;</a> wrote:

</pre>
<blockquote type="cite">
<pre wrap="">Hi Elias,

I am generally in favour of supporting regular expressions in GNU APL.

We should do that in a way that is compatible with the way in which the most commonly used libraries
do that (even if they are lacking some features that more exotic libraries may have. Unfortunately I do not
have a full overview of all (or even any) existing libraries. I personally love grep and hate perl (the latter not
only because of their regexes).

I would like to avoid constructs like s/aaa/bbb/ where operations are kind of text-encoded into strings.
That is, IMHO, a  hack-ish programming style and should be replaced by a more APL-alike syntax such as
'aaa' ⎕REX['s'] 'bbb' or maybe 's' ⎕REX 'aaa' 'bbb'.

Or, if the number of operations is small (perl seems to have only 2, not counting the translate which is already
covered by other APL functions), then we could also have different ⎕-functions for them and thus avoiding a
third argument.

Everybody else, please feel invited to join the discussion.

Best Regards,
Jürgen Sauermann


On 09/20/2017 05:59 AM, Elias Mårtenson wrote:
On several occasions, I have felt that built-in regex support in GNU APL would be very helpful.

Implementing it should be rather simple, but I'd like to discuss how such an API should look in order for it to be as useful as possible.

I was thinking of the following form:

      regex ⎕Regex string

The way I envision this to work, is to have the function return ⍬ if there is no match, or a string containing the match, if there is one:

      'f..' ⎕Regex 'xzooy'
┏⊖┓
┃0┃
┗━┛
      'f..' ⎕Regex 'xfooy'
'foo'

If the regex has subexpressions, those matches should be returned as individual strings:

      '([0-9]+)-([0-9]+)-([0-9]+) '⎕Regex '2017-01-02'
┏→━━━━━━━━━━━━━━━┓
┃"2017" "01" "02"┃
┗∊━━━━━━━━━━━━━━━┛

This would be a very useful API, and reasonably easy to implement by simply calling into the standard regcomp() call: <a class="moz-txt-link-freetext" href="http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html">http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html</a>

What do you think? Is this a reasonable way to implement it? Any suggestions about alternative API's?

Regards,
Elias

</pre>
</blockquote>
<pre wrap="">

</pre>
</blockquote>
<br>
</body>
</html>
Jay Foad
2017-09-22 12:21:41 UTC
Permalink
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which are implemented
with PCRE:

('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┮───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue

http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm

Jay.
Elias Mårtenson
2017-09-22 14:59:41 UTC
Permalink
I did not know this. I took a look at Dyalog's API and it's not possible to
implement it fully, as it relies on their object oriented features.
However, the basic functionality wouldn't be hard to replicate, if that is
something that is desired.

JÃŒrgen, what is your opinion on this?
Post by Jay Foad
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which are
('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┮───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue
http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm
Jay.
Juergen Sauermann
2017-09-22 16:08:59 UTC
Permalink
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">Hi,<br>
<br>
I have not looked into Dyalogs implementation myself, but if they<br>
have it then we should aim at being as compatible as it makes
sense.<br>
No problem if some of their capabilities are not supported (please
avoid<br>
going over the top in the GNU APL implementation)<br>
<br>
Unfortunately ⎕R is already occupied in GNU APL (inherited from
IBM APL2),<br>
so some other name(s) are needed.<br>
<br>
Before implementing too much in advance, it would be good to
present the<br>
intended syntax and semantics on bug-apl and solicit opinions.<br>
<br>
/// Jürgen<br>
</font><br>
<br>
<div class="moz-cite-prefix">On 09/22/2017 04:59 PM, Elias Mårtenson
wrote:<br>
</div>
<blockquote
cite="mid:***@mail.gmail.com"
type="cite">
<div dir="ltr">I did not know this. I took a look at Dyalog's API
and it's not possible to implement it fully, as it relies on
their object oriented features. However, the basic functionality
wouldn't be hard to replicate, if that is something that is
desired.
<div><br>
</div>
<div>Jürgen, what is your opinion on this?</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 22 September 2017 at 20:21, Jay Foad
<span dir="ltr">&lt;<a moz-do-not-send="true"
href="mailto:***@gmail.com" target="_blank">***@gmail.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">FYI Dyalog has operators ⎕S
(search) and ⎕R (replace) which are implemented with
PCRE:</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">
<div class="gmail_extra"><font face="monospace,
monospace">      ('[Aa]..'⎕S'&amp;')'Dyalog APL'</font></div>
<div class="gmail_extra"><font face="monospace,
monospace">┌───┬───┐</font></div>
<div class="gmail_extra"><font face="monospace,
monospace">│alo│APL│</font></div>
<div class="gmail_extra"><font face="monospace,
monospace">└───┴───┘</font></div>
<div class="gmail_extra"><font face="monospace,
monospace">      ('red' 'green'⎕R'green' 'blue')'red
orange yellow green blue'<br>
</font></div>
</div>
<div class="gmail_extra">
<div class="gmail_extra"><font face="monospace,
monospace">green orange yellow blue blue</font></div>
<div><br>
</div>
</div>
<div class="gmail_extra"><a moz-do-not-send="true"
href="http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm"
target="_blank">http://help.dyalog.com/16.0/<wbr>Content/Language/System%<wbr>20Functions/r.htm</a><span
class="HOEnZb"><font color="#888888"><br>
</font></span></div>
<span class="HOEnZb"><font color="#888888">
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Jay.</div>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>
Elias Mårtenson
2017-09-25 03:18:51 UTC
Permalink
Dyalog's implementation is much more expressive than what I had proposed.

There are technical reasons why we have no hope of replicating their
functionality (in particular, GNU APL does not have support for namespaces).

Their function takes arguments and returns a function, which is a matcher
function that can be reused, which is useful since you'd only compile the
regexp once. JÃŒrgen, how can I make a quad-function behave like below? It
seems to be similar in behaviour to ⍀ and ⍣.

* ('.at' ⎕R '\u0') 'The cat sat on the mat' *
The CAT SAT on the MAT

It can also accept a function, in which case the function is called for
each match, to return a replacement string. Can you explain how to make a
quad-function an operator?

* ('\w+' ⎕R {⌜⍵.Match}) 'The cat sat on the mat'*
ehT tac tas no eht tam

As you can see, they leverage namespaces in order to pass a lot of
different fields to the replace-function. If we want to do something
similar, ⍵ would probably have to be the match string, and we'll have to
live without the remaining fields.

Regards,
Elias


On 23 September 2017 at 00:08, Juergen Sauermann <
Post by Juergen Sauermann
Hi,
I have not looked into Dyalogs implementation myself, but if they
have it then we should aim at being as compatible as it makes sense.
No problem if some of their capabilities are not supported (please avoid
going over the top in the GNU APL implementation)
Unfortunately ⎕R is already occupied in GNU APL (inherited from IBM APL2),
so some other name(s) are needed.
Before implementing too much in advance, it would be good to present the
intended syntax and semantics on bug-apl and solicit opinions.
/// JÃŒrgen
I did not know this. I took a look at Dyalog's API and it's not possible
to implement it fully, as it relies on their object oriented features.
However, the basic functionality wouldn't be hard to replicate, if that is
something that is desired.
JÃŒrgen, what is your opinion on this?
Post by Jay Foad
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which are
('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┮───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue
http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm
Jay.
Juergen Sauermann
2017-09-25 12:10:52 UTC
Permalink
Hi Elias,

making a quad function an operator is simple if the function argument(s)
is/are primitive functions
and a little more complicated if not.

First of all you have to implement (read: overload) some of the
eval_XXX() function that have function
arguments. For monadic operators these eval_XXX() functions areare:

   virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
   virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
   virtual Token eval_LB(Token & LO, Value_P B)
   virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)

where L resp. LO stands for the left function argument. For a dyadic
operators they are:

   virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
   virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P
X, Value_P B)
   virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
   virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)

where L resp. LO and R resp. RO stand for the left and right function
argument(s), A and B
are the value arguments, and X the axis.

Not all of them need to be implemented only those that have function
signatures that
are supported by the operator (mainly in terms of allowing an axis
argument X or a
left value argument A).

If an operator supports defined functions (as opposed to primitive
functions) then it will typically
implement the operator itself as a macro, which means that the
implementation is written in APL
rather than in C++ (similar to "magic functions" in NARS). This is
needed because primitive functions
are atomic (they either succeed or fail, but cannot be continued after a
failure) while defined functions
(and operators) can continue at the point of interruption after having
fixed the values that have cause
the fault.

Some of the build-in operators in GNU APL have both a primitive
implementation (which is used when
the function arguments are primitive) and a macro based implementation
if not. This is for performance
reasons so that the ability to take defined functions as arguments does
not performance-wise harm the
cases where the function arguments are primitive.

The Macro definitions are contained in Macro.def

Please note that in GNU APL functions cannot return functions, which may
or may not be a problem
in your case, depending on whether the function argument(s) of the
⎕-operator is/are primitive or not.
In standard APL you cannot assign a function to a name. The usual
work-around return a string and ⍎ it.

My guts feeling is that if you need function arguments for implementing
regular expressions then
something has been going into the wrong direction somewhere else.

Best Regards,
/// Jürgen
Post by Elias MÃ¥rtenson
Dyalog's implementation is much more expressive than what I had proposed.
There are technical reasons why we have no hope of replicating their
functionality (in particular, GNU APL does not have support for namespaces).
Their function takes arguments and returns a function, which is a
matcher function that can be reused, which is useful since you'd only
compile the regexp once. Jürgen, how can I make a quad-function behave
like below? It seems to be similar in behaviour to ⍤ and ⍣.
*      ('.at' ⎕R '\u0') 'The cat sat on the mat' *
The CAT SAT on the MAT
It can also accept a function, in which case the function is called
for each match, to return a replacement string. Can you explain how to
make a quad-function an operator?
*
*
*      ('\w+' ⎕R {⌽⍵.Match}) 'The cat sat on the mat'*
ehT tac tas no eht tam
As you can see, they leverage namespaces in order to pass a lot of
different fields to the replace-function. If we want to do something
similar, ⍵ would probably have to be the match string, and we'll have
to live without the remaining fields.
Regards,
Elias
On 23 September 2017 at 00:08, Juergen Sauermann
Hi,
I have not looked into Dyalogs implementation myself, but if they
have it then we should aim at being as compatible as it makes sense.
No problem if some of their capabilities are not supported (please avoid
going over the top in the GNU APL implementation)
Unfortunately ⎕R is already occupied in GNU APL (inherited from
IBM APL2),
so some other name(s) are needed.
Before implementing too much in advance, it would be good to present the
intended syntax and semantics on bug-apl and solicit opinions.
/// Jürgen
Post by Elias MÃ¥rtenson
I did not know this. I took a look at Dyalog's API and it's not
possible to implement it fully, as it relies on their object
oriented features. However, the basic functionality wouldn't be
hard to replicate, if that is something that is desired.
Jürgen, what is your opinion on this?
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┴───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue
http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm
<http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm>
Jay.
Elias Mårtenson
2017-10-02 08:27:03 UTC
Permalink
Some progress:

The behaviour I described earlier still works, but now has the ability to
work N-dimensional arrays of strings, compiling the regex only once and
then applying it on all the cells.

In addition to this, I have now also added a flag "B" (meaning "bitmap")
that creates a bitmap of all matches and can be used in conjunction with ⊂
to split strings by regex.

Here's an example:

* " +" ⎕RE["B"] "this is a test"*
┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃0 0 0 0 1 0 0 2 2 2 0 3 3 3 3 3 0 0 0 0┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

This matches any sequence of spaces, and we can easily use ⊂ to split the
string:

* {⍵ ⊂⍹ 0=" +" ⎕RE["B"] ⍵} "this is a test"*
┏→━━━━━━━━━━━━━━━━━━━━━┓
┃"this" "is" "a" "test"┃
┗∊━━━━━━━━━━━━━━━━━━━━━┛

However, I'm not sure if the value returned from the function are ideal.
The idea of the increasing numbers is to be able to differentiate between
the result of:

* " " ⎕RE["B"] " "*
┏→━━━━━━┓
┃1 2 3 4┃
┗━━━━━━━┛

vs:

* " +" ⎕RE["B"] " "*
┏→━━━━━━┓
┃1 1 1 1┃
┗━━━━━━━┛

Should it be left like this, or should it be done in some other way?

Regards,
Elias

On 25 September 2017 at 20:10, Juergen Sauermann <
Post by Juergen Sauermann
Hi Elias,
making a quad function an operator is simple if the function argument(s)
is/are primitive functions
and a little more complicated if not.
First of all you have to implement (read: overload) some of the eval_XXX()
function that have function
virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
virtual Token eval_LB(Token & LO, Value_P B)
virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)
where L resp. LO stands for the left function argument. For a dyadic
virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P X,
Value_P B)
virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)
where L resp. LO and R resp. RO stand for the left and right function
argument(s), A and B
are the value arguments, and X the axis.
Not all of them need to be implemented only those that have function
signatures that
are supported by the operator (mainly in terms of allowing an axis
argument X or a
left value argument A).
If an operator supports defined functions (as opposed to primitive
functions) then it will typically
implement the operator itself as a macro, which means that the
implementation is written in APL
rather than in C++ (similar to "magic functions" in NARS). This is needed
because primitive functions
are atomic (they either succeed or fail, but cannot be continued after a
failure) while defined functions
(and operators) can continue at the point of interruption after having
fixed the values that have cause
the fault.
Some of the build-in operators in GNU APL have both a primitive
implementation (which is used when
the function arguments are primitive) and a macro based implementation if
not. This is for performance
reasons so that the ability to take defined functions as arguments does
not performance-wise harm the
cases where the function arguments are primitive.
The Macro definitions are contained in Macro.def
Please note that in GNU APL functions cannot return functions, which may
or may not be a problem
in your case, depending on whether the function argument(s) of the
⎕-operator is/are primitive or not.
In standard APL you cannot assign a function to a name. The usual
work-around return a string and ⍎ it.
My guts feeling is that if you need function arguments for implementing
regular expressions then
something has been going into the wrong direction somewhere else.
Best Regards,
/// JÃŒrgen
Post by Elias MÃ¥rtenson
Dyalog's implementation is much more expressive than what I had proposed.
There are technical reasons why we have no hope of replicating their
functionality (in particular, GNU APL does not have support for namespaces).
Their function takes arguments and returns a function, which is a matcher
function that can be reused, which is useful since you'd only compile the
regexp once. JÃŒrgen, how can I make a quad-function behave like below? It
seems to be similar in behaviour to ⍀ and ⍣.
* ('.at' ⎕R '\u0') 'The cat sat on the mat' *
The CAT SAT on the MAT
It can also accept a function, in which case the function is called for
each match, to return a replacement string. Can you explain how to make a
quad-function an operator?
*
*
* ('\w+' ⎕R {⌜⍵.Match}) 'The cat sat on the mat'*
ehT tac tas no eht tam
As you can see, they leverage namespaces in order to pass a lot of
different fields to the replace-function. If we want to do something
similar, ⍵ would probably have to be the match string, and we'll have to
live without the remaining fields.
Regards,
Elias
On 23 September 2017 at 00:08, Juergen Sauermann <
Hi,
I have not looked into Dyalogs implementation myself, but if they
have it then we should aim at being as compatible as it makes sense.
No problem if some of their capabilities are not supported (please avoid
going over the top in the GNU APL implementation)
Unfortunately ⎕R is already occupied in GNU APL (inherited from IBM APL2),
so some other name(s) are needed.
Before implementing too much in advance, it would be good to present the
intended syntax and semantics on bug-apl and solicit opinions.
/// JÃŒrgen
Post by Elias MÃ¥rtenson
I did not know this. I took a look at Dyalog's API and it's not
possible to implement it fully, as it relies on their object
oriented features. However, the basic functionality wouldn't be
hard to replicate, if that is something that is desired.
JÃŒrgen, what is your opinion on this?
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┮───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue
http://help.dyalog.com/16.0/Content/Language/System%20Functi
ons/r.htm
<http://help.dyalog.com/16.0/Content/Language/System%20Funct
ions/r.htm>
Jay.
Elias Mårtenson
2017-10-02 08:47:28 UTC
Permalink
In playing around with this, I realise that the "B" mode is quite useful.
So much so, in fact, that I'm wondering if it's warranted to have a
dedicated quad-function for this specific behaviour.

Here's an example of extracting sequences of 4 characters:

* {⍵ ⊂⍹ "[a-z]{4}" ⎕RE['B'] ⍵} 'abcdef45abchello9'*
┏→━━━━━━━━━━━━━━━━━━━┓
┃"abcd" "abch" "ello"┃
┗∊━━━━━━━━━━━━━━━━━━━┛

Regards,
Elias
Post by Elias MÃ¥rtenson
The behaviour I described earlier still works, but now has the ability to
work N-dimensional arrays of strings, compiling the regex only once and
then applying it on all the cells.
In addition to this, I have now also added a flag "B" (meaning "bitmap")
that creates a bitmap of all matches and can be used in conjunction with ⊂
to split strings by regex.
* " +" ⎕RE["B"] "this is a test"*
┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃0 0 0 0 1 0 0 2 2 2 0 3 3 3 3 3 0 0 0 0┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
This matches any sequence of spaces, and we can easily use ⊂ to split the
* {⍵ ⊂⍹ 0=" +" ⎕RE["B"] ⍵} "this is a test"*
┏→━━━━━━━━━━━━━━━━━━━━━┓
┃"this" "is" "a" "test"┃
┗∊━━━━━━━━━━━━━━━━━━━━━┛
However, I'm not sure if the value returned from the function are ideal.
The idea of the increasing numbers is to be able to differentiate between
* " " ⎕RE["B"] " "*
┏→━━━━━━┓
┃1 2 3 4┃
┗━━━━━━━┛
* " +" ⎕RE["B"] " "*
┏→━━━━━━┓
┃1 1 1 1┃
┗━━━━━━━┛
Should it be left like this, or should it be done in some other way?
Regards,
Elias
On 25 September 2017 at 20:10, Juergen Sauermann <
Post by Juergen Sauermann
Hi Elias,
making a quad function an operator is simple if the function argument(s)
is/are primitive functions
and a little more complicated if not.
First of all you have to implement (read: overload) some of the
eval_XXX() function that have function
virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
virtual Token eval_LB(Token & LO, Value_P B)
virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)
where L resp. LO stands for the left function argument. For a dyadic
virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P X,
Value_P B)
virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)
where L resp. LO and R resp. RO stand for the left and right function
argument(s), A and B
are the value arguments, and X the axis.
Not all of them need to be implemented only those that have function
signatures that
are supported by the operator (mainly in terms of allowing an axis
argument X or a
left value argument A).
If an operator supports defined functions (as opposed to primitive
functions) then it will typically
implement the operator itself as a macro, which means that the
implementation is written in APL
rather than in C++ (similar to "magic functions" in NARS). This is needed
because primitive functions
are atomic (they either succeed or fail, but cannot be continued after a
failure) while defined functions
(and operators) can continue at the point of interruption after having
fixed the values that have cause
the fault.
Some of the build-in operators in GNU APL have both a primitive
implementation (which is used when
the function arguments are primitive) and a macro based implementation if
not. This is for performance
reasons so that the ability to take defined functions as arguments does
not performance-wise harm the
cases where the function arguments are primitive.
The Macro definitions are contained in Macro.def
Please note that in GNU APL functions cannot return functions, which may
or may not be a problem
in your case, depending on whether the function argument(s) of the
⎕-operator is/are primitive or not.
In standard APL you cannot assign a function to a name. The usual
work-around return a string and ⍎ it.
My guts feeling is that if you need function arguments for implementing
regular expressions then
something has been going into the wrong direction somewhere else.
Best Regards,
/// JÃŒrgen
Post by Elias MÃ¥rtenson
Dyalog's implementation is much more expressive than what I had proposed.
There are technical reasons why we have no hope of replicating their
functionality (in particular, GNU APL does not have support for namespaces).
Their function takes arguments and returns a function, which is a
matcher function that can be reused, which is useful since you'd only
compile the regexp once. JÃŒrgen, how can I make a quad-function behave like
below? It seems to be similar in behaviour to ⍀ and ⍣.
* ('.at' ⎕R '\u0') 'The cat sat on the mat' *
The CAT SAT on the MAT
It can also accept a function, in which case the function is called for
each match, to return a replacement string. Can you explain how to make a
quad-function an operator?
*
*
* ('\w+' ⎕R {⌜⍵.Match}) 'The cat sat on the mat'*
ehT tac tas no eht tam
As you can see, they leverage namespaces in order to pass a lot of
different fields to the replace-function. If we want to do something
similar, ⍵ would probably have to be the match string, and we'll have to
live without the remaining fields.
Regards,
Elias
On 23 September 2017 at 00:08, Juergen Sauermann <
Hi,
I have not looked into Dyalogs implementation myself, but if they
have it then we should aim at being as compatible as it makes sense.
No problem if some of their capabilities are not supported (please avoid
going over the top in the GNU APL implementation)
Unfortunately ⎕R is already occupied in GNU APL (inherited from IBM APL2),
so some other name(s) are needed.
Before implementing too much in advance, it would be good to present the
intended syntax and semantics on bug-apl and solicit opinions.
/// JÃŒrgen
Post by Elias MÃ¥rtenson
I did not know this. I took a look at Dyalog's API and it's not
possible to implement it fully, as it relies on their object
oriented features. However, the basic functionality wouldn't be
hard to replicate, if that is something that is desired.
JÃŒrgen, what is your opinion on this?
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┮───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue
http://help.dyalog.com/16.0/Content/Language/System%20Functi
ons/r.htm
<http://help.dyalog.com/16.0/Content/Language/System%20Funct
ions/r.htm>
Jay.
Juergen Sauermann
2017-10-02 17:30:57 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font face="Helvetica, Arial, sans-serif">Hi Elias,<br>
<br>
I believe it is better to keep things together, i.e. in a single ⎕
function than in several.<br>
<br>
It may be intuitive to use the character ⊂ </font><font
face="Helvetica, Arial, sans-serif"><font face="Helvetica, Arial,
sans-serif"><font face="Helvetica, Arial, sans-serif"><font
face="Helvetica, Arial, sans-serif"><font face="Helvetica,
Arial, sans-serif"><font face="Helvetica, Arial,
sans-serif"><font face="Helvetica, Arial, sans-serif"><font
face="Helvetica, Arial, sans-serif">instead of B in
the axis argument to indicate<br>
that the result is meant for dyadic ⊂.<br>
<br>
</font></font></font></font></font></font></font></font><font
face="Helvetica, Arial, sans-serif">/// Jürgen<br>
<br>
</font><font face="Helvetica, Arial, sans-serif"><font
face="Helvetica, Arial, sans-serif"><font face="Courier New,
Courier, monospace"><b><br>
</b></font></font></font>
<div class="moz-cite-prefix">On 10/02/2017 10:47 AM, Elias Mårtenson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CADtN0W+pMrpBXuDhimYoJc6VfPr9-***@mail.gmail.com">
<div dir="ltr">In playing around with this, I realise that the "B"
mode is quite useful. So much so, in fact, that I'm wondering if
it's warranted to have a dedicated quad-function for this
specific behaviour.
<div><br>
</div>
<div>Here's an example of extracting sequences of 4 characters:</div>
<div><br>
</div>
<div>
<div><font face="monospace, monospace"><b>      {⍵ ⊂⍨
"[a-z]{4}" ⎕RE['B'] ⍵} 'abcdef45abchello9'</b></font></div>
<div><font face="monospace, monospace">┏→━━━━━━━━━━━━━━━━━━━┓</font></div>
<div><font face="monospace, monospace">┃"abcd" "abch" "ello"┃</font></div>
<div><font face="monospace, monospace">┗∊━━━━━━━━━━━━━━━━━━━┛</font></div>
</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 2 October 2017 at 16:27, Elias
Mårtenson <span dir="ltr">&lt;<a
href="mailto:***@gmail.com" target="_blank"
moz-do-not-send="true">***@gmail.com</a>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Some progress:
<div><br>
</div>
<div>The behaviour I described earlier still works, but
now has the ability to work N-dimensional arrays of
strings, compiling the regex only once and then applying
it on all the cells.</div>
<div><br>
</div>
<div>In addition to this, I have now also added a flag "B"
(meaning "bitmap") that creates a bitmap of all matches
and can be used in conjunction with ⊂ to split strings
by regex.</div>
<div><br>
</div>
<div>Here's an example:</div>
<div>
<div><font face="monospace, monospace"><b><br>
</b></font></div>
<div>
<div><font face="monospace, monospace"><b>      " +"
⎕RE["B"] "this is   a     test"</b></font></div>
<div><font face="monospace, monospace">┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━<wbr>━━━━━━━━━━┓</font></div>
<div><font face="monospace, monospace">┃0 0 0 0 1 0 0
2 2 2 0 3 3 3 3 3 0 0 0 0┃</font></div>
<div><font face="monospace, monospace">┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━<wbr>━━━━━━━━━━┛</font></div>
</div>
</div>
<div><br>
</div>
<div>This matches any sequence of spaces, and we can
easily use ⊂ to split the string:</div>
<div><br>
</div>
<div>
<div><font face="monospace, monospace"><b>      {⍵ ⊂⍨
0=" +" ⎕RE["B"] ⍵} "this is   a     test"</b></font></div>
<div><font face="monospace, monospace">┏→━━━━━━━━━━━━━━━━━━━━━┓</font></div>
<div><font face="monospace, monospace">┃"this" "is" "a"
"test"┃</font></div>
<div><font face="monospace, monospace">┗∊━━━━━━━━━━━━━━━━━━━━━┛</font></div>
</div>
<div><br>
</div>
<div>However, I'm not sure if the value returned from the
function are ideal. The idea of the increasing numbers
is to be able to differentiate between the result of:</div>
<div><br>
</div>
<div>
<div><font face="monospace, monospace"><b>      " "
⎕RE["B"] "    "</b></font></div>
<div><font face="monospace, monospace">┏→━━━━━━┓</font></div>
<div><font face="monospace, monospace">┃1 2 3 4┃</font></div>
<div><font face="monospace, monospace">┗━━━━━━━┛</font></div>
<div><br>
</div>
<div>vs:</div>
<div><br>
</div>
<div><font face="monospace, monospace"><b>      " +"
⎕RE["B"] "    "</b></font></div>
<div><font face="monospace, monospace">┏→━━━━━━┓</font></div>
<div><font face="monospace, monospace">┃1 1 1 1┃</font></div>
<div><font face="monospace, monospace">┗━━━━━━━┛</font></div>
</div>
<div><br>
</div>
<div>Should it be left like this, or should it be done in
some other way?</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
<div class="HOEnZb">
<div class="h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On 25 September 2017 at
20:10, Juergen Sauermann <span dir="ltr">&lt;<a
href="mailto:***@t-online.de"
target="_blank" moz-do-not-send="true">***@t-online.de</a><wbr>&gt;</span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
Elias,<br>
<br>
making a quad function an operator is simple if
the function argument(s) is/are primitive
functions<br>
and a little more complicated if not.<br>
<br>
First of all you have to implement (read:
overload) some of the eval_XXX() function that
have function<br>
arguments. For monadic operators these eval_XXX()
functions areare:<br>
<br>
   virtual Token eval_ALB(Value_P A, Token &amp;
LO, Value_P B)<br>
   virtual Token eval_ALXB(Value_P A, Token &amp;
LO, Value_P X, Value_P B)<br>
   virtual Token eval_LB(Token &amp; LO, Value_P
B)<br>
   virtual Token eval_LXB(Token &amp; LO, Value_P
X, Value_P B)<br>
<br>
where L resp. LO stands for the left function
argument. For a dyadic operators they are:<br>
<br>
   virtual Token eval_ALRB(Value_P A, Token &amp;
LO, Token &amp; RO, Value_P B)<br>
   virtual Token eval_ALRXB(Value_P A, Token &amp;
LO, Token &amp; RO, Value_P X, Value_P B)<br>
   virtual Token eval_LRB(Token &amp; LO, Token
&amp; RO, Value_P B)<br>
   virtual Token eval_LRXB(Token &amp; LO, Token
&amp; RO, Value_P X, Value_P B)<br>
<br>
where L resp. LO and R resp. RO stand for the left
and right function argument(s), A and B<br>
are the value arguments, and X the axis.<br>
<br>
Not all of them need to be implemented only those
that have function signatures that<br>
are supported by the operator (mainly in terms of
allowing an axis argument X or a<br>
left value argument A).<br>
<br>
If an operator supports defined functions (as
opposed to primitive functions) then it will
typically<br>
implement the operator itself as a macro, which
means that the implementation is written in APL<br>
rather than in C++ (similar to "magic functions"
in NARS). This is needed because primitive
functions<br>
are atomic (they either succeed or fail, but
cannot be continued after a failure) while defined
functions<br>
(and operators) can continue at the point of
interruption after having fixed the values that
have cause<br>
the fault.<br>
<br>
Some of the build-in operators in GNU APL have
both a primitive implementation (which is used
when<br>
the function arguments are primitive) and a macro
based implementation if not. This is for
performance<br>
reasons so that the ability to take defined
functions as arguments does not performance-wise
harm the<br>
cases where the function arguments are primitive.<br>
<br>
The Macro definitions are contained in Macro.def<br>
<br>
Please note that in GNU APL functions cannot
return functions, which may or may not be a
problem<br>
in your case, depending on whether the function
argument(s) of the ⎕-operator is/are primitive or
not.<br>
In standard APL you cannot assign a function to a
name. The usual work-around return a string and ⍎
it.<br>
<br>
My guts feeling is that if you need function
arguments for implementing regular expressions
then<br>
something has been going into the wrong direction
somewhere else.<br>
<br>
Best Regards,<br>
/// Jürgen<span><br>
<br>
<br>
<br>
On 09/25/2017 05:18 AM, Elias Mårtenson wrote:<br>
</span>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><span>
Dyalog's implementation is much more
expressive than what I had proposed.<br>
<br>
There are technical reasons why we have no
hope of replicating their functionality (in
particular, GNU APL does not have support for
namespaces).<br>
<br>
Their function takes arguments and returns a
function, which is a matcher function that can
be reused, which is useful since you'd only
compile the regexp once. Jürgen, how can I
make a quad-function behave like below? It
seems to be similar in behaviour to ⍤ and ⍣.<br>
<br>
</span>
*      ('.at' ⎕R '\u0') 'The cat sat on the mat'
*<span><br>
The CAT SAT on the MAT<br>
<br>
It can also accept a function, in which case
the function is called for each match, to
return a replacement string. Can you explain
how to make a quad-function an operator?<br>
</span>
*<br>
*<br>
*      ('\w+' ⎕R {⌽⍵.Match}) 'The cat sat on the
mat'*<span><br>
ehT tac tas no eht tam<br>
<br>
As you can see, they leverage namespaces in
order to pass a lot of different fields to the
replace-function. If we want to do something
similar, ⍵ would probably have to be the match
string, and we'll have to live without the
remaining fields.<br>
<br>
Regards,<br>
Elias<br>
<br>
<br>
</span><span>
On 23 September 2017 at 00:08, Juergen
Sauermann &lt;<a
href="mailto:***@t-online.de"
target="_blank" moz-do-not-send="true">***@t-online.de</a>
&lt;mailto:<a
href="mailto:***@t-online.de"
target="_blank" moz-do-not-send="true">***@t-on<wbr>line.de</a>&gt;&gt;
wrote:<br>
<br>
    Hi,<br>
<br>
    I have not looked into Dyalogs
implementation myself, but if they<br>
    have it then we should aim at being as
compatible as it makes sense.<br>
    No problem if some of their capabilities
are not supported (please<br>
    avoid<br>
    going over the top in the GNU APL
implementation)<br>
<br>
    Unfortunately ⎕R is already occupied in
GNU APL (inherited from<br>
    IBM APL2),<br>
    so some other name(s) are needed.<br>
<br>
    Before implementing too much in advance,
it would be good to<br>
    present the<br>
    intended syntax and semantics on bug-apl
and solicit opinions.<br>
<br>
    /// Jürgen<br>
<br>
<br>
    On 09/22/2017 04:59 PM, Elias Mårtenson
wrote:<br>
</span>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><span>
    I did not know this. I took a look at
Dyalog's API and it's not<br>
    possible to implement it fully, as it
relies on their object<br>
    oriented features. However, the basic
functionality wouldn't be<br>
    hard to replicate, if that is something
that is desired.<br>
<br>
    Jürgen, what is your opinion on this?<br>
<br>
    On 22 September 2017 at 20:21, Jay Foad
&lt;<a href="mailto:***@gmail.com"
target="_blank" moz-do-not-send="true">***@gmail.com</a><br>
</span><span>
    &lt;mailto:<a
href="mailto:***@gmail.com"
target="_blank" moz-do-not-send="true">***@gmail.com</a>&gt;&gt;
wrote:<br>
<br>
        FYI Dyalog has operators ⎕S (search)
and ⎕R (replace) which<br>
        are implemented with PCRE:<br>
<br>
        ('[Aa]..'⎕S'&amp;')'Dyalog APL'<br>
        ┌───┬───┐<br>
        │alo│APL│<br>
        └───┴───┘<br>
        ('red' 'green'⎕R'green' 'blue')'red
orange yellow green blue'<br>
        green orange yellow blue blue<br>
<br>
        <a
href="http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://help.dyalog.com/16.0/Co<wbr>ntent/Language/System%20Functi<wbr>ons/r.htm</a><br>
</span>
        &lt;<a
href="http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://help.dyalog.com/16.0/C<wbr>ontent/Language/System%20Funct<wbr>ions/r.htm</a>&gt;<br>
<br>
        Jay.<br>
<br>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>
Elias Mårtenson
2017-10-03 04:14:09 UTC
Permalink
In the default mode, as I have demonstrated earlier, when the regexp has
parenthesised subexpressions, the strings matching those expressions will
be returned as separate strings. This is logical and in my opinion makes
perfect sense.

When using ⊂-mode, parenthesised expressions doesn't change the behaviour
at all, as there is no natural behaviour to implement in this case.

However, it would be nice to have a way to use subexpressions to split
strings, so I'm thinking of something like the following:


* "([0-9]{4})-([0-9]{2})-([0-9]{2})" ⎕RE[something] "foo 2010-02-03"*
┏→━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃0 0 0 0 1 1 1 1 0 2 2 0 3 3┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Note that this variation is different from the previous one in that
the ⊂-mode described in my previous email repeatedly calls the matching
function, marking each result in the output bitmap, while the proposed
version above runs the match only once, marking the subexpressions in the
result.

I'm starting to think that both are needed, but what symbols should be used
in the axis argument to indicate the desired mode?

An alternative output for the same expression would be something like the
following, which would match pretty much exactly what the underlying PCRE
function returns:

┏→━━━━┓
↓ 4 14┃
┃ 4 8┃
┃10 11┃
┃13 14┃
┃ 4 14┃
┗━━━━━┛

Would this is be a useful variation too? And if so, what axis marker should
be used for it?

Regards,
Elias
Post by Juergen Sauermann
Hi Elias,
I believe it is better to keep things together, i.e. in a single ⎕
function than in several.
It may be intuitive to use the character ⊂ instead of B in the axis
argument to indicate
that the result is meant for dyadic ⊂.
/// JÃŒrgen
In playing around with this, I realise that the "B" mode is quite useful.
So much so, in fact, that I'm wondering if it's warranted to have a
dedicated quad-function for this specific behaviour.
* {⍵ ⊂⍹ "[a-z]{4}" ⎕RE['B'] ⍵} 'abcdef45abchello9'*
┏→━━━━━━━━━━━━━━━━━━━┓
┃"abcd" "abch" "ello"┃
┗∊━━━━━━━━━━━━━━━━━━━┛
Regards,
Elias
Post by Elias MÃ¥rtenson
The behaviour I described earlier still works, but now has the ability to
work N-dimensional arrays of strings, compiling the regex only once and
then applying it on all the cells.
In addition to this, I have now also added a flag "B" (meaning "bitmap")
that creates a bitmap of all matches and can be used in conjunction with ⊂
to split strings by regex.
* " +" ⎕RE["B"] "this is a test"*
┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃0 0 0 0 1 0 0 2 2 2 0 3 3 3 3 3 0 0 0 0┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
This matches any sequence of spaces, and we can easily use ⊂ to split the
* {⍵ ⊂⍹ 0=" +" ⎕RE["B"] ⍵} "this is a test"*
┏→━━━━━━━━━━━━━━━━━━━━━┓
┃"this" "is" "a" "test"┃
┗∊━━━━━━━━━━━━━━━━━━━━━┛
However, I'm not sure if the value returned from the function are ideal.
The idea of the increasing numbers is to be able to differentiate between
* " " ⎕RE["B"] " "*
┏→━━━━━━┓
┃1 2 3 4┃
┗━━━━━━━┛
* " +" ⎕RE["B"] " "*
┏→━━━━━━┓
┃1 1 1 1┃
┗━━━━━━━┛
Should it be left like this, or should it be done in some other way?
Regards,
Elias
On 25 September 2017 at 20:10, Juergen Sauermann <
Post by Juergen Sauermann
Hi Elias,
making a quad function an operator is simple if the function argument(s)
is/are primitive functions
and a little more complicated if not.
First of all you have to implement (read: overload) some of the
eval_XXX() function that have function
virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
virtual Token eval_LB(Token & LO, Value_P B)
virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)
where L resp. LO stands for the left function argument. For a dyadic
virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P
X, Value_P B)
virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)
where L resp. LO and R resp. RO stand for the left and right function
argument(s), A and B
are the value arguments, and X the axis.
Not all of them need to be implemented only those that have function
signatures that
are supported by the operator (mainly in terms of allowing an axis
argument X or a
left value argument A).
If an operator supports defined functions (as opposed to primitive
functions) then it will typically
implement the operator itself as a macro, which means that the
implementation is written in APL
rather than in C++ (similar to "magic functions" in NARS). This is
needed because primitive functions
are atomic (they either succeed or fail, but cannot be continued after a
failure) while defined functions
(and operators) can continue at the point of interruption after having
fixed the values that have cause
the fault.
Some of the build-in operators in GNU APL have both a primitive
implementation (which is used when
the function arguments are primitive) and a macro based implementation
if not. This is for performance
reasons so that the ability to take defined functions as arguments does
not performance-wise harm the
cases where the function arguments are primitive.
The Macro definitions are contained in Macro.def
Please note that in GNU APL functions cannot return functions, which may
or may not be a problem
in your case, depending on whether the function argument(s) of the
⎕-operator is/are primitive or not.
In standard APL you cannot assign a function to a name. The usual
work-around return a string and ⍎ it.
My guts feeling is that if you need function arguments for implementing
regular expressions then
something has been going into the wrong direction somewhere else.
Best Regards,
/// JÃŒrgen
Post by Elias MÃ¥rtenson
Dyalog's implementation is much more expressive than what I had proposed.
There are technical reasons why we have no hope of replicating their
functionality (in particular, GNU APL does not have support for namespaces).
Their function takes arguments and returns a function, which is a
matcher function that can be reused, which is useful since you'd only
compile the regexp once. JÃŒrgen, how can I make a quad-function behave like
below? It seems to be similar in behaviour to ⍀ and ⍣.
* ('.at' ⎕R '\u0') 'The cat sat on the mat' *
The CAT SAT on the MAT
It can also accept a function, in which case the function is called for
each match, to return a replacement string. Can you explain how to make a
quad-function an operator?
*
*
* ('\w+' ⎕R {⌜⍵.Match}) 'The cat sat on the mat'*
ehT tac tas no eht tam
As you can see, they leverage namespaces in order to pass a lot of
different fields to the replace-function. If we want to do something
similar, ⍵ would probably have to be the match string, and we'll have to
live without the remaining fields.
Regards,
Elias
On 23 September 2017 at 00:08, Juergen Sauermann <
Hi,
I have not looked into Dyalogs implementation myself, but if they
have it then we should aim at being as compatible as it makes sense.
No problem if some of their capabilities are not supported (please avoid
going over the top in the GNU APL implementation)
Unfortunately ⎕R is already occupied in GNU APL (inherited from
IBM APL2),
so some other name(s) are needed.
Before implementing too much in advance, it would be good to present the
intended syntax and semantics on bug-apl and solicit opinions.
/// JÃŒrgen
Post by Elias MÃ¥rtenson
I did not know this. I took a look at Dyalog's API and it's not
possible to implement it fully, as it relies on their object
oriented features. However, the basic functionality wouldn't be
hard to replicate, if that is something that is desired.
JÃŒrgen, what is your opinion on this?
FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
('[Aa]..'⎕S'&')'Dyalog APL'
┌───┬───┐
│alo│APL│
└───┮───┘
('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
green orange yellow blue blue
http://help.dyalog.com/16.0/Content/Language/System%20Functi
ons/r.htm
<http://help.dyalog.com/16.0/Content/Language/System%20Funct
ions/r.htm>
Jay.
Loading...