<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font face="Helvetica, Arial, sans-serif">Hi Elias,<br>
<br>
thanks, fixed in <b>SVN 1013</b>.<br>
<br>
/// Jürgen<br>
</font><br>
<br>
<div class="moz-cite-prefix">On 10/09/2017 10:11 AM, Elias Mårtenson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CADtN0WKupkDPzNc+GYDth_sg4KKcvyaVp=S2MLv+***@mail.gmail.com">
<div dir="ltr">One more bug:
<div><br>
</div>
<div>The call to pcre2_compile_32 should be changed from:</div>
<div><font face="monospace, monospace"><br>
</font></div>
<div>
<div><font face="monospace, monospace"> code =
pcre2_compile_32(pattern_ucs, pattern.size(),</font></div>
<div><font face="monospace, monospace">
PCRE2_NO_UTF_CHECK | flags, &error_code,</font></div>
<div><font face="monospace, monospace">
&error_offset, 0);</font></div>
</div>
<div><br>
</div>
<div>To:</div>
<div><br>
</div>
<div>
<div><font face="monospace, monospace"> code =
pcre2_compile_32(pattern_ucs, pattern.size(),</font></div>
<div><font face="monospace, monospace">
<b>PCRE2_UTF | </b></font><span
style="color:rgb(0,0,90)"><b>PCRE2_UCP</b></span><span
style="font-family:monospace,monospace"> | flags,
&error_code,</span></div>
<div><font face="monospace, monospace">
&error_offset, 0);</font></div>
</div>
<div><br>
</div>
<div>Without <b>PCRE2_UTF</b>, proper Unicode semantics will
not be applied (such as properly handling case matching for
non-ASCII characters).</div>
<div><br>
</div>
<div><b>PCRE2_UCP</b>, is a little less obvious. I think it
would make sense to enable it, since we care more for
correctness than performance. Here's what the documentation
has to say about it:</div>
<div><br>
</div>
<blockquote style="margin:0px 0px 0px
40px;border:none;padding:0px">
<div><i>“This option changes the way PCRE2 processes \B, \b,
\D, \d, \S, \s, \W, \w, and some of the POSIX character
classes. By default, only ASCII characters are recognized,
but if PCRE2_UCP is set, Unicode properties are used
instead to classify characters. More details are given in
the section on generic character types in the pcre2pattern
page. If you set PCRE2_UCP, matching one of the items it
affects takes much longer.”</i></div>
</blockquote>
<div><br>
</div>
<div>Finally, I don't think it makes sense to use <span
style="font-family:monospace,monospace"><b>PCRE2_NO_UTF_CHECK</b></span> since
at best it's a no-op (since we're using UTF-32) and at worst
it can cause a crash when trying to match an invalid string.
That's not worth what little performance benefit there is to
gain from it.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 9 October 2017 at 11:12, Elias
Mårtenson <span dir="ltr"><<a
href="mailto:***@gmail.com" target="_blank"
moz-do-not-send="true">***@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">I found another bug. ↓ is used to indicate
that string indexes are requested, but the error message
when multiple output types are requested is wrong:
<div><br>
</div>
<div>
<div><font face="monospace, monospace"><b> "foo"
⎕RE["⊂↓"] "bar"</b></font></div>
<div><font face="monospace, monospace" color="#ff0000">DOMAIN
ERROR+</font></div>
<div><font face="monospace, monospace" color="#ff0000">
'foo' ⎕RE['⊂↓']'bar'</font></div>
<div><font face="monospace, monospace"><font
color="#ff0000"> ^ </font> ^</font></div>
<div><font face="monospace, monospace"><b> )more</b></font></div>
<div><font face="monospace, monospace" color="#000000">Multiple
⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/</font></div>
<div><br>
</div>
</div>
<div>Note the ⍳ in the error message instead of ↓.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
<div class="HOEnZb">
<div class="h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On 9 October 2017 at 10:45,
Elias Mårtenson <span dir="ltr"><<a
href="mailto:***@gmail.com" target="_blank"
moz-do-not-send="true">***@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">I fixed the problem by adding a <font
face="monospace, monospace">static_cast<PCRE2_SIZE>(len)</font>,
but I found another issue: The testcases file is
missing.
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
<div class="m_-8255613701604448095HOEnZb">
<div class="m_-8255613701604448095h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On 9 October 2017
at 10:41, Elias Mårtenson <span dir="ltr"><<a
href="mailto:***@gmail.com"
target="_blank" moz-do-not-send="true">***@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">Thank you.
<div><br>
</div>
<div>There are some errors when
compiling on my Arch system:</div>
<div><br>
</div>
<div>
<div><font face="monospace,
monospace">g++ -DHAVE_CONFIG_H
-I. -I.. -Wall -I sql
-Wold-style-cast -Werror
-I/usr/include -I/usr/include
-rdynamic -g -O2 -MT
apl-Quad_RE.o -MD -MP -MF
.deps/apl-Quad_RE.Tpo -c -o
apl-Quad_RE.o `test -f
'Quad_RE.cc' || echo
'./'`Quad_RE.cc</font></div>
<div><font face="monospace,
monospace">Quad_RE.cc: In static
member function ‘static Value_P
Quad_RE::partition_result(cons<wbr>t
Regexp&, const
Quad_RE::Flags&, const
UCS_string&)’:</font></div>
<div><font face="monospace,
monospace">Quad_RE.cc:211:42:
error: comparison between signed
and unsigned integer expressions
[-Werror=sign-compare]</font></div>
<div><font face="monospace,
monospace"> for (ShapeItem
match_id = 1; B_offset < len;
match_id += match_id_inc)</font></div>
<div><font face="monospace,
monospace">
~~~~~~~~~^~~~~</font></div>
<div><font face="monospace,
monospace">cc1plus: all warnings
being treated as errors</font></div>
<div><font face="monospace,
monospace">make[3]: ***
[Makefile:2725: apl-Quad_RE.o]
Error 1</font></div>
<div><font face="monospace,
monospace">make[3]: Leaving
directory
'/home/emartenson/src/apl/src'</font></div>
<div><font face="monospace,
monospace">make[2]: ***
[Makefile:3333: all-recursive]
Error 1</font></div>
<div><font face="monospace,
monospace">make[2]: Leaving
directory
'/home/emartenson/src/apl/src'</font></div>
<div><font face="monospace,
monospace">make[1]: ***
[Makefile:514: all-recursive]
Error 1</font></div>
<div><font face="monospace,
monospace">make[1]: Leaving
directory
'/home/emartenson/src/apl'</font></div>
<div><font face="monospace,
monospace">make: ***
[Makefile:401: all] Error 2</font></div>
</div>
<div><br>
</div>
<div>Regards,</div>
<div>Elias</div>
</div>
<div
class="m_-8255613701604448095m_-6718572597318133427HOEnZb">
<div
class="m_-8255613701604448095m_-6718572597318133427h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On 9
October 2017 at 00:47, Juergen
Sauermann <span dir="ltr"><<a
href="mailto:***@t-online.de" target="_blank"
moz-do-not-send="true">***@t-online.de</a><wbr>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"> <font
face="Helvetica, Arial,
sans-serif">Hi,<br>
<br>
I have merged Elias' <b>⎕RE</b>
implementation into GNU
APL.<br>
Thanks, Elias, for
contributing it. See <b>'info
apl</b><b>'</b> for a
description<br>
and <b>src/testcases/Q</b><b>uad_RE.tc</b>
for examples of how to use
<b>⎕RE</b>.<br>
<br>
<b>SVN 1012</b>.<br>
<br>
Enjoy,<br>
/// Jürgen<br>
<br>
</font> </div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>