Commit 2a590514 authored by Sergei Golubchik's avatar Sergei Golubchik

pcre-8.35

parents c9c9f513 8cc5973f
...@@ -8,7 +8,7 @@ Email domain: cam.ac.uk ...@@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service, University of Cambridge Computing Service,
Cambridge, England. Cambridge, England.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
All rights reserved All rights reserved
...@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg ...@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2010-2013 Zoltan Herczeg Copyright(c) 2010-2014 Zoltan Herczeg
All rights reserved. All rights reserved.
...@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg ...@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2009-2013 Zoltan Herczeg Copyright(c) 2009-2014 Zoltan Herczeg
All rights reserved. All rights reserved.
......
ChangeLog for PCRE ChangeLog for PCRE
------------------ ------------------
Version 8.35 04-April-2014
--------------------------
1. A new flag is set, when property checks are present in an XCLASS.
When this flag is not set, PCRE can perform certain optimizations
such as studying these XCLASS-es.
2. The auto-possessification of character sets were improved: a normal
and an extended character set can be compared now. Furthermore
the JIT compiler optimizes more character set checks.
3. Got rid of some compiler warnings for potentially uninitialized variables
that show up only when compiled with -O2.
4. A pattern such as (?=ab\K) that uses \K in an assertion can set the start
of a match later then the end of the match. The pcretest program was not
handling the case sensibly - it was outputting from the start to the next
binary zero. It now reports this situation in a message, and outputs the
text from the end to the start.
5. Fast forward search is improved in JIT. Instead of the first three
characters, any three characters with fixed position can be searched.
Search order: first, last, middle.
6. Improve character range checks in JIT. Characters are read by an inprecise
function now, which returns with an unknown value if the character code is
above a certain treshold (e.g: 256). The only limitation is that the value
must be bigger than the treshold as well. This function is useful, when
the characters above the treshold are handled in the same way.
7. The macros whose names start with RAWUCHAR are placeholders for a future
mode in which only the bottom 21 bits of 32-bit data items are used. To
make this more memorable for those maintaining the code, the names have
been changed to start with UCHAR21, and an extensive comment has been added
to their definition.
8. Add missing (new) files sljitNativeTILEGX.c and sljitNativeTILEGX-encoder.c
to the export list in Makefile.am (they were accidentally omitted from the
8.34 tarball).
9. The informational output from pcretest used the phrase "starting byte set"
which is inappropriate for the 16-bit and 32-bit libraries. As the output
for "first char" and "need char" really means "non-UTF-char", I've changed
"byte" to "char", and slightly reworded the output. The documentation about
these values has also been (I hope) clarified.
10. Another JIT related optimization: use table jumps for selecting the correct
backtracking path, when more than four alternatives are present inside a
bracket.
11. Empty match is not possible, when the minimum length is greater than zero,
and there is no \K in the pattern. JIT should avoid empty match checks in
such cases.
12. In a caseless character class with UCP support, when a character with more
than one alternative case was not the first character of a range, not all
the alternative cases were added to the class. For example, s and \x{17f}
are both alternative cases for S: the class [RST] was handled correctly,
but [R-T] was not.
13. The configure.ac file always checked for pthread support when JIT was
enabled. This is not used in Windows, so I have put this test inside a
check for the presence of windows.h (which was already tested for).
14. Improve pattern prefix search by a simplified Boyer-Moore algorithm in JIT.
The algorithm provides a way to skip certain starting offsets, and usually
faster than linear prefix searches.
15. Change 13 for 8.20 updated RunTest to check for the 'fr' locale as well
as for 'fr_FR' and 'french'. For some reason, however, it then used the
Windows-specific input and output files, which have 'french' screwed in.
So this could never have worked. One of the problems with locales is that
they aren't always the same. I have now updated RunTest so that it checks
the output of the locale test (test 3) against three different output
files, and it allows the test to pass if any one of them matches. With luck
this should make the test pass on some versions of Solaris where it was
failing. Because of the uncertainty, the script did not used to stop if
test 3 failed; it now does. If further versions of a French locale ever
come to light, they can now easily be added.
16. If --with-pcregrep-bufsize was given a non-integer value such as "50K",
there was a message during ./configure, but it did not stop. This now
provokes an error. The invalid example in README has been corrected.
If a value less than the minimum is given, the minimum value has always
been used, but now a warning is given.
17. If --enable-bsr-anycrlf was set, the special 16/32-bit test failed. This
was a bug in the test system, which is now fixed. Also, the list of various
configurations that are tested for each release did not have one with both
16/32 bits and --enable-bar-anycrlf. It now does.
18. pcretest was missing "-C bsr" for displaying the \R default setting.
19. Little endian PowerPC systems are supported now by the JIT compiler.
20. The fast forward newline mechanism could enter to an infinite loop on
certain invalid UTF-8 input. Although we don't support these cases
this issue can be fixed by a performance optimization.
21. Change 33 of 8.34 is not sufficient to ensure stack safety because it does
not take account if existing stack usage. There is now a new global
variable called pcre_stack_guard that can be set to point to an external
function to check stack availability. It is called at the start of
processing every parenthesized group.
22. A typo in the code meant that in ungreedy mode the max/min qualifier
behaved like a min-possessive qualifier, and, for example, /a{1,3}b/U did
not match "ab".
23. When UTF was disabled, the JIT program reported some incorrect compile
errors. These messages are silenced now.
24. Experimental support for ARM-64 and MIPS-64 has been added to the JIT
compiler.
25. Change all the temporary files used in RunGrepTest to be different to those
used by RunTest so that the tests can be run simultaneously, for example by
"make -j check".
Version 8.34 15-December-2013 Version 8.34 15-December-2013
----------------------------- -----------------------------
......
...@@ -12,8 +12,8 @@ without warranty of any kind. ...@@ -12,8 +12,8 @@ without warranty of any kind.
Basic Installation Basic Installation
================== ==================
Briefly, the shell commands `./configure; make; make install' should Briefly, the shell command `./configure && make && make install'
configure, build, and install this package. The following should configure, build, and install this package. The following
more-detailed instructions are generic; see the `README' file for more-detailed instructions are generic; see the `README' file for
instructions specific to this package. Some packages provide this instructions specific to this package. Some packages provide this
`INSTALL' file but do not implement all of the features documented `INSTALL' file but do not implement all of the features documented
......
...@@ -24,7 +24,7 @@ Email domain: cam.ac.uk ...@@ -24,7 +24,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service, University of Cambridge Computing Service,
Cambridge, England. Cambridge, England.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
All rights reserved. All rights reserved.
...@@ -35,7 +35,7 @@ Written by: Zoltan Herczeg ...@@ -35,7 +35,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2010-2013 Zoltan Herczeg Copyright(c) 2010-2014 Zoltan Herczeg
All rights reserved. All rights reserved.
...@@ -46,7 +46,7 @@ Written by: Zoltan Herczeg ...@@ -46,7 +46,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2009-2013 Zoltan Herczeg Copyright(c) 2009-2014 Zoltan Herczeg
All rights reserved. All rights reserved.
......
News about PCRE releases News about PCRE releases
------------------------ ------------------------
Release 8.35 04-April-2014
--------------------------
There have been performance improvements for classes containing non-ASCII
characters and the "auto-possessification" feature has been extended. Other
minor improvements have been implemented and bugs fixed. There is a new callout
feature to enable applications to do detailed stack checks at compile time, to
avoid running out of stack for deeply nested parentheses. The JIT compiler has
been extended with experimental support for ARM-64, MIPS-64, and PPC-LE.
Release 8.34 15-December-2013 Release 8.34 15-December-2013
----------------------------- -----------------------------
......
...@@ -85,11 +85,12 @@ documentation is supplied in two other forms: ...@@ -85,11 +85,12 @@ documentation is supplied in two other forms:
1. There are files called doc/pcre.txt, doc/pcregrep.txt, and 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
doc/pcretest.txt in the source distribution. The first of these is a doc/pcretest.txt in the source distribution. The first of these is a
concatenation of the text forms of all the section 3 man pages except concatenation of the text forms of all the section 3 man pages except
those that summarize individual functions. The other two are the text the listing of pcredemo.c and those that summarize individual functions.
forms of the section 1 man pages for the pcregrep and pcretest commands. The other two are the text forms of the section 1 man pages for the
These text forms are provided for ease of scanning with text editors or pcregrep and pcretest commands. These text forms are provided for ease of
similar tools. They are installed in <prefix>/share/doc/pcre, where scanning with text editors or similar tools. They are installed in
<prefix> is the installation prefix (defaulting to /usr/local). <prefix>/share/doc/pcre, where <prefix> is the installation prefix
(defaulting to /usr/local).
2. A set of files containing all the documentation in HTML form, hyperlinked 2. A set of files containing all the documentation in HTML form, hyperlinked
in various ways, and rooted in a file called index.html, is distributed in in various ways, and rooted in a file called index.html, is distributed in
...@@ -372,12 +373,12 @@ library. They are also documented in the pcrebuild man page. ...@@ -372,12 +373,12 @@ library. They are also documented in the pcrebuild man page.
Of course, the relevant libraries must be installed on your system. Of course, the relevant libraries must be installed on your system.
. The default size of internal buffer used by pcregrep can be set by, for . The default size (in bytes) of the internal buffer used by pcregrep can be
example: set by, for example:
--with-pcregrep-bufsize=50K --with-pcregrep-bufsize=51200
The default value is 20K. The value must be a plain integer. The default is 20480.
. It is possible to compile pcretest so that it links with the libreadline . It is possible to compile pcretest so that it links with the libreadline
or libedit libraries, by specifying, respectively, or libedit libraries, by specifying, respectively,
...@@ -987,4 +988,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx. ...@@ -987,4 +988,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 05 November 2013 Last updated: 17 January 2014
This diff is collapsed.
...@@ -31,6 +31,11 @@ ...@@ -31,6 +31,11 @@
# except test 10. Whatever order the arguments are in, the tests are always run # except test 10. Whatever order the arguments are in, the tests are always run
# in numerical order. # in numerical order.
# #
# The special argument "3S" runs test 3, stopping if it fails. Test 3 is the
# locale test, and failure usually means there's an issue with the locale
# rather than a bug in PCRE, so normally subsequent tests are run. "3S" is
# useful when you want to debug or update the test.
#
# Inappropriate tests are automatically skipped (with a comment to say so): for # Inappropriate tests are automatically skipped (with a comment to say so): for
# example, if JIT support is not compiled, test 12 is skipped, whereas if JIT # example, if JIT support is not compiled, test 12 is skipped, whereas if JIT
# support is compiled, test 13 is skipped. # support is compiled, test 13 is skipped.
...@@ -458,8 +463,9 @@ fi ...@@ -458,8 +463,9 @@ fi
# Locale-specific tests, provided that either the "fr_FR" or the "french" # Locale-specific tests, provided that either the "fr_FR" or the "french"
# locale is available. The former is the Unix-like standard; the latter is # locale is available. The former is the Unix-like standard; the latter is
# for Windows. Another possibility is "fr", which needs to be run against # for Windows. Another possibility is "fr". Unfortunately, different versions
# the Windows-specific input and output files. # of the French locale give different outputs for some items. This test passes
# if the output matches any one of the alternative output files.
if [ $do3 = yes ] ; then if [ $do3 = yes ] ; then
locale -a | grep '^fr_FR$' >/dev/null locale -a | grep '^fr_FR$' >/dev/null
...@@ -467,20 +473,28 @@ if [ $do3 = yes ] ; then ...@@ -467,20 +473,28 @@ if [ $do3 = yes ] ; then
locale=fr_FR locale=fr_FR
infile=$testdata/testinput3 infile=$testdata/testinput3
outfile=$testdata/testoutput3 outfile=$testdata/testoutput3
outfile2=$testdata/testoutput3A
outfile3=$testdata/testoutput3B
else else
infile=test3input infile=test3input
outfile=test3output outfile=test3output
outfile2=test3outputA
outfile3=test3outputB
locale -a | grep '^french$' >/dev/null locale -a | grep '^french$' >/dev/null
if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
locale=french locale=french
sed 's/fr_FR/french/' $testdata/testinput3 >test3input sed 's/fr_FR/french/' $testdata/testinput3 >test3input
sed 's/fr_FR/french/' $testdata/testoutput3 >test3output sed 's/fr_FR/french/' $testdata/testoutput3 >test3output
sed 's/fr_FR/french/' $testdata/testoutput3A >test3outputA
sed 's/fr_FR/french/' $testdata/testoutput3B >test3outputB
else else
locale -a | grep '^fr$' >/dev/null locale -a | grep '^fr$' >/dev/null
if [ $? -eq 0 ] ; then if [ $? -eq 0 ] ; then
locale=fr locale=fr
sed 's/fr_FR/fr/' $testdata/wintestinput3 >test3input sed 's/fr_FR/fr/' $testdata/intestinput3 >test3input
sed 's/fr_FR/fr/' $testdata/wintestoutput3 >test3output sed 's/fr_FR/fr/' $testdata/intestoutput3 >test3output
sed 's/fr_FR/fr/' $testdata/intestoutput3A >test3outputA
sed 's/fr_FR/fr/' $testdata/intestoutput3B >test3outputB
else else
locale= locale=
fi fi
...@@ -492,18 +506,20 @@ if [ $do3 = yes ] ; then ...@@ -492,18 +506,20 @@ if [ $do3 = yes ] ; then
for opt in "" "-s" $jitopt; do for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $infile testtry $sim $valgrind ./pcretest -q $bmode $opt $infile testtry
if [ $? = 0 ] ; then if [ $? = 0 ] ; then
$cf $outfile testtry if $cf $outfile testtry >teststdout || \
if [ $? != 0 ] ; then $cf $outfile2 testtry >teststdout || \
echo " " $cf $outfile3 testtry >teststdout
echo "Locale test did not run entirely successfully." then
echo "This usually means that there is a problem with the locale"
echo "settings rather than a bug in PCRE."
break;
else
if [ "$opt" = "-s" ] ; then echo " OK with study" if [ "$opt" = "-s" ] ; then echo " OK with study"
elif [ "$opt" = "-s+" ] ; then echo " OK with JIT study" elif [ "$opt" = "-s+" ] ; then echo " OK with JIT study"
else echo " OK" else echo " OK"
fi fi
else
echo "** Locale test did not run successfully. The output did not match"
echo " $outfile, $outfile2 or $outfile3."
echo " This may mean that there is a problem with the locale settings rather"
echo " than a bug in PCRE."
exit 1
fi fi
else exit 1 else exit 1
fi fi
...@@ -989,6 +1005,6 @@ fi ...@@ -989,6 +1005,6 @@ fi
done done
# Clean up local working files # Clean up local working files
rm -f test3input test3output testNinput testsaved* teststderr teststdout testtry rm -f test3input test3output test3outputA testNinput testsaved* teststderr teststdout testtry
# End # End
...@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might ...@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty. dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8]) m4_define(pcre_major, [8])
m4_define(pcre_minor, [34]) m4_define(pcre_minor, [35])
m4_define(pcre_prerelease, []) m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2013-12-15]) m4_define(pcre_date, [2014-04-04])
# NOTE: The CMakeLists.txt file searches for the above variables in the first # NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved. # 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age) # Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:2:2]) m4_define(libpcre_version, [3:3:2])
m4_define(libpcre16_version, [2:2:2]) m4_define(libpcre16_version, [2:3:2])
m4_define(libpcre32_version, [0:2:0]) m4_define(libpcre32_version, [0:3:0])
m4_define(libpcreposix_version, [0:2:0]) m4_define(libpcreposix_version, [0:2:0])
m4_define(libpcrecpp_version, [0:0:0]) m4_define(libpcrecpp_version, [0:0:0])
...@@ -248,7 +248,7 @@ AC_ARG_ENABLE(pcregrep-libbz2, ...@@ -248,7 +248,7 @@ AC_ARG_ENABLE(pcregrep-libbz2,
# Handle --with-pcregrep-bufsize=N # Handle --with-pcregrep-bufsize=N
AC_ARG_WITH(pcregrep-bufsize, AC_ARG_WITH(pcregrep-bufsize,
AS_HELP_STRING([--with-pcregrep-bufsize=N], AS_HELP_STRING([--with-pcregrep-bufsize=N],
[pcregrep buffer size (default=20480)]), [pcregrep buffer size (default=20480, minimum=8192)]),
, with_pcregrep_bufsize=20480) , with_pcregrep_bufsize=20480)
# Handle --enable-pcretest-libedit # Handle --enable-pcretest-libedit
...@@ -461,7 +461,8 @@ sure both macros are undefined; an emulation function will then be used. */]) ...@@ -461,7 +461,8 @@ sure both macros are undefined; an emulation function will then be used. */])
# Checks for header files. # Checks for header files.
AC_HEADER_STDC AC_HEADER_STDC
AC_CHECK_HEADERS(limits.h sys/types.h sys/stat.h dirent.h windows.h) AC_CHECK_HEADERS(limits.h sys/types.h sys/stat.h dirent.h)
AC_CHECK_HEADERS([windows.h], [HAVE_WINDOWS_H=1])
# The files below are C++ header files. # The files below are C++ header files.
pcre_have_type_traits="0" pcre_have_type_traits="0"
...@@ -686,11 +687,15 @@ if test "$enable_pcre32" = "yes"; then ...@@ -686,11 +687,15 @@ if test "$enable_pcre32" = "yes"; then
Define to any value to enable the 32 bit PCRE library.]) Define to any value to enable the 32 bit PCRE library.])
fi fi
# Unless running under Windows, JIT support requires pthreads.
if test "$enable_jit" = "yes"; then if test "$enable_jit" = "yes"; then
if test "$HAVE_WINDOWS_H" != "1"; then
AX_PTHREAD([], [AC_MSG_ERROR([JIT support requires pthreads])]) AX_PTHREAD([], [AC_MSG_ERROR([JIT support requires pthreads])])
CC="$PTHREAD_CC" CC="$PTHREAD_CC"
CFLAGS="$PTHREAD_CFLAGS $CFLAGS" CFLAGS="$PTHREAD_CFLAGS $CFLAGS"
LIBS="$PTHREAD_LIBS $LIBS" LIBS="$PTHREAD_LIBS $LIBS"
fi
AC_DEFINE([SUPPORT_JIT], [], [ AC_DEFINE([SUPPORT_JIT], [], [
Define to any value to enable support for Just-In-Time compiling.]) Define to any value to enable support for Just-In-Time compiling.])
else else
...@@ -739,7 +744,12 @@ if test "$enable_pcregrep_libbz2" = "yes"; then ...@@ -739,7 +744,12 @@ if test "$enable_pcregrep_libbz2" = "yes"; then
fi fi
if test $with_pcregrep_bufsize -lt 8192 ; then if test $with_pcregrep_bufsize -lt 8192 ; then
AC_MSG_WARN([$with_pcregrep_bufsize is too small for --with-pcregrep-bufsize; using 8192])
with_pcregrep_bufsize="8192" with_pcregrep_bufsize="8192"
else
if test $? -gt 1 ; then
AC_MSG_ERROR([Bad value for --with-pcregrep-bufsize])
fi
fi fi
AC_DEFINE_UNQUOTED([PCREGREP_BUFSIZE], [$with_pcregrep_bufsize], [ AC_DEFINE_UNQUOTED([PCREGREP_BUFSIZE], [$with_pcregrep_bufsize], [
......
...@@ -85,11 +85,12 @@ documentation is supplied in two other forms: ...@@ -85,11 +85,12 @@ documentation is supplied in two other forms:
1. There are files called doc/pcre.txt, doc/pcregrep.txt, and 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
doc/pcretest.txt in the source distribution. The first of these is a doc/pcretest.txt in the source distribution. The first of these is a
concatenation of the text forms of all the section 3 man pages except concatenation of the text forms of all the section 3 man pages except
those that summarize individual functions. The other two are the text the listing of pcredemo.c and those that summarize individual functions.
forms of the section 1 man pages for the pcregrep and pcretest commands. The other two are the text forms of the section 1 man pages for the
These text forms are provided for ease of scanning with text editors or pcregrep and pcretest commands. These text forms are provided for ease of
similar tools. They are installed in <prefix>/share/doc/pcre, where scanning with text editors or similar tools. They are installed in
<prefix> is the installation prefix (defaulting to /usr/local). <prefix>/share/doc/pcre, where <prefix> is the installation prefix
(defaulting to /usr/local).
2. A set of files containing all the documentation in HTML form, hyperlinked 2. A set of files containing all the documentation in HTML form, hyperlinked
in various ways, and rooted in a file called index.html, is distributed in in various ways, and rooted in a file called index.html, is distributed in
...@@ -372,12 +373,12 @@ library. They are also documented in the pcrebuild man page. ...@@ -372,12 +373,12 @@ library. They are also documented in the pcrebuild man page.
Of course, the relevant libraries must be installed on your system. Of course, the relevant libraries must be installed on your system.
. The default size of internal buffer used by pcregrep can be set by, for . The default size (in bytes) of the internal buffer used by pcregrep can be
example: set by, for example:
--with-pcregrep-bufsize=50K --with-pcregrep-bufsize=51200
The default value is 20K. The value must be a plain integer. The default is 20480.
. It is possible to compile pcretest so that it links with the libreadline . It is possible to compile pcretest so that it links with the libreadline
or libedit libraries, by specifying, respectively, or libedit libraries, by specifying, respectively,
...@@ -987,4 +988,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx. ...@@ -987,4 +988,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 05 November 2013 Last updated: 17 January 2014
...@@ -154,8 +154,11 @@ page. ...@@ -154,8 +154,11 @@ page.
The user documentation for PCRE comprises a number of different sections. In The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format, the "man" format, each of these is a separate "man page". In the HTML format,
each is a separate page, linked from the index page. In the plain text format, each is a separate page, linked from the index page. In the plain text format,
all the sections, except the <b>pcredemo</b> section, are concatenated, for ease the descriptions of the <b>pcregrep</b> and <b>pcretest</b> programs are in files
of searching. The sections are as follows: called <b>pcregrep.txt</b> and <b>pcretest.txt</b>, respectively. The remaining
sections, except for the <b>pcredemo</b> section (which is a program listing),
are concatenated in <b>pcre.txt</b>, for ease of searching. The sections are as
follows:
<pre> <pre>
pcre this document pcre this document
pcre-config show PCRE installation configuration information pcre-config show PCRE installation configuration information
...@@ -182,8 +185,8 @@ of searching. The sections are as follows: ...@@ -182,8 +185,8 @@ of searching. The sections are as follows:
pcretest description of the <b>pcretest</b> testing command pcretest description of the <b>pcretest</b> testing command
pcreunicode discussion of Unicode and UTF-8/16/32 support pcreunicode discussion of Unicode and UTF-8/16/32 support
</pre> </pre>
In addition, in the "man" and HTML formats, there is a short page for each In the "man" and HTML formats, there is also a short page for each C library
C library function, listing its arguments and results. function, listing its arguments and results.
</P> </P>
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br> <br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
<P> <P>
...@@ -201,9 +204,9 @@ two digits 10, at the domain cam.ac.uk. ...@@ -201,9 +204,9 @@ two digits 10, at the domain cam.ac.uk.
</P> </P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br> <br><a name="SEC5" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 13 May 2013 Last updated: 08 January 2014
<br> <br>
Copyright &copy; 1997-2013 University of Cambridge. Copyright &copy; 1997-2014 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE index page</a>. Return to the <a href="index.html">PCRE index page</a>.
......
...@@ -166,6 +166,9 @@ man page, in case the conversion went wrong. ...@@ -166,6 +166,9 @@ man page, in case the conversion went wrong.
<br> <br>
<br> <br>
<b>int (*pcre_callout)(pcre_callout_block *);</b> <b>int (*pcre_callout)(pcre_callout_block *);</b>
<br>
<br>
<b>int (*pcre_stack_guard)(void);</b>
</P> </P>
<br><a name="SEC5" href="#TOC1">PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br> <br><a name="SEC5" href="#TOC1">PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
<P> <P>
...@@ -324,6 +327,15 @@ by the caller to a "callout" function, which PCRE will then call at specified ...@@ -324,6 +327,15 @@ by the caller to a "callout" function, which PCRE will then call at specified
points during a matching operation. Details are given in the points during a matching operation. Details are given in the
<a href="pcrecallout.html"><b>pcrecallout</b></a> <a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation. documentation.
</P>
<P>
The global variable <b>pcre_stack_guard</b> initially contains NULL. It can be
set by the caller to a function that is called by PCRE whenever it starts
to compile a parenthesized part of a pattern. When parentheses are nested, PCRE
uses recursive function calls, which use up the system stack. This function is
provided so that applications with restricted stacks can force a compilation
error if the stack runs out. The function should return zero if all is well, or
non-zero to force an error.
<a name="newlines"></a></P> <a name="newlines"></a></P>
<br><a name="SEC7" href="#TOC1">NEWLINES</a><br> <br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
<P> <P>
...@@ -369,7 +381,8 @@ controlled in a similar way, but by separate options. ...@@ -369,7 +381,8 @@ controlled in a similar way, but by separate options.
The PCRE functions can be used in multi-threading applications, with the The PCRE functions can be used in multi-threading applications, with the
proviso that the memory management functions pointed to by <b>pcre_malloc</b>, proviso that the memory management functions pointed to by <b>pcre_malloc</b>,
<b>pcre_free</b>, <b>pcre_stack_malloc</b>, and <b>pcre_stack_free</b>, and the <b>pcre_free</b>, <b>pcre_stack_malloc</b>, and <b>pcre_stack_free</b>, and the
callout function pointed to by <b>pcre_callout</b>, are shared by all threads. callout and stack-checking functions pointed to by <b>pcre_callout</b> and
<b>pcre_stack_guard</b>, are shared by all threads.
</P> </P>
<P> <P>
The compiled form of a regular expression is not altered during matching, so The compiled form of a regular expression is not altered during matching, so
...@@ -489,7 +502,10 @@ documentation. ...@@ -489,7 +502,10 @@ documentation.
The output is a long integer that gives the maximum depth of nesting of The output is a long integer that gives the maximum depth of nesting of
parentheses (of any kind) in a pattern. This limit is imposed to cap the amount parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
of system stack used when a pattern is compiled. It is specified when PCRE is of system stack used when a pattern is compiled. It is specified when PCRE is
built; the default is 250. built; the default is 250. This limit does not take into account the stack that
may already be used by the calling application. For finer control over
compilation stack usage, you can set a pointer to an external checking function
in <b>pcre_stack_guard</b>.
<pre> <pre>
PCRE_CONFIG_MATCH_LIMIT PCRE_CONFIG_MATCH_LIMIT
</pre> </pre>
...@@ -1008,6 +1024,8 @@ have fallen out of use. To avoid confusion, they have not been re-used. ...@@ -1008,6 +1024,8 @@ have fallen out of use. To avoid confusion, they have not been re-used.
81 missing opening brace after \o 81 missing opening brace after \o
82 parentheses are too deeply nested 82 parentheses are too deeply nested
83 invalid range in character class 83 invalid range in character class
84 group name must start with a non-digit
85 parentheses are too deeply nested (stack check)
</pre> </pre>
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
be used if the limits were changed when PCRE was built. be used if the limits were changed when PCRE was built.
...@@ -1265,12 +1283,15 @@ information call is provided for internal use by the <b>pcre_study()</b> ...@@ -1265,12 +1283,15 @@ information call is provided for internal use by the <b>pcre_study()</b>
function. External callers can cause PCRE to use its internal tables by passing function. External callers can cause PCRE to use its internal tables by passing
a NULL table pointer. a NULL table pointer.
<pre> <pre>
PCRE_INFO_FIRSTBYTE PCRE_INFO_FIRSTBYTE (deprecated)
</pre> </pre>
Return information about the first data unit of any matched string, for a Return information about the first data unit of any matched string, for a
non-anchored pattern. (The name of this option refers to the 8-bit library, non-anchored pattern. The name of this option refers to the 8-bit library,
where data units are bytes.) The fourth argument should point to an <b>int</b> where data units are bytes. The fourth argument should point to an <b>int</b>
variable. variable. Negative values are used for special cases. However, this means that
when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of
characters cannot be returned. For this reason, this value is deprecated; use
PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead.
</P> </P>
<P> <P>
If there is a fixed first value, for example, the letter "c" from a pattern If there is a fixed first value, for example, the letter "c" from a pattern
...@@ -1293,12 +1314,43 @@ starts with "^", or ...@@ -1293,12 +1314,43 @@ starts with "^", or
-1 is returned, indicating that the pattern matches only at the start of a -1 is returned, indicating that the pattern matches only at the start of a
subject string or after any newline within the string. Otherwise -2 is subject string or after any newline within the string. Otherwise -2 is
returned. For anchored patterns, -2 is returned. returned. For anchored patterns, -2 is returned.
<pre>
PCRE_INFO_FIRSTCHARACTER
</pre>
Return the value of the first data unit (non-UTF character) of any matched
string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1;
otherwise return 0. The fourth argument should point to an <b>uint_t</b>
variable.
</P> </P>
<P> <P>
Since for the 32-bit library using the non-UTF-32 mode, this function is unable In the 8-bit library, the value is always less than 256. In the 16-bit library
to return the full 32-bit range of the character, this value is deprecated; the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
should be used. <pre>
PCRE_INFO_FIRSTCHARACTERFLAGS
</pre>
Return information about the first data unit of any matched string, for a
non-anchored pattern. The fourth argument should point to an <b>int</b>
variable.
</P>
<P>
If there is a fixed first value, for example, the letter "c" from a pattern
such as (cat|cow|coyote), 1 is returned, and the character value can be
retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and
if either
<br>
<br>
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
starts with "^", or
<br>
<br>
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
(if it were set, the pattern would be anchored),
<br>
<br>
2 is returned, indicating that the pattern matches only at the start of a
subject string or after any newline within the string. Otherwise 0 is
returned. For anchored patterns, 0 is returned.
<pre> <pre>
PCRE_INFO_FIRSTTABLE PCRE_INFO_FIRSTTABLE
</pre> </pre>
...@@ -1508,44 +1560,6 @@ above). The format of the <i>study_data</i> block is private, but its length ...@@ -1508,44 +1560,6 @@ above). The format of the <i>study_data</i> block is private, but its length
is made available via this option so that it can be saved and restored (see the is made available via this option so that it can be saved and restored (see the
<a href="pcreprecompile.html"><b>pcreprecompile</b></a> <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
documentation for details). documentation for details).
<pre>
PCRE_INFO_FIRSTCHARACTERFLAGS
</pre>
Return information about the first data unit of any matched string, for a
non-anchored pattern. The fourth argument should point to an <b>int</b>
variable.
</P>
<P>
If there is a fixed first value, for example, the letter "c" from a pattern
such as (cat|cow|coyote), 1 is returned, and the character value can be
retrieved using PCRE_INFO_FIRSTCHARACTER.
</P>
<P>
If there is no fixed first value, and if either
<br>
<br>
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
starts with "^", or
<br>
<br>
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
(if it were set, the pattern would be anchored),
<br>
<br>
2 is returned, indicating that the pattern matches only at the start of a
subject string or after any newline within the string. Otherwise 0 is
returned. For anchored patterns, 0 is returned.
<pre>
PCRE_INFO_FIRSTCHARACTER
</pre>
Return the fixed first character value in the situation where
PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
argument should point to an <b>uint_t</b> variable.
</P>
<P>
In the 8-bit library, the value is always less than 256. In the 16-bit library
the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
<pre> <pre>
PCRE_INFO_REQUIREDCHARFLAGS PCRE_INFO_REQUIREDCHARFLAGS
</pre> </pre>
...@@ -2899,9 +2913,9 @@ Cambridge CB2 3QH, England. ...@@ -2899,9 +2913,9 @@ Cambridge CB2 3QH, England.
</P> </P>
<br><a name="SEC26" href="#TOC1">REVISION</a><br> <br><a name="SEC26" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 12 November 2013 Last updated: 09 February 2014
<br> <br>
Copyright &copy; 1997-2013 University of Cambridge. Copyright &copy; 1997-2014 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE index page</a>. Return to the <a href="index.html">PCRE index page</a>.
......
...@@ -37,8 +37,10 @@ man page, in case the conversion went wrong. ...@@ -37,8 +37,10 @@ man page, in case the conversion went wrong.
<b>pcregrep</b> searches files for character patterns, in the same way as other <b>pcregrep</b> searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See patterns that are compatible with the regular expressions of Perl 5. See
<a href="pcresyntax.html"><b>pcresyntax</b>(3)</a>
for a quick-reference summary of pattern syntax, or
<a href="pcrepattern.html"><b>pcrepattern</b>(3)</a> <a href="pcrepattern.html"><b>pcrepattern</b>(3)</a>
for a full description of syntax and semantics of the regular expressions for a full description of the syntax and semantics of the regular expressions
that PCRE supports. that PCRE supports.
</P> </P>
<P> <P>
...@@ -748,9 +750,9 @@ Cambridge CB2 3QH, England. ...@@ -748,9 +750,9 @@ Cambridge CB2 3QH, England.
</P> </P>
<br><a name="SEC14" href="#TOC1">REVISION</a><br> <br><a name="SEC14" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 13 September 2012 Last updated: 03 April 2014
<br> <br>
Copyright &copy; 1997-2012 University of Cambridge. Copyright &copy; 1997-2014 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE index page</a>. Return to the <a href="index.html">PCRE index page</a>.
......
...@@ -1003,7 +1003,9 @@ matches "foobar", the first substring is still set to "foo". ...@@ -1003,7 +1003,9 @@ matches "foobar", the first substring is still set to "foo".
<P> <P>
Perl documents that the use of \K within assertions is "not well defined". In Perl documents that the use of \K within assertions is "not well defined". In
PCRE, \K is acted upon when it occurs inside positive assertions, but is PCRE, \K is acted upon when it occurs inside positive assertions, but is
ignored in negative assertions. ignored in negative assertions. Note that when a pattern such as (?=ab\K)
matches, the reported start of the match can be greater than the end of the
match.
<a name="smallassertions"></a></P> <a name="smallassertions"></a></P>
<br><b> <br><b>
Simple assertions Simple assertions
...@@ -2990,19 +2992,22 @@ match does not always guarantee that a match must be at this starting point. ...@@ -2990,19 +2992,22 @@ match does not always guarantee that a match must be at this starting point.
<P> <P>
Note that (*COMMIT) at the start of a pattern is not the same as an anchor, Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
unless PCRE's start-of-match optimizations are turned off, as shown in this unless PCRE's start-of-match optimizations are turned off, as shown in this
<b>pcretest</b> example: output from <b>pcretest</b>:
<pre> <pre>
re&#62; /(*COMMIT)abc/ re&#62; /(*COMMIT)abc/
data&#62; xyzabc data&#62; xyzabc
0: abc 0: abc
xyzabc\Y data&#62; xyzabc\Y
No match No match
</pre> </pre>
PCRE knows that any match must start with "a", so the optimization skips along For this pattern, PCRE knows that any match must start with "a", so the
the subject to "a" before running the first match attempt, which succeeds. When optimization skips along the subject to "a" before applying the pattern to the
the optimization is disabled by the \Y escape in the second subject, the match first set of data. The match attempt then succeeds. In the second set of data,
starts at "x" and so the (*COMMIT) causes it to fail without trying any other the escape sequence \Y is interpreted by the <b>pcretest</b> program. It causes
starting points. the PCRE_NO_START_OPTIMIZE option to be set when <b>pcre_exec()</b> is called.
This disables the optimization that skips along to the first character. The
pattern is now applied starting at "x", and so the (*COMMIT) causes the match
to fail without trying any other starting points.
<pre> <pre>
(*PRUNE) or (*PRUNE:NAME) (*PRUNE) or (*PRUNE:NAME)
</pre> </pre>
...@@ -3221,9 +3226,9 @@ Cambridge CB2 3QH, England. ...@@ -3221,9 +3226,9 @@ Cambridge CB2 3QH, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 03 December 2013 Last updated: 08 January 2014
<br> <br>
Copyright &copy; 1997-2013 University of Cambridge. Copyright &copy; 1997-2014 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE index page</a>. Return to the <a href="index.html">PCRE index page</a>.
......
...@@ -29,13 +29,13 @@ man page, in case the conversion went wrong. ...@@ -29,13 +29,13 @@ man page, in case the conversion went wrong.
<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a> <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
<li><a name="TOC15" href="#SEC15">COMMENT</a> <li><a name="TOC15" href="#SEC15">COMMENT</a>
<li><a name="TOC16" href="#SEC16">OPTION SETTING</a> <li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> <li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a> <li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> <li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a> <li><a name="TOC20" href="#SEC20">BACKREFERENCES</a>
<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a> <li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a> <li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a>
<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a> <li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a>
<li><a name="TOC24" href="#SEC24">CALLOUTS</a> <li><a name="TOC24" href="#SEC24">CALLOUTS</a>
<li><a name="TOC25" href="#SEC25">SEE ALSO</a> <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
<li><a name="TOC26" href="#SEC26">AUTHOR</a> <li><a name="TOC26" href="#SEC26">AUTHOR</a>
...@@ -339,7 +339,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use ...@@ -339,7 +339,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
<P> <P>
<pre> <pre>
\K reset start of match \K reset start of match
</PRE> </pre>
\K is honoured in positive assertions, but ignored in negative ones.
</P> </P>
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br> <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
<P> <P>
...@@ -382,11 +383,13 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use ...@@ -382,11 +383,13 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
(?x) extended (ignore white space) (?x) extended (ignore white space)
(?-...) unset option(s) (?-...) unset option(s)
</pre> </pre>
The following are recognized only at the start of a pattern or after one of the The following are recognized only at the very start of a pattern or after one
newline-setting options with similar syntax: of the newline or \R options with similar syntax. More than one of them may
appear.
<pre> <pre>
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
(*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
...@@ -397,7 +400,28 @@ newline-setting options with similar syntax: ...@@ -397,7 +400,28 @@ newline-setting options with similar syntax:
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
limits set by the caller of pcre_exec(), not increase them. limits set by the caller of pcre_exec(), not increase them.
</P> </P>
<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> <br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
<P>
These are recognized only at the very start of the pattern or after option
settings with a similar syntax.
<pre>
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
</PRE>
</P>
<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
These are recognized only at the very start of the pattern or after option
setting with a similar syntax.
<pre>
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
</PRE>
</P>
<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
<P> <P>
<pre> <pre>
(?=...) positive look ahead (?=...) positive look ahead
...@@ -407,7 +431,7 @@ limits set by the caller of pcre_exec(), not increase them. ...@@ -407,7 +431,7 @@ limits set by the caller of pcre_exec(), not increase them.
</pre> </pre>
Each top-level branch of a look behind must be of a fixed length. Each top-level branch of a look behind must be of a fixed length.
</P> </P>
<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br> <br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
<P> <P>
<pre> <pre>
\n reference by number (can be ambiguous) \n reference by number (can be ambiguous)
...@@ -421,7 +445,7 @@ Each top-level branch of a look behind must be of a fixed length. ...@@ -421,7 +445,7 @@ Each top-level branch of a look behind must be of a fixed length.
(?P=name) reference by name (Python) (?P=name) reference by name (Python)
</PRE> </PRE>
</P> </P>
<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> <br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
<P> <P>
<pre> <pre>
(?R) recurse whole pattern (?R) recurse whole pattern
...@@ -440,7 +464,7 @@ Each top-level branch of a look behind must be of a fixed length. ...@@ -440,7 +464,7 @@ Each top-level branch of a look behind must be of a fixed length.
\g'-n' call subpattern by relative number (PCRE extension) \g'-n' call subpattern by relative number (PCRE extension)
</PRE> </PRE>
</P> </P>
<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br> <br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br>
<P> <P>
<pre> <pre>
(?(condition)yes-pattern) (?(condition)yes-pattern)
...@@ -459,7 +483,7 @@ Each top-level branch of a look behind must be of a fixed length. ...@@ -459,7 +483,7 @@ Each top-level branch of a look behind must be of a fixed length.
(?(assert)... assertion condition (?(assert)... assertion condition
</PRE> </PRE>
</P> </P>
<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br> <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
<P> <P>
The following act immediately they are reached: The following act immediately they are reached:
<pre> <pre>
...@@ -482,27 +506,6 @@ pattern is not anchored. ...@@ -482,27 +506,6 @@ pattern is not anchored.
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
</PRE> </PRE>
</P> </P>
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
<P>
These are recognized only at the very start of the pattern or after a
(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
<pre>
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
</PRE>
</P>
<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
These are recognized only at the very start of the pattern or after a
(*...) option that sets the newline convention or a UTF or UCP mode.
<pre>
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
</PRE>
</P>
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
<P> <P>
<pre> <pre>
...@@ -526,9 +529,9 @@ Cambridge CB2 3QH, England. ...@@ -526,9 +529,9 @@ Cambridge CB2 3QH, England.
</P> </P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br> <br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 12 November 2013 Last updated: 08 January 2014
<br> <br>
Copyright &copy; 1997-2013 University of Cambridge. Copyright &copy; 1997-2014 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE index page</a>. Return to the <a href="index.html">PCRE index page</a>.
......
...@@ -138,6 +138,9 @@ following options output the value and set the exit code as indicated: ...@@ -138,6 +138,9 @@ following options output the value and set the exit code as indicated:
newline the default newline setting: newline the default newline setting:
CR, LF, CRLF, ANYCRLF, or ANY CR, LF, CRLF, ANYCRLF, or ANY
exit code is always 0 exit code is always 0
bsr the default setting for what \R matches:
ANYCRLF or ANY
exit code is always 0
</pre> </pre>
The following options output 1 for true or 0 for false, and set the exit code The following options output 1 for true or 0 for false, and set the exit code
to the same value: to the same value:
...@@ -373,6 +376,7 @@ sections. ...@@ -373,6 +376,7 @@ sections.
<b>/N</b> set PCRE_NO_AUTO_CAPTURE <b>/N</b> set PCRE_NO_AUTO_CAPTURE
<b>/O</b> set PCRE_NO_AUTO_POSSESS <b>/O</b> set PCRE_NO_AUTO_POSSESS
<b>/P</b> use the POSIX wrapper <b>/P</b> use the POSIX wrapper
<b>/Q</b> test external stack check function
<b>/S</b> study the pattern after compilation <b>/S</b> study the pattern after compilation
<b>/s</b> set PCRE_DOTALL <b>/s</b> set PCRE_DOTALL
<b>/T</b> select character tables <b>/T</b> select character tables
...@@ -534,7 +538,10 @@ below. ...@@ -534,7 +538,10 @@ below.
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
compiled pattern (whether it is anchored, has a fixed first character, and compiled pattern (whether it is anchored, has a fixed first character, and
so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a
pattern. If the pattern is studied, the results of that are also output. pattern. If the pattern is studied, the results of that are also output. In
this output, the word "char" means a non-UTF character, that is, the value of a
single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
being tested).
</P> </P>
<P> <P>
The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking
...@@ -568,6 +575,14 @@ successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the ...@@ -568,6 +575,14 @@ successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
JIT compiled code is also output. JIT compiled code is also output.
</P> </P>
<P> <P>
The <b>/Q</b> modifier is used to test the use of <b>pcre_stack_guard</b>. It
must be followed by '0' or '1', specifying the return code to be given from an
external function that is passed to PCRE and used for stack checking during
compilation (see the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation for details).
</P>
<P>
The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the
expression has been compiled, and the results used when the expression is expression has been compiled, and the results used when the expression is
matched. There are a number of qualifying characters that may follow <b>/S</b>. matched. There are a number of qualifying characters that may follow <b>/S</b>.
...@@ -1134,9 +1149,9 @@ Cambridge CB2 3QH, England. ...@@ -1134,9 +1149,9 @@ Cambridge CB2 3QH, England.
</P> </P>
<br><a name="SEC17" href="#TOC1">REVISION</a><br> <br><a name="SEC17" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 12 November 2013 Last updated: 09 February 2014
<br> <br>
Copyright &copy; 1997-2013 University of Cambridge. Copyright &copy; 1997-2014 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE index page</a>. Return to the <a href="index.html">PCRE index page</a>.
......
.TH PCRE 3 "01 Oct 2013" "PCRE 8.33" .TH PCRE 3 "08 January 2014" "PCRE 8.35"
.SH NAME .SH NAME
PCRE - Perl-compatible regular expressions PCRE - Perl-compatible regular expressions
.SH INTRODUCTION .SH INTRODUCTION
...@@ -158,8 +158,11 @@ page. ...@@ -158,8 +158,11 @@ page.
The user documentation for PCRE comprises a number of different sections. In The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format, the "man" format, each of these is a separate "man page". In the HTML format,
each is a separate page, linked from the index page. In the plain text format, each is a separate page, linked from the index page. In the plain text format,
all the sections, except the \fBpcredemo\fP section, are concatenated, for ease the descriptions of the \fBpcregrep\fP and \fBpcretest\fP programs are in files
of searching. The sections are as follows: called \fBpcregrep.txt\fP and \fBpcretest.txt\fP, respectively. The remaining
sections, except for the \fBpcredemo\fP section (which is a program listing),
are concatenated in \fBpcre.txt\fP, for ease of searching. The sections are as
follows:
.sp .sp
pcre this document pcre this document
pcre-config show PCRE installation configuration information pcre-config show PCRE installation configuration information
...@@ -188,8 +191,8 @@ of searching. The sections are as follows: ...@@ -188,8 +191,8 @@ of searching. The sections are as follows:
pcretest description of the \fBpcretest\fP testing command pcretest description of the \fBpcretest\fP testing command
pcreunicode discussion of Unicode and UTF-8/16/32 support pcreunicode discussion of Unicode and UTF-8/16/32 support
.sp .sp
In addition, in the "man" and HTML formats, there is a short page for each In the "man" and HTML formats, there is also a short page for each C library
C library function, listing its arguments and results. function, listing its arguments and results.
. .
. .
.SH AUTHOR .SH AUTHOR
...@@ -210,6 +213,6 @@ two digits 10, at the domain cam.ac.uk. ...@@ -210,6 +213,6 @@ two digits 10, at the domain cam.ac.uk.
.rs .rs
.sp .sp
.nf .nf
Last updated: 13 May 2013 Last updated: 08 January 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi
This diff is collapsed.
.TH PCREAPI 3 "12 November 2013" "PCRE 8.34" .TH PCREAPI 3 "09 February 2014" "PCRE 8.35"
.SH NAME .SH NAME
PCRE - Perl-compatible regular expressions PCRE - Perl-compatible regular expressions
.sp .sp
...@@ -116,6 +116,8 @@ PCRE - Perl-compatible regular expressions ...@@ -116,6 +116,8 @@ PCRE - Perl-compatible regular expressions
.B void (*pcre_stack_free)(void *); .B void (*pcre_stack_free)(void *);
.sp .sp
.B int (*pcre_callout)(pcre_callout_block *); .B int (*pcre_callout)(pcre_callout_block *);
.sp
.B int (*pcre_stack_guard)(void);
.fi .fi
. .
. .
...@@ -286,6 +288,14 @@ points during a matching operation. Details are given in the ...@@ -286,6 +288,14 @@ points during a matching operation. Details are given in the
\fBpcrecallout\fP \fBpcrecallout\fP
.\" .\"
documentation. documentation.
.P
The global variable \fBpcre_stack_guard\fP initially contains NULL. It can be
set by the caller to a function that is called by PCRE whenever it starts
to compile a parenthesized part of a pattern. When parentheses are nested, PCRE
uses recursive function calls, which use up the system stack. This function is
provided so that applications with restricted stacks can force a compilation
error if the stack runs out. The function should return zero if all is well, or
non-zero to force an error.
. .
. .
.\" HTML <a name="newlines"></a> .\" HTML <a name="newlines"></a>
...@@ -337,7 +347,8 @@ controlled in a similar way, but by separate options. ...@@ -337,7 +347,8 @@ controlled in a similar way, but by separate options.
The PCRE functions can be used in multi-threading applications, with the The PCRE functions can be used in multi-threading applications, with the
proviso that the memory management functions pointed to by \fBpcre_malloc\fP, proviso that the memory management functions pointed to by \fBpcre_malloc\fP,
\fBpcre_free\fP, \fBpcre_stack_malloc\fP, and \fBpcre_stack_free\fP, and the \fBpcre_free\fP, \fBpcre_stack_malloc\fP, and \fBpcre_stack_free\fP, and the
callout function pointed to by \fBpcre_callout\fP, are shared by all threads. callout and stack-checking functions pointed to by \fBpcre_callout\fP and
\fBpcre_stack_guard\fP, are shared by all threads.
.P .P
The compiled form of a regular expression is not altered during matching, so The compiled form of a regular expression is not altered during matching, so
the same compiled pattern can safely be used by several threads at once. the same compiled pattern can safely be used by several threads at once.
...@@ -465,7 +476,10 @@ documentation. ...@@ -465,7 +476,10 @@ documentation.
The output is a long integer that gives the maximum depth of nesting of The output is a long integer that gives the maximum depth of nesting of
parentheses (of any kind) in a pattern. This limit is imposed to cap the amount parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
of system stack used when a pattern is compiled. It is specified when PCRE is of system stack used when a pattern is compiled. It is specified when PCRE is
built; the default is 250. built; the default is 250. This limit does not take into account the stack that
may already be used by the calling application. For finer control over
compilation stack usage, you can set a pointer to an external checking function
in \fBpcre_stack_guard\fP.
.sp .sp
PCRE_CONFIG_MATCH_LIMIT PCRE_CONFIG_MATCH_LIMIT
.sp .sp
...@@ -991,6 +1005,8 @@ have fallen out of use. To avoid confusion, they have not been re-used. ...@@ -991,6 +1005,8 @@ have fallen out of use. To avoid confusion, they have not been re-used.
81 missing opening brace after \eo 81 missing opening brace after \eo
82 parentheses are too deeply nested 82 parentheses are too deeply nested
83 invalid range in character class 83 invalid range in character class
84 group name must start with a non-digit
85 parentheses are too deeply nested (stack check)
.sp .sp
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
be used if the limits were changed when PCRE was built. be used if the limits were changed when PCRE was built.
...@@ -1248,12 +1264,15 @@ information call is provided for internal use by the \fBpcre_study()\fP ...@@ -1248,12 +1264,15 @@ information call is provided for internal use by the \fBpcre_study()\fP
function. External callers can cause PCRE to use its internal tables by passing function. External callers can cause PCRE to use its internal tables by passing
a NULL table pointer. a NULL table pointer.
.sp .sp
PCRE_INFO_FIRSTBYTE PCRE_INFO_FIRSTBYTE (deprecated)
.sp .sp
Return information about the first data unit of any matched string, for a Return information about the first data unit of any matched string, for a
non-anchored pattern. (The name of this option refers to the 8-bit library, non-anchored pattern. The name of this option refers to the 8-bit library,
where data units are bytes.) The fourth argument should point to an \fBint\fP where data units are bytes. The fourth argument should point to an \fBint\fP
variable. variable. Negative values are used for special cases. However, this means that
when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of
characters cannot be returned. For this reason, this value is deprecated; use
PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead.
.P .P
If there is a fixed first value, for example, the letter "c" from a pattern If there is a fixed first value, for example, the letter "c" from a pattern
such as (cat|cow|coyote), its value is returned. In the 8-bit library, the such as (cat|cow|coyote), its value is returned. In the 8-bit library, the
...@@ -1271,11 +1290,38 @@ starts with "^", or ...@@ -1271,11 +1290,38 @@ starts with "^", or
-1 is returned, indicating that the pattern matches only at the start of a -1 is returned, indicating that the pattern matches only at the start of a
subject string or after any newline within the string. Otherwise -2 is subject string or after any newline within the string. Otherwise -2 is
returned. For anchored patterns, -2 is returned. returned. For anchored patterns, -2 is returned.
.sp
PCRE_INFO_FIRSTCHARACTER
.sp
Return the value of the first data unit (non-UTF character) of any matched
string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1;
otherwise return 0. The fourth argument should point to an \fBuint_t\fP
variable.
.P .P
Since for the 32-bit library using the non-UTF-32 mode, this function is unable In the 8-bit library, the value is always less than 256. In the 16-bit library
to return the full 32-bit range of the character, this value is deprecated; the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
should be used. .sp
PCRE_INFO_FIRSTCHARACTERFLAGS
.sp
Return information about the first data unit of any matched string, for a
non-anchored pattern. The fourth argument should point to an \fBint\fP
variable.
.P
If there is a fixed first value, for example, the letter "c" from a pattern
such as (cat|cow|coyote), 1 is returned, and the character value can be
retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and
if either
.sp
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
starts with "^", or
.sp
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
(if it were set, the pattern would be anchored),
.sp
2 is returned, indicating that the pattern matches only at the start of a
subject string or after any newline within the string. Otherwise 0 is
returned. For anchored patterns, 0 is returned.
.sp .sp
PCRE_INFO_FIRSTTABLE PCRE_INFO_FIRSTTABLE
.sp .sp
...@@ -1498,38 +1544,6 @@ is made available via this option so that it can be saved and restored (see the ...@@ -1498,38 +1544,6 @@ is made available via this option so that it can be saved and restored (see the
\fBpcreprecompile\fP \fBpcreprecompile\fP
.\" .\"
documentation for details). documentation for details).
.sp
PCRE_INFO_FIRSTCHARACTERFLAGS
.sp
Return information about the first data unit of any matched string, for a
non-anchored pattern. The fourth argument should point to an \fBint\fP
variable.
.P
If there is a fixed first value, for example, the letter "c" from a pattern
such as (cat|cow|coyote), 1 is returned, and the character value can be
retrieved using PCRE_INFO_FIRSTCHARACTER.
.P
If there is no fixed first value, and if either
.sp
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
starts with "^", or
.sp
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
(if it were set, the pattern would be anchored),
.sp
2 is returned, indicating that the pattern matches only at the start of a
subject string or after any newline within the string. Otherwise 0 is
returned. For anchored patterns, 0 is returned.
.sp
PCRE_INFO_FIRSTCHARACTER
.sp
Return the fixed first character value in the situation where
PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
argument should point to an \fBuint_t\fP variable.
.P
In the 8-bit library, the value is always less than 256. In the 16-bit library
the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
.sp .sp
PCRE_INFO_REQUIREDCHARFLAGS PCRE_INFO_REQUIREDCHARFLAGS
.sp .sp
...@@ -2900,6 +2914,6 @@ Cambridge CB2 3QH, England. ...@@ -2900,6 +2914,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 November 2013 Last updated: 09 February 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi
.TH PCREGREP 1 "13 September 2012" "PCRE 8.32" .TH PCREGREP 1 "03 April 2014" "PCRE 8.35"
.SH NAME .SH NAME
pcregrep - a grep with Perl-compatible regular expressions. pcregrep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
...@@ -11,9 +11,13 @@ pcregrep - a grep with Perl-compatible regular expressions. ...@@ -11,9 +11,13 @@ pcregrep - a grep with Perl-compatible regular expressions.
grep commands do, but it uses the PCRE regular expression library to support grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See patterns that are compatible with the regular expressions of Perl 5. See
.\" HREF .\" HREF
\fBpcresyntax\fP(3)
.\"
for a quick-reference summary of pattern syntax, or
.\" HREF
\fBpcrepattern\fP(3) \fBpcrepattern\fP(3)
.\" .\"
for a full description of syntax and semantics of the regular expressions for a full description of the syntax and semantics of the regular expressions
that PCRE supports. that PCRE supports.
.P .P
Patterns, whether supplied on the command line or in a separate file, are given Patterns, whether supplied on the command line or in a separate file, are given
...@@ -674,6 +678,6 @@ Cambridge CB2 3QH, England. ...@@ -674,6 +678,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 13 September 2012 Last updated: 03 April 2014
Copyright (c) 1997-2012 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi
...@@ -14,7 +14,8 @@ DESCRIPTION ...@@ -14,7 +14,8 @@ DESCRIPTION
pcregrep searches files for character patterns, in the same way as pcregrep searches files for character patterns, in the same way as
other grep commands do, but it uses the PCRE regular expression library other grep commands do, but it uses the PCRE regular expression library
to support patterns that are compatible with the regular expressions of to support patterns that are compatible with the regular expressions of
Perl 5. See pcrepattern(3) for a full description of syntax and seman- Perl 5. See pcresyntax(3) for a quick-reference summary of pattern syn-
tax, or pcrepattern(3) for a full description of the syntax and seman-
tics of the regular expressions that PCRE supports. tics of the regular expressions that PCRE supports.
Patterns, whether supplied on the command line or in a separate file, Patterns, whether supplied on the command line or in a separate file,
...@@ -736,5 +737,5 @@ AUTHOR ...@@ -736,5 +737,5 @@ AUTHOR
REVISION REVISION
Last updated: 13 September 2012 Last updated: 03 April 2014
Copyright (c) 1997-2012 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.TH PCREPATTERN 3 "03 December 2013" "PCRE 8.34" .TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
.SH NAME .SH NAME
PCRE - Perl-compatible regular expressions PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS" .SH "PCRE REGULAR EXPRESSION DETAILS"
...@@ -1004,7 +1004,9 @@ matches "foobar", the first substring is still set to "foo". ...@@ -1004,7 +1004,9 @@ matches "foobar", the first substring is still set to "foo".
.P .P
Perl documents that the use of \eK within assertions is "not well defined". In Perl documents that the use of \eK within assertions is "not well defined". In
PCRE, \eK is acted upon when it occurs inside positive assertions, but is PCRE, \eK is acted upon when it occurs inside positive assertions, but is
ignored in negative assertions. ignored in negative assertions. Note that when a pattern such as (?=ab\eK)
matches, the reported start of the match can be greater than the end of the
match.
. .
. .
.\" HTML <a name="smallassertions"></a> .\" HTML <a name="smallassertions"></a>
...@@ -3028,19 +3030,22 @@ match does not always guarantee that a match must be at this starting point. ...@@ -3028,19 +3030,22 @@ match does not always guarantee that a match must be at this starting point.
.P .P
Note that (*COMMIT) at the start of a pattern is not the same as an anchor, Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
unless PCRE's start-of-match optimizations are turned off, as shown in this unless PCRE's start-of-match optimizations are turned off, as shown in this
\fBpcretest\fP example: output from \fBpcretest\fP:
.sp .sp
re> /(*COMMIT)abc/ re> /(*COMMIT)abc/
data> xyzabc data> xyzabc
0: abc 0: abc
xyzabc\eY data> xyzabc\eY
No match No match
.sp .sp
PCRE knows that any match must start with "a", so the optimization skips along For this pattern, PCRE knows that any match must start with "a", so the
the subject to "a" before running the first match attempt, which succeeds. When optimization skips along the subject to "a" before applying the pattern to the
the optimization is disabled by the \eY escape in the second subject, the match first set of data. The match attempt then succeeds. In the second set of data,
starts at "x" and so the (*COMMIT) causes it to fail without trying any other the escape sequence \eY is interpreted by the \fBpcretest\fP program. It causes
starting points. the PCRE_NO_START_OPTIMIZE option to be set when \fBpcre_exec()\fP is called.
This disables the optimization that skips along to the first character. The
pattern is now applied starting at "x", and so the (*COMMIT) causes the match
to fail without trying any other starting points.
.sp .sp
(*PRUNE) or (*PRUNE:NAME) (*PRUNE) or (*PRUNE:NAME)
.sp .sp
...@@ -3255,6 +3260,6 @@ Cambridge CB2 3QH, England. ...@@ -3255,6 +3260,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 03 December 2013 Last updated: 08 January 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi
.TH PCRESYNTAX 3 "12 November 2013" "PCRE 8.34" .TH PCRESYNTAX 3 "08 January 2014" "PCRE 8.35"
.SH NAME .SH NAME
PCRE - Perl-compatible regular expressions PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
...@@ -309,6 +309,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use ...@@ -309,6 +309,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
.rs .rs
.sp .sp
\eK reset start of match \eK reset start of match
.sp
\eK is honoured in positive assertions, but ignored in negative ones.
. .
. .
.SH "ALTERNATION" .SH "ALTERNATION"
...@@ -354,11 +356,13 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use ...@@ -354,11 +356,13 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
(?x) extended (ignore white space) (?x) extended (ignore white space)
(?-...) unset option(s) (?-...) unset option(s)
.sp .sp
The following are recognized only at the start of a pattern or after one of the The following are recognized only at the very start of a pattern or after one
newline-setting options with similar syntax: of the newline or \eR options with similar syntax. More than one of them may
appear.
.sp .sp
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
(*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
...@@ -370,6 +374,29 @@ Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the ...@@ -370,6 +374,29 @@ Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
limits set by the caller of pcre_exec(), not increase them. limits set by the caller of pcre_exec(), not increase them.
. .
. .
.SH "NEWLINE CONVENTION"
.rs
.sp
These are recognized only at the very start of the pattern or after option
settings with a similar syntax.
.sp
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
.
.
.SH "WHAT \eR MATCHES"
.rs
.sp
These are recognized only at the very start of the pattern or after option
setting with a similar syntax.
.sp
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
.
.
.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS" .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
.rs .rs
.sp .sp
...@@ -457,29 +484,6 @@ pattern is not anchored. ...@@ -457,29 +484,6 @@ pattern is not anchored.
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
. .
. .
.SH "NEWLINE CONVENTIONS"
.rs
.sp
These are recognized only at the very start of the pattern or after a
(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
.sp
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
.
.
.SH "WHAT \eR MATCHES"
.rs
.sp
These are recognized only at the very start of the pattern or after a
(*...) option that sets the newline convention or a UTF or UCP mode.
.sp
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
.
.
.SH "CALLOUTS" .SH "CALLOUTS"
.rs .rs
.sp .sp
...@@ -508,6 +512,6 @@ Cambridge CB2 3QH, England. ...@@ -508,6 +512,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 November 2013 Last updated: 08 January 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi
.TH PCRETEST 1 "12 November 2013" "PCRE 8.34" .TH PCRETEST 1 "09 February 2014" "PCRE 8.35"
.SH NAME .SH NAME
pcretest - a program for testing Perl-compatible regular expressions. pcretest - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
...@@ -113,6 +113,9 @@ following options output the value and set the exit code as indicated: ...@@ -113,6 +113,9 @@ following options output the value and set the exit code as indicated:
newline the default newline setting: newline the default newline setting:
CR, LF, CRLF, ANYCRLF, or ANY CR, LF, CRLF, ANYCRLF, or ANY
exit code is always 0 exit code is always 0
bsr the default setting for what \eR matches:
ANYCRLF or ANY
exit code is always 0
.sp .sp
The following options output 1 for true or 0 for false, and set the exit code The following options output 1 for true or 0 for false, and set the exit code
to the same value: to the same value:
...@@ -330,6 +333,7 @@ sections. ...@@ -330,6 +333,7 @@ sections.
\fB/N\fP set PCRE_NO_AUTO_CAPTURE \fB/N\fP set PCRE_NO_AUTO_CAPTURE
\fB/O\fP set PCRE_NO_AUTO_POSSESS \fB/O\fP set PCRE_NO_AUTO_POSSESS
\fB/P\fP use the POSIX wrapper \fB/P\fP use the POSIX wrapper
\fB/Q\fP test external stack check function
\fB/S\fP study the pattern after compilation \fB/S\fP study the pattern after compilation
\fB/s\fP set PCRE_DOTALL \fB/s\fP set PCRE_DOTALL
\fB/T\fP select character tables \fB/T\fP select character tables
...@@ -483,7 +487,10 @@ below. ...@@ -483,7 +487,10 @@ below.
The \fB/I\fP modifier requests that \fBpcretest\fP output information about the The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
compiled pattern (whether it is anchored, has a fixed first character, and compiled pattern (whether it is anchored, has a fixed first character, and
so on). It does this by calling \fBpcre[16|32]_fullinfo()\fP after compiling a so on). It does this by calling \fBpcre[16|32]_fullinfo()\fP after compiling a
pattern. If the pattern is studied, the results of that are also output. pattern. If the pattern is studied, the results of that are also output. In
this output, the word "char" means a non-UTF character, that is, the value of a
single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
being tested).
.P .P
The \fB/K\fP modifier requests \fBpcretest\fP to show names from backtracking The \fB/K\fP modifier requests \fBpcretest\fP to show names from backtracking
control verbs that are returned from calls to \fBpcre[16|32]_exec()\fP. It causes control verbs that are returned from calls to \fBpcre[16|32]_exec()\fP. It causes
...@@ -513,6 +520,15 @@ the compiled pattern to be output. This does not include the size of the ...@@ -513,6 +520,15 @@ the compiled pattern to be output. This does not include the size of the
successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
JIT compiled code is also output. JIT compiled code is also output.
.P .P
The \fB/Q\fP modifier is used to test the use of \fBpcre_stack_guard\fP. It
must be followed by '0' or '1', specifying the return code to be given from an
external function that is passed to PCRE and used for stack checking during
compilation (see the
.\" HREF
\fBpcreapi\fP
.\"
documentation for details).
.P
The \fB/S\fP modifier causes \fBpcre[16|32]_study()\fP to be called after the The \fB/S\fP modifier causes \fBpcre[16|32]_study()\fP to be called after the
expression has been compiled, and the results used when the expression is expression has been compiled, and the results used when the expression is
matched. There are a number of qualifying characters that may follow \fB/S\fP. matched. There are a number of qualifying characters that may follow \fB/S\fP.
...@@ -1135,6 +1151,6 @@ Cambridge CB2 3QH, England. ...@@ -1135,6 +1151,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 November 2013 Last updated: 09 February 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi
...@@ -99,6 +99,9 @@ COMMAND LINE OPTIONS ...@@ -99,6 +99,9 @@ COMMAND LINE OPTIONS
newline the default newline setting: newline the default newline setting:
CR, LF, CRLF, ANYCRLF, or ANY CR, LF, CRLF, ANYCRLF, or ANY
exit code is always 0 exit code is always 0
bsr the default setting for what \R matches:
ANYCRLF or ANY
exit code is always 0
The following options output 1 for true or 0 for false, and The following options output 1 for true or 0 for false, and
set the exit code to the same value: set the exit code to the same value:
...@@ -316,6 +319,7 @@ PATTERN MODIFIERS ...@@ -316,6 +319,7 @@ PATTERN MODIFIERS
/N set PCRE_NO_AUTO_CAPTURE /N set PCRE_NO_AUTO_CAPTURE
/O set PCRE_NO_AUTO_POSSESS /O set PCRE_NO_AUTO_POSSESS
/P use the POSIX wrapper /P use the POSIX wrapper
/Q test external stack check function
/S study the pattern after compilation /S study the pattern after compilation
/s set PCRE_DOTALL /s set PCRE_DOTALL
/T select character tables /T select character tables
...@@ -462,7 +466,9 @@ PATTERN MODIFIERS ...@@ -462,7 +466,9 @@ PATTERN MODIFIERS
compiled pattern (whether it is anchored, has a fixed first character, compiled pattern (whether it is anchored, has a fixed first character,
and so on). It does this by calling pcre[16|32]_fullinfo() after com- and so on). It does this by calling pcre[16|32]_fullinfo() after com-
piling a pattern. If the pattern is studied, the results of that are piling a pattern. If the pattern is studied, the results of that are
also output. also output. In this output, the word "char" means a non-UTF character,
that is, the value of a single data item (8-bit, 16-bit, or 32-bit,
depending on the library that is being tested).
The /K modifier requests pcretest to show names from backtracking con- The /K modifier requests pcretest to show names from backtracking con-
trol verbs that are returned from calls to pcre[16|32]_exec(). It trol verbs that are returned from calls to pcre[16|32]_exec(). It
...@@ -493,6 +499,11 @@ PATTERN MODIFIERS ...@@ -493,6 +499,11 @@ PATTERN MODIFIERS
pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option, pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option,
the size of the JIT compiled code is also output. the size of the JIT compiled code is also output.
The /Q modifier is used to test the use of pcre_stack_guard. It must be
followed by '0' or '1', specifying the return code to be given from an
external function that is passed to PCRE and used for stack checking
during compilation (see the pcreapi documentation for details).
The /S modifier causes pcre[16|32]_study() to be called after the The /S modifier causes pcre[16|32]_study() to be called after the
expression has been compiled, and the results used when the expression expression has been compiled, and the results used when the expression
is matched. There are a number of qualifying characters that may follow is matched. There are a number of qualifying characters that may follow
...@@ -1072,5 +1083,5 @@ AUTHOR ...@@ -1072,5 +1083,5 @@ AUTHOR
REVISION REVISION
Last updated: 12 November 2013 Last updated: 09 February 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
=== modified file 'pcre/pcre.h.in'
--- pcre/pcre.h.in 2013-09-26 14:02:17 +0000
+++ pcre/pcre.h.in 2013-10-02 07:58:29 +0000
@@ -486,6 +486,7 @@ PCRE_EXP_DECL void (*pcre_free)(void *)
PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_stack_free)(void *);
PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *);
+PCRE_EXP_DECL int (*pcre_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre16_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_free)(void *);
@@ -504,6 +505,7 @@ PCRE_EXP_DECL void pcre_free(void *);
PCRE_EXP_DECL void *pcre_stack_malloc(size_t);
PCRE_EXP_DECL void pcre_stack_free(void *);
PCRE_EXP_DECL int pcre_callout(pcre_callout_block *);
+PCRE_EXP_DECL int pcre_stack_guard(void);
PCRE_EXP_DECL void *pcre16_malloc(size_t);
PCRE_EXP_DECL void pcre16_free(void *);
=== modified file 'pcre/pcre_compile.c'
--- pcre/pcre_compile.c 2013-09-26 14:02:17 +0000
+++ pcre/pcre_compile.c 2013-10-02 07:58:29 +0000
@@ -7107,6 +7107,12 @@ unsigned int orig_bracount;
unsigned int max_bracount;
branch_chain bc;
+if (pcre_stack_guard && pcre_stack_guard())
+{
+ *errorcodeptr= ERR23;
+ return FALSE;
+}
+
bc.outer = bcptr;
bc.current_branch = code;
=== modified file 'pcre/pcre_globals.c'
--- pcre/pcre_globals.c 2013-09-26 14:02:17 +0000
+++ pcre/pcre_globals.c 2013-10-02 07:58:29 +0000
@@ -72,6 +72,7 @@ PCRE_EXP_DATA_DEFN void (*PUBL(free))(v
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = LocalPcreFree;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
+PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#elif !defined VPCOMPAT
PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc;
@@ -79,6 +80,7 @@ PCRE_EXP_DATA_DEFN void (*PUBL(free))(v
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = free;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
+PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#endif
/* End of pcre_globals.c */
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, to be #included by /* This is the public header file for the PCRE library, to be #included by
applications that call the PCRE functions. applications that call the PCRE functions.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -498,12 +498,14 @@ PCRE_EXP_DECL void (*pcre16_free)(void *); ...@@ -498,12 +498,14 @@ PCRE_EXP_DECL void (*pcre16_free)(void *);
PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t); PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_stack_free)(void *); PCRE_EXP_DECL void (*pcre16_stack_free)(void *);
PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *); PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *);
PCRE_EXP_DECL int (*pcre16_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre32_malloc)(size_t); PCRE_EXP_DECL void *(*pcre32_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_free)(void *); PCRE_EXP_DECL void (*pcre32_free)(void *);
PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t); PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_stack_free)(void *); PCRE_EXP_DECL void (*pcre32_stack_free)(void *);
PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *); PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *);
PCRE_EXP_DECL int (*pcre32_stack_guard)(void);
#else /* VPCOMPAT */ #else /* VPCOMPAT */
PCRE_EXP_DECL void *pcre_malloc(size_t); PCRE_EXP_DECL void *pcre_malloc(size_t);
PCRE_EXP_DECL void pcre_free(void *); PCRE_EXP_DECL void pcre_free(void *);
...@@ -517,12 +519,14 @@ PCRE_EXP_DECL void pcre16_free(void *); ...@@ -517,12 +519,14 @@ PCRE_EXP_DECL void pcre16_free(void *);
PCRE_EXP_DECL void *pcre16_stack_malloc(size_t); PCRE_EXP_DECL void *pcre16_stack_malloc(size_t);
PCRE_EXP_DECL void pcre16_stack_free(void *); PCRE_EXP_DECL void pcre16_stack_free(void *);
PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *); PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *);
PCRE_EXP_DECL int pcre16_stack_guard(void);
PCRE_EXP_DECL void *pcre32_malloc(size_t); PCRE_EXP_DECL void *pcre32_malloc(size_t);
PCRE_EXP_DECL void pcre32_free(void *); PCRE_EXP_DECL void pcre32_free(void *);
PCRE_EXP_DECL void *pcre32_stack_malloc(size_t); PCRE_EXP_DECL void *pcre32_stack_malloc(size_t);
PCRE_EXP_DECL void pcre32_stack_free(void *); PCRE_EXP_DECL void pcre32_stack_free(void *);
PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *); PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *);
PCRE_EXP_DECL int pcre32_stack_guard(void);
#endif /* VPCOMPAT */ #endif /* VPCOMPAT */
/* User defined callback which provides a stack just before the match starts. */ /* User defined callback which provides a stack just before the match starts. */
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -311,9 +311,9 @@ while(TRUE) ...@@ -311,9 +311,9 @@ while(TRUE)
ptr++; ptr++;
} }
/* Control should never reach here in 16/32 bit mode. */ /* Control should never reach here in 16/32 bit mode. */
#endif /* !COMPILE_PCRE8 */ #else /* In 8-bit mode, the pattern does not need to be processed. */
return 0; return 0;
#endif /* !COMPILE_PCRE8 */
} }
/* End of pcre_byte_order.c */ /* End of pcre_byte_order.c */
This diff is collapsed.
...@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language (but see ...@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language (but see
below for why this module is different). below for why this module is different).
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -1473,7 +1473,7 @@ for (;;) ...@@ -1473,7 +1473,7 @@ for (;;)
goto ANYNL01; goto ANYNL01;
case CHAR_CR: case CHAR_CR:
if (ptr + 1 < end_subject && RAWUCHARTEST(ptr + 1) == CHAR_LF) ncount = 1; if (ptr + 1 < end_subject && UCHAR21TEST(ptr + 1) == CHAR_LF) ncount = 1;
/* Fall through */ /* Fall through */
ANYNL01: ANYNL01:
...@@ -1742,7 +1742,7 @@ for (;;) ...@@ -1742,7 +1742,7 @@ for (;;)
goto ANYNL02; goto ANYNL02;
case CHAR_CR: case CHAR_CR:
if (ptr + 1 < end_subject && RAWUCHARTEST(ptr + 1) == CHAR_LF) ncount = 1; if (ptr + 1 < end_subject && UCHAR21TEST(ptr + 1) == CHAR_LF) ncount = 1;
/* Fall through */ /* Fall through */
ANYNL02: ANYNL02:
...@@ -2012,7 +2012,7 @@ for (;;) ...@@ -2012,7 +2012,7 @@ for (;;)
goto ANYNL03; goto ANYNL03;
case CHAR_CR: case CHAR_CR:
if (ptr + 1 < end_subject && RAWUCHARTEST(ptr + 1) == CHAR_LF) ncount = 1; if (ptr + 1 < end_subject && UCHAR21TEST(ptr + 1) == CHAR_LF) ncount = 1;
/* Fall through */ /* Fall through */
ANYNL03: ANYNL03:
...@@ -2210,7 +2210,7 @@ for (;;) ...@@ -2210,7 +2210,7 @@ for (;;)
if ((md->moptions & PCRE_PARTIAL_HARD) != 0) if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
reset_could_continue = TRUE; reset_could_continue = TRUE;
} }
else if (RAWUCHARTEST(ptr + 1) == CHAR_LF) else if (UCHAR21TEST(ptr + 1) == CHAR_LF)
{ {
ADD_NEW_DATA(-(state_offset + 1), 0, 1); ADD_NEW_DATA(-(state_offset + 1), 0, 1);
} }
...@@ -3466,7 +3466,7 @@ for (;;) ...@@ -3466,7 +3466,7 @@ for (;;)
if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0) if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
{ {
/* Advance to a known first char. */ /* Advance to a known first pcre_uchar (i.e. data item) */
if (has_first_char) if (has_first_char)
{ {
...@@ -3474,12 +3474,12 @@ for (;;) ...@@ -3474,12 +3474,12 @@ for (;;)
{ {
pcre_uchar csc; pcre_uchar csc;
while (current_subject < end_subject && while (current_subject < end_subject &&
(csc = RAWUCHARTEST(current_subject)) != first_char && csc != first_char2) (csc = UCHAR21TEST(current_subject)) != first_char && csc != first_char2)
current_subject++; current_subject++;
} }
else else
while (current_subject < end_subject && while (current_subject < end_subject &&
RAWUCHARTEST(current_subject) != first_char) UCHAR21TEST(current_subject) != first_char)
current_subject++; current_subject++;
} }
...@@ -3509,36 +3509,26 @@ for (;;) ...@@ -3509,36 +3509,26 @@ for (;;)
ANYCRLF, and we are now at a LF, advance the match position by one ANYCRLF, and we are now at a LF, advance the match position by one
more character. */ more character. */
if (RAWUCHARTEST(current_subject - 1) == CHAR_CR && if (UCHAR21TEST(current_subject - 1) == CHAR_CR &&
(md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) && (md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) &&
current_subject < end_subject && current_subject < end_subject &&
RAWUCHARTEST(current_subject) == CHAR_NL) UCHAR21TEST(current_subject) == CHAR_NL)
current_subject++; current_subject++;
} }
} }
/* Or to a non-unique first char after study */ /* Advance to a non-unique first pcre_uchar after study */
else if (start_bits != NULL) else if (start_bits != NULL)
{ {
while (current_subject < end_subject) while (current_subject < end_subject)
{ {
register pcre_uint32 c = RAWUCHARTEST(current_subject); register pcre_uint32 c = UCHAR21TEST(current_subject);
#ifndef COMPILE_PCRE8 #ifndef COMPILE_PCRE8
if (c > 255) c = 255; if (c > 255) c = 255;
#endif #endif
if ((start_bits[c/8] & (1 << (c&7))) == 0) if ((start_bits[c/8] & (1 << (c&7))) != 0) break;
{
current_subject++; current_subject++;
#if defined SUPPORT_UTF && defined COMPILE_PCRE8
/* In non 8-bit mode, the iteration will stop for
characters > 255 at the beginning or not stop at all. */
if (utf)
ACROSSCHAR(current_subject < end_subject, *current_subject,
current_subject++);
#endif
}
else break;
} }
} }
} }
...@@ -3557,19 +3547,20 @@ for (;;) ...@@ -3557,19 +3547,20 @@ for (;;)
/* If the pattern was studied, a minimum subject length may be set. This /* If the pattern was studied, a minimum subject length may be set. This
is a lower bound; no actual string of that length may actually match the is a lower bound; no actual string of that length may actually match the
pattern. Although the value is, strictly, in characters, we treat it as pattern. Although the value is, strictly, in characters, we treat it as
bytes to avoid spending too much time in this optimization. */ in pcre_uchar units to avoid spending too much time in this optimization.
*/
if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 && if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 &&
(pcre_uint32)(end_subject - current_subject) < study->minlength) (pcre_uint32)(end_subject - current_subject) < study->minlength)
return PCRE_ERROR_NOMATCH; return PCRE_ERROR_NOMATCH;
/* If req_char is set, we know that that character must appear in the /* If req_char is set, we know that that pcre_uchar must appear in the
subject for the match to succeed. If the first character is set, req_char subject for the match to succeed. If the first pcre_uchar is set,
must be later in the subject; otherwise the test starts at the match req_char must be later in the subject; otherwise the test starts at the
point. This optimization can save a huge amount of work in patterns with match point. This optimization can save a huge amount of work in patterns
nested unlimited repeats that aren't going to match. Writing separate with nested unlimited repeats that aren't going to match. Writing
code for cased/caseless versions makes it go faster, as does using an separate code for cased/caseless versions makes it go faster, as does
autoincrement and backing off on a match. using an autoincrement and backing off on a match.
HOWEVER: when the subject string is very, very long, searching to its end HOWEVER: when the subject string is very, very long, searching to its end
can take a long time, and give bad performance on quite ordinary can take a long time, and give bad performance on quite ordinary
...@@ -3589,7 +3580,7 @@ for (;;) ...@@ -3589,7 +3580,7 @@ for (;;)
{ {
while (p < end_subject) while (p < end_subject)
{ {
register pcre_uint32 pp = RAWUCHARINCTEST(p); register pcre_uint32 pp = UCHAR21INCTEST(p);
if (pp == req_char || pp == req_char2) { p--; break; } if (pp == req_char || pp == req_char2) { p--; break; }
} }
} }
...@@ -3597,18 +3588,18 @@ for (;;) ...@@ -3597,18 +3588,18 @@ for (;;)
{ {
while (p < end_subject) while (p < end_subject)
{ {
if (RAWUCHARINCTEST(p) == req_char) { p--; break; } if (UCHAR21INCTEST(p) == req_char) { p--; break; }
} }
} }
/* If we can't find the required character, break the matching loop, /* If we can't find the required pcre_uchar, break the matching loop,
which will cause a return or PCRE_ERROR_NOMATCH. */ which will cause a return or PCRE_ERROR_NOMATCH. */
if (p >= end_subject) break; if (p >= end_subject) break;
/* If we have found the required character, save the point where we /* If we have found the required pcre_uchar, save the point where we
found it, so that we don't search again next time round the loop if found it, so that we don't search again next time round the loop if
the start hasn't passed this character yet. */ the start hasn't passed this point yet. */
req_char_ptr = p; req_char_ptr = p;
} }
...@@ -3665,9 +3656,9 @@ for (;;) ...@@ -3665,9 +3656,9 @@ for (;;)
not contain any explicit matches for \r or \n, and the newline option is CRLF not contain any explicit matches for \r or \n, and the newline option is CRLF
or ANY or ANYCRLF, advance the match position by one more character. */ or ANY or ANYCRLF, advance the match position by one more character. */
if (RAWUCHARTEST(current_subject - 1) == CHAR_CR && if (UCHAR21TEST(current_subject - 1) == CHAR_CR &&
current_subject < end_subject && current_subject < end_subject &&
RAWUCHARTEST(current_subject) == CHAR_NL && UCHAR21TEST(current_subject) == CHAR_NL &&
(re->flags & PCRE_HASCRORLF) == 0 && (re->flags & PCRE_HASCRORLF) == 0 &&
(md->nltype == NLTYPE_ANY || (md->nltype == NLTYPE_ANY ||
md->nltype == NLTYPE_ANYCRLF || md->nltype == NLTYPE_ANYCRLF ||
......
This diff is collapsed.
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -316,8 +316,8 @@ start/end of string field names are. */ ...@@ -316,8 +316,8 @@ start/end of string field names are. */
&(NLBLOCK->nllen), utf)) \ &(NLBLOCK->nllen), utf)) \
: \ : \
((p) <= NLBLOCK->PSEND - NLBLOCK->nllen && \ ((p) <= NLBLOCK->PSEND - NLBLOCK->nllen && \
RAWUCHARTEST(p) == NLBLOCK->nl[0] && \ UCHAR21TEST(p) == NLBLOCK->nl[0] && \
(NLBLOCK->nllen == 1 || RAWUCHARTEST(p+1) == NLBLOCK->nl[1]) \ (NLBLOCK->nllen == 1 || UCHAR21TEST(p+1) == NLBLOCK->nl[1]) \
) \ ) \
) )
...@@ -330,8 +330,8 @@ start/end of string field names are. */ ...@@ -330,8 +330,8 @@ start/end of string field names are. */
&(NLBLOCK->nllen), utf)) \ &(NLBLOCK->nllen), utf)) \
: \ : \
((p) >= NLBLOCK->PSSTART + NLBLOCK->nllen && \ ((p) >= NLBLOCK->PSSTART + NLBLOCK->nllen && \
RAWUCHARTEST(p - NLBLOCK->nllen) == NLBLOCK->nl[0] && \ UCHAR21TEST(p - NLBLOCK->nllen) == NLBLOCK->nl[0] && \
(NLBLOCK->nllen == 1 || RAWUCHARTEST(p - NLBLOCK->nllen + 1) == NLBLOCK->nl[1]) \ (NLBLOCK->nllen == 1 || UCHAR21TEST(p - NLBLOCK->nllen + 1) == NLBLOCK->nl[1]) \
) \ ) \
) )
...@@ -582,12 +582,27 @@ changed in future to be a fixed number of bytes or to depend on LINK_SIZE. */ ...@@ -582,12 +582,27 @@ changed in future to be a fixed number of bytes or to depend on LINK_SIZE. */
#define MAX_MARK ((1u << 8) - 1) #define MAX_MARK ((1u << 8) - 1)
#endif #endif
/* There is a proposed future special "UTF-21" mode, in which only the lowest
21 bits of a 32-bit character are interpreted as UTF, with the remaining 11
high-order bits available to the application for other uses. In preparation for
the future implementation of this mode, there are macros that load a data item
and, if in this special mode, mask it to 21 bits. These macros all have names
starting with UCHAR21. In all other modes, including the normal 32-bit
library, the macros all have the same simple definitions. When the new mode is
implemented, it is expected that these definitions will be varied appropriately
using #ifdef when compiling the library that supports the special mode. */
#define UCHAR21(eptr) (*(eptr))
#define UCHAR21TEST(eptr) (*(eptr))
#define UCHAR21INC(eptr) (*(eptr)++)
#define UCHAR21INCTEST(eptr) (*(eptr)++)
/* When UTF encoding is being used, a character is no longer just a single /* When UTF encoding is being used, a character is no longer just a single
byte. The macros for character handling generate simple sequences when used in byte in 8-bit mode or a single short in 16-bit mode. The macros for character
character-mode, and more complicated ones for UTF characters. GETCHARLENTEST handling generate simple sequences when used in the basic mode, and more
and other macros are not used when UTF is not supported, so they are not complicated ones for UTF characters. GETCHARLENTEST and other macros are not
defined. To make sure they can never even appear when UTF support is omitted, used when UTF is not supported. To make sure they can never even appear when
we don't even define them. */ UTF support is omitted, we don't even define them. */
#ifndef SUPPORT_UTF #ifndef SUPPORT_UTF
...@@ -600,10 +615,6 @@ we don't even define them. */ ...@@ -600,10 +615,6 @@ we don't even define them. */
#define GETCHARINC(c, eptr) c = *eptr++; #define GETCHARINC(c, eptr) c = *eptr++;
#define GETCHARINCTEST(c, eptr) c = *eptr++; #define GETCHARINCTEST(c, eptr) c = *eptr++;
#define GETCHARLEN(c, eptr, len) c = *eptr; #define GETCHARLEN(c, eptr, len) c = *eptr;
#define RAWUCHAR(eptr) (*(eptr))
#define RAWUCHARINC(eptr) (*(eptr)++)
#define RAWUCHARTEST(eptr) (*(eptr))
#define RAWUCHARINCTEST(eptr) (*(eptr)++)
/* #define GETCHARLENTEST(c, eptr, len) */ /* #define GETCHARLENTEST(c, eptr, len) */
/* #define BACKCHAR(eptr) */ /* #define BACKCHAR(eptr) */
/* #define FORWARDCHAR(eptr) */ /* #define FORWARDCHAR(eptr) */
...@@ -776,30 +787,6 @@ do not know if we are in UTF-8 mode. */ ...@@ -776,30 +787,6 @@ do not know if we are in UTF-8 mode. */
c = *eptr; \ c = *eptr; \
if (utf && c >= 0xc0) GETUTF8LEN(c, eptr, len); if (utf && c >= 0xc0) GETUTF8LEN(c, eptr, len);
/* Returns the next uchar, not advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHAR(eptr) \
(*(eptr))
/* Returns the next uchar, advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHARINC(eptr) \
(*((eptr)++))
/* Returns the next uchar, testing for UTF mode, and not advancing the
pointer. */
#define RAWUCHARTEST(eptr) \
(*(eptr))
/* Returns the next uchar, testing for UTF mode, advancing the
pointer. */
#define RAWUCHARINCTEST(eptr) \
(*((eptr)++))
/* If the pointer is not at the start of a character, move it back until /* If the pointer is not at the start of a character, move it back until
it is. This is called only in UTF-8 mode - we don't put a test within the macro it is. This is called only in UTF-8 mode - we don't put a test within the macro
because almost all calls are already within a block of UTF-8 only code. */ because almost all calls are already within a block of UTF-8 only code. */
...@@ -895,30 +882,6 @@ we do not know if we are in UTF-16 mode. */ ...@@ -895,30 +882,6 @@ we do not know if we are in UTF-16 mode. */
c = *eptr; \ c = *eptr; \
if (utf && (c & 0xfc00) == 0xd800) GETUTF16LEN(c, eptr, len); if (utf && (c & 0xfc00) == 0xd800) GETUTF16LEN(c, eptr, len);
/* Returns the next uchar, not advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHAR(eptr) \
(*(eptr))
/* Returns the next uchar, advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHARINC(eptr) \
(*((eptr)++))
/* Returns the next uchar, testing for UTF mode, and not advancing the
pointer. */
#define RAWUCHARTEST(eptr) \
(*(eptr))
/* Returns the next uchar, testing for UTF mode, advancing the
pointer. */
#define RAWUCHARINCTEST(eptr) \
(*((eptr)++))
/* If the pointer is not at the start of a character, move it back until /* If the pointer is not at the start of a character, move it back until
it is. This is called only in UTF-16 mode - we don't put a test within the it is. This is called only in UTF-16 mode - we don't put a test within the
macro because almost all calls are already within a block of UTF-16 only macro because almost all calls are already within a block of UTF-16 only
...@@ -980,30 +943,6 @@ This is called when we do not know if we are in UTF-32 mode. */ ...@@ -980,30 +943,6 @@ This is called when we do not know if we are in UTF-32 mode. */
#define GETCHARLENTEST(c, eptr, len) \ #define GETCHARLENTEST(c, eptr, len) \
GETCHARTEST(c, eptr) GETCHARTEST(c, eptr)
/* Returns the next uchar, not advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHAR(eptr) \
(*(eptr))
/* Returns the next uchar, advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHARINC(eptr) \
(*((eptr)++))
/* Returns the next uchar, testing for UTF mode, and not advancing the
pointer. */
#define RAWUCHARTEST(eptr) \
(*(eptr))
/* Returns the next uchar, testing for UTF mode, advancing the
pointer. */
#define RAWUCHARINCTEST(eptr) \
(*((eptr)++))
/* If the pointer is not at the start of a character, move it back until /* If the pointer is not at the start of a character, move it back until
it is. This is called only in UTF-32 mode - we don't put a test within the it is. This is called only in UTF-32 mode - we don't put a test within the
macro because almost all calls are already within a block of UTF-32 only macro because almost all calls are already within a block of UTF-32 only
...@@ -1876,6 +1815,7 @@ contain characters with values greater than 255. */ ...@@ -1876,6 +1815,7 @@ contain characters with values greater than 255. */
#define XCL_NOT 0x01 /* Flag: this is a negative class */ #define XCL_NOT 0x01 /* Flag: this is a negative class */
#define XCL_MAP 0x02 /* Flag: a 32-byte map is present */ #define XCL_MAP 0x02 /* Flag: a 32-byte map is present */
#define XCL_HASPROP 0x04 /* Flag: property checks are present. */
#define XCL_END 0 /* Marks end of individual items */ #define XCL_END 0 /* Marks end of individual items */
#define XCL_SINGLE 1 /* Single item (one multibyte char) follows */ #define XCL_SINGLE 1 /* Single item (one multibyte char) follows */
...@@ -2341,7 +2281,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, ...@@ -2341,7 +2281,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59, ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
ERR80, ERR81, ERR82, ERR83, ERR84, ERRCOUNT }; ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERRCOUNT };
/* JIT compiling modes. The function list is indexed by them. */ /* JIT compiling modes. The function list is indexed by them. */
......
This diff is collapsed.
This diff is collapsed.
...@@ -644,7 +644,9 @@ for(;;) ...@@ -644,7 +644,9 @@ for(;;)
int i; int i;
unsigned int min, max; unsigned int min, max;
BOOL printmap; BOOL printmap;
BOOL invertmap = FALSE;
pcre_uint8 *map; pcre_uint8 *map;
pcre_uint8 inverted_map[32];
fprintf(f, " ["); fprintf(f, " [");
...@@ -653,7 +655,12 @@ for(;;) ...@@ -653,7 +655,12 @@ for(;;)
extra = GET(code, 1); extra = GET(code, 1);
ccode = code + LINK_SIZE + 1; ccode = code + LINK_SIZE + 1;
printmap = (*ccode & XCL_MAP) != 0; printmap = (*ccode & XCL_MAP) != 0;
if ((*ccode++ & XCL_NOT) != 0) fprintf(f, "^"); if ((*ccode & XCL_NOT) != 0)
{
invertmap = (*ccode & XCL_HASPROP) == 0;
fprintf(f, "^");
}
ccode++;
} }
else else
{ {
...@@ -666,6 +673,12 @@ for(;;) ...@@ -666,6 +673,12 @@ for(;;)
if (printmap) if (printmap)
{ {
map = (pcre_uint8 *)ccode; map = (pcre_uint8 *)ccode;
if (invertmap)
{
for (i = 0; i < 32; i++) inverted_map[i] = ~map[i];
map = inverted_map;
}
for (i = 0; i < 256; i++) for (i = 0; i < 256; i++)
{ {
if ((map[i/8] & (1 << (i&7))) != 0) if ((map[i/8] & (1 << (i&7))) != 0)
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -91,8 +91,8 @@ pcre_uchar c2; ...@@ -91,8 +91,8 @@ pcre_uchar c2;
while (*str1 != '\0' || *str2 != '\0') while (*str1 != '\0' || *str2 != '\0')
{ {
c1 = RAWUCHARINC(str1); c1 = UCHAR21INC(str1);
c2 = RAWUCHARINC(str2); c2 = UCHAR21INC(str2);
if (c1 != c2) if (c1 != c2)
return ((c1 > c2) << 1) - 1; return ((c1 > c2) << 1) - 1;
} }
...@@ -131,7 +131,7 @@ pcre_uchar c2; ...@@ -131,7 +131,7 @@ pcre_uchar c2;
while (*str1 != '\0' || *ustr2 != '\0') while (*str1 != '\0' || *ustr2 != '\0')
{ {
c1 = RAWUCHARINC(str1); c1 = UCHAR21INC(str1);
c2 = (pcre_uchar)*ustr2++; c2 = (pcre_uchar)*ustr2++;
if (c1 != c2) if (c1 != c2)
return ((c1 > c2) << 1) - 1; return ((c1 > c2) << 1) - 1;
......
...@@ -879,9 +879,6 @@ do ...@@ -879,9 +879,6 @@ do
case OP_SOM: case OP_SOM:
case OP_THEN: case OP_THEN:
case OP_THEN_ARG: case OP_THEN_ARG:
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
case OP_XCLASS:
#endif
return SSB_FAIL; return SSB_FAIL;
/* We can ignore word boundary tests. */ /* We can ignore word boundary tests. */
...@@ -1257,6 +1254,16 @@ do ...@@ -1257,6 +1254,16 @@ do
with a value >= 0xc4 is a potentially valid starter because it starts a with a value >= 0xc4 is a potentially valid starter because it starts a
character with a value > 255. */ character with a value > 255. */
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
case OP_XCLASS:
if ((tcode[1 + LINK_SIZE] & XCL_HASPROP) != 0)
return SSB_FAIL;
/* All bits are set. */
if ((tcode[1 + LINK_SIZE] & XCL_MAP) == 0 && (tcode[1 + LINK_SIZE] & XCL_NOT) != 0)
return SSB_FAIL;
#endif
/* Fall through */
case OP_NCLASS: case OP_NCLASS:
#if defined SUPPORT_UTF && defined COMPILE_PCRE8 #if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf) if (utf)
...@@ -1273,8 +1280,21 @@ do ...@@ -1273,8 +1280,21 @@ do
case OP_CLASS: case OP_CLASS:
{ {
pcre_uint8 *map; pcre_uint8 *map;
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
map = NULL;
if (*tcode == OP_XCLASS)
{
if ((tcode[1 + LINK_SIZE] & XCL_MAP) != 0)
map = (pcre_uint8 *)(tcode + 1 + LINK_SIZE + 1);
tcode += GET(tcode, 1);
}
else
#endif
{
tcode++; tcode++;
map = (pcre_uint8 *)tcode; map = (pcre_uint8 *)tcode;
tcode += 32 / sizeof(pcre_uchar);
}
/* In UTF-8 mode, the bits in a bit map correspond to character /* In UTF-8 mode, the bits in a bit map correspond to character
values, not to byte values. However, the bit map we are constructing is values, not to byte values. However, the bit map we are constructing is
...@@ -1282,6 +1302,10 @@ do ...@@ -1282,6 +1302,10 @@ do
value is > 127. In fact, there are only two possible starting bytes for value is > 127. In fact, there are only two possible starting bytes for
characters in the range 128 - 255. */ characters in the range 128 - 255. */
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
if (map != NULL)
#endif
{
#if defined SUPPORT_UTF && defined COMPILE_PCRE8 #if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf) if (utf)
{ {
...@@ -1302,11 +1326,11 @@ do ...@@ -1302,11 +1326,11 @@ do
/* In non-UTF-8 mode, the two bit maps are completely compatible. */ /* In non-UTF-8 mode, the two bit maps are completely compatible. */
for (c = 0; c < 32; c++) start_bits[c] |= map[c]; for (c = 0; c < 32; c++) start_bits[c] |= map[c];
} }
}
/* Advance past the bit map, and act on what follows. For a zero /* Advance past the bit map, and act on what follows. For a zero
minimum repeat, continue; otherwise stop processing. */ minimum repeat, continue; otherwise stop processing. */
tcode += 32 / sizeof(pcre_uchar);
switch (*tcode) switch (*tcode)
{ {
case OP_CRSTAR: case OP_CRSTAR:
......
...@@ -81,6 +81,11 @@ additional data. */ ...@@ -81,6 +81,11 @@ additional data. */
if (c < 256) if (c < 256)
{ {
if ((*data & XCL_HASPROP) == 0)
{
if ((*data & XCL_MAP) == 0) return negated;
return (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0;
}
if ((*data & XCL_MAP) != 0 && if ((*data & XCL_MAP) != 0 &&
(((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0) (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0)
return !negated; /* char found */ return !negated; /* char found */
......
...@@ -12,7 +12,7 @@ distribution because other apparatus is needed to compile pcregrep for z/OS. ...@@ -12,7 +12,7 @@ distribution because other apparatus is needed to compile pcregrep for z/OS.
The header can be found in the special z/OS distribution, which is available The header can be found in the special z/OS distribution, which is available
from www.zaconsultants.net or from www.cbttape.org. from www.zaconsultants.net or from www.cbttape.org.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -1298,7 +1298,7 @@ switch(endlinetype) ...@@ -1298,7 +1298,7 @@ switch(endlinetype)
while (p > startptr && p[-1] != '\n') p--; while (p > startptr && p[-1] != '\n') p--;
if (p <= startptr + 1 || p[-2] == '\r') return p; if (p <= startptr + 1 || p[-2] == '\r') return p;
} }
return p; /* But control should never get here */ /* Control can never get here */
case EL_ANY: case EL_ANY:
case EL_ANYCRLF: case EL_ANYCRLF:
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
...@@ -170,7 +170,9 @@ static const int eint[] = { ...@@ -170,7 +170,9 @@ static const int eint[] = {
REG_BADPAT, /* missing opening brace after \o */ REG_BADPAT, /* missing opening brace after \o */
REG_BADPAT, /* parentheses too deeply nested */ REG_BADPAT, /* parentheses too deeply nested */
REG_BADPAT, /* invalid range in character class */ REG_BADPAT, /* invalid range in character class */
REG_BADPAT /* group name must start with a non-digit */ REG_BADPAT, /* group name must start with a non-digit */
/* 85 */
REG_BADPAT /* parentheses too deeply nested (stack check) */
}; };
/* Table of texts corresponding to POSIX error codes */ /* Table of texts corresponding to POSIX error codes */
......
...@@ -233,6 +233,9 @@ argument, the casting might be incorrectly applied. */ ...@@ -233,6 +233,9 @@ argument, the casting might be incorrectly applied. */
#define SET_PCRE_CALLOUT8(callout) \ #define SET_PCRE_CALLOUT8(callout) \
pcre_callout = callout pcre_callout = callout
#define SET_PCRE_STACK_GUARD8(stack_guard) \
pcre_stack_guard = stack_guard
#define PCRE_ASSIGN_JIT_STACK8(extra, callback, userdata) \ #define PCRE_ASSIGN_JIT_STACK8(extra, callback, userdata) \
pcre_assign_jit_stack(extra, callback, userdata) pcre_assign_jit_stack(extra, callback, userdata)
...@@ -317,6 +320,9 @@ argument, the casting might be incorrectly applied. */ ...@@ -317,6 +320,9 @@ argument, the casting might be incorrectly applied. */
#define SET_PCRE_CALLOUT16(callout) \ #define SET_PCRE_CALLOUT16(callout) \
pcre16_callout = (int (*)(pcre16_callout_block *))callout pcre16_callout = (int (*)(pcre16_callout_block *))callout
#define SET_PCRE_STACK_GUARD16(stack_guard) \
pcre16_stack_guard = (int (*)(void))stack_guard
#define PCRE_ASSIGN_JIT_STACK16(extra, callback, userdata) \ #define PCRE_ASSIGN_JIT_STACK16(extra, callback, userdata) \
pcre16_assign_jit_stack((pcre16_extra *)extra, \ pcre16_assign_jit_stack((pcre16_extra *)extra, \
(pcre16_jit_callback)callback, userdata) (pcre16_jit_callback)callback, userdata)
...@@ -406,6 +412,9 @@ argument, the casting might be incorrectly applied. */ ...@@ -406,6 +412,9 @@ argument, the casting might be incorrectly applied. */
#define SET_PCRE_CALLOUT32(callout) \ #define SET_PCRE_CALLOUT32(callout) \
pcre32_callout = (int (*)(pcre32_callout_block *))callout pcre32_callout = (int (*)(pcre32_callout_block *))callout
#define SET_PCRE_STACK_GUARD32(stack_guard) \
pcre32_stack_guard = (int (*)(void))stack_guard
#define PCRE_ASSIGN_JIT_STACK32(extra, callback, userdata) \ #define PCRE_ASSIGN_JIT_STACK32(extra, callback, userdata) \
pcre32_assign_jit_stack((pcre32_extra *)extra, \ pcre32_assign_jit_stack((pcre32_extra *)extra, \
(pcre32_jit_callback)callback, userdata) (pcre32_jit_callback)callback, userdata)
...@@ -533,6 +542,14 @@ cases separately. */ ...@@ -533,6 +542,14 @@ cases separately. */
else \ else \
SET_PCRE_CALLOUT8(callout) SET_PCRE_CALLOUT8(callout)
#define SET_PCRE_STACK_GUARD(stack_guard) \
if (pcre_mode == PCRE32_MODE) \
SET_PCRE_STACK_GUARD32(stack_guard); \
else if (pcre_mode == PCRE16_MODE) \
SET_PCRE_STACK_GUARD16(stack_guard); \
else \
SET_PCRE_STACK_GUARD8(stack_guard)
#define STRLEN(p) (pcre_mode == PCRE32_MODE ? STRLEN32(p) : pcre_mode == PCRE16_MODE ? STRLEN16(p) : STRLEN8(p)) #define STRLEN(p) (pcre_mode == PCRE32_MODE ? STRLEN32(p) : pcre_mode == PCRE16_MODE ? STRLEN16(p) : STRLEN8(p))
#define PCRE_ASSIGN_JIT_STACK(extra, callback, userdata) \ #define PCRE_ASSIGN_JIT_STACK(extra, callback, userdata) \
...@@ -756,6 +773,12 @@ the three different cases. */ ...@@ -756,6 +773,12 @@ the three different cases. */
else \ else \
G(SET_PCRE_CALLOUT,BITTWO)(callout) G(SET_PCRE_CALLOUT,BITTWO)(callout)
#define SET_PCRE_STACK_GUARD(stack_guard) \
if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \
G(SET_PCRE_STACK_GUARD,BITONE)(stack_guard); \
else \
G(SET_PCRE_STACK_GUARD,BITTWO)(stack_guard)
#define STRLEN(p) ((pcre_mode == G(G(PCRE,BITONE),_MODE)) ? \ #define STRLEN(p) ((pcre_mode == G(G(PCRE,BITONE),_MODE)) ? \
G(STRLEN,BITONE)(p) : G(STRLEN,BITTWO)(p)) G(STRLEN,BITONE)(p) : G(STRLEN,BITTWO)(p))
...@@ -897,6 +920,7 @@ the three different cases. */ ...@@ -897,6 +920,7 @@ the three different cases. */
#define PCHARSV PCHARSV8 #define PCHARSV PCHARSV8
#define READ_CAPTURE_NAME READ_CAPTURE_NAME8 #define READ_CAPTURE_NAME READ_CAPTURE_NAME8
#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT8 #define SET_PCRE_CALLOUT SET_PCRE_CALLOUT8
#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD8
#define STRLEN STRLEN8 #define STRLEN STRLEN8
#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK8 #define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK8
#define PCRE_COMPILE PCRE_COMPILE8 #define PCRE_COMPILE PCRE_COMPILE8
...@@ -927,6 +951,7 @@ the three different cases. */ ...@@ -927,6 +951,7 @@ the three different cases. */
#define PCHARSV PCHARSV16 #define PCHARSV PCHARSV16
#define READ_CAPTURE_NAME READ_CAPTURE_NAME16 #define READ_CAPTURE_NAME READ_CAPTURE_NAME16
#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT16 #define SET_PCRE_CALLOUT SET_PCRE_CALLOUT16
#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD16
#define STRLEN STRLEN16 #define STRLEN STRLEN16
#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK16 #define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK16
#define PCRE_COMPILE PCRE_COMPILE16 #define PCRE_COMPILE PCRE_COMPILE16
...@@ -957,6 +982,7 @@ the three different cases. */ ...@@ -957,6 +982,7 @@ the three different cases. */
#define PCHARSV PCHARSV32 #define PCHARSV PCHARSV32
#define READ_CAPTURE_NAME READ_CAPTURE_NAME32 #define READ_CAPTURE_NAME READ_CAPTURE_NAME32
#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT32 #define SET_PCRE_CALLOUT SET_PCRE_CALLOUT32
#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD32
#define STRLEN STRLEN32 #define STRLEN STRLEN32
#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK32 #define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK32
#define PCRE_COMPILE PCRE_COMPILE32 #define PCRE_COMPILE PCRE_COMPILE32
...@@ -1015,6 +1041,7 @@ static int first_callout; ...@@ -1015,6 +1041,7 @@ static int first_callout;
static int jit_was_used; static int jit_was_used;
static int locale_set = 0; static int locale_set = 0;
static int show_malloc; static int show_malloc;
static int stack_guard_return;
static int use_utf; static int use_utf;
static const unsigned char *last_callout_mark = NULL; static const unsigned char *last_callout_mark = NULL;
...@@ -2200,6 +2227,18 @@ return p; ...@@ -2200,6 +2227,18 @@ return p;
/*************************************************
* Stack guard function *
*************************************************/
/* Called from PCRE when set in pcre_stack_guard. We give an error (non-zero)
return when a count overflows. */
static int stack_guard(void)
{
return stack_guard_return;
}
/************************************************* /*************************************************
* Callout function * * Callout function *
*************************************************/ *************************************************/
...@@ -2883,8 +2922,8 @@ printf(" -32 use the 32-bit library\n"); ...@@ -2883,8 +2922,8 @@ printf(" -32 use the 32-bit library\n");
#endif #endif
printf(" -b show compiled code\n"); printf(" -b show compiled code\n");
printf(" -C show PCRE compile-time options and exit\n"); printf(" -C show PCRE compile-time options and exit\n");
printf(" -C arg show a specific compile-time option\n"); printf(" -C arg show a specific compile-time option and exit\n");
printf(" and exit with its value. The arg can be:\n"); printf(" with its value if numeric (else 0). The arg can be:\n");
printf(" linksize internal link size [2, 3, 4]\n"); printf(" linksize internal link size [2, 3, 4]\n");
printf(" pcre8 8 bit library support enabled [0, 1]\n"); printf(" pcre8 8 bit library support enabled [0, 1]\n");
printf(" pcre16 16 bit library support enabled [0, 1]\n"); printf(" pcre16 16 bit library support enabled [0, 1]\n");
...@@ -2892,7 +2931,8 @@ printf(" pcre32 32 bit library support enabled [0, 1]\n"); ...@@ -2892,7 +2931,8 @@ printf(" pcre32 32 bit library support enabled [0, 1]\n");
printf(" utf Unicode Transformation Format supported [0, 1]\n"); printf(" utf Unicode Transformation Format supported [0, 1]\n");
printf(" ucp Unicode Properties supported [0, 1]\n"); printf(" ucp Unicode Properties supported [0, 1]\n");
printf(" jit Just-in-time compiler supported [0, 1]\n"); printf(" jit Just-in-time compiler supported [0, 1]\n");
printf(" newline Newline type [CR, LF, CRLF, ANYCRLF, ANY, ???]\n"); printf(" newline Newline type [CR, LF, CRLF, ANYCRLF, ANY]\n");
printf(" bsr \\R type [ANYCRLF, ANY]\n");
printf(" -d debug: show compiled code and information (-b and -i)\n"); printf(" -d debug: show compiled code and information (-b and -i)\n");
#if !defined NODFA #if !defined NODFA
printf(" -dfa force DFA matching for all subjects\n"); printf(" -dfa force DFA matching for all subjects\n");
...@@ -3231,6 +3271,11 @@ while (argc > 1 && argv[op][0] == '-') ...@@ -3231,6 +3271,11 @@ while (argc > 1 && argv[op][0] == '-')
(void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &rc); (void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &rc);
print_newline_config(rc, TRUE); print_newline_config(rc, TRUE);
} }
else if (strcmp(argv[op + 1], "bsr") == 0)
{
(void)PCRE_CONFIG(PCRE_CONFIG_BSR, &rc);
printf("%s\n", rc? "ANYCRLF" : "ANY");
}
else if (strcmp(argv[op + 1], "ebcdic") == 0) else if (strcmp(argv[op + 1], "ebcdic") == 0)
{ {
#ifdef EBCDIC #ifdef EBCDIC
...@@ -3439,6 +3484,7 @@ while (!done) ...@@ -3439,6 +3484,7 @@ while (!done)
use_utf = 0; use_utf = 0;
debug_lengths = 1; debug_lengths = 1;
SET_PCRE_STACK_GUARD(NULL);
if (extend_inputline(infile, buffer, " re> ") == NULL) break; if (extend_inputline(infile, buffer, " re> ") == NULL) break;
if (infile != stdin) fprintf(outfile, "%s", (char *)buffer); if (infile != stdin) fprintf(outfile, "%s", (char *)buffer);
...@@ -3739,6 +3785,21 @@ while (!done) ...@@ -3739,6 +3785,21 @@ while (!done)
case 'P': do_posix = 1; break; case 'P': do_posix = 1; break;
#endif #endif
case 'Q':
switch (*pp)
{
case '0':
case '1':
stack_guard_return = *pp++ - '0';
break;
default:
fprintf(outfile, "** Missing 0 or 1 after /Q\n");
goto SKIP_DATA;
}
SET_PCRE_STACK_GUARD(stack_guard);
break;
case 'S': case 'S':
do_study = 1; do_study = 1;
for (;;) for (;;)
...@@ -4282,12 +4343,12 @@ while (!done) ...@@ -4282,12 +4343,12 @@ while (!done)
if (new_info(re, extra, PCRE_INFO_FIRSTTABLE, &start_bits) == 0) if (new_info(re, extra, PCRE_INFO_FIRSTTABLE, &start_bits) == 0)
{ {
if (start_bits == NULL) if (start_bits == NULL)
fprintf(outfile, "No set of starting bytes\n"); fprintf(outfile, "No starting char list\n");
else else
{ {
int i; int i;
int c = 24; int c = 24;
fprintf(outfile, "Starting byte set: "); fprintf(outfile, "Starting chars: ");
for (i = 0; i < 256; i++) for (i = 0; i < 256; i++)
{ {
if ((start_bits[i/8] & (1<<(i&7))) != 0) if ((start_bits[i/8] & (1<<(i&7))) != 0)
...@@ -5192,7 +5253,8 @@ while (!done) ...@@ -5192,7 +5253,8 @@ while (!done)
if (count * 2 > use_size_offsets) count = use_size_offsets/2; if (count * 2 > use_size_offsets) count = use_size_offsets/2;
} }
/* Output the captured substrings */ /* Output the captured substrings. Note that, for the matched string,
the use of \K in an assertion can make the start later than the end. */
for (i = 0; i < count * 2; i += 2) for (i = 0; i < count * 2; i += 2)
{ {
...@@ -5208,11 +5270,25 @@ while (!done) ...@@ -5208,11 +5270,25 @@ while (!done)
} }
else else
{ {
int start = use_offsets[i];
int end = use_offsets[i+1];
if (start > end)
{
start = use_offsets[i+1];
end = use_offsets[i];
fprintf(outfile, "Start of matched string is beyond its end - "
"displaying from end to start.\n");
}
fprintf(outfile, "%2d: ", i/2); fprintf(outfile, "%2d: ", i/2);
PCHARSV(bptr, use_offsets[i], PCHARSV(bptr, start, end - start, outfile);
use_offsets[i+1] - use_offsets[i], outfile);
if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)"); if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)");
fprintf(outfile, "\n"); fprintf(outfile, "\n");
/* Note: don't use the start/end variables here because we want to
show the text from what is reported as the end. */
if (do_showcaprest || (i == 0 && do_showrest)) if (do_showcaprest || (i == 0 && do_showrest))
{ {
fprintf(outfile, "%2d+ ", i/2); fprintf(outfile, "%2d+ ", i/2);
......
...@@ -207,7 +207,7 @@ correctly, but that messes up comparisons). --/ ...@@ -207,7 +207,7 @@ correctly, but that messes up comparisons). --/
CDBABC CDBABC
\x{2000}ABC \x{2000}ABC
/\R*A/SI8 /\R*A/SI8<bsr_unicode>
CDBABC CDBABC
\x{2028}A \x{2028}A
......
...@@ -907,6 +907,9 @@ ...@@ -907,6 +907,9 @@
/\U/I /\U/I
/a{1,3}b/U
ab
/[/I /[/I
/[a-/I /[a-/I
...@@ -4045,4 +4048,18 @@ backtracking verbs. --/ ...@@ -4045,4 +4048,18 @@ backtracking verbs. --/
/[a[:<:]] should give error/ /[a[:<:]] should give error/
/(?=ab\K)/+
abcd
/abcd/f<lf>
xx\nxabcd
/ -- Test stack check external calls --/
/(((((a)))))/Q0
/(((((a)))))/Q1
/(((((a)))))/Q
/-- End of testinput2 --/ /-- End of testinput2 --/
/-- Tests for the 32-bit library only */ /-- Tests for the 32-bit library only */
< forbid 8w < forbid 8W
/-- Check maximum character size --/ /-- Check maximum character size --/
......
/-- This set of tests checks local-specific features, using the fr_FR locale. /-- This set of tests checks local-specific features, using the "fr_FR" locale.
It is not Perl-compatible. There is different version called wintestinput3 It is not Perl-compatible. When run via RunTest, the locale is edited to
f or use on Windows, where the locale is called "french". --/ be whichever of "fr_FR", "french", or "fr" is found to exist. There is
different version of this file called wintestinput3 for use on Windows,
where the locale is called "french" and the tests are run using
RunTest.bat. --/
< forbid 8W < forbid 8W
......
...@@ -716,4 +716,10 @@ ...@@ -716,4 +716,10 @@
/^a+[a\x{200}]/8 /^a+[a\x{200}]/8
aa aa
/^.\B.\B./8
\x{10123}\x{10124}\x{10125}
/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8
#\x{10000}#\x{100}#\x{10ffff}#
/-- End of testinput4 --/ /-- End of testinput4 --/
...@@ -788,4 +788,6 @@ ...@@ -788,4 +788,6 @@
/^a+[a\x{200}]/8BZ /^a+[a\x{200}]/8BZ
aa aa
/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ
/-- End of testinput5 --/ /-- End of testinput5 --/
......
...@@ -1484,4 +1484,13 @@ ...@@ -1484,4 +1484,13 @@
\x{a1}\x{a7} \x{a1}\x{a7}
\x{37e} \x{37e}
/[RST]+/8iW
Ss\x{17f}
/[R-T]+/8iW
Ss\x{17f}
/[q-u]+/8iW
Ss\x{17f}
/-- End of testinput6 --/ /-- End of testinput6 --/
...@@ -829,4 +829,10 @@ of case for anything other than the ASCII letters. --/ ...@@ -829,4 +829,10 @@ of case for anything other than the ASCII letters. --/
/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/8WBZ /\d+\s{0,5}=\s*\S?=\w{0,4}\W*/8WBZ
/[RST]+/8iWBZ
/[R-T]+/8iWBZ
/[Q-U]+/8iWBZ
/-- End of testinput7 --/ /-- End of testinput7 --/
...@@ -8,7 +8,7 @@ No options ...@@ -8,7 +8,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
JIT study was successful JIT study was successful
/(?(?C1)(?=a)a)/S+I /(?(?C1)(?=a)a)/S+I
...@@ -27,7 +27,7 @@ No options ...@@ -27,7 +27,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = -1 Subject length lower bound = -1
No set of starting bytes No starting char list
JIT study was not successful JIT study was not successful
/abc/S+I>testsavedregex /abc/S+I>testsavedregex
...@@ -36,7 +36,7 @@ No options ...@@ -36,7 +36,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
JIT study was successful JIT study was successful
Compiled pattern written to testsavedregex Compiled pattern written to testsavedregex
Study data written to testsavedregex Study data written to testsavedregex
...@@ -165,7 +165,7 @@ No options ...@@ -165,7 +165,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'd' Need char = 'd'
Subject length lower bound = 4 Subject length lower bound = 4
No set of starting bytes No starting char list
JIT study was successful JIT study was successful
/(*NO_START_OPT)a(*:m)b/KS++ /(*NO_START_OPT)a(*:m)b/KS++
......
...@@ -8,7 +8,7 @@ No options ...@@ -8,7 +8,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
JIT support is not available in this version of PCRE JIT support is not available in this version of PCRE
/a*/SI /a*/SI
......
...@@ -361,7 +361,7 @@ Options: extended ...@@ -361,7 +361,7 @@ Options: extended
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting chars: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
...@@ -388,7 +388,7 @@ No options ...@@ -388,7 +388,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 Starting chars: \x09 \x20 \xa0
/\H/SI /\H/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -396,7 +396,7 @@ No options ...@@ -396,7 +396,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\v/SI /\v/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -404,7 +404,7 @@ No options ...@@ -404,7 +404,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/\V/SI /\V/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -412,7 +412,7 @@ No options ...@@ -412,7 +412,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\R/SI /\R/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -420,7 +420,7 @@ No options ...@@ -420,7 +420,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/[\h]/BZ /[\h]/BZ
------------------------------------------------------------------ ------------------------------------------------------------------
......
...@@ -481,7 +481,7 @@ Options: utf ...@@ -481,7 +481,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
...@@ -519,7 +519,7 @@ Options: utf ...@@ -519,7 +519,7 @@ Options: utf
First char = \x{c4} First char = \x{c4}
Need char = \x{80} Need char = \x{80}
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
\x{100}\x{100}\x{100}\x{100\x{100} \x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100} 0: \x{100}\x{100}\x{100}
...@@ -539,7 +539,7 @@ Options: utf ...@@ -539,7 +539,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xc4 Starting chars: x \xc4
/(\x{100}*a|x)/8SDZ /(\x{100}*a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -558,7 +558,7 @@ Options: utf ...@@ -558,7 +558,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xc4 Starting chars: a x \xc4
/(\x{100}{0,2}a|x)/8SDZ /(\x{100}{0,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -577,7 +577,7 @@ Options: utf ...@@ -577,7 +577,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xc4 Starting chars: a x \xc4
/(\x{100}{1,2}a|x)/8SDZ /(\x{100}{1,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -597,7 +597,7 @@ Options: utf ...@@ -597,7 +597,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xc4 Starting chars: x \xc4
/\x{100}/8DZ /\x{100}/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -799,7 +799,7 @@ Options: utf ...@@ -799,7 +799,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xc2 \xe1 \xe2 \xe3 Starting chars: \x09 \x20 \xc2 \xe1 \xe2 \xe3
ABC\x{09} ABC\x{09}
0: \x{09} 0: \x{09}
ABC\x{20} ABC\x{20}
...@@ -825,7 +825,7 @@ Options: utf ...@@ -825,7 +825,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
ABC\x{0a} ABC\x{0a}
0: \x{0a} 0: \x{0a}
ABC\x{0b} ABC\x{0b}
...@@ -845,7 +845,7 @@ Options: utf ...@@ -845,7 +845,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 Starting chars: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
CDBABC CDBABC
0: A 0: A
...@@ -855,7 +855,7 @@ Options: utf ...@@ -855,7 +855,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
/\s?xxx\s/8SI /\s?xxx\s/8SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -863,7 +863,7 @@ Options: utf ...@@ -863,7 +863,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1 /\sxxx\s/I8ST1
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -871,7 +871,7 @@ Options: utf ...@@ -871,7 +871,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: \x09 \x0a \x0c \x0d \x20 \xc2 Starting chars: \x09 \x0a \x0c \x0d \x20 \xc2
AB\x{85}xxx\x{a0}XYZ AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0} 0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ AB\x{a0}xxx\x{85}XYZ
...@@ -883,7 +883,7 @@ Options: utf ...@@ -883,7 +883,7 @@ Options: utf
No first char No first char
Need char = ' ' Need char = ' '
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
...@@ -917,7 +917,7 @@ Options: caseless utf ...@@ -917,7 +917,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \xe1 Starting chars: \xe1
/\x{1234}+?/iS8I /\x{1234}+?/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -925,7 +925,7 @@ Options: caseless utf ...@@ -925,7 +925,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \xe1 Starting chars: \xe1
/\x{1234}++/iS8I /\x{1234}++/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -933,7 +933,7 @@ Options: caseless utf ...@@ -933,7 +933,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \xe1 Starting chars: \xe1
/\x{1234}{2}/iS8I /\x{1234}{2}/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -941,7 +941,7 @@ Options: caseless utf ...@@ -941,7 +941,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \xe1 Starting chars: \xe1
/[^\x{c4}]/8DZ /[^\x{c4}]/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -974,7 +974,7 @@ Options: utf ...@@ -974,7 +974,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
/\777/8DZ /\777/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
......
...@@ -64,7 +64,7 @@ Options: caseless utf ...@@ -64,7 +64,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 17 Subject length lower bound = 17
Starting byte set: \xd0 \xd1 Starting chars: \xd0 \xd1
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
...@@ -92,7 +92,7 @@ No options ...@@ -92,7 +92,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 Starting chars: \x09 \x20 \xa0
/\v/SI /\v/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -100,7 +100,7 @@ No options ...@@ -100,7 +100,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/\R/SI /\R/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -108,7 +108,7 @@ No options ...@@ -108,7 +108,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/[[:blank:]]/WBZ /[[:blank:]]/WBZ
------------------------------------------------------------------ ------------------------------------------------------------------
......
...@@ -228,7 +228,7 @@ Options: extended ...@@ -228,7 +228,7 @@ Options: extended
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting chars: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff
...@@ -274,7 +274,7 @@ No options ...@@ -274,7 +274,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 \xff Starting chars: \x09 \x20 \xa0 \xff
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
0: \x{1680}\x{2000}\x{202f}\x{3000} 0: \x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000} \x{3001}\x{2fff}\x{200a}\xa0\x{2000}
...@@ -292,7 +292,7 @@ No options ...@@ -292,7 +292,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x09 \x20 \xa0 \xff
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
0: \x{1680}\x{2000}\x{202f}\x{3000} 0: \x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000} \x{3001}\x{2fff}\x{200a}\xa0\x{2000}
...@@ -304,7 +304,7 @@ No options ...@@ -304,7 +304,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f} 0: \x{167f}\x{1681}\x{180d}\x{180f}
\x{2000}\x{200a}\x{1fff}\x{200b} \x{2000}\x{200a}\x{1fff}\x{200b}
...@@ -330,7 +330,7 @@ No options ...@@ -330,7 +330,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029} 0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
...@@ -348,7 +348,7 @@ No options ...@@ -348,7 +348,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029} 0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
...@@ -360,7 +360,7 @@ No options ...@@ -360,7 +360,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
\x{2028}\x{2029}\x{2027}\x{2030} \x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030} 0: \x{2027}\x{2030}
\x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86
...@@ -378,7 +378,7 @@ Options: bsr_unicode ...@@ -378,7 +378,7 @@ Options: bsr_unicode
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029} 0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
...@@ -534,18 +534,18 @@ MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789AB ...@@ -534,18 +534,18 @@ MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789AB
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
a* a*
[b-\x{200}]?+ [b-\xff\x{100}-\x{200}]?+
a# a#
a*+ a*+
[b-\x{200}]? [b-\xff\x{100}-\x{200}]?
b# b#
[a-f]* [a-f]*+
[g-\x{200}]*+ [g-\xff\x{100}-\x{200}]*+
# #
[g-\x{200}]* [g-\xff\x{100}-\x{200}]*+
[a-c]*+ [a-c]*+
# #
[g-\x{200}]* [g-\xff\x{100}-\x{200}]*
[a-h]*+ [a-h]*+
Ket Ket
End End
......
...@@ -339,7 +339,7 @@ Options: utf ...@@ -339,7 +339,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
...@@ -378,7 +378,7 @@ Options: utf ...@@ -378,7 +378,7 @@ Options: utf
First char = \x{100} First char = \x{100}
Need char = \x{100} Need char = \x{100}
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
\x{100}\x{100}\x{100}\x{100\x{100} \x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100} 0: \x{100}\x{100}\x{100}
...@@ -398,7 +398,7 @@ Options: utf ...@@ -398,7 +398,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xff Starting chars: x \xff
/(\x{100}*a|x)/8SDZ /(\x{100}*a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -417,7 +417,7 @@ Options: utf ...@@ -417,7 +417,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xff Starting chars: a x \xff
/(\x{100}{0,2}a|x)/8SDZ /(\x{100}{0,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -436,7 +436,7 @@ Options: utf ...@@ -436,7 +436,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xff Starting chars: a x \xff
/(\x{100}{1,2}a|x)/8SDZ /(\x{100}{1,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -456,7 +456,7 @@ Options: utf ...@@ -456,7 +456,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xff Starting chars: x \xff
/\x{100}/8DZ /\x{100}/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -666,7 +666,7 @@ Options: utf ...@@ -666,7 +666,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 \xff Starting chars: \x09 \x20 \xa0 \xff
ABC\x{09} ABC\x{09}
0: \x{09} 0: \x{09}
ABC\x{20} ABC\x{20}
...@@ -692,7 +692,7 @@ Options: utf ...@@ -692,7 +692,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
ABC\x{0a} ABC\x{0a}
0: \x{0a} 0: \x{0a}
ABC\x{0b} ABC\x{0b}
...@@ -712,19 +712,19 @@ Options: utf ...@@ -712,19 +712,19 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 A \xa0 \xff Starting chars: \x09 \x20 A \xa0 \xff
CDBABC CDBABC
0: A 0: A
\x{2000}ABC \x{2000}ABC
0: \x{2000}A 0: \x{2000}A
/\R*A/SI8 /\R*A/SI8<bsr_unicode>
Capturing subpattern count = 0 Capturing subpattern count = 0
Options: utf Options: bsr_unicode utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d A \x85 \xff Starting chars: \x0a \x0b \x0c \x0d A \x85 \xff
CDBABC CDBABC
0: A 0: A
\x{2028}A \x{2028}A
...@@ -736,7 +736,7 @@ Options: utf ...@@ -736,7 +736,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
/\s?xxx\s/8SI /\s?xxx\s/8SI
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -744,7 +744,7 @@ Options: utf ...@@ -744,7 +744,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1 /\sxxx\s/I8ST1
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -752,7 +752,7 @@ Options: utf ...@@ -752,7 +752,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: \x09 \x0a \x0c \x0d \x20 \x85 \xa0 Starting chars: \x09 \x0a \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0} 0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ AB\x{a0}xxx\x{85}XYZ
...@@ -764,7 +764,7 @@ Options: utf ...@@ -764,7 +764,7 @@ Options: utf
No first char No first char
Need char = ' ' Need char = ' '
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
...@@ -803,7 +803,7 @@ Options: caseless utf ...@@ -803,7 +803,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}+?/iS8I /\x{1234}+?/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -811,7 +811,7 @@ Options: caseless utf ...@@ -811,7 +811,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}++/iS8I /\x{1234}++/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -819,7 +819,7 @@ Options: caseless utf ...@@ -819,7 +819,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}{2}/iS8I /\x{1234}{2}/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
...@@ -827,7 +827,7 @@ Options: caseless utf ...@@ -827,7 +827,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
Need char = \x{1234} Need char = \x{1234}
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
/[^\x{c4}]/8DZ /[^\x{c4}]/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
...@@ -860,7 +860,7 @@ Options: utf ...@@ -860,7 +860,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
/-- Check bad offset --/ /-- Check bad offset --/
......
This diff is collapsed.
...@@ -55,7 +55,7 @@ Options: caseless utf ...@@ -55,7 +55,7 @@ Options: caseless utf
First char = \x{401} (caseless) First char = \x{401} (caseless)
Need char = \x{42f} (caseless) Need char = \x{42f} (caseless)
Subject length lower bound = 17 Subject length lower bound = 17
No set of starting bytes No starting char list
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
......
This diff is collapsed.
...@@ -50,7 +50,7 @@ Options: anchored extended ...@@ -50,7 +50,7 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
<!testsaved16BE-1 <!testsaved16BE-1
Compiled pattern loaded from testsaved16BE-1 Compiled pattern loaded from testsaved16BE-1
...@@ -83,7 +83,7 @@ Options: anchored extended ...@@ -83,7 +83,7 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
<!testsaved32LE-1 <!testsaved32LE-1
Compiled pattern loaded from testsaved32LE-1 Compiled pattern loaded from testsaved32LE-1
......
...@@ -62,7 +62,7 @@ Options: anchored extended ...@@ -62,7 +62,7 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
<!testsaved32BE-1 <!testsaved32BE-1
Compiled pattern loaded from testsaved32BE-1 Compiled pattern loaded from testsaved32BE-1
...@@ -95,6 +95,6 @@ Options: anchored extended ...@@ -95,6 +95,6 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
/-- End of testinput21 --/ /-- End of testinput21 --/
...@@ -37,7 +37,7 @@ Options: extended utf ...@@ -37,7 +37,7 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
<!testsaved16BE-2 <!testsaved16BE-2
Compiled pattern loaded from testsaved16BE-2 Compiled pattern loaded from testsaved16BE-2
...@@ -64,7 +64,7 @@ Options: extended utf ...@@ -64,7 +64,7 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
<!testsaved32LE-2 <!testsaved32LE-2
Compiled pattern loaded from testsaved32LE-2 Compiled pattern loaded from testsaved32LE-2
......
...@@ -49,7 +49,7 @@ Options: extended utf ...@@ -49,7 +49,7 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
<!testsaved32BE-2 <!testsaved32BE-2
Compiled pattern loaded from testsaved32BE-2 Compiled pattern loaded from testsaved32BE-2
...@@ -76,6 +76,6 @@ Options: extended utf ...@@ -76,6 +76,6 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
/-- End of testinput22 --/ /-- End of testinput22 --/
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -1263,4 +1263,12 @@ No match ...@@ -1263,4 +1263,12 @@ No match
aa aa
0: aa 0: aa
/^.\B.\B./8
\x{10123}\x{10124}\x{10125}
0: \x{10123}\x{10124}\x{10125}
/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8
#\x{10000}#\x{100}#\x{10ffff}#
0: #\x{10000}#\x{100}#\x{10ffff}#
/-- End of testinput4 --/ /-- End of testinput4 --/
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -7232,7 +7232,7 @@ No options ...@@ -7232,7 +7232,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: a d x Starting chars: a d x
terhjk;abcdaadsfe terhjk;abcdaadsfe
0: abc 0: abc
the quick xyz brown fox the quick xyz brown fox
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment