Commit 0b4f5060 authored by Sergei Golubchik's avatar Sergei Golubchik

Merge branch 'merge-pcre' into 10.0

parents 6c5ee862 c4cc91cd
......@@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2015 University of Cambridge
All rights reserved
......@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2014 Zoltan Herczeg
Copyright(c) 2010-2015 Zoltan Herczeg
All rights reserved.
......@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2014 Zoltan Herczeg
Copyright(c) 2009-2015 Zoltan Herczeg
All rights reserved.
......
ChangeLog for PCRE
------------------
Version 8.37 28-April-2015
--------------------------
1. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges
for those parentheses to be closed with whatever has been captured so far.
However, it was failing to mark any other groups between the hightest
capture so far and the currrent group as "unset". Thus, the ovector for
those groups contained whatever was previously there. An example is the
pattern /(x)|((*ACCEPT))/ when matched against "abcd".
2. If an assertion condition was quantified with a minimum of zero (an odd
thing to do, but it happened), SIGSEGV or other misbehaviour could occur.
3. If a pattern in pcretest input had the P (POSIX) modifier followed by an
unrecognized modifier, a crash could occur.
4. An attempt to do global matching in pcretest with a zero-length ovector
caused a crash.
5. Fixed a memory leak during matching that could occur for a subpattern
subroutine call (recursive or otherwise) if the number of captured groups
that had to be saved was greater than ten.
6. Catch a bad opcode during auto-possessification after compiling a bad UTF
string with NO_UTF_CHECK. This is a tidyup, not a bug fix, as passing bad
UTF with NO_UTF_CHECK is documented as having an undefined outcome.
7. A UTF pattern containing a "not" match of a non-ASCII character and a
subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
8. When a pattern is compiled, it remembers the highest back reference so that
when matching, if the ovector is too small, extra memory can be obtained to
use instead. A conditional subpattern whose condition is a check on a
capture having happened, such as, for example in the pattern
/^(?:(a)|b)(?(1)A|B)/, is another kind of back reference, but it was not
setting the highest backreference number. This mattered only if pcre_exec()
was called with an ovector that was too small to hold the capture, and there
was no other kind of back reference (a situation which is probably quite
rare). The effect of the bug was that the condition was always treated as
FALSE when the capture could not be consulted, leading to a incorrect
behaviour by pcre_exec(). This bug has been fixed.
9. A reference to a duplicated named group (either a back reference or a test
for being set in a conditional) that occurred in a part of the pattern where
PCRE_DUPNAMES was not set caused the amount of memory needed for the pattern
to be incorrectly calculated, leading to overwriting.
10. A mutually recursive set of back references such as (\2)(\1) caused a
segfault at study time (while trying to find the minimum matching length).
The infinite loop is now broken (with the minimum length unset, that is,
zero).
11. If an assertion that was used as a condition was quantified with a minimum
of zero, matching went wrong. In particular, if the whole group had
unlimited repetition and could match an empty string, a segfault was
likely. The pattern (?(?=0)?)+ is an example that caused this. Perl allows
assertions to be quantified, but not if they are being used as conditions,
so the above pattern is faulted by Perl. PCRE has now been changed so that
it also rejects such patterns.
12. A possessive capturing group such as (a)*+ with a minimum repeat of zero
failed to allow the zero-repeat case if pcre2_exec() was called with an
ovector too small to capture the group.
13. Fixed two bugs in pcretest that were discovered by fuzzing and reported by
Red Hat Product Security:
(a) A crash if /K and /F were both set with the option to save the compiled
pattern.
(b) Another crash if the option to print captured substrings in a callout
was combined with setting a null ovector, for example \O\C+ as a subject
string.
14. A pattern such as "((?2){0,1999}())?", which has a group containing a
forward reference repeated a large (but limited) number of times within a
repeated outer group that has a zero minimum quantifier, caused incorrect
code to be compiled, leading to the error "internal error:
previously-checked referenced subpattern not found" when an incorrect
memory address was read. This bug was reported as "heap overflow",
discovered by Kai Lu of Fortinet's FortiGuard Labs and given the CVE number
CVE-2015-2325.
23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
call within a group that also contained a recursive back reference caused
incorrect code to be compiled. This bug was reported as "heap overflow",
discovered by Kai Lu of Fortinet's FortiGuard Labs, and given the CVE
number CVE-2015-2326.
24. Computing the size of the JIT read-only data in advance has been a source
of various issues, and new ones are still appear unfortunately. To fix
existing and future issues, size computation is eliminated from the code,
and replaced by on-demand memory allocation.
25. A pattern such as /(?i)[A-`]/, where characters in the other case are
adjacent to the end of the range, and the range contained characters with
more than one other case, caused incorrect behaviour when compiled in UTF
mode. In that example, the range a-j was left out of the class.
26. Fix JIT compilation of conditional blocks, which assertion
is converted to (*FAIL). E.g: /(?(?!))/.
27. The pattern /(?(?!)^)/ caused references to random memory. This bug was
discovered by the LLVM fuzzer.
28. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
when this assertion was used as a condition, for example (?(?!)a|b). In
pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
error about an unsupported item.
29. For some types of pattern, for example /Z*(|d*){216}/, the auto-
possessification code could take exponential time to complete. A recursion
depth limit of 1000 has been imposed to limit the resources used by this
optimization.
30. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
because \S ensures they are all in the class. The code for doing this was
interacting badly with the code for computing the amount of space needed to
compile the pattern, leading to a buffer overflow. This bug was discovered
by the LLVM fuzzer.
31. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
other kinds of group caused stack overflow at compile time. This bug was
discovered by the LLVM fuzzer.
32. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
between a subroutine call and its quantifier was incorrectly compiled,
leading to buffer overflow or other errors. This bug was discovered by the
LLVM fuzzer.
33. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
assertion after (?(. The code was failing to check the character after
(?(?< for the ! or = that would indicate a lookbehind assertion. This bug
was discovered by the LLVM fuzzer.
34. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
a fixed maximum following a group that contains a subroutine reference was
incorrectly compiled and could trigger buffer overflow. This bug was
discovered by the LLVM fuzzer.
35. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
caused a stack overflow instead of the diagnosis of a non-fixed length
lookbehind assertion. This bug was discovered by the LLVM fuzzer.
36. The use of \K in a positive lookbehind assertion in a non-anchored pattern
(e.g. /(?<=\Ka)/) could make pcregrep loop.
37. There was a similar problem to 36 in pcretest for global matches.
38. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
and a subsequent item in the pattern caused a non-match, backtracking over
the repeated \X did not stop, but carried on past the start of the subject,
causing reference to random memory and/or a segfault. There were also some
other cases where backtracking after \C could crash. This set of bugs was
discovered by the LLVM fuzzer.
39. The function for finding the minimum length of a matching string could take
a very long time if mutual recursion was present many times in a pattern,
for example, /((?2){73}(?2))((?1))/. A better mutual recursion detection
method has been implemented. This infelicity was discovered by the LLVM
fuzzer.
40. Static linking against the PCRE library using the pkg-config module was
failing on missing pthread symbols.
Version 8.36 26-September-2014
------------------------------
......
......@@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language.
Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.
directory, is distributed under the same terms as the software itself. The data
in the testdata directory is not copyrighted and is in the public domain.
The basic library functions are written in C and are freestanding. Also
included in the distribution is a set of C++ wrapper functions, and a
......@@ -24,7 +25,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2015 University of Cambridge
All rights reserved.
......@@ -35,7 +36,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2014 Zoltan Herczeg
Copyright(c) 2010-2015 Zoltan Herczeg
All rights reserved.
......@@ -46,7 +47,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2014 Zoltan Herczeg
Copyright(c) 2009-2015 Zoltan Herczeg
All rights reserved.
......
News about PCRE releases
------------------------
Release 8.37 28-April-2015
--------------------------
This is bug-fix release. Note that this library (now called PCRE1) is now being
maintained for bug fixes only. New projects are advised to use the new PCRE2
libraries.
Release 8.36 26-September-2014
------------------------------
......
Building PCRE without using autotools
-------------------------------------
NOTE: This document relates to PCRE releases that use the original API, with
library names libpcre, libpcre16, and libpcre32. January 2015 saw the first
release of a new API, known as PCRE2, with release numbers starting at 10.00
and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old libraries
(now called PCRE1) are still being maintained for bug fixes, but there will be
no new development. New projects are advised to use the new PCRE2 libraries.
This document contains the following sections:
General
......@@ -761,4 +769,4 @@ There is also a mirror here:
http://www.vsoft-software.com/downloads.html
==========================
Last Updated: 14 May 2013
Last Updated: 10 February 2015
README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------
The latest release of PCRE is always available in three alternative formats
NOTE: This set of files relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
The latest release of PCRE1 is always available in three alternative formats
from:
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
......@@ -990,4 +999,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 24 October 2014
Last updated: 10 February 2015
......@@ -506,6 +506,11 @@ echo "---------------------------- Test 106 -----------------------------" >>tes
(cd $srcdir; echo "a" | $valgrind $pcregrep -M "|a" ) >>testtrygrep 2>&1
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 107 -----------------------------" >>testtrygrep
echo "a" >testtemp1grep
echo "aaaaa" >>testtemp1grep
(cd $srcdir; $valgrind $pcregrep --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
echo "RC=$?" >>testtrygrep
# Now compare the results.
......
......@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [36])
m4_define(pcre_minor, [37])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2014-09-26])
m4_define(pcre_date, [2015-04-28])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:4:2])
m4_define(libpcre16_version, [2:4:2])
m4_define(libpcre32_version, [0:4:0])
m4_define(libpcre_version, [3:5:2])
m4_define(libpcre16_version, [2:5:2])
m4_define(libpcre32_version, [0:5:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcrecpp_version, [0:1:0])
......
Building PCRE without using autotools
-------------------------------------
NOTE: This document relates to PCRE releases that use the original API, with
library names libpcre, libpcre16, and libpcre32. January 2015 saw the first
release of a new API, known as PCRE2, with release numbers starting at 10.00
and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old libraries
(now called PCRE1) are still being maintained for bug fixes, but there will be
no new development. New projects are advised to use the new PCRE2 libraries.
This document contains the following sections:
General
......@@ -761,4 +769,4 @@ There is also a mirror here:
http://www.vsoft-software.com/downloads.html
==========================
Last Updated: 14 May 2013
Last Updated: 10 February 2015
README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------
The latest release of PCRE is always available in three alternative formats
NOTE: This set of files relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
The latest release of PCRE1 is always available in three alternative formats
from:
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
......@@ -990,4 +999,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 24 October 2014
Last updated: 10 February 2015
......@@ -13,13 +13,24 @@ from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
<li><a name="TOC2" href="#SEC2">SECURITY CONSIDERATIONS</a>
<li><a name="TOC3" href="#SEC3">USER DOCUMENTATION</a>
<li><a name="TOC4" href="#SEC4">AUTHOR</a>
<li><a name="TOC5" href="#SEC5">REVISION</a>
<li><a name="TOC1" href="#SEC1">PLEASE TAKE NOTE</a>
<li><a name="TOC2" href="#SEC2">INTRODUCTION</a>
<li><a name="TOC3" href="#SEC3">SECURITY CONSIDERATIONS</a>
<li><a name="TOC4" href="#SEC4">USER DOCUMENTATION</a>
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
<li><a name="TOC6" href="#SEC6">REVISION</a>
</ul>
<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
<br><a name="SEC1" href="#TOC1">PLEASE TAKE NOTE</a><br>
<P>
This document relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
</P>
<br><a name="SEC2" href="#TOC1">INTRODUCTION</a><br>
<P>
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
......@@ -115,7 +126,7 @@ clashes. In some environments, it is possible to control which external symbols
are exported when a shared library is built, and in these cases the
undocumented symbols are not exported.
</P>
<br><a name="SEC2" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
<br><a name="SEC3" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
<P>
If you are using PCRE in a non-UTF application that permits users to supply
arbitrary patterns for compilation, you should be aware of a feature that
......@@ -149,7 +160,7 @@ against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.
</P>
<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>
<br><a name="SEC4" href="#TOC1">USER DOCUMENTATION</a><br>
<P>
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
......@@ -188,7 +199,7 @@ follows:
In the "man" and HTML formats, there is also a short page for each C library
function, listing its arguments and results.
</P>
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
......@@ -202,11 +213,11 @@ Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my two initials, followed by the
two digits 10, at the domain cam.ac.uk.
</P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P>
Last updated: 08 January 2014
Last updated: 10 February 2015
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.
......
.TH PCRE 3 "08 January 2014" "PCRE 8.35"
.TH PCRE 3 "10 February 2015" "PCRE 8.37"
.SH NAME
PCRE - Perl-compatible regular expressions
PCRE - Perl-compatible regular expressions (original API)
.SH "PLEASE TAKE NOTE"
.rs
.sp
This document relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
.
.
.SH INTRODUCTION
.rs
.sp
......@@ -213,6 +225,6 @@ two digits 10, at the domain cam.ac.uk.
.rs
.sp
.nf
Last updated: 08 January 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 10 February 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi
......@@ -13,7 +13,18 @@ PCRE(3) Library Functions Manual PCRE(3)
NAME
PCRE - Perl-compatible regular expressions
PCRE - Perl-compatible regular expressions (original API)
PLEASE TAKE NOTE
This document relates to PCRE releases that use the original API, with
library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers start-
ing at 10.00 and library names libpcre2-8, libpcre2-16, and
libpcre2-32. The old libraries (now called PCRE1) are still being main-
tained for bug fixes, but there will be no new development. New
projects are advised to use the new PCRE2 libraries.
INTRODUCTION
......@@ -179,8 +190,8 @@ AUTHOR
REVISION
Last updated: 08 January 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 10 February 2015
Copyright (c) 1997-2015 University of Cambridge.
------------------------------------------------------------------------------
......
This diff is collapsed.
......@@ -2736,9 +2736,10 @@ for (;;)
condcode == OP_DNRREF)
return PCRE_ERROR_DFA_UCOND;
/* The DEFINE condition is always false */
/* The DEFINE condition is always false, and the assertion (?!) is
converted to OP_FAIL. */
if (condcode == OP_DEF)
if (condcode == OP_DEF || condcode == OP_FAIL)
{ ADD_ACTIVE(state_offset + codelink + LINK_SIZE + 1, 0); }
/* The only supported version of OP_RREF is for the value RREF_ANY,
......
......@@ -1136,8 +1136,8 @@ for (;;)
printf("\n");
#endif
if (offset < md->offset_max)
{
if (offset >= md->offset_max) goto POSSESSIVE_NON_CAPTURE;
matched_once = FALSE;
code_offset = (int)(ecode - md->start_code);
......@@ -1211,18 +1211,6 @@ for (;;)
}
RRETURN(MATCH_NOMATCH);
}
/* FALL THROUGH ... Insufficient room for saving captured contents. Treat
as a non-capturing bracket. */
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
DPRINTF(("insufficient capture room: treat as non-capturing\n"));
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
/* Non-capturing possessive bracket with unlimited repeat. We come here
from BRAZERO with allow_zero = TRUE. The code is similar to the above,
......@@ -1388,6 +1376,7 @@ for (;;)
break;
case OP_DEF: /* DEFINE - always false */
case OP_FAIL: /* From optimized (?!) condition */
break;
/* The condition is an assertion. Call match() to evaluate it - setting
......@@ -1404,8 +1393,11 @@ for (;;)
condition = TRUE;
/* Advance ecode past the assertion to the start of the first branch,
but adjust it so that the general choosing code below works. */
but adjust it so that the general choosing code below works. If the
assertion has a quantifier that allows zero repeats we must skip over
the BRAZERO. This is a lunatic thing to do, but somebody did! */
if (*ecode == OP_BRAZERO) ecode++;
ecode += GET(ecode, 1);
while (*ecode == OP_ALT) ecode += GET(ecode, 1);
ecode += 1 + LINK_SIZE - PRIV(OP_lengths)[condcode];
......@@ -1474,7 +1466,18 @@ for (;;)
md->offset_vector[offset] =
md->offset_vector[md->offset_end - number];
md->offset_vector[offset+1] = (int)(eptr - md->start_subject);
if (offset_top <= offset) offset_top = offset + 2;
/* If this group is at or above the current highwater mark, ensure that
any groups between the current high water mark and this group are marked
unset and then update the high water mark. */
if (offset >= offset_top)
{
register int *iptr = md->offset_vector + offset_top;
register int *iend = md->offset_vector + offset;
while (iptr < iend) *iptr++ = -1;
offset_top = offset + 2;
}
}
ecode += 1 + IMM2_SIZE;
break;
......@@ -1826,7 +1829,11 @@ for (;;)
are defined in a range that can be tested for. */
if (rrc >= MATCH_BACKTRACK_MIN && rrc <= MATCH_BACKTRACK_MAX)
{
if (new_recursive.offset_save != stacksave)
(PUBL(free))(new_recursive.offset_save);
RRETURN(MATCH_NOMATCH);
}
/* Any return code other than NOMATCH is an error. */
......@@ -3476,7 +3483,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM23);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
#ifdef SUPPORT_UCP
......@@ -3897,7 +3904,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM30);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
......@@ -4032,7 +4039,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM34);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
......@@ -5603,7 +5610,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM44);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
......@@ -5645,12 +5652,17 @@ for (;;)
if (possessive) continue; /* No backtracking */
/* We use <= pp rather than == pp to detect the start of the run while
backtracking because the use of \C in UTF mode can cause BACKCHAR to
move back past pp. This is just palliative; the use of \C in UTF mode
is fraught with danger. */
for(;;)
{
int lgb, rgb;
PCRE_PUCHAR fptr;
if (eptr == pp) goto TAIL_RECURSE; /* At start of char run */
if (eptr <= pp) goto TAIL_RECURSE; /* At start of char run */
RMATCH(eptr, ecode, offset_top, md, eptrb, RM45);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
......@@ -5668,7 +5680,7 @@ for (;;)
for (;;)
{
if (eptr == pp) goto TAIL_RECURSE; /* At start of char run */
if (eptr <= pp) goto TAIL_RECURSE; /* At start of char run */
fptr = eptr - 1;
if (!utf) c = *fptr; else
{
......@@ -5918,7 +5930,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM46);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
......
......@@ -2446,6 +2446,7 @@ typedef struct compile_data {
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
BOOL check_lookbehind; /* Lookbehinds need later checking */
BOOL dupnames; /* Duplicate names exist */
BOOL iscondassert; /* Next assert is a condition */
int nltype; /* Newline type */
int nllen; /* Newline string length */
pcre_uchar nl[4]; /* Newline string when fixed length */
......@@ -2459,6 +2460,13 @@ typedef struct branch_chain {
pcre_uchar *current_branch;
} branch_chain;
/* Structure for mutual recursion detection. */
typedef struct recurse_check {
struct recurse_check *prev;
const pcre_uchar *group;
} recurse_check;
/* Structure for items in a linked list that represents an explicit recursive
call within the pattern; used by pcre_exec(). */
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -51,8 +51,6 @@ POSSIBILITY OF SUCH DAMAGE.
#include "pcre_internal.h"
#define PCRE_BUG 0x80000000
/*
Letter characters:
\xe6\x92\xad = 0x64ad = 25773 (kanji)
......@@ -69,6 +67,9 @@ POSSIBILITY OF SUCH DAMAGE.
\xc3\x89 = 0xc9 = 201 (E')
\xc3\xa1 = 0xe1 = 225 (a')
\xc3\x81 = 0xc1 = 193 (A')
\x53 = 0x53 = S
\x73 = 0x73 = s
\xc5\xbf = 0x17f = 383 (long S)
\xc8\xba = 0x23a = 570
\xe2\xb1\xa5 = 0x2c65 = 11365
\xe1\xbd\xb8 = 0x1f78 = 8056
......@@ -78,6 +79,10 @@ POSSIBILITY OF SUCH DAMAGE.
\xc7\x84 = 0x1c4 = 452
\xc7\x85 = 0x1c5 = 453
\xc7\x86 = 0x1c6 = 454
Caseless sets:
ucp_Armenian - \x{531}-\x{556} -> \x{561}-\x{586}
ucp_Coptic - \x{2c80}-\x{2ce3} -> caseless: XOR 0x1
ucp_Latin - \x{ff21}-\x{ff3a} -> \x{ff41]-\x{ff5a}
Mark property:
\xcc\x8d = 0x30d = 781
......@@ -626,6 +631,9 @@ static struct regression_test_case regression_test_cases[] = {
{ MUA, 0, "(?P<Name>a)?(?P<Name2>b)?(?(Name)c|d)+?dd", "bcabcacdb bdddd" },
{ MUA, 0, "(?P<Name>a)?(?P<Name2>b)?(?(Name)c|d)+l", "ababccddabdbccd abcccl" },
{ MUA, 0, "((?:a|aa)(?(1)aaa))x", "aax" },
{ MUA, 0, "(?(?!)a|b)", "ab" },
{ MUA, 0, "(?(?!)a)", "ab" },
{ MUA, 0 | F_NOMATCH, "(?(?!)a|b)", "ac" },
/* Set start of match. */
{ MUA, 0, "(?:\\Ka)*aaaab", "aaaaaaaa aaaaaaabb" },
......@@ -944,7 +952,7 @@ static void setstack16(pcre16_extra *extra)
pcre16_assign_jit_stack(extra, callback16, getstack16());
}
#endif /* SUPPORT_PCRE8 */
#endif /* SUPPORT_PCRE16 */
#ifdef SUPPORT_PCRE32
static pcre32_jit_stack *stack32;
......@@ -967,7 +975,7 @@ static void setstack32(pcre32_extra *extra)
pcre32_assign_jit_stack(extra, callback32, getstack32());
}
#endif /* SUPPORT_PCRE8 */
#endif /* SUPPORT_PCRE32 */
#ifdef SUPPORT_PCRE16
......@@ -1177,7 +1185,7 @@ static int regression_tests(void)
#elif defined SUPPORT_PCRE16
pcre16_config(PCRE_CONFIG_UTF16, &utf);
pcre16_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp);
#elif defined SUPPORT_PCRE16
#elif defined SUPPORT_PCRE32
pcre32_config(PCRE_CONFIG_UTF32, &utf);
pcre32_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp);
#endif
......
......@@ -70,7 +70,7 @@ rather than bytes.
code pointer to start of group (the bracket)
startcode pointer to start of the whole pattern's code
options the compiling options
int RECURSE depth
recurses chain of recurse_check to catch mutual recursion
Returns: the minimum length
-1 if \C in UTF-8 mode or (*ACCEPT) was encountered
......@@ -80,12 +80,13 @@ Returns: the minimum length
static int
find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
const pcre_uchar *startcode, int options, int recurse_depth)
const pcre_uchar *startcode, int options, recurse_check *recurses)
{
int length = -1;
/* PCRE_UTF16 has the same value as PCRE_UTF8. */
BOOL utf = (options & PCRE_UTF8) != 0;
BOOL had_recurse = FALSE;
recurse_check this_recurse;
register int branchlength = 0;
register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;
......@@ -130,7 +131,7 @@ for (;;)
case OP_SBRAPOS:
case OP_ONCE:
case OP_ONCE_NC:
d = find_minlength(re, cc, startcode, options, recurse_depth);
d = find_minlength(re, cc, startcode, options, recurses);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
......@@ -393,7 +394,7 @@ for (;;)
ce = cs = (pcre_uchar *)PRIV(find_bracket)(startcode, utf, GET2(slot, 0));
if (cs == NULL) return -2;
do ce += GET(ce, 1); while (*ce == OP_ALT);
if (cc > cs && cc < ce)
if (cc > cs && cc < ce) /* Simple recursion */
{
d = 0;
had_recurse = TRUE;
......@@ -401,9 +402,23 @@ for (;;)
}
else
{
int dd = find_minlength(re, cs, startcode, options, recurse_depth);
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) /* Mutual recursion */
{
d = 0;
had_recurse = TRUE;
break;
}
else
{
int dd;
this_recurse.prev = recurses;
this_recurse.group = cs;
dd = find_minlength(re, cs, startcode, options, &this_recurse);
if (dd < d) d = dd;
}
}
slot += re->name_entry_size;
}
}
......@@ -418,14 +433,26 @@ for (;;)
ce = cs = (pcre_uchar *)PRIV(find_bracket)(startcode, utf, GET2(cc, 1));
if (cs == NULL) return -2;
do ce += GET(ce, 1); while (*ce == OP_ALT);
if (cc > cs && cc < ce)
if (cc > cs && cc < ce) /* Simple recursion */
{
d = 0;
had_recurse = TRUE;
}
else
{
d = find_minlength(re, cs, startcode, options, recurse_depth);
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) /* Mutual recursion */
{
d = 0;
had_recurse = TRUE;
}
else
{
this_recurse.prev = recurses;
this_recurse.group = cs;
d = find_minlength(re, cs, startcode, options, &this_recurse);
}
}
}
else d = 0;
......@@ -474,12 +501,21 @@ for (;;)
case OP_RECURSE:
cs = ce = (pcre_uchar *)startcode + GET(cc, 1);
do ce += GET(ce, 1); while (*ce == OP_ALT);
if ((cc > cs && cc < ce) || recurse_depth > 10)
if (cc > cs && cc < ce) /* Simple recursion */
had_recurse = TRUE;
else
{
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) /* Mutual recursion */
had_recurse = TRUE;
else
{
this_recurse.prev = recurses;
this_recurse.group = cs;
branchlength += find_minlength(re, cs, startcode, options,
recurse_depth + 1);
&this_recurse);
}
}
cc += 1 + LINK_SIZE;
break;
......@@ -1503,7 +1539,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&
/* Find the minimum length of subject string. */
switch(min = find_minlength(re, code, code, re->options, 0))
switch(min = find_minlength(re, code, code, re->options, NULL))
{
case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
case -3: *errorptr = "internal error: opcode not recognized"; return NULL;
......
......@@ -1582,12 +1582,15 @@ while (ptr < endptr)
int endlinelength;
int mrc = 0;
int startoffset = 0;
int prevoffsets[2];
unsigned int options = 0;
BOOL match;
char *matchptr = ptr;
char *t = ptr;
size_t length, linelength;
prevoffsets[0] = prevoffsets[1] = -1;
/* At this point, ptr is at the start of a line. We need to find the length
of the subject string to pass to pcre_exec(). In multiline mode, it is the
length remainder of the data in the buffer. Otherwise, it is the length of
......@@ -1729,6 +1732,20 @@ while (ptr < endptr)
{
if (!invert)
{
int oldstartoffset = startoffset;
/* It is possible, when a lookbehind assertion contains \K, for the
same string to be found again. The code below advances startoffset, but
until it is past the "bumpalong" offset that gave the match, the same
substring will be returned. The PCRE1 library does not return the
bumpalong offset, so all we can do is ignore repeated strings. (PCRE2
does this better.) */
if (prevoffsets[0] != offsets[0] || prevoffsets[1] != offsets[1])
{
prevoffsets[0] = offsets[0];
prevoffsets[1] = offsets[1];
if (printname != NULL) fprintf(stdout, "%s:", printname);
if (number) fprintf(stdout, "%d:", linenumber);
......@@ -1771,13 +1788,30 @@ while (ptr < endptr)
if (printed || printname != NULL || number) fprintf(stdout, "\n");
}
}
/* Prepare to repeat to find the next match */
/* Prepare to repeat to find the next match. If the patterned contained
a lookbehind tht included \K, it is possible that the end of the match
might be at or before the actual strting offset we have just used. We
need to start one character further on. Unfortunately, for unanchored
patterns, the actual start offset can be greater that the one that was
set as a result of "bumpalong". PCRE1 does not return the actual start
offset, so we have to check against the original start offset. This may
lead to duplicates - we we need the fudge above to avoid printing them.
(PCRE2 does this better.) */
match = FALSE;
if (line_buffered) fflush(stdout);
rc = 0; /* Had some success */
startoffset = offsets[1]; /* Restart after the match */
if (startoffset <= oldstartoffset)
{
if ((size_t)startoffset >= length)
goto END_ONE_MATCH; /* We were at the end */
startoffset = oldstartoffset + 1;
if (utf8)
while ((matchptr[startoffset] & 0xc0) == 0x80) startoffset++;
}
goto ONLY_MATCHING_RESTART;
}
}
......@@ -1974,6 +2008,7 @@ while (ptr < endptr)
/* Advance to after the newline and increment the line number. The file
offset to the current line is maintained in filepos. */
END_ONE_MATCH:
ptr += linelength + endlinelength;
filepos += (int)(linelength + endlinelength);
linenumber++;
......
......@@ -2257,6 +2257,8 @@ if (callout_extra)
fprintf(f, "Callout %d: last capture = %d\n",
cb->callout_number, cb->capture_last);
if (cb->offset_vector != NULL)
{
for (i = 0; i < cb->capture_top * 2; i += 2)
{
if (cb->offset_vector[i] < 0)
......@@ -2270,6 +2272,7 @@ if (callout_extra)
}
}
}
}
/* Re-print the subject in canonical form, the first time or if giving full
datails. On subsequent calls in the same match, we use pchars just to find the
......@@ -2519,7 +2522,7 @@ re->name_entry_size = swap_uint16(re->name_entry_size);
re->name_count = swap_uint16(re->name_count);
re->ref_count = swap_uint16(re->ref_count);
if (extra != NULL)
if (extra != NULL && (extra->flags & PCRE_EXTRA_STUDY_DATA) != 0)
{
pcre_study_data *rsd = (pcre_study_data *)(extra->study_data);
rsd->size = swap_uint32(rsd->size);
......@@ -2700,7 +2703,7 @@ re->name_entry_size = swap_uint16(re->name_entry_size);
re->name_count = swap_uint16(re->name_count);
re->ref_count = swap_uint16(re->ref_count);
if (extra != NULL)
if (extra != NULL && (extra->flags & PCRE_EXTRA_STUDY_DATA) != 0)
{
pcre_study_data *rsd = (pcre_study_data *)(extra->study_data);
rsd->size = swap_uint32(rsd->size);
......@@ -3453,7 +3456,7 @@ while (!done)
pcre_extra *extra = NULL;
#if !defined NOPOSIX /* There are still compilers that require no indent */
regex_t preg;
regex_t preg = { NULL, 0, 0} ;
int do_posix = 0;
#endif
......@@ -5603,6 +5606,12 @@ while (!done)
if (!do_g && !do_G) break;
if (use_offsets == NULL)
{
fprintf(outfile, "Cannot do global matching without an ovector\n");
break;
}
/* If we have matched an empty string, first check to see if we are at
the end of the subject. If so, the /g loop is over. Otherwise, mimic what
Perl's /g options does. This turns out to be rather cunning. First we set
......@@ -5618,9 +5627,33 @@ while (!done)
g_notempty = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
}
/* For /g, update the start offset, leaving the rest alone */
/* For /g, update the start offset, leaving the rest alone. There is a
tricky case when \K is used in a positive lookbehind assertion. This can
cause the end of the match to be less than or equal to the start offset.
In this case we restart at one past the start offset. This may return the
same match if the original start offset was bumped along during the
match, but eventually the new start offset will hit the actual start
offset. (In PCRE2 the true start offset is available, and this can be
done better. It is not worth doing more than making sure we do not loop
at this stage in the life of PCRE1.) */
if (do_g) start_offset = use_offsets[1];
if (do_g)
{
if (g_notempty == 0 && use_offsets[1] <= start_offset)
{
if (start_offset >= len) break; /* End of subject */
start_offset++;
if (use_utf)
{
while (start_offset < len)
{
if ((bptr[start_offset] & 0xc0) != 0x80) break;
start_offset++;
}
}
}
else start_offset = use_offsets[1];
}
/* For /G, update the pointer and length */
......@@ -5637,7 +5670,7 @@ while (!done)
CONTINUE:
#if !defined NOPOSIX
if (posix || do_posix) regfree(&preg);
if ((posix || do_posix) && preg.re_pcre != 0) regfree(&preg);
#endif
if (re != NULL) new_free(re);
......
......@@ -5720,4 +5720,14 @@ AbcdCBefgBhiBqz
/[\Q]a\E]+/
aa]]
/(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/
1234abcd
/(\2)(\1)/
"Z*(|d*){216}"
"(?1)(?#?'){8}(a)"
baaaaaaaaac
/-- End of testinput1 --/
......@@ -134,4 +134,6 @@ is required for these tests. --/
/(((a\2)|(a*)\g<-1>))*a?/B
/((?+1)(\1))/B
/-- End of testinput11 --/
......@@ -87,4 +87,12 @@ and a couple of things that are different with JIT. --/
/^12345678abcd/mS++
12345678abcd
/-- Test pattern compilation --/
/(?:a|b|c|d|e)(?R)/S++
/(?:a|b|c|d|e)(?R)(?R)/S++
/(a(?:a|b|c|d|e)b){8,16}/S++
/-- End of testinput12 --/
......@@ -1380,6 +1380,8 @@
1X
123456\P
//KF>/dev/null
/abc/IS>testsavedregex
<testsavedregex
abc
......@@ -4078,4 +4080,76 @@ backtracking verbs. --/
/\x{whatever}/
"((?=(?(?=(?(?=(?(?=()))))))))"
a
"(?(?=)==)(((((((((?=)))))))))"
a
/^(?:(a)|b)(?(1)A|B)/I
aA123\O3
aA123\O6
'^(?:(?<AA>a)|b)(?(<AA>)A|B)'
aA123\O3
aA123\O6
'^(?<AA>)(?:(?<AA>a)|b)(?(<AA>)A|B)'J
aA123\O3
aA123\O6
'^(?:(?<AA>X)|)(?:(?<AA>a)|b)\k{AA}'J
aa123\O3
aa123\O6
/(?<N111>(?J)(?<N111>1(111111)11|)1|1|)(?(<N111>)1)/
/(?(?=0)?)+/
/(?(?=0)(?=00)?00765)/
00765
/(?(?=0)(?=00)?00765|(?!3).56)/
00765
456
** Failers
356
'^(a)*+(\w)'
g
g\O3
'^(?:a)*+(\w)'
g
g\O3
//C
\O\C+
"((?2){0,1999}())?"
/((?+1)(\1))/BZ
/(?(?!)a|b)/
bbb
aaa
"((?2)+)((?1))"
"(?(?<E>.*!.*)?)"
"X((?2)()*+){2}+"BZ
"X((?2)()*+){2}"BZ
"(?<=((?2))((?1)))"
/(?<=\Ka)/g+
aaaaa
/(?<=\Ka)/G+
aaaaa
/((?2){73}(?2))((?1))/
/-- End of testinput2 --/
......@@ -722,4 +722,9 @@
/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8
#\x{10000}#\x{100}#\x{10ffff}#
"[\S\V\H]"8
/\C(\W?ſ)'?{{/8
\\C(\\W?ſ)'?{{
/-- End of testinput4 --/
......@@ -790,4 +790,12 @@
/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ
/[^\xff]*PRUNE:\x{100}abc(xyz(?1))/8DZ
/(?<=\K\x{17f})/8g+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
/(?<=\K\x{17f})/8G+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
/-- End of testinput5 --/
......
......@@ -1496,4 +1496,10 @@
/^s?c/mi8
scat
/[A-`]/i8
abcdefghijklmno
/\C\X*QT/8
Ӆ\x0aT
/-- End of testinput6 --/
......@@ -4837,4 +4837,8 @@
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/(?(?!)a|b)/
bbb
aaa
/-- End of testinput8 --/
......@@ -9411,4 +9411,22 @@ No match
aa]]
0: aa]]
/(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/
1234abcd
0:
1: <unset>
2: <unset>
3: <unset>
4: <unset>
5:
/(\2)(\1)/
"Z*(|d*){216}"
"(?1)(?#?'){8}(a)"
baaaaaaaaac
0: aaaaaaaaa
1: a
/-- End of testinput1 --/
......@@ -231,7 +231,7 @@ Memory allocation (code space): 73
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 57
Memory allocation (code space): 61
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
......@@ -733,4 +733,19 @@ Memory allocation (code space): 14
41 End
------------------------------------------------------------------
/((?+1)(\1))/B
------------------------------------------------------------------
0 20 Bra
2 16 Once
4 12 CBra 1
7 9 Recurse
9 5 CBra 2
12 \1
14 5 Ket
16 12 Ket
18 16 Ket
20 20 Ket
22 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -231,7 +231,7 @@ Memory allocation (code space): 155
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 117
Memory allocation (code space): 125
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
......@@ -733,4 +733,19 @@ Memory allocation (code space): 28
41 End
------------------------------------------------------------------
/((?+1)(\1))/B
------------------------------------------------------------------
0 20 Bra
2 16 Once
4 12 CBra 1
7 9 Recurse
9 5 CBra 2
12 \1
14 5 Ket
16 12 Ket
18 16 Ket
20 20 Ket
22 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -231,7 +231,7 @@ Memory allocation (code space): 45
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 34
Memory allocation (code space): 38
------------------------------------------------------------------
0 30 Bra
3 7 CBra 1
......@@ -733,4 +733,19 @@ Memory allocation (code space): 10
60 End
------------------------------------------------------------------
/((?+1)(\1))/B
------------------------------------------------------------------
0 31 Bra
3 25 Once
6 19 CBra 1
11 14 Recurse
14 8 CBra 2
19 \1
22 8 Ket
25 19 Ket
28 25 Ket
31 31 Ket
34 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -176,4 +176,12 @@ No match, mark = m (JIT)
12345678abcd
0: 12345678abcd (JIT)
/-- Test pattern compilation --/
/(?:a|b|c|d|e)(?R)/S++
/(?:a|b|c|d|e)(?R)(?R)/S++
/(a(?:a|b|c|d|e)b){8,16}/S++
/-- End of testinput12 --/
......@@ -561,7 +561,7 @@ Failed: assertion expected after (?( at offset 3
Failed: reference to non-existent subpattern at offset 7
/(?(?<ab))/
Failed: syntax error in subpattern name (missing terminator) at offset 7
Failed: assertion expected after (?( at offset 3
/((?s)blah)\s+\1/I
Capturing subpattern count = 1
......@@ -1566,30 +1566,35 @@ Need char = 'b'
/a(?(1)b)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
No need char
/a(?(1)bag|big)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'g'
/a(?(1)bag|big)*(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
No need char
/a(?(1)bag|big)+(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'g'
/a(?(1)b..|b..)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'b'
......@@ -3379,24 +3384,28 @@ Need char = 'a'
/(?(1)ab|ac)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
No need char
/(?(1)abz|acz)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'z'
/(?(1)abz)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
No first char
No need char
/(?(1)abz)(1)23/I
Capturing subpattern count = 1
Max back reference = 1
No options
No first char
Need char = '3'
......@@ -5605,6 +5614,10 @@ No match
123456\P
No match
//KF>/dev/null
Compiled pattern written to /dev/null
Study data written to /dev/null
/abc/IS>testsavedregex
Capturing subpattern count = 0
No options
......@@ -6336,6 +6349,7 @@ No need char
/^(?P<A>a)?(?(A)a|b)/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
A 1
Options: anchored
......@@ -6353,6 +6367,7 @@ No match
/(?:(?(ZZ)a|b)(?P<ZZ>X))+/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
ZZ 1
No options
......@@ -6370,6 +6385,7 @@ Failed: reference to non-existent subpattern at offset 9
/(?:(?(ZZ)a|b)(?(ZZ)a|b)(?P<ZZ>X))+/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
ZZ 1
No options
......@@ -6381,6 +6397,7 @@ Need char = 'X'
/(?:(?(ZZ)a|\(b\))\\(?P<ZZ>X))+/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
ZZ 1
No options
......@@ -10226,6 +10243,7 @@ No starting char list
(?(1)|.) # check that there was an empty component
/xiIS
Capturing subpattern count = 1
Max back reference = 1
Options: anchored caseless extended
No first char
Need char = ':'
......@@ -10255,6 +10273,7 @@ Failed: different names for subpatterns of the same number are not allowed at of
b(?<quote> (?<apostrophe>')|(?<realquote>")) )
(?('quote')[a-z]+|[0-9]+)/JIx
Capturing subpattern count = 6
Max back reference = 1
Named capturing subpatterns:
apostrophe 2
apostrophe 5
......@@ -10317,6 +10336,7 @@ No match
End
------------------------------------------------------------------
Capturing subpattern count = 4
Max back reference = 4
Named capturing subpatterns:
D 4
D 1
......@@ -10364,6 +10384,7 @@ No match
End
------------------------------------------------------------------
Capturing subpattern count = 4
Max back reference = 1
Named capturing subpatterns:
A 1
A 4
......@@ -10486,6 +10507,7 @@ No starting char list
/()i(?(1)a)/SI
Capturing subpattern count = 1
Max back reference = 1
No options
No first char
Need char = 'i'
......@@ -14206,4 +14228,199 @@ Failed: digits missing in \x{} or \o{} at offset 3
/\x{whatever}/
Failed: non-hex character in \x{} (closing brace missing?) at offset 3
"((?=(?(?=(?(?=(?(?=()))))))))"
a
0:
1:
2:
"(?(?=)==)(((((((((?=)))))))))"
a
No match
/^(?:(a)|b)(?(1)A|B)/I
Capturing subpattern count = 1
Max back reference = 1
Options: anchored
No first char
No need char
aA123\O3
Matched, but too many substrings
0: aA
aA123\O6
0: aA
1: a
'^(?:(?<AA>a)|b)(?(<AA>)A|B)'
aA123\O3
Matched, but too many substrings
0: aA
aA123\O6
0: aA
1: a
'^(?<AA>)(?:(?<AA>a)|b)(?(<AA>)A|B)'J
aA123\O3
Matched, but too many substrings
0: aA
aA123\O6
Matched, but too many substrings
0: aA
1:
'^(?:(?<AA>X)|)(?:(?<AA>a)|b)\k{AA}'J
aa123\O3
Matched, but too many substrings
0: aa
aa123\O6
Matched, but too many substrings
0: aa
1: <unset>
/(?<N111>(?J)(?<N111>1(111111)11|)1|1|)(?(<N111>)1)/
/(?(?=0)?)+/
Failed: nothing to repeat at offset 7
/(?(?=0)(?=00)?00765)/
00765
0: 00765
/(?(?=0)(?=00)?00765|(?!3).56)/
00765
0: 00765
456
0: 456
** Failers
No match
356
No match
'^(a)*+(\w)'
g
0: g
1: <unset>
2: g
g\O3
Matched, but too many substrings
0: g
'^(?:a)*+(\w)'
g
0: g
1: g
g\O3
Matched, but too many substrings
0: g
//C
\O\C+
Callout 255: last capture = -1
--->
+0 ^
Matched, but too many substrings
"((?2){0,1999}())?"
/((?+1)(\1))/BZ
------------------------------------------------------------------
Bra
Once
CBra 1
Recurse
CBra 2
\1
Ket
Ket
Ket
Ket
End
------------------------------------------------------------------
/(?(?!)a|b)/
bbb
0: b
aaa
No match
"((?2)+)((?1))"
"(?(?<E>.*!.*)?)"
Failed: assertion expected after (?( at offset 3
"X((?2)()*+){2}+"BZ
------------------------------------------------------------------
Bra
X
Once
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
Ket
Ket
End
------------------------------------------------------------------
"X((?2)()*+){2}"BZ
------------------------------------------------------------------
Bra
X
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
Ket
End
------------------------------------------------------------------
"(?<=((?2))((?1)))"
Failed: lookbehind assertion is not fixed length at offset 17
/(?<=\Ka)/g+
aaaaa
0: a
0+ aaaa
0: a
0+ aaaa
0: a
0+ aaa
0: a
0+ aa
0: a
0+ a
0: a
0+
/(?<=\Ka)/G+
aaaaa
0: a
0+ aaaa
0: a
0+ aaa
0: a
0+ aa
0: a
0+ a
0: a
0+
/((?2){73}(?2))((?1))/
/-- End of testinput2 --/
......@@ -1271,4 +1271,10 @@ No match
#\x{10000}#\x{100}#\x{10ffff}#
0: #\x{10000}#\x{100}#\x{10ffff}#
"[\S\V\H]"8
/\C(\W?ſ)'?{{/8
\\C(\\W?ſ)'?{{
No match
/-- End of testinput4 --/
......@@ -1897,4 +1897,49 @@ Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 5
End
------------------------------------------------------------------
/[^\xff]*PRUNE:\x{100}abc(xyz(?1))/8DZ
------------------------------------------------------------------
Bra
[^\x{ff}]*
PRUNE:\x{100}abc
CBra 1
xyz
Recurse
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
No first char
Need char = 'z'
/(?<=\K\x{17f})/8g+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}
0: \x{17f}
0+
/(?<=\K\x{17f})/8G+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}
0: \x{17f}
0+
/-- End of testinput5 --/
......
......@@ -2461,4 +2461,12 @@ No match
scat
0: sc
/[A-`]/i8
abcdefghijklmno
0: a
/\C\X*QT/8
Ӆ\x0aT
No match
/-- End of testinput6 --/
......@@ -7785,4 +7785,10 @@ Matched, but offsets vector is too small to show all matches
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
/(?(?!)a|b)/
bbb
0: b
aaa
No match
/-- End of testinput8 --/
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment