Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
M
MariaDB
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
nexedi
MariaDB
Commits
1cac281e
Commit
1cac281e
authored
Mar 05, 2017
by
Vicențiu Ciorbaru
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'merge-pcre' into 10.0
parents
895b2539
dfd77491
Changes
30
Hide whitespace changes
Inline
Side-by-side
Showing
30 changed files
with
1449 additions
and
1166 deletions
+1449
-1166
pcre/AUTHORS
pcre/AUTHORS
+3
-3
pcre/CMakeLists.txt
pcre/CMakeLists.txt
+1
-0
pcre/ChangeLog
pcre/ChangeLog
+47
-0
pcre/LICENCE
pcre/LICENCE
+3
-3
pcre/NEWS
pcre/NEWS
+6
-0
pcre/configure.ac
pcre/configure.ac
+5
-5
pcre/doc/html/pcrecompat.html
pcre/doc/html/pcrecompat.html
+1
-1
pcre/doc/html/pcrepattern.html
pcre/doc/html/pcrepattern.html
+20
-17
pcre/doc/pcre.txt
pcre/doc/pcre.txt
+1057
-1052
pcre/doc/pcrecompat.3
pcre/doc/pcrecompat.3
+1
-1
pcre/doc/pcrepattern.3
pcre/doc/pcrepattern.3
+20
-17
pcre/pcre_compile.c
pcre/pcre_compile.c
+56
-24
pcre/pcre_jit_compile.c
pcre/pcre_jit_compile.c
+2
-2
pcre/pcre_jit_test.c
pcre/pcre_jit_test.c
+1
-0
pcre/pcregrep.c
pcre/pcregrep.c
+6
-0
pcre/pcretest.c
pcre/pcretest.c
+15
-5
pcre/testdata/testinput1
pcre/testdata/testinput1
+6
-0
pcre/testdata/testinput16
pcre/testdata/testinput16
+26
-0
pcre/testdata/testinput19
pcre/testdata/testinput19
+17
-0
pcre/testdata/testinput2
pcre/testdata/testinput2
+6
-0
pcre/testdata/testinput6
pcre/testdata/testinput6
+6
-0
pcre/testdata/testinput7
pcre/testdata/testinput7
+0
-9
pcre/testdata/testinput8
pcre/testdata/testinput8
+4
-0
pcre/testdata/testoutput1
pcre/testdata/testoutput1
+8
-0
pcre/testdata/testoutput16
pcre/testdata/testoutput16
+52
-0
pcre/testdata/testoutput19
pcre/testdata/testoutput19
+26
-0
pcre/testdata/testoutput2
pcre/testdata/testoutput2
+36
-1
pcre/testdata/testoutput6
pcre/testdata/testoutput6
+8
-0
pcre/testdata/testoutput7
pcre/testdata/testoutput7
+0
-26
pcre/testdata/testoutput8
pcre/testdata/testoutput8
+10
-0
No files found.
pcre/AUTHORS
View file @
1cac281e
...
...
@@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-201
6
University of Cambridge
Copyright (c) 1997-201
7
University of Cambridge
All rights reserved
...
...
@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-201
6
Zoltan Herczeg
Copyright(c) 2010-201
7
Zoltan Herczeg
All rights reserved.
...
...
@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-201
6
Zoltan Herczeg
Copyright(c) 2009-201
7
Zoltan Herczeg
All rights reserved.
...
...
pcre/CMakeLists.txt
View file @
1cac281e
...
...
@@ -66,6 +66,7 @@
# 2013-10-08 PH got rid of the "source" command, which is a bash-ism (use ".")
# 2013-11-05 PH added support for PARENS_NEST_LIMIT
# 2016-03-01 PH applied Chris Wilson's patch for MSVC static build
# 2016-06-24 PH applied Chris Wilson's revised patch (adds a separate option)
PROJECT
(
PCRE C CXX
)
...
...
pcre/ChangeLog
View file @
1cac281e
...
...
@@ -4,6 +4,53 @@ ChangeLog for PCRE
Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
development is happening in the PCRE2 10.xx series.
Version 8.40 11-January-2017
----------------------------
1. Using -o with -M in pcregrep could cause unnecessary repeated output when
the match extended over a line boundary.
2. Applied Chris Wilson's second patch (Bugzilla #1681) to CMakeLists.txt for
MSVC static compilation, putting the first patch under a new option.
3. Fix register overwite in JIT when SSE2 acceleration is enabled.
4. Ignore "show all captures" (/=) for DFA matching.
5. Fix JIT unaligned accesses on x86. Patch by Marc Mutz.
6. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode),
without PCRE_UCP set, a negative character type such as \D in a positive
class should cause all characters greater than 255 to match, whatever else
is in the class. There was a bug that caused this not to happen if a
Unicode property item was added to such a class, for example [\D\P{Nd}] or
[\W\pL].
7. When pcretest was outputing information from a callout, the caret indicator
for the current position in the subject line was incorrect if it was after
an escape sequence for a character whose code point was greater than
\x{ff}.
8. A pattern such as (?<RA>abc)(?(R)xyz) was incorrectly compiled such that
the conditional was interpreted as a reference to capturing group 1 instead
of a test for recursion. Any group whose name began with R was
misinterpreted in this way. (The reference interpretation should only
happen if the group's name is precisely "R".)
9. A number of bugs have been mended relating to match start-up optimizations
when the first thing in a pattern is a positive lookahead. These all
applied only when PCRE_NO_START_OPTIMIZE was *not* set:
(a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed
both an initial 'X' and a following 'X'.
(b) Some patterns starting with an assertion that started with .* were
incorrectly optimized as having to match at the start of the subject or
after a newline. There are cases where this is not true, for example,
(?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that
start with spaces. Starting .* in an assertion is no longer taken as an
indication of matching at the start (or after a newline).
Version 8.39 14-June-2016
-------------------------
...
...
pcre/LICENCE
View file @
1cac281e
...
...
@@ -25,7 +25,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-201
6
University of Cambridge
Copyright (c) 1997-201
7
University of Cambridge
All rights reserved.
...
...
@@ -36,7 +36,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-201
6
Zoltan Herczeg
Copyright(c) 2010-201
7
Zoltan Herczeg
All rights reserved.
...
...
@@ -47,7 +47,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-201
6
Zoltan Herczeg
Copyright(c) 2009-201
7
Zoltan Herczeg
All rights reserved.
...
...
pcre/NEWS
View file @
1cac281e
News about PCRE releases
------------------------
Release 8.40 11-January-2017
----------------------------
This is a bug-fix release.
Release 8.39 14-June-2016
-------------------------
...
...
pcre/configure.ac
View file @
1cac281e
...
...
@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [
39
])
m4_define(pcre_minor, [
40
])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [201
6-06-14
])
m4_define(pcre_date, [201
7-01-11
])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:
7
:2])
m4_define(libpcre16_version, [2:
7
:2])
m4_define(libpcre32_version, [0:
7
:0])
m4_define(libpcre_version, [3:
8
:2])
m4_define(libpcre16_version, [2:
8
:2])
m4_define(libpcre32_version, [0:
8
:0])
m4_define(libpcreposix_version, [0:4:0])
m4_define(libpcrecpp_version, [0:1:0])
...
...
pcre/doc/html/pcrecompat.html
View file @
1cac281e
...
...
@@ -128,7 +128,7 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
names is not as general as Perl's. This is a consequence of the fact the PCRE
works internally just with numbers, using an external table to translate
between numbers and names. In particular, a pattern such as (?|(?
<
a
>
A)|(?
<
b
)
B),
between numbers and names. In particular, a pattern such as (?|(?
<
a
>
A)|(?
<
b
>
B),
where the two capturing parentheses have the same number but different names,
is not supported, and causes an error at compile time. If it were allowed, it
would not be possible to distinguish which parentheses matched, because both
...
...
pcre/doc/html/pcrepattern.html
View file @
1cac281e
...
...
@@ -358,24 +358,24 @@ When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
generate the appropriate EBCDIC code values. The \c escape is processed
as specified for Perl in the
<b>
perlebcdic
</b>
document. The only characters
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \@ encodes
character code 0;
the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\
? becomes either 255 (hex FF) or 95 (hex 5F).
other character provokes a compile-time error. The sequence \
c
@ encodes
character code 0;
after \c the letters (in either case) encode characters 1-26
(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex
1F), and \c
? becomes either 255 (hex FF) or 95 (hex 5F).
</P>
<P>
Thus, apart from \?, these escapes generate the same character code values as
Thus, apart from \
c
?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \G always generates code value 7, which is BEL in ASCII
differ. For example, \
c
G always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
</P>
<P>
The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
The sequence \
c
? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \? generate 95; otherwise it generates 255.
values, PCRE makes \
c
? generate 95; otherwise it generates 255.
</P>
<P>
After \0 up to two further octal digits are read. If there are fewer than two
...
...
@@ -1512,13 +1512,8 @@ J, U and X respectively.
<P>
When one of these option changes occurs at top level (that is, not inside
subpattern parentheses), the change applies to the remainder of the pattern
that follows. If the change is placed right at the start of a pattern, PCRE
extracts it into the global options (and it will therefore show up in data
extracted by the
<b>
pcre_fullinfo()
</b>
function).
</P>
<P>
An option change within a subpattern (see below for a description of
subpatterns) affects only that part of the subpattern that follows it, so
that follows. An option change within a subpattern (see below for a description
of subpatterns) affects only that part of the subpattern that follows it, so
<pre>
(a(?i)b)c
</pre>
...
...
@@ -2160,6 +2155,14 @@ capturing is carried out only for positive assertions. (Perl sometimes, but not
always, does do capturing in negative assertions.)
</P>
<P>
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation, and as PCRE1 is now in
maintenance-only status, it is unlikely ever to change.
</P>
<P>
For compatibility with Perl, assertion subpatterns may be repeated; though
it makes no sense to assert the same thing several times, the side effect of
capturing parentheses may occasionally be useful. In practice, there only three
...
...
@@ -3264,9 +3267,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a
name=
"SEC30"
href=
"#TOC1"
>
REVISION
</a><br>
<P>
Last updated:
14 June 2015
Last updated:
23 October 2016
<br>
Copyright
©
1997-201
5
University of Cambridge.
Copyright
©
1997-201
6
University of Cambridge.
<br>
<p>
Return to the
<a
href=
"index.html"
>
PCRE index page
</a>
.
...
...
pcre/doc/pcre.txt
View file @
1cac281e
This source diff could not be displayed because it is too large. You can
view the blob
instead.
pcre/doc/pcrecompat.3
View file @
1cac281e
...
...
@@ -113,7 +113,7 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
names is not as general as Perl's. This is a consequence of the fact the PCRE
works internally just with numbers, using an external table to translate
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b
)
B),
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b
>
B),
where the two capturing parentheses have the same number but different names,
is not supported, and causes an error at compile time. If it were allowed, it
would not be possible to distinguish which parentheses matched, because both
...
...
pcre/doc/pcrepattern.3
View file @
1cac281e
.TH PCREPATTERN 3 "
14 June 2015" "PCRE 8.38
"
.TH PCREPATTERN 3 "
23 October 2016" "PCRE 8.40
"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
...
...
@@ -336,22 +336,22 @@ When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
generate the appropriate EBCDIC code values. The \ec escape is processed
as specified for Perl in the \fBperlebcdic\fP document. The only characters
that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \e@ encodes
character code 0;
the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\e
? becomes either 255 (hex FF) or 95 (hex 5F).
other character provokes a compile-time error. The sequence \e
c
@ encodes
character code 0;
after \ec the letters (in either case) encode characters 1-26
(hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex
1F), and \ec
? becomes either 255 (hex FF) or 95 (hex 5F).
.P
Thus, apart from \e?, these escapes generate the same character code values as
Thus, apart from \e
c
?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \eG always generates code value 7, which is BEL in ASCII
differ. For example, \e
c
G always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
.P
The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
The sequence \e
c
? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \e? generate 95; otherwise it generates 255.
values, PCRE makes \e
c
? generate 95; otherwise it generates 255.
.P
After \e0 up to two further octal digits are read. If there are fewer than two
digits, just those that are present are used. Thus the sequence \e0\ex\e015
...
...
@@ -1511,12 +1511,8 @@ J, U and X respectively.
.P
When one of these option changes occurs at top level (that is, not inside
subpattern parentheses), the change applies to the remainder of the pattern
that follows. If the change is placed right at the start of a pattern, PCRE
extracts it into the global options (and it will therefore show up in data
extracted by the \fBpcre_fullinfo()\fP function).
.P
An option change within a subpattern (see below for a description of
subpatterns) affects only that part of the subpattern that follows it, so
that follows. An option change within a subpattern (see below for a description
of subpatterns) affects only that part of the subpattern that follows it, so
.sp
(a(?i)b)c
.sp
...
...
@@ -2171,6 +2167,13 @@ numbering the capturing subpatterns in the whole pattern. However, substring
capturing is carried out only for positive assertions. (Perl sometimes, but not
always, does do capturing in negative assertions.)
.P
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation, and as PCRE1 is now in
maintenance-only status, it is unlikely ever to change.
.P
For compatibility with Perl, assertion subpatterns may be repeated; though
it makes no sense to assert the same thing several times, the side effect of
capturing parentheses may occasionally be useful. In practice, there only three
...
...
@@ -3296,6 +3299,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated:
14 June 2015
Copyright (c) 1997-201
5
University of Cambridge.
Last updated:
23 October 2016
Copyright (c) 1997-201
6
University of Cambridge.
.fi
pcre/pcre_compile.c
View file @
1cac281e
...
...
@@ -5579,6 +5579,34 @@ for (;; ptr++)
#endif
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
{
/* For non-UCP wide characters, in a non-negative class containing \S or
similar (should_flip_negation is set), all characters greater than 255
must be in the class. */
if
(
#if defined COMPILE_PCRE8
utf
&&
#endif
should_flip_negation
&&
!
negate_class
&&
(
options
&
PCRE_UCP
)
==
0
)
{
*
class_uchardata
++
=
XCL_RANGE
;
if
(
utf
)
/* Will always be utf in the 8-bit library */
{
class_uchardata
+=
PRIV
(
ord2utf
)(
0x100
,
class_uchardata
);
class_uchardata
+=
PRIV
(
ord2utf
)(
0x10ffff
,
class_uchardata
);
}
else
/* Can only happen for the 16-bit & 32-bit libraries */
{
#if defined COMPILE_PCRE16
*
class_uchardata
++
=
0x100
;
*
class_uchardata
++
=
0xffffu
;
#elif defined COMPILE_PCRE32
*
class_uchardata
++
=
0x100
;
*
class_uchardata
++
=
0xffffffffu
;
#endif
}
}
*
class_uchardata
++
=
XCL_END
;
/* Marks the end of extra data */
*
code
++
=
OP_XCLASS
;
code
+=
LINK_SIZE
;
...
...
@@ -6923,7 +6951,8 @@ for (;; ptr++)
slot
=
cd
->
name_table
;
for
(
i
=
0
;
i
<
cd
->
names_found
;
i
++
)
{
if
(
STRNCMP_UC_UC
(
name
,
slot
+
IMM2_SIZE
,
namelen
)
==
0
)
break
;
if
(
STRNCMP_UC_UC
(
name
,
slot
+
IMM2_SIZE
,
namelen
)
==
0
&&
slot
[
IMM2_SIZE
+
namelen
]
==
0
)
break
;
slot
+=
cd
->
name_entry_size
;
}
...
...
@@ -7889,15 +7918,17 @@ for (;; ptr++)
}
}
/* For a forward assertion, we take the reqchar, if set. This can be
helpful if the pattern that follows the assertion doesn't set a different
char. For example, it's useful for /(?=abcde).+/. We can't set firstchar
for an assertion, however because it leads to incorrect effect for patterns
such as /(?=a)a.+/ when the "real" "a" would then become a reqchar instead
of a firstchar. This is overcome by a scan at the end if there's no
firstchar, looking for an asserted first char. */
else
if
(
bravalue
==
OP_ASSERT
&&
subreqcharflags
>=
0
)
/* For a forward assertion, we take the reqchar, if set, provided that the
group has also set a first char. This can be helpful if the pattern that
follows the assertion doesn't set a different char. For example, it's
useful for /(?=abcde).+/. We can't set firstchar for an assertion, however
because it leads to incorrect effect for patterns such as /(?=a)a.+/ when
the "real" "a" would then become a reqchar instead of a firstchar. This is
overcome by a scan at the end if there's no firstchar, looking for an
asserted first char. */
else
if
(
bravalue
==
OP_ASSERT
&&
subreqcharflags
>=
0
&&
subfirstcharflags
>=
0
)
{
reqchar
=
subreqchar
;
reqcharflags
=
subreqcharflags
;
...
...
@@ -8686,8 +8717,8 @@ matching and for non-DOTALL patterns that start with .* (which must start at
the beginning or after \n). As in the case of is_anchored() (see above), we
have to take account of back references to capturing brackets that contain .*
because in that case we can't make the assumption. Also, the appearance of .*
inside atomic brackets or in a
pattern that contains *PRUNE or *SKIP does not
count, because once again the assumption no longer holds.
inside atomic brackets or in a
n assertion, or in a pattern that contains *PRUNE
or *SKIP does not
count, because once again the assumption no longer holds.
Arguments:
code points to start of expression (the bracket)
...
...
@@ -8696,13 +8727,14 @@ count, because once again the assumption no longer holds.
the less precise approach
cd points to the compile data
atomcount atomic group level
inassert TRUE if in an assertion
Returns: TRUE or FALSE
*/
static
BOOL
is_startline
(
const
pcre_uchar
*
code
,
unsigned
int
bracket_map
,
compile_data
*
cd
,
int
atomcount
)
compile_data
*
cd
,
int
atomcount
,
BOOL
inassert
)
{
do
{
const
pcre_uchar
*
scode
=
first_significant_code
(
...
...
@@ -8729,7 +8761,7 @@ do {
return
FALSE
;
default:
/* Assertion */
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
))
return
FALSE
;
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
,
TRUE
))
return
FALSE
;
do
scode
+=
GET
(
scode
,
1
);
while
(
*
scode
==
OP_ALT
);
scode
+=
1
+
LINK_SIZE
;
break
;
...
...
@@ -8743,7 +8775,7 @@ do {
if
(
op
==
OP_BRA
||
op
==
OP_BRAPOS
||
op
==
OP_SBRA
||
op
==
OP_SBRAPOS
)
{
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
))
return
FALSE
;
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
,
inassert
))
return
FALSE
;
}
/* Capturing brackets */
...
...
@@ -8753,33 +8785,33 @@ do {
{
int
n
=
GET2
(
scode
,
1
+
LINK_SIZE
);
int
new_map
=
bracket_map
|
((
n
<
32
)
?
(
1
<<
n
)
:
1
);
if
(
!
is_startline
(
scode
,
new_map
,
cd
,
atomcount
))
return
FALSE
;
if
(
!
is_startline
(
scode
,
new_map
,
cd
,
atomcount
,
inassert
))
return
FALSE
;
}
/* Positive forward assertions */
else
if
(
op
==
OP_ASSERT
)
{
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
))
return
FALSE
;
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
,
TRUE
))
return
FALSE
;
}
/* Atomic brackets */
else
if
(
op
==
OP_ONCE
||
op
==
OP_ONCE_NC
)
{
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
+
1
))
return
FALSE
;
if
(
!
is_startline
(
scode
,
bracket_map
,
cd
,
atomcount
+
1
,
inassert
))
return
FALSE
;
}
/* .* means "start at start or after \n" if it isn't in atomic brackets or
brackets that may be referenced
, as long as the pattern does not contain
*PRUNE or *SKIP, because these break the feature. Consider, for example,
/.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", i.e. not at the
start of a line. */
brackets that may be referenced
or an assertion, as long as the pattern does
not contain *PRUNE or *SKIP, because these break the feature. Consider, for
example, /.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", i.e.
not at the
start of a line. */
else
if
(
op
==
OP_TYPESTAR
||
op
==
OP_TYPEMINSTAR
||
op
==
OP_TYPEPOSSTAR
)
{
if
(
scode
[
1
]
!=
OP_ANY
||
(
bracket_map
&
cd
->
backref_map
)
!=
0
||
atomcount
>
0
||
cd
->
had_pruneorskip
)
atomcount
>
0
||
cd
->
had_pruneorskip
||
inassert
)
return
FALSE
;
}
...
...
@@ -9634,7 +9666,7 @@ if ((re->options & PCRE_ANCHORED) == 0)
re
->
flags
|=
PCRE_FIRSTSET
;
}
else
if
(
is_startline
(
codestart
,
0
,
cd
,
0
))
re
->
flags
|=
PCRE_STARTLINE
;
else
if
(
is_startline
(
codestart
,
0
,
cd
,
0
,
FALSE
))
re
->
flags
|=
PCRE_STARTLINE
;
}
}
...
...
pcre/pcre_jit_compile.c
View file @
1cac281e
...
...
@@ -4004,12 +4004,12 @@ sljit_emit_op_custom(compiler, instruction, 4);
if
(
load_twice
)
{
OP1
(
SLJIT_MOV
,
TMP3
,
0
,
TMP2
,
0
);
OP1
(
SLJIT_MOV
,
RETURN_ADDR
,
0
,
TMP2
,
0
);
instruction
[
3
]
=
0xc0
|
(
tmp2_ind
<<
3
)
|
1
;
sljit_emit_op_custom
(
compiler
,
instruction
,
4
);
OP2
(
SLJIT_OR
,
TMP1
,
0
,
TMP1
,
0
,
TMP2
,
0
);
OP1
(
SLJIT_MOV
,
TMP2
,
0
,
TMP3
,
0
);
OP1
(
SLJIT_MOV
,
TMP2
,
0
,
RETURN_ADDR
,
0
);
}
OP2
(
SLJIT_ASHR
,
TMP1
,
0
,
TMP1
,
0
,
TMP2
,
0
);
...
...
pcre/pcre_jit_test.c
View file @
1cac281e
...
...
@@ -687,6 +687,7 @@ static struct regression_test_case regression_test_cases[] = {
{
PCRE_FIRSTLINE
|
PCRE_NEWLINE_LF
|
PCRE_DOTALL
,
0
|
F_NOMATCH
,
"ab."
,
"ab"
},
{
MUA
|
PCRE_FIRSTLINE
,
1
|
F_NOMATCH
,
"^[a-d0-9]"
,
"
\n
xx
\n
d"
},
{
PCRE_NEWLINE_ANY
|
PCRE_FIRSTLINE
|
PCRE_DOTALL
,
0
,
"....a"
,
"012
\n
0a"
},
{
MUA
|
PCRE_FIRSTLINE
,
0
,
"[aC]"
,
"a"
},
/* Recurse. */
{
MUA
,
0
,
"(a)(?1)"
,
"aa"
},
...
...
pcre/pcregrep.c
View file @
1cac281e
...
...
@@ -1803,6 +1803,12 @@ while (ptr < endptr)
match
=
FALSE
;
if
(
line_buffered
)
fflush
(
stdout
);
rc
=
0
;
/* Had some success */
/* If the current match ended past the end of the line (only possible
in multiline mode), we are done with this line. */
if
((
unsigned
int
)
offsets
[
1
]
>
linelength
)
goto
END_ONE_MATCH
;
startoffset
=
offsets
[
1
];
/* Restart after the match */
if
(
startoffset
<=
oldstartoffset
)
{
...
...
pcre/pcretest.c
View file @
1cac281e
...
...
@@ -1982,6 +1982,7 @@ return(result);
static
int
pchar
(
pcre_uint32
c
,
FILE
*
f
)
{
int
n
=
0
;
char
tempbuffer
[
16
];
if
(
PRINTOK
(
c
))
{
if
(
f
!=
NULL
)
fprintf
(
f
,
"%c"
,
c
);
...
...
@@ -2003,6 +2004,8 @@ if (c < 0x100)
}
if
(
f
!=
NULL
)
n
=
fprintf
(
f
,
"
\\
x{%02x}"
,
c
);
else
n
=
sprintf
(
tempbuffer
,
"
\\
x{%02x}"
,
c
);
return
n
>=
0
?
n
:
0
;
}
...
...
@@ -5042,7 +5045,7 @@ while (!done)
if
((
all_use_dfa
||
use_dfa
)
&&
find_match_limit
)
{
printf
(
"**Match limit not relevant for DFA matching: ignored
\n
"
);
printf
(
"**
Match limit not relevant for DFA matching: ignored
\n
"
);
find_match_limit
=
0
;
}
...
...
@@ -5255,10 +5258,17 @@ while (!done)
if
(
do_allcaps
)
{
if
(
new_info
(
re
,
NULL
,
PCRE_INFO_CAPTURECOUNT
,
&
count
)
<
0
)
goto
SKIP_DATA
;
count
++
;
/* Allow for full match */
if
(
count
*
2
>
use_size_offsets
)
count
=
use_size_offsets
/
2
;
if
(
all_use_dfa
||
use_dfa
)
{
fprintf
(
outfile
,
"** Show all captures ignored after DFA matching
\n
"
);
}
else
{
if
(
new_info
(
re
,
NULL
,
PCRE_INFO_CAPTURECOUNT
,
&
count
)
<
0
)
goto
SKIP_DATA
;
count
++
;
/* Allow for full match */
if
(
count
*
2
>
use_size_offsets
)
count
=
use_size_offsets
/
2
;
}
}
/* Output the captured substrings. Note that, for the matched string,
...
...
pcre/testdata/testinput1
View file @
1cac281e
...
...
@@ -5733,4 +5733,10 @@ AbcdCBefgBhiBqz
"(?|(\k'Pm')|(?'Pm'))"
abcd
/(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[,;:])(?=.{8,16})(?!.*[\s])/
\ Fred:099
/(?=.*X)X$/
\ X
/-- End of testinput1 --/
pcre/testdata/testinput16
View file @
1cac281e
...
...
@@ -38,4 +38,30 @@
/s+/i8SI
SSss\x{17f}
/[\W\p{Any}]/BZ
abc
123
/[\W\pL]/BZ
abc
** Failers
123
/[\D]/8
\x{1d7cf}
/[\D\P{Nd}]/8
\x{1d7cf}
/[^\D]/8
a9b
** Failers
\x{1d7cf}
/[^\D\P{Nd}]/8
a9b
\x{1d7cf}
** Failers
\x{10000}
/-- End of testinput16 --/
pcre/testdata/testinput19
View file @
1cac281e
...
...
@@ -25,4 +25,21 @@
/s+/i8SI
SSss\x{17f}
/[\D]/8
\x{1d7cf}
/[\D\P{Nd}]/8
\x{1d7cf}
/[^\D]/8
a9b
** Failers
\x{1d7cf}
/[^\D\P{Nd}]/8
a9b
\x{1d7cf}
** Failers
\x{10000}
/-- End of testinput19 --/
pcre/testdata/testinput2
View file @
1cac281e
...
...
@@ -4243,4 +4243,10 @@ backtracking verbs. --/
/\N(?(?C)0?!.)*/
/(?<RA>abc)(?(R)xyz)/BZ
/(?<R>abc)(?(R)xyz)/BZ
/(?=.*[A-Z])/I
/-- End of testinput2 --/
pcre/testdata/testinput6
View file @
1cac281e
...
...
@@ -1562,4 +1562,10 @@
\x{389}
\x{20ac}
/(?=.*b)\pL/
11bb
/(?(?=.*b)(?=.*b)\pL|.*c)/
11bb
/-- End of testinput6 --/
pcre/testdata/testinput7
View file @
1cac281e
...
...
@@ -838,15 +838,6 @@ of case for anything other than the ASCII letters. --/
/^s?c/mi8I
scat
/[\W\p{Any}]/BZ
abc
123
/[\W\pL]/BZ
abc
** Failers
123
/a[[:punct:]b]/WBZ
/a[[:punct:]b]/8WBZ
...
...
pcre/testdata/testinput8
View file @
1cac281e
...
...
@@ -4841,4 +4841,8 @@
bbb
aaa
/()()a+/O=
aaa\D
a\D
/-- End of testinput8 --/
pcre/testdata/testoutput1
View file @
1cac281e
...
...
@@ -9434,4 +9434,12 @@ No match
0:
1:
/(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[,;:])(?=.{8,16})(?!.*[\s])/
\ Fred:099
0:
/(?=.*X)X$/
\ X
0: X
/-- End of testinput1 --/
pcre/testdata/testoutput16
View file @
1cac281e
...
...
@@ -138,4 +138,56 @@ Starting chars: S s \xc5
SSss\x{17f}
0: SSss\x{17f}
/[\W\p{Any}]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}]
Ket
End
------------------------------------------------------------------
abc
0: a
** Failers
0: *
123
No match
/[\D]/8
\x{1d7cf}
0: \x{1d7cf}
/[\D\P{Nd}]/8
\x{1d7cf}
0: \x{1d7cf}
/[^\D]/8
a9b
0: 9
** Failers
No match
\x{1d7cf}
No match
/[^\D\P{Nd}]/8
a9b
0: 9
\x{1d7cf}
0: \x{1d7cf}
** Failers
No match
\x{10000}
No match
/-- End of testinput16 --/
pcre/testdata/testoutput19
View file @
1cac281e
...
...
@@ -105,4 +105,30 @@ Starting chars: S s \xff
SSss\x{17f}
0: SSss\x{17f}
/[\D]/8
\x{1d7cf}
0: \x{1d7cf}
/[\D\P{Nd}]/8
\x{1d7cf}
0: \x{1d7cf}
/[^\D]/8
a9b
0: 9
** Failers
No match
\x{1d7cf}
No match
/[^\D\P{Nd}]/8
a9b
0: 9
\x{1d7cf}
0: \x{1d7cf}
** Failers
No match
\x{10000}
No match
/-- End of testinput19 --/
pcre/testdata/testoutput2
View file @
1cac281e
...
...
@@ -9380,7 +9380,7 @@ No need char
/(?(?=.*b).*b|^d)/I
Capturing subpattern count = 0
No options
First char at start or follows newline
No first char
No need char
/xyz/C
...
...
@@ -14670,4 +14670,39 @@ No match
/\N(?(?C)0?!.)*/
Failed: assertion expected after (?( or (?(?C) at offset 4
/(?<RA>abc)(?(R)xyz)/BZ
------------------------------------------------------------------
Bra
CBra 1
abc
Ket
Cond
Cond recurse any
xyz
Ket
Ket
End
------------------------------------------------------------------
/(?<R>abc)(?(R)xyz)/BZ
------------------------------------------------------------------
Bra
CBra 1
abc
Ket
Cond
1 Cond ref
xyz
Ket
Ket
End
------------------------------------------------------------------
/(?=.*[A-Z])/I
Capturing subpattern count = 0
May match empty string
No options
No first char
No need char
/-- End of testinput2 --/
pcre/testdata/testoutput6
View file @
1cac281e
...
...
@@ -2573,4 +2573,12 @@ No match
\x{20ac}
No match
/(?=.*b)\pL/
11bb
0: b
/(?(?=.*b)(?=.*b)\pL|.*c)/
11bb
0: b
/-- End of testinput6 --/
pcre/testdata/testoutput7
View file @
1cac281e
...
...
@@ -2295,32 +2295,6 @@ Need char = 'c' (caseless)
scat
0: sc
/[\W\p{Any}]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}]
Ket
End
------------------------------------------------------------------
abc
0: a
** Failers
0: *
123
No match
/a[[:punct:]b]/WBZ
------------------------------------------------------------------
Bra
...
...
pcre/testdata/testoutput8
View file @
1cac281e
...
...
@@ -7791,4 +7791,14 @@ Matched, but offsets vector is too small to show all matches
aaa
No match
/()()a+/O=
aaa\D
** Show all captures ignored after DFA matching
0: aaa
1: aa
2: a
a\D
** Show all captures ignored after DFA matching
0: a
/-- End of testinput8 --/
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment