Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
M
MariaDB
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
nexedi
MariaDB
Commits
52eb4f17
Commit
52eb4f17
authored
Apr 26, 2019
by
Sergei Golubchik
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'merge-pcre' into 10.1
parents
1389c94b
879f7e85
Changes
16
Hide whitespace changes
Inline
Side-by-side
Showing
16 changed files
with
227 additions
and
27 deletions
+227
-27
pcre/AUTHORS
pcre/AUTHORS
+3
-3
pcre/ChangeLog
pcre/ChangeLog
+43
-0
pcre/LICENCE
pcre/LICENCE
+5
-5
pcre/NEWS
pcre/NEWS
+10
-0
pcre/configure.ac
pcre/configure.ac
+5
-5
pcre/pcre_compile.c
pcre/pcre_compile.c
+14
-4
pcre/pcre_jit_compile.c
pcre/pcre_jit_compile.c
+1
-1
pcre/pcrecpp.cc
pcre/pcrecpp.cc
+62
-2
pcre/pcrecpp_unittest.cc
pcre/pcrecpp_unittest.cc
+29
-5
pcre/pcregrep.c
pcre/pcregrep.c
+2
-2
pcre/testdata/testinput1
pcre/testdata/testinput1
+15
-0
pcre/testdata/testinput2
pcre/testdata/testinput2
+3
-0
pcre/testdata/testinput4
pcre/testdata/testinput4
+3
-0
pcre/testdata/testoutput1
pcre/testdata/testoutput1
+24
-0
pcre/testdata/testoutput2
pcre/testdata/testoutput2
+4
-0
pcre/testdata/testoutput4
pcre/testdata/testoutput4
+4
-0
No files found.
pcre/AUTHORS
View file @
52eb4f17
...
...
@@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-201
8
University of Cambridge
Copyright (c) 1997-201
9
University of Cambridge
All rights reserved
...
...
@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-201
8
Zoltan Herczeg
Copyright(c) 2010-201
9
Zoltan Herczeg
All rights reserved.
...
...
@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-201
8
Zoltan Herczeg
Copyright(c) 2009-201
9
Zoltan Herczeg
All rights reserved.
...
...
pcre/ChangeLog
View file @
52eb4f17
...
...
@@ -5,6 +5,49 @@ Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
development is happening in the PCRE2 10.xx series.
Version 8.43 23-February-2019
-----------------------------
1. Some time ago the config macro SUPPORT_UTF8 was changed to SUPPORT_UTF
because it also applies to UTF-16 and UTF-32. However, this change was not made
in the pcre2cpp files; consequently the C++ wrapper has from then been compiled
with a bug in it, which would have been picked up by the unit test except that
it also had its UTF8 code cut out. The bug was in a global replace when moving
forward after matching an empty string.
2. The C++ wrapper got broken a long time ago (version 7.3, August 2007) when
(*CR) was invented (assuming it was the first such start-of-pattern option).
The wrapper could never handle such patterns because it wraps patterns in
(?:...)\z in order to support end anchoring. I have hacked in some code to fix
this, that is, move the wrapping till after any existing start-of-pattern
special settings.
3. "pcre2grep" (sic) was accidentally mentioned in an error message (fix was
ported from PCRE2).
4. Typo LCC_ALL for LC_ALL fixed in pcregrep.
5. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
negative class with no characters less than 0x100 followed by a positive class
with only characters less than 0x100, the first class was incorrectly being
auto-possessified, causing incorrect match failures.
6. If the only branch in a conditional subpattern was anchored, the whole
subpattern was treated as anchored, when it should not have been, since the
assumed empty second branch cannot be anchored. Demonstrated by test patterns
such as /(?(1)^())b/ or /(?(?=^))b/.
7. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
8. If a pattern started with a subroutine call that had a quantifier with a
minimum of zero, an incorrect "match must start with this character" could be
recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
be the first character of a match.
9. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.
Version 8.42 20-March-2018
--------------------------
...
...
pcre/LICENCE
View file @
52eb4f17
...
...
@@ -25,7 +25,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-201
8
University of Cambridge
Copyright (c) 1997-201
9
University of Cambridge
All rights reserved.
...
...
@@ -34,9 +34,9 @@ PCRE JUST-IN-TIME COMPILATION SUPPORT
Written by: Zoltan Herczeg
Email local part: hzmester
Emai
n
domain: freemail.hu
Emai
l
domain: freemail.hu
Copyright(c) 2010-201
8
Zoltan Herczeg
Copyright(c) 2010-201
9
Zoltan Herczeg
All rights reserved.
...
...
@@ -45,9 +45,9 @@ STACK-LESS JUST-IN-TIME COMPILER
Written by: Zoltan Herczeg
Email local part: hzmester
Emai
n
domain: freemail.hu
Emai
l
domain: freemail.hu
Copyright(c) 2009-201
8
Zoltan Herczeg
Copyright(c) 2009-201
9
Zoltan Herczeg
All rights reserved.
...
...
pcre/NEWS
View file @
52eb4f17
News about PCRE releases
------------------------
Note that this library (now called PCRE1) is now being maintained for bug fixes
only. New projects are advised to use the new PCRE2 libraries.
Release 8.43 23-February-2019
-----------------------------
This is a bug-fix release.
Release 8.42 20-March-2018
--------------------------
...
...
pcre/configure.ac
View file @
52eb4f17
...
...
@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [4
2
])
m4_define(pcre_minor, [4
3
])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [201
8-03-20
])
m4_define(pcre_date, [201
9-02-23
])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:1
0
:2])
m4_define(libpcre16_version, [2:1
0
:2])
m4_define(libpcre32_version, [0:1
0
:0])
m4_define(libpcre_version, [3:1
1
:2])
m4_define(libpcre16_version, [2:1
1
:2])
m4_define(libpcre32_version, [0:1
1
:0])
m4_define(libpcreposix_version, [0:6:0])
m4_define(libpcrecpp_version, [0:1:0])
...
...
pcre/pcre_compile.c
View file @
52eb4f17
...
...
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-201
6
University of Cambridge
Copyright (c) 1997-201
8
University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
...
...
@@ -3300,7 +3300,7 @@ for(;;)
if
((
*
xclass_flags
&
XCL_MAP
)
==
0
)
{
/* No bits are set for characters < 256. */
if
(
list
[
1
]
==
0
)
return
TRUE
;
if
(
list
[
1
]
==
0
)
return
(
*
xclass_flags
&
XCL_NOT
)
==
0
;
/* Might be an empty repeat. */
continue
;
}
...
...
@@ -7645,6 +7645,8 @@ for (;; ptr++)
/* Can't determine a first byte now */
if
(
firstcharflags
==
REQ_UNSET
)
firstcharflags
=
REQ_NONE
;
zerofirstchar
=
firstchar
;
zerofirstcharflags
=
firstcharflags
;
continue
;
...
...
@@ -8685,10 +8687,18 @@ do {
if
(
!
is_anchored
(
scode
,
new_map
,
cd
,
atomcount
))
return
FALSE
;
}
/* Positive forward assertion
s and conditions
*/
/* Positive forward assertion */
else
if
(
op
==
OP_ASSERT
||
op
==
OP_COND
)
else
if
(
op
==
OP_ASSERT
)
{
if
(
!
is_anchored
(
scode
,
bracket_map
,
cd
,
atomcount
))
return
FALSE
;
}
/* Condition; not anchored if no second branch */
else
if
(
op
==
OP_COND
)
{
if
(
scode
[
GET
(
scode
,
1
)]
!=
OP_ALT
)
return
FALSE
;
if
(
!
is_anchored
(
scode
,
bracket_map
,
cd
,
atomcount
))
return
FALSE
;
}
...
...
pcre/pcre_jit_compile.c
View file @
52eb4f17
...
...
@@ -9002,7 +9002,7 @@ if (exact > 1)
#ifdef SUPPORT_UTF
&&
!
common
->
utf
#endif
)
&&
type
!=
OP_ANYNL
&&
type
!=
OP_EXTUNI
)
{
OP2
(
SLJIT_ADD
,
TMP1
,
0
,
STR_PTR
,
0
,
SLJIT_IMM
,
IN_UCHARS
(
exact
));
add_jump
(
compiler
,
&
backtrack
->
topbacktracks
,
CMP
(
SLJIT_GREATER
,
TMP1
,
0
,
STR_END
,
0
));
...
...
pcre/pcrecpp.cc
View file @
52eb4f17
...
...
@@ -80,6 +80,24 @@ static const string empty_string;
// If the user doesn't ask for any options, we just use this one
static
RE_Options
default_options
;
// Specials for the start of patterns. See comments where start_options is used
// below. (PH June 2018)
static
const
char
*
start_options
[]
=
{
"(*UTF8)"
,
"(*UTF)"
,
"(*UCP)"
,
"(*NO_START_OPT)"
,
"(*NO_AUTO_POSSESS)"
,
"(*LIMIT_RECURSION="
,
"(*LIMIT_MATCH="
,
"(*CRLF)"
,
"(*CR)"
,
"(*BSR_UNICODE)"
,
"(*BSR_ANYCRLF)"
,
"(*ANYCRLF)"
,
"(*ANY)"
,
""
};
void
RE
::
Init
(
const
string
&
pat
,
const
RE_Options
*
options
)
{
pattern_
=
pat
;
if
(
options
==
NULL
)
{
...
...
@@ -135,7 +153,49 @@ pcre* RE::Compile(Anchor anchor) {
}
else
{
// Tack a '\z' at the end of RE. Parenthesize it first so that
// the '\z' applies to all top-level alternatives in the regexp.
string
wrapped
=
"(?:"
;
// A non-counting grouping operator
/* When this code was written (for PCRE 6.0) it was enough just to
parenthesize the entire pattern. Unfortunately, when the feature of
starting patterns with (*UTF8) or (*CR) etc. was added to PCRE patterns,
this code was never updated. This bug was not noticed till 2018, long after
PCRE became obsolescent and its maintainer no longer around. Since PCRE is
frozen, I have added a hack to check for all the existing "start of
pattern" specials - knowing that no new ones will ever be added. I am not a
C++ programmer, so the code style is no doubt crude. It is also
inefficient, but is only run when the pattern starts with "(*".
PH June 2018. */
string
wrapped
=
""
;
if
(
pattern_
.
c_str
()[
0
]
==
'('
&&
pattern_
.
c_str
()[
1
]
==
'*'
)
{
int
kk
,
klen
,
kmat
;
for
(;;)
{
// Loop for any number of leading items
for
(
kk
=
0
;
start_options
[
kk
][
0
]
!=
0
;
kk
++
)
{
klen
=
strlen
(
start_options
[
kk
]);
kmat
=
strncmp
(
pattern_
.
c_str
(),
start_options
[
kk
],
klen
);
if
(
kmat
>=
0
)
break
;
}
if
(
kmat
!=
0
)
break
;
// Not found
// If the item ended in "=" we must copy digits up to ")".
if
(
start_options
[
kk
][
klen
-
1
]
==
'='
)
{
while
(
isdigit
(
pattern_
.
c_str
()[
klen
]))
klen
++
;
if
(
pattern_
.
c_str
()[
klen
]
!=
')'
)
break
;
// Syntax error
klen
++
;
}
// Move the item from the pattern to the start of the wrapped string.
wrapped
+=
pattern_
.
substr
(
0
,
klen
);
pattern_
.
erase
(
0
,
klen
);
}
}
// Wrap the rest of the pattern.
wrapped
+=
"(?:"
;
// A non-counting grouping operator
wrapped
+=
pattern_
;
wrapped
+=
")
\\
z"
;
re
=
pcre_compile
(
wrapped
.
c_str
(),
pcre_options
,
...
...
@@ -415,7 +475,7 @@ int RE::GlobalReplace(const StringPiece& rewrite,
matchend
++
;
}
// We also need to advance more than one char if we're in utf8 mode.
#ifdef SUPPORT_UTF
8
#ifdef SUPPORT_UTF
if
(
options_
.
utf8
())
{
while
(
matchend
<
static_cast
<
int
>
(
str
->
length
())
&&
((
*
str
)[
matchend
]
&
0xc0
)
==
0x80
)
...
...
pcre/pcrecpp_unittest.cc
View file @
52eb4f17
...
...
@@ -309,7 +309,7 @@ static void TestReplace() {
"@aa"
,
"@@@"
,
3
},
#ifdef SUPPORT_UTF
8
#ifdef SUPPORT_UTF
{
"b*"
,
"bb"
,
"
\xE3\x83\x9B\xE3\x83\xBC\xE3\x83\xA0\xE3\x81\xB8
"
,
// utf8
...
...
@@ -327,7 +327,7 @@ static void TestReplace() {
{
""
,
NULL
,
NULL
,
NULL
,
NULL
,
0
}
};
#ifdef SUPPORT_UTF
8
#ifdef SUPPORT_UTF
const
bool
support_utf8
=
true
;
#else
const
bool
support_utf8
=
false
;
...
...
@@ -535,7 +535,7 @@ static void TestQuoteMetaLatin1() {
}
static
void
TestQuoteMetaUtf8
()
{
#ifdef SUPPORT_UTF
8
#ifdef SUPPORT_UTF
TestQuoteMeta
(
"Pl
\xc3\xa1\x63
ido Domingo"
,
pcrecpp
::
UTF8
());
TestQuoteMeta
(
"xyz"
,
pcrecpp
::
UTF8
());
// No fancy utf8
TestQuoteMeta
(
"
\xc2\xb0
"
,
pcrecpp
::
UTF8
());
// 2-byte utf8 (degree symbol)
...
...
@@ -1178,7 +1178,7 @@ int main(int argc, char** argv) {
CHECK
(
re
.
error
().
empty
());
// Must have no error
}
#ifdef SUPPORT_UTF
8
#ifdef SUPPORT_UTF
// Check UTF-8 handling
{
printf
(
"Testing UTF-8 handling
\n
"
);
...
...
@@ -1203,6 +1203,30 @@ int main(int argc, char** argv) {
RE
re_test2
(
"..."
,
pcrecpp
::
UTF8
());
CHECK
(
re_test2
.
FullMatch
(
utf8_string
));
// PH added these tests for leading option settings
RE
re_testZ0
(
"(*CR)(*NO_START_OPT)........."
);
CHECK
(
re_testZ0
.
FullMatch
(
utf8_string
));
#ifdef SUPPORT_UTF
RE
re_testZ1
(
"(*UTF8)..."
);
CHECK
(
re_testZ1
.
FullMatch
(
utf8_string
));
RE
re_testZ2
(
"(*UTF)..."
);
CHECK
(
re_testZ2
.
FullMatch
(
utf8_string
));
#ifdef SUPPORT_UCP
RE
re_testZ3
(
"(*UCP)(*UTF)..."
);
CHECK
(
re_testZ3
.
FullMatch
(
utf8_string
));
RE
re_testZ4
(
"(*UCP)(*LIMIT_MATCH=1000)(*UTF)..."
);
CHECK
(
re_testZ4
.
FullMatch
(
utf8_string
));
RE
re_testZ5
(
"(*UCP)(*LIMIT_MATCH=1000)(*ANY)(*UTF)..."
);
CHECK
(
re_testZ5
.
FullMatch
(
utf8_string
));
#endif
#endif
// Check that '.' matches one byte or UTF-8 character
// according to the mode.
string
ss
;
...
...
@@ -1248,7 +1272,7 @@ int main(int argc, char** argv) {
CHECK
(
!
match_sentence
.
FullMatch
(
target
));
CHECK
(
!
match_sentence_re
.
FullMatch
(
target
));
}
#endif
/* def SUPPORT_UTF
8
*/
#endif
/* def SUPPORT_UTF */
printf
(
"Testing error reporting
\n
"
);
...
...
pcre/pcregrep.c
View file @
52eb4f17
...
...
@@ -2252,7 +2252,7 @@ if (isdirectory(pathname))
int
fnlength
=
strlen
(
pathname
)
+
strlen
(
nextfile
)
+
2
;
if
(
fnlength
>
2048
)
{
fprintf
(
stderr
,
"pcre
2
grep: recursive filename is too long
\n
"
);
fprintf
(
stderr
,
"pcregrep: recursive filename is too long
\n
"
);
rc
=
2
;
break
;
}
...
...
@@ -3034,7 +3034,7 @@ LC_ALL environment variable is set, and if so, use it. */
if
(
locale
==
NULL
)
{
locale
=
getenv
(
"LC_ALL"
);
locale_from
=
"LC
C
_ALL"
;
locale_from
=
"LC_ALL"
;
}
if
(
locale
==
NULL
)
...
...
pcre/testdata/testinput1
View file @
52eb4f17
...
...
@@ -5742,4 +5742,19 @@ AbcdCBefgBhiBqz
/X+(?#comment)?/
>XXX<
/ (?<word> \w+ )* \. /xi
pokus.
/(?(DEFINE) (?<word> \w+ ) ) (?&word)* \./xi
pokus.
/(?(DEFINE) (?<word> \w+ ) ) ( (?&word)* ) \./xi
pokus.
/(?&word)* (?(DEFINE) (?<word> \w+ ) ) \./xi
pokus.
/(?&word)* \. (?<word> \w+ )/xi
pokus.hokus
/-- End of testinput1 --/
pcre/testdata/testinput2
View file @
52eb4f17
...
...
@@ -4257,4 +4257,7 @@ backtracking verbs. --/
ab
aaab
/(?(?=^))b/
abc
/-- End of testinput2 --/
pcre/testdata/testinput4
View file @
52eb4f17
...
...
@@ -727,4 +727,7 @@
/\C(\W?ſ)'?{{/8
\\C(\\W?ſ)'?{{
/[^\x{100}-\x{ffff}]*[\x80-\xff]/8
\x{99}\x{99}\x{99}
/-- End of testinput4 --/
pcre/testdata/testoutput1
View file @
52eb4f17
...
...
@@ -9446,4 +9446,28 @@ No match
>XXX<
0: X
/ (?<word> \w+ )* \. /xi
pokus.
0: pokus.
1: pokus
/(?(DEFINE) (?<word> \w+ ) ) (?&word)* \./xi
pokus.
0: pokus.
/(?(DEFINE) (?<word> \w+ ) ) ( (?&word)* ) \./xi
pokus.
0: pokus.
1: <unset>
2: pokus
/(?&word)* (?(DEFINE) (?<word> \w+ ) ) \./xi
pokus.
0: pokus.
/(?&word)* \. (?<word> \w+ )/xi
pokus.hokus
0: pokus.hokus
1: hokus
/-- End of testinput1 --/
pcre/testdata/testoutput2
View file @
52eb4f17
...
...
@@ -14721,4 +14721,8 @@ No need char
0: ab
1: a
/(?(?=^))b/
abc
0: b
/-- End of testinput2 --/
pcre/testdata/testoutput4
View file @
52eb4f17
...
...
@@ -1277,4 +1277,8 @@ No match
\\C(\\W?ſ)'?{{
No match
/[^\x{100}-\x{ffff}]*[\x80-\xff]/8
\x{99}\x{99}\x{99}
0: \x{99}\x{99}\x{99}
/-- End of testinput4 --/
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment