Merge branch 'merge-pcre' into 10.0

1cac281e · Vicențiu Ciorbaru · 895b2539 · dfd77491 · 1cac281e · 1cac281e
Commit 1cac281e authored Mar 05, 2017 by Vicențiu Ciorbaru
30 changed files
--- a/pcre/AUTHORS
+++ b/pcre/AUTHORS
@@ -8,7 +8,7 @@ Email domain:     cam.ac.uk
 University of Cambridge Computing Service,
 Cambridge, England.

-Copyright (c) 1997-2016 University of Cambridge
+Copyright (c) 1997-2017 University of Cambridge
 All rights reserved


@@ -19,7 +19,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2010-2016 Zoltan Herczeg
+Copyright(c) 2010-2017 Zoltan Herczeg
 All rights reserved.


@@ -30,7 +30,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2009-2016 Zoltan Herczeg
+Copyright(c) 2009-2017 Zoltan Herczeg
 All rights reserved.



--- a/pcre/CMakeLists.txt
+++ b/pcre/CMakeLists.txt
@@ -66,6 +66,7 @@
 # 2013-10-08 PH got rid of the "source" command, which is a bash-ism (use ".")
 # 2013-11-05 PH added support for PARENS_NEST_LIMIT
 # 2016-03-01 PH applied Chris Wilson's patch for MSVC static build
+# 2016-06-24 PH applied Chris Wilson's revised patch (adds a separate option)

 PROJECT(PCRE C CXX)


--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
@@ -4,6 +4,53 @@ ChangeLog for PCRE
 Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
 development is happening in the PCRE2 10.xx series.

+Version 8.40 11-January-2017
+----------------------------
+
+1.  Using -o with -M in pcregrep could cause unnecessary repeated output when
+    the match extended over a line boundary.
+
+2.  Applied Chris Wilson's second patch (Bugzilla #1681) to CMakeLists.txt for
+    MSVC static compilation, putting the first patch under a new option.
+
+3.  Fix register overwite in JIT when SSE2 acceleration is enabled.
+
+4.  Ignore "show all captures" (/=) for DFA matching.
+
+5.  Fix JIT unaligned accesses on x86. Patch by Marc Mutz.
+
+6.  In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode),
+    without PCRE_UCP set, a negative character type such as \D in a positive
+    class should cause all characters greater than 255 to match, whatever else
+    is in the class. There was a bug that caused this not to happen if a
+    Unicode property item was added to such a class, for example [\D\P{Nd}] or
+    [\W\pL].
+
+7.  When pcretest was outputing information from a callout, the caret indicator
+    for the current position in the subject line was incorrect if it was after
+    an escape sequence for a character whose code point was greater than
+    \x{ff}.
+
+8.  A pattern such as (?<RA>abc)(?(R)xyz) was incorrectly compiled such that
+    the conditional was interpreted as a reference to capturing group 1 instead
+    of a test for recursion. Any group whose name began with R was
+    misinterpreted in this way. (The reference interpretation should only
+    happen if the group's name is precisely "R".)
+
+9.  A number of bugs have been mended relating to match start-up optimizations
+    when the first thing in a pattern is a positive lookahead. These all
+    applied only when PCRE_NO_START_OPTIMIZE was *not* set:
+
+    (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed
+        both an initial 'X' and a following 'X'.
+    (b) Some patterns starting with an assertion that started with .* were
+        incorrectly optimized as having to match at the start of the subject or
+        after a newline. There are cases where this is not true, for example,
+        (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that
+        start with spaces. Starting .* in an assertion is no longer taken as an
+        indication of matching at the start (or after a newline).
+
+
 Version 8.39 14-June-2016
 -------------------------


--- a/pcre/LICENCE
+++ b/pcre/LICENCE
@@ -25,7 +25,7 @@ Email domain:     cam.ac.uk
 University of Cambridge Computing Service,
 Cambridge, England.

-Copyright (c) 1997-2016 University of Cambridge
+Copyright (c) 1997-2017 University of Cambridge
 All rights reserved.


@@ -36,7 +36,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2010-2016 Zoltan Herczeg
+Copyright(c) 2010-2017 Zoltan Herczeg
 All rights reserved.


@@ -47,7 +47,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2009-2016 Zoltan Herczeg
+Copyright(c) 2009-2017 Zoltan Herczeg
 All rights reserved.



--- a/pcre/NEWS
+++ b/pcre/NEWS
 News about PCRE releases
 ------------------------

+Release 8.40 11-January-2017
+----------------------------
+
+This is a bug-fix release.
+
+
 Release 8.39 14-June-2016
 -------------------------


--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.

 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [39])
+m4_define(pcre_minor, [40])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2016-06-14])
+m4_define(pcre_date, [2017-01-11])

 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:7:2])
-m4_define(libpcre16_version, [2:7:2])
-m4_define(libpcre32_version, [0:7:0])
+m4_define(libpcre_version, [3:8:2])
+m4_define(libpcre16_version, [2:8:2])
+m4_define(libpcre32_version, [0:8:0])
 m4_define(libpcreposix_version, [0:4:0])
 m4_define(libpcrecpp_version, [0:1:0])


--- a/pcre/doc/html/pcrecompat.html
+++ b/pcre/doc/html/pcrecompat.html
@@ -128,7 +128,7 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
 14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE
 works internally just with numbers, using an external table to translate
-between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b)B),
+between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B),
 where the two capturing parentheses have the same number but different names,
 is not supported, and causes an error at compile time. If it were allowed, it
 would not be possible to distinguish which parentheses matched, because both

--- a/pcre/doc/html/pcrepattern.html
+++ b/pcre/doc/html/pcrepattern.html
@@ -358,24 +358,24 @@ When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
 generate the appropriate EBCDIC code values. The \c escape is processed
 as specified for Perl in the <b>perlebcdic</b> document. The only characters
 that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
-other character provokes a compile-time error. The sequence \@ encodes
-character code 0; the letters (in either case) encode characters 1-26 (hex 01
-to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-\? becomes either 255 (hex FF) or 95 (hex 5F).
+other character provokes a compile-time error. The sequence \c@ encodes
+character code 0; after \c the letters (in either case) encode characters 1-26
+(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex
+1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F).
 </P>
 <P>
-Thus, apart from \?, these escapes generate the same character code values as
+Thus, apart from \c?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
-differ. For example, \G always generates code value 7, which is BEL in ASCII
+differ. For example, \cG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 </P>
 <P>
-The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
-values, PCRE makes \? generate 95; otherwise it generates 255.
+values, PCRE makes \c? generate 95; otherwise it generates 255.
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
@@ -1512,13 +1512,8 @@ J, U and X respectively.
 <P>
 When one of these option changes occurs at top level (that is, not inside
 subpattern parentheses), the change applies to the remainder of the pattern
-that follows. If the change is placed right at the start of a pattern, PCRE
-extracts it into the global options (and it will therefore show up in data
-extracted by the <b>pcre_fullinfo()</b> function).
-</P>
-<P>
-An option change within a subpattern (see below for a description of
-subpatterns) affects only that part of the subpattern that follows it, so
+that follows. An option change within a subpattern (see below for a description
+of subpatterns) affects only that part of the subpattern that follows it, so
 <pre>
  (a(?i)b)c
 </pre>
@@ -2160,6 +2155,14 @@ capturing is carried out only for positive assertions. (Perl sometimes, but not
 always, does do capturing in negative assertions.)
 </P>
 <P>
+WARNING: If a positive assertion containing one or more capturing subpatterns
+succeeds, but failure to match later in the pattern causes backtracking over
+this assertion, the captures within the assertion are reset only if no higher
+numbered captures are already set. This is, unfortunately, a fundamental
+limitation of the current implementation, and as PCRE1 is now in
+maintenance-only status, it is unlikely ever to change.
+</P>
+<P>
 For compatibility with Perl, assertion subpatterns may be repeated; though
 it makes no sense to assert the same thing several times, the side effect of
 capturing parentheses may occasionally be useful. In practice, there only three
@@ -3264,9 +3267,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 June 2015
+Last updated: 23 October 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.

--- a/pcre/doc/pcre.txt
+++ b/pcre/doc/pcre.txt
--- a/pcre/doc/pcrecompat.3
+++ b/pcre/doc/pcrecompat.3
@@ -113,7 +113,7 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
 14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE
 works internally just with numbers, using an external table to translate
-between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B),
+between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B),
 where the two capturing parentheses have the same number but different names,
 is not supported, and causes an error at compile time. If it were allowed, it
 would not be possible to distinguish which parentheses matched, because both

--- a/pcre/doc/pcrepattern.3
+++ b/pcre/doc/pcrepattern.3
-.TH PCREPATTERN 3 "14 June 2015" "PCRE 8.38"
+.TH PCREPATTERN 3 "23 October 2016" "PCRE 8.40"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -336,22 +336,22 @@ When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
 generate the appropriate EBCDIC code values. The \ec escape is processed
 as specified for Perl in the \fBperlebcdic\fP document. The only characters
 that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
-other character provokes a compile-time error. The sequence \e@ encodes
-character code 0; the letters (in either case) encode characters 1-26 (hex 01
-to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-\e? becomes either 255 (hex FF) or 95 (hex 5F).
+other character provokes a compile-time error. The sequence \ec@ encodes
+character code 0; after \ec the letters (in either case) encode characters 1-26
+(hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex
+1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F).
 .P
-Thus, apart from \e?, these escapes generate the same character code values as
+Thus, apart from \ec?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
-differ. For example, \eG always generates code value 7, which is BEL in ASCII
+differ. For example, \ecG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 .P
-The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
+The sequence \ec? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
-values, PCRE makes \e? generate 95; otherwise it generates 255.
+values, PCRE makes \ec? generate 95; otherwise it generates 255.
 .P
 After \e0 up to two further octal digits are read. If there are fewer than two
 digits, just those that are present are used. Thus the sequence \e0\ex\e015
@@ -1511,12 +1511,8 @@ J, U and X respectively.
 .P
 When one of these option changes occurs at top level (that is, not inside
 subpattern parentheses), the change applies to the remainder of the pattern
-that follows. If the change is placed right at the start of a pattern, PCRE
-extracts it into the global options (and it will therefore show up in data
-extracted by the \fBpcre_fullinfo()\fP function).
-.P
-An option change within a subpattern (see below for a description of
-subpatterns) affects only that part of the subpattern that follows it, so
+that follows. An option change within a subpattern (see below for a description
+of subpatterns) affects only that part of the subpattern that follows it, so
 .sp
  (a(?i)b)c
 .sp
@@ -2171,6 +2167,13 @@ numbering the capturing subpatterns in the whole pattern. However, substring
 capturing is carried out only for positive assertions. (Perl sometimes, but not
 always, does do capturing in negative assertions.)
 .P
+WARNING: If a positive assertion containing one or more capturing subpatterns
+succeeds, but failure to match later in the pattern causes backtracking over
+this assertion, the captures within the assertion are reset only if no higher
+numbered captures are already set. This is, unfortunately, a fundamental
+limitation of the current implementation, and as PCRE1 is now in
+maintenance-only status, it is unlikely ever to change.
+.P
 For compatibility with Perl, assertion subpatterns may be repeated; though
 it makes no sense to assert the same thing several times, the side effect of
 capturing parentheses may occasionally be useful. In practice, there only three
@@ -3296,6 +3299,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 14 June 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 23 October 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
@@ -5579,6 +5579,34 @@ for (;; ptr++)
 #endif
 #if defined SUPPORT_UTF || !defined COMPILE_PCRE8
      {
+      /* For non-UCP wide characters, in a non-negative class containing \S or
+      similar (should_flip_negation is set), all characters greater than 255
+      must be in the class. */
+
+      if (
+#if defined COMPILE_PCRE8
+           utf &&
+#endif
+           should_flip_negation && !negate_class && (options & PCRE_UCP) == 0)
+        {
+        *class_uchardata++ = XCL_RANGE;
+        if (utf)   /* Will always be utf in the 8-bit library */
+          {
+          class_uchardata += PRIV(ord2utf)(0x100, class_uchardata);
+          class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
+          }
+        else       /* Can only happen for the 16-bit & 32-bit libraries */
+          {
+#if defined COMPILE_PCRE16
+          *class_uchardata++ = 0x100;
+          *class_uchardata++ = 0xffffu;
+#elif defined COMPILE_PCRE32
+          *class_uchardata++ = 0x100;
+          *class_uchardata++ = 0xffffffffu;
+#endif
+          }
+        }
+
      *class_uchardata++ = XCL_END;    /* Marks the end of extra data */
      *code++ = OP_XCLASS;
      code += LINK_SIZE;
@@ -6923,7 +6951,8 @@ for (;; ptr++)
        slot = cd->name_table;
        for (i = 0; i < cd->names_found; i++)
          {
-          if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) == 0) break;
+          if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) == 0 &&
+            slot[IMM2_SIZE+namelen] == 0) break;
          slot += cd->name_entry_size;
          }

@@ -7889,15 +7918,17 @@ for (;; ptr++)
        }
      }

-    /* For a forward assertion, we take the reqchar, if set. This can be
-    helpful if the pattern that follows the assertion doesn't set a different
-    char. For example, it's useful for /(?=abcde).+/. We can't set firstchar
-    for an assertion, however because it leads to incorrect effect for patterns
-    such as /(?=a)a.+/ when the "real" "a" would then become a reqchar instead
-    of a firstchar. This is overcome by a scan at the end if there's no
-    firstchar, looking for an asserted first char. */
-
-    else if (bravalue == OP_ASSERT && subreqcharflags >= 0)
+    /* For a forward assertion, we take the reqchar, if set, provided that the
+    group has also set a first char. This can be helpful if the pattern that
+    follows the assertion doesn't set a different char. For example, it's
+    useful for /(?=abcde).+/. We can't set firstchar for an assertion, however
+    because it leads to incorrect effect for patterns such as /(?=a)a.+/ when
+    the "real" "a" would then become a reqchar instead of a firstchar. This is
+    overcome by a scan at the end if there's no firstchar, looking for an
+    asserted first char. */
+
+    else if (bravalue == OP_ASSERT && subreqcharflags >= 0 &&
+             subfirstcharflags >= 0)
      {
      reqchar = subreqchar;
      reqcharflags = subreqcharflags;
@@ -8686,8 +8717,8 @@ matching and for non-DOTALL patterns that start with .* (which must start at
 the beginning or after \n). As in the case of is_anchored() (see above), we
 have to take account of back references to capturing brackets that contain .*
 because in that case we can't make the assumption. Also, the appearance of .*
-inside atomic brackets or in a pattern that contains *PRUNE or *SKIP does not
-count, because once again the assumption no longer holds.
+inside atomic brackets or in an assertion, or in a pattern that contains *PRUNE
+or *SKIP does not count, because once again the assumption no longer holds.

 Arguments:
  code           points to start of expression (the bracket)
@@ -8696,13 +8727,14 @@ count, because once again the assumption no longer holds.
                  the less precise approach
  cd             points to the compile data
  atomcount      atomic group level
+  inassert       TRUE if in an assertion

 Returns:         TRUE or FALSE
 */

 static BOOL
 is_startline(const pcre_uchar *code, unsigned int bracket_map,
-  compile_data *cd, int atomcount)
+  compile_data *cd, int atomcount, BOOL inassert)
 {
 do {
   const pcre_uchar *scode = first_significant_code(
@@ -8729,7 +8761,7 @@ do {
       return FALSE;

       default:     /* Assertion */
-       if (!is_startline(scode, bracket_map, cd, atomcount)) return FALSE;
+       if (!is_startline(scode, bracket_map, cd, atomcount, TRUE)) return FALSE;
       do scode += GET(scode, 1); while (*scode == OP_ALT);
       scode += 1 + LINK_SIZE;
       break;
@@ -8743,7 +8775,7 @@ do {
   if (op == OP_BRA  || op == OP_BRAPOS ||
       op == OP_SBRA || op == OP_SBRAPOS)
     {
-     if (!is_startline(scode, bracket_map, cd, atomcount)) return FALSE;
+     if (!is_startline(scode, bracket_map, cd, atomcount, inassert)) return FALSE;
     }

   /* Capturing brackets */
@@ -8753,33 +8785,33 @@ do {
     {
     int n = GET2(scode, 1+LINK_SIZE);
     int new_map = bracket_map | ((n < 32)? (1 << n) : 1);
-     if (!is_startline(scode, new_map, cd, atomcount)) return FALSE;
+     if (!is_startline(scode, new_map, cd, atomcount, inassert)) return FALSE;
     }

   /* Positive forward assertions */

   else if (op == OP_ASSERT)
     {
-     if (!is_startline(scode, bracket_map, cd, atomcount)) return FALSE;
+     if (!is_startline(scode, bracket_map, cd, atomcount, TRUE)) return FALSE;
     }

   /* Atomic brackets */

   else if (op == OP_ONCE || op == OP_ONCE_NC)
     {
-     if (!is_startline(scode, bracket_map, cd, atomcount + 1)) return FALSE;
+     if (!is_startline(scode, bracket_map, cd, atomcount + 1, inassert)) return FALSE;
     }

   /* .* means "start at start or after \n" if it isn't in atomic brackets or
-   brackets that may be referenced, as long as the pattern does not contain
-   *PRUNE or *SKIP, because these break the feature. Consider, for example,
-   /.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", i.e. not at the
-   start of a line. */
+   brackets that may be referenced or an assertion, as long as the pattern does
+   not contain *PRUNE or *SKIP, because these break the feature. Consider, for
+   example, /.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", i.e.
+   not at the start of a line. */

   else if (op == OP_TYPESTAR || op == OP_TYPEMINSTAR || op == OP_TYPEPOSSTAR)
     {
     if (scode[1] != OP_ANY || (bracket_map & cd->backref_map) != 0 ||
-         atomcount > 0 || cd->had_pruneorskip)
+         atomcount > 0 || cd->had_pruneorskip || inassert)
       return FALSE;
     }

@@ -9634,7 +9666,7 @@ if ((re->options & PCRE_ANCHORED) == 0)
      re->flags |= PCRE_FIRSTSET;
      }

-    else if (is_startline(codestart, 0, cd, 0)) re->flags |= PCRE_STARTLINE;
+    else if (is_startline(codestart, 0, cd, 0, FALSE)) re->flags |= PCRE_STARTLINE;
    }
  }


--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
@@ -4004,12 +4004,12 @@ sljit_emit_op_custom(compiler, instruction, 4);

 if (load_twice)
  {
-  OP1(SLJIT_MOV, TMP3, 0, TMP2, 0);
+  OP1(SLJIT_MOV, RETURN_ADDR, 0, TMP2, 0);
  instruction[3] = 0xc0 | (tmp2_ind << 3) | 1;
  sljit_emit_op_custom(compiler, instruction, 4);

  OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0);
-  OP1(SLJIT_MOV, TMP2, 0, TMP3, 0);
+  OP1(SLJIT_MOV, TMP2, 0, RETURN_ADDR, 0);
  }

 OP2(SLJIT_ASHR, TMP1, 0, TMP1, 0, TMP2, 0);

--- a/pcre/pcre_jit_test.c
+++ b/pcre/pcre_jit_test.c
@@ -687,6 +687,7 @@ static struct regression_test_case regression_test_cases[] = {
 	{ PCRE_FIRSTLINE | PCRE_NEWLINE_LF | PCRE_DOTALL, 0 | F_NOMATCH, "ab.", "ab" },
 	{ MUA | PCRE_FIRSTLINE, 1 | F_NOMATCH, "^[a-d0-9]", "\nxx\nd" },
 	{ PCRE_NEWLINE_ANY | PCRE_FIRSTLINE | PCRE_DOTALL, 0, "....a", "012\n0a" },
+	{ MUA | PCRE_FIRSTLINE, 0, "[aC]", "a" },

 	/* Recurse. */
 	{ MUA, 0, "(a)(?1)", "aa" },

--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@@ -1803,6 +1803,12 @@ while (ptr < endptr)
        match = FALSE;
        if (line_buffered) fflush(stdout);
        rc = 0;                      /* Had some success */
+
+        /* If the current match ended past the end of the line (only possible
+        in multiline mode), we are done with this line. */
+
+        if ((unsigned int)offsets[1] > linelength) goto END_ONE_MATCH;
+
        startoffset = offsets[1];    /* Restart after the match */
        if (startoffset <= oldstartoffset)
          {

--- a/pcre/pcretest.c
+++ b/pcre/pcretest.c
@@ -1982,6 +1982,7 @@ return(result);
 static int pchar(pcre_uint32 c, FILE *f)
 {
 int n = 0;
+char tempbuffer[16];
 if (PRINTOK(c))
  {
  if (f != NULL) fprintf(f, "%c", c);
@@ -2003,6 +2004,8 @@ if (c < 0x100)
  }

 if (f != NULL) n = fprintf(f, "\\x{%02x}", c);
+  else n = sprintf(tempbuffer, "\\x{%02x}", c);
+
 return n >= 0 ? n : 0;
 }

@@ -5042,7 +5045,7 @@ while (!done)

    if ((all_use_dfa || use_dfa) && find_match_limit)
      {
-      printf("**Match limit not relevant for DFA matching: ignored\n");
+      printf("** Match limit not relevant for DFA matching: ignored\n");
      find_match_limit = 0;
      }

@@ -5255,10 +5258,17 @@ while (!done)

        if (do_allcaps)
          {
-          if (new_info(re, NULL, PCRE_INFO_CAPTURECOUNT, &count) < 0)
-            goto SKIP_DATA;
-          count++;   /* Allow for full match */
-          if (count * 2 > use_size_offsets) count = use_size_offsets/2;
+          if (all_use_dfa || use_dfa)
+            {
+            fprintf(outfile, "** Show all captures ignored after DFA matching\n");
+            }
+          else
+           {
+            if (new_info(re, NULL, PCRE_INFO_CAPTURECOUNT, &count) < 0)
+              goto SKIP_DATA;
+            count++;   /* Allow for full match */
+            if (count * 2 > use_size_offsets) count = use_size_offsets/2;
+            }
          }

        /* Output the captured substrings. Note that, for the matched string,

--- a/pcre/testdata/testinput1
+++ b/pcre/testdata/testinput1
@@ -5733,4 +5733,10 @@ AbcdCBefgBhiBqz
 "(?|(\k'Pm')|(?'Pm'))"
    abcd

+/(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[,;:])(?=.{8,16})(?!.*[\s])/
+    \  Fred:099
+
+/(?=.*X)X$/ 
+    \  X
+     
 /-- End of testinput1 --/
--- a/pcre/testdata/testinput16
+++ b/pcre/testdata/testinput16
@@ -38,4 +38,30 @@
 /s+/i8SI
    SSss\x{17f}

+/[\W\p{Any}]/BZ
+    abc
+    123 
+
+/[\W\pL]/BZ
+    abc
+    ** Failers 
+    123
+    
+/[\D]/8
+    \x{1d7cf}
+
+/[\D\P{Nd}]/8
+    \x{1d7cf}
+
+/[^\D]/8
+    a9b
+    ** Failers
+    \x{1d7cf}
+
+/[^\D\P{Nd}]/8
+    a9b
+    \x{1d7cf}
+    ** Failers
+    \x{10000}
+
 /-- End of testinput16 --/
--- a/pcre/testdata/testinput19
+++ b/pcre/testdata/testinput19
@@ -25,4 +25,21 @@
 /s+/i8SI
    SSss\x{17f}

+/[\D]/8
+    \x{1d7cf}
+
+/[\D\P{Nd}]/8
+    \x{1d7cf}
+
+/[^\D]/8
+    a9b
+    ** Failers
+    \x{1d7cf}
+
+/[^\D\P{Nd}]/8
+    a9b
+    \x{1d7cf}
+    ** Failers
+    \x{10000}
+
 /-- End of testinput19 --/ 
--- a/pcre/testdata/testinput2
+++ b/pcre/testdata/testinput2
@@ -4243,4 +4243,10 @@ backtracking verbs. --/

 /\N(?(?C)0?!.)*/

+/(?<RA>abc)(?(R)xyz)/BZ
+
+/(?<R>abc)(?(R)xyz)/BZ
+
+/(?=.*[A-Z])/I
+
 /-- End of testinput2 --/
--- a/pcre/testdata/testinput6
+++ b/pcre/testdata/testinput6
@@ -1562,4 +1562,10 @@
    \x{389}
    \x{20ac}

+/(?=.*b)\pL/
+    11bb
+    
+/(?(?=.*b)(?=.*b)\pL|.*c)/
+    11bb
+
 /-- End of testinput6 --/
--- a/pcre/testdata/testinput7
+++ b/pcre/testdata/testinput7
@@ -838,15 +838,6 @@ of case for anything other than the ASCII letters. --/
 /^s?c/mi8I
    scat

-/[\W\p{Any}]/BZ
-    abc
-    123 
-
-/[\W\pL]/BZ
-    abc
-    ** Failers 
-    123     
-
 /a[[:punct:]b]/WBZ

 /a[[:punct:]b]/8WBZ

--- a/pcre/testdata/testinput8
+++ b/pcre/testdata/testinput8
@@ -4841,4 +4841,8 @@
    bbb
    aaa 

+/()()a+/O=
+    aaa\D
+    a\D
+
 /-- End of testinput8 --/
--- a/pcre/testdata/testoutput1
+++ b/pcre/testdata/testoutput1
@@ -9434,4 +9434,12 @@ No match
 0: 
 1: 

+/(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[,;:])(?=.{8,16})(?!.*[\s])/
+    \  Fred:099
+ 0: 
+
+/(?=.*X)X$/ 
+    \  X
+ 0: X
+     
 /-- End of testinput1 --/
--- a/pcre/testdata/testoutput16
+++ b/pcre/testdata/testoutput16
@@ -138,4 +138,56 @@ Starting chars: S s \xc5
    SSss\x{17f}
 0: SSss\x{17f}

+/[\W\p{Any}]/BZ
+------------------------------------------------------------------
+        Bra
+        [\x00-/:-@[-^`{-\xff\p{Any}]
+        Ket
+        End
+------------------------------------------------------------------
+    abc
+ 0: a
+    123 
+ 0: 1
+
+/[\W\pL]/BZ
+------------------------------------------------------------------
+        Bra
+        [\x00-/:-@[-^`{-\xff\p{L}]
+        Ket
+        End
+------------------------------------------------------------------
+    abc
+ 0: a
+    ** Failers 
+ 0: *
+    123
+No match
+    
+/[\D]/8
+    \x{1d7cf}
+ 0: \x{1d7cf}
+
+/[\D\P{Nd}]/8
+    \x{1d7cf}
+ 0: \x{1d7cf}
+
+/[^\D]/8
+    a9b
+ 0: 9
+    ** Failers
+No match
+    \x{1d7cf}
+No match
+
+/[^\D\P{Nd}]/8
+    a9b
+ 0: 9
+    \x{1d7cf}
+ 0: \x{1d7cf}
+    ** Failers
+No match
+    \x{10000}
+No match
+
 /-- End of testinput16 --/
--- a/pcre/testdata/testoutput19
+++ b/pcre/testdata/testoutput19
@@ -105,4 +105,30 @@ Starting chars: S s \xff
    SSss\x{17f}
 0: SSss\x{17f}

+/[\D]/8
+    \x{1d7cf}
+ 0: \x{1d7cf}
+
+/[\D\P{Nd}]/8
+    \x{1d7cf}
+ 0: \x{1d7cf}
+
+/[^\D]/8
+    a9b
+ 0: 9
+    ** Failers
+No match
+    \x{1d7cf}
+No match
+
+/[^\D\P{Nd}]/8
+    a9b
+ 0: 9
+    \x{1d7cf}
+ 0: \x{1d7cf}
+    ** Failers
+No match
+    \x{10000}
+No match
+
 /-- End of testinput19 --/ 
--- a/pcre/testdata/testoutput2
+++ b/pcre/testdata/testoutput2
@@ -9380,7 +9380,7 @@ No need char
 /(?(?=.*b).*b|^d)/I
 Capturing subpattern count = 0
 No options
-First char at start or follows newline
+No first char
 No need char

 /xyz/C
@@ -14670,4 +14670,39 @@ No match
 /\N(?(?C)0?!.)*/
 Failed: assertion expected after (?( or (?(?C) at offset 4

+/(?<RA>abc)(?(R)xyz)/BZ
+------------------------------------------------------------------
+        Bra
+        CBra 1
+        abc
+        Ket
+        Cond
+        Cond recurse any
+        xyz
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+/(?<R>abc)(?(R)xyz)/BZ
+------------------------------------------------------------------
+        Bra
+        CBra 1
+        abc
+        Ket
+        Cond
+      1 Cond ref
+        xyz
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+/(?=.*[A-Z])/I
+Capturing subpattern count = 0
+May match empty string
+No options
+No first char
+No need char
+
 /-- End of testinput2 --/
--- a/pcre/testdata/testoutput6
+++ b/pcre/testdata/testoutput6
@@ -2573,4 +2573,12 @@ No match
    \x{20ac}
 No match

+/(?=.*b)\pL/
+    11bb
+ 0: b
+    
+/(?(?=.*b)(?=.*b)\pL|.*c)/
+    11bb
+ 0: b
+
 /-- End of testinput6 --/
--- a/pcre/testdata/testoutput7
+++ b/pcre/testdata/testoutput7
@@ -2295,32 +2295,6 @@ Need char = 'c' (caseless)
    scat
 0: sc

-/[\W\p{Any}]/BZ
------------------------------------------------------------------
-        Bra
-        [\x00-/:-@[-^`{-\xff\p{Any}]
-        Ket
-        End
------------------------------------------------------------------
-    abc
- 0: a
-    123 
- 0: 1
-
-/[\W\pL]/BZ
------------------------------------------------------------------
-        Bra
-        [\x00-/:-@[-^`{-\xff\p{L}]
-        Ket
-        End
------------------------------------------------------------------
-    abc
- 0: a
-    ** Failers 
- 0: *
-    123     
-No match
-
 /a[[:punct:]b]/WBZ
 ------------------------------------------------------------------
        Bra

--- a/pcre/testdata/testoutput8
+++ b/pcre/testdata/testoutput8
@@ -7791,4 +7791,14 @@ Matched, but offsets vector is too small to show all matches
    aaa 
 No match

+/()()a+/O=
+    aaa\D
+** Show all captures ignored after DFA matching
+ 0: aaa
+ 1: aa
+ 2: a
+    a\D
+** Show all captures ignored after DFA matching
+ 0: a
+
 /-- End of testinput8 --/