Merge branch 'merge-pcre' into 10.0

48636f09 · Sergei Golubchik · 5ae2656b · cf242ade · 48636f09 · 48636f09
Commit 48636f09 authored Apr 26, 2018 by Sergei Golubchik
19 changed files
--- a/pcre/AUTHORS
+++ b/pcre/AUTHORS
@@ -8,7 +8,7 @@ Email domain:     cam.ac.uk
 University of Cambridge Computing Service,
 Cambridge, England.

-Copyright (c) 1997-2017 University of Cambridge
+Copyright (c) 1997-2018 University of Cambridge
 All rights reserved


@@ -19,7 +19,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2010-2017 Zoltan Herczeg
+Copyright(c) 2010-2018 Zoltan Herczeg
 All rights reserved.


@@ -30,7 +30,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2009-2017 Zoltan Herczeg
+Copyright(c) 2009-2018 Zoltan Herczeg
 All rights reserved.



--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
@@ -4,6 +4,59 @@ ChangeLog for PCRE
 Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
 development is happening in the PCRE2 10.xx series.

+
+Version 8.42 20-March-2018
+--------------------------
+
+1.  Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard.
+
+2.  Fixed outdated real_pcre definitions in pcre.h.in (patch by Evgeny Kotkov).
+
+3.  pcregrep was truncating components of file names to 128 characters when
+processing files with the -r option, and also (some very odd code) truncating
+path names to 512 characters. There is now a check on the absolute length of
+full path file names, which may be up to 2047 characters long.
+
+4.  Using pcre_dfa_exec(), in UTF mode when UCP support was not defined, there
+was the possibility of a false positive match when caselessly matching a "not
+this character" item such as [^\x{1234}] (with a code point greater than 127)
+because the "other case" variable was not being initialized.
+
+5. Although pcre_jit_exec checks whether the pattern is compiled
+in a given mode, it was also expected that at least one mode is available.
+This is fixed and pcre_jit_exec returns with PCRE_ERROR_JIT_BADOPTION
+when the pattern is not optimized by JIT at all.
+
+6. The line number and related variables such as match counts in pcregrep
+were all int variables, causing overflow when files with more than 2147483647
+lines were processed (assuming 32-bit ints). They have all been changed to
+unsigned long ints.
+
+7. If a backreference with a minimum repeat count of zero was first in a
+pattern, apart from assertions, an incorrect first matching character could be
+recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
+as the first character of a match.
+
+8. Fix out-of-bounds read for partial matching of /./ against an empty string
+when the newline type is CRLF.
+
+9. When matching using the the REG_STARTEND feature of the POSIX API with a
+non-zero starting offset, unset capturing groups with lower numbers than a
+group that did capture something were not being correctly returned as "unset"
+(that is, with offset values of -1).
+
+10. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string
+containing multi-code-unit characters caused bad behaviour and possibly a
+crash. This issue was fixed for other kinds of repeat in release 8.37 by change
+38, but repeating character classes were overlooked.
+
+11. A small fix to pcregrep to avoid compiler warnings for -Wformat-overflow=2.
+
+12. Added --enable-jit=auto support to configure.ac.
+
+13. Fix misleading error message in configure.ac.
+
+
 Version 8.41 05-July-2017
 -------------------------


--- a/pcre/INSTALL
+++ b/pcre/INSTALL
--- a/pcre/LICENCE
+++ b/pcre/LICENCE
@@ -25,7 +25,7 @@ Email domain:     cam.ac.uk
 University of Cambridge Computing Service,
 Cambridge, England.

-Copyright (c) 1997-2017 University of Cambridge
+Copyright (c) 1997-2018 University of Cambridge
 All rights reserved.


@@ -36,7 +36,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2010-2017 Zoltan Herczeg
+Copyright(c) 2010-2018 Zoltan Herczeg
 All rights reserved.


@@ -47,7 +47,7 @@ Written by:       Zoltan Herczeg
 Email local part: hzmester
 Emain domain:     freemail.hu

-Copyright(c) 2009-2017 Zoltan Herczeg
+Copyright(c) 2009-2018 Zoltan Herczeg
 All rights reserved.



--- a/pcre/NEWS
+++ b/pcre/NEWS
 News about PCRE releases
 ------------------------

+Release 8.42 20-March-2018
+--------------------------
+
+This is a bug-fix release.
+
+
 Release 8.41 13-June-2017
 -------------------------


--- a/pcre/NON-AUTOTOOLS-BUILD
+++ b/pcre/NON-AUTOTOOLS-BUILD
@@ -760,13 +760,14 @@ The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and
 applications can be supported through UNIX System Services, and in such an
 environment PCRE can be built in the same way as in other systems. However, in
 native z/OS (without UNIX System Services) and in z/VM, special ports are
-required. For details, please see this web site:
+required. PCRE1 version 8.39 is available in file 882 on this site:

-  http://www.zaconsultants.net
+  http://www.cbttape.org

-You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
-executable, is in EBCDIC and native z/OS file formats and this is the
-recommended download site.
+Everything, source and executable, is in EBCDIC and native z/OS file formats.
+However, this software is not maintained and will not be upgraded. If you are
+new to PCRE you should be looking at PCRE2 (version 10.30 or later).

-==========================
-Last Updated: 25 June 2015
+===============================
+Last Updated: 13 September 2017
+===============================
--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@@ -9,18 +9,18 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.

 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [41])
+m4_define(pcre_minor, [42])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2017-07-05])
+m4_define(pcre_date, [2018-03-20])

 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:9:2])
-m4_define(libpcre16_version, [2:9:2])
-m4_define(libpcre32_version, [0:9:0])
-m4_define(libpcreposix_version, [0:5:0])
+m4_define(libpcre_version, [3:10:2])
+m4_define(libpcre16_version, [2:10:2])
+m4_define(libpcre32_version, [0:10:0])
+m4_define(libpcreposix_version, [0:6:0])
 m4_define(libpcrecpp_version, [0:1:0])

 AC_PREREQ(2.57)
@@ -155,6 +155,18 @@ AC_ARG_ENABLE(jit,
                             [enable Just-In-Time compiling support]),
              , enable_jit=no)

+# This code enables JIT if the hardware supports it.
+
+if test "$enable_jit" = "auto"; then
+  AC_LANG(C)
+  AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
+  #define SLJIT_CONFIG_AUTO 1
+  #include "sljit/sljitConfigInternal.h"
+  #if (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED)
+  #error unsupported
+  #endif]])], enable_jit=yes, enable_jit=no)
+fi
+
 # Handle --disable-pcregrep-jit (enabled by default)
 AC_ARG_ENABLE(pcregrep-jit,
              AS_HELP_STRING([--disable-pcregrep-jit],
@@ -469,7 +481,7 @@ pcre_have_type_traits="0"
 pcre_have_bits_type_traits="0"

 if test "x$enable_cpp" = "xyes" -a -z "$CXX"; then
-   AC_MSG_ERROR([You need a C++ compiler for C++ support.])
+   AC_MSG_ERROR([Invalid C++ compiler or C++ compiler flags])
 fi

 if test "x$enable_cpp" = "xyes" -a -n "$CXX"

--- a/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
@@ -760,13 +760,14 @@ The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and
 applications can be supported through UNIX System Services, and in such an
 environment PCRE can be built in the same way as in other systems. However, in
 native z/OS (without UNIX System Services) and in z/VM, special ports are
-required. For details, please see this web site:
+required. PCRE1 version 8.39 is available in file 882 on this site:

-  http://www.zaconsultants.net
+  http://www.cbttape.org

-You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
-executable, is in EBCDIC and native z/OS file formats and this is the
-recommended download site.
+Everything, source and executable, is in EBCDIC and native z/OS file formats.
+However, this software is not maintained and will not be upgraded. If you are
+new to PCRE you should be looking at PCRE2 (version 10.30 or later).

-==========================
-Last Updated: 25 June 2015
+===============================
+Last Updated: 13 September 2017
+===============================
--- a/pcre/pcre.h.in
+++ b/pcre/pcre.h.in
@@ -321,11 +321,11 @@ these bits, just add new ones on the end, in order to remain compatible. */

 /* Types */

-struct real_pcre;                 /* declaration; the definition is private  */
-typedef struct real_pcre pcre;
+struct real_pcre8_or_16;          /* declaration; the definition is private  */
+typedef struct real_pcre8_or_16 pcre;

-struct real_pcre16;               /* declaration; the definition is private  */
-typedef struct real_pcre16 pcre16;
+struct real_pcre8_or_16;          /* declaration; the definition is private  */
+typedef struct real_pcre8_or_16 pcre16;

 struct real_pcre32;               /* declaration; the definition is private  */
 typedef struct real_pcre32 pcre32;

--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
@@ -8063,7 +8063,7 @@ for (;; ptr++)
        single group (i.e. not to a duplicated name. */

        HANDLE_REFERENCE:
-        if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
+        if (firstcharflags == REQ_UNSET) zerofirstcharflags = firstcharflags = REQ_NONE;
        previous = code;
        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((options & PCRE_CASELESS) != 0)? OP_REFI : OP_REF;

--- a/pcre/pcre_dfa_exec.c
+++ b/pcre/pcre_dfa_exec.c
@@ -2287,12 +2287,14 @@ for (;;)
      case OP_NOTI:
      if (clen > 0)
        {
-        unsigned int otherd;
+        pcre_uint32 otherd;
 #ifdef SUPPORT_UTF
        if (utf && d >= 128)
          {
 #ifdef SUPPORT_UCP
          otherd = UCD_OTHERCASE(d);
+#else
+          otherd = d;
 #endif  /* SUPPORT_UCP */
          }
        else

--- a/pcre/pcre_exec.c
+++ b/pcre/pcre_exec.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2014 University of Cambridge
+           Copyright (c) 1997-2018 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -2313,7 +2313,7 @@ for (;;)
    case OP_ANY:
    if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH);
    if (md->partial != 0 &&
-        eptr + 1 >= md->end_subject &&
+        eptr == md->end_subject - 1 &&
        NLBLOCK->nltype == NLTYPE_FIXED &&
        NLBLOCK->nllen == 2 &&
        UCHAR21TEST(eptr) == NLBLOCK->nl[0])
@@ -3061,7 +3061,7 @@ for (;;)
            {
            RMATCH(eptr, ecode, offset_top, md, eptrb, RM18);
            if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (eptr-- == pp) break;        /* Stop if tried at original pos */
+            if (eptr-- <= pp) break;        /* Stop if tried at original pos */
            BACKCHAR(eptr);
            }
          }
@@ -3218,7 +3218,7 @@ for (;;)
          {
          RMATCH(eptr, ecode, offset_top, md, eptrb, RM21);
          if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (eptr-- == pp) break;        /* Stop if tried at original pos */
+          if (eptr-- <= pp) break;        /* Stop if tried at original pos */
 #ifdef SUPPORT_UTF
          if (utf) BACKCHAR(eptr);
 #endif

--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@@ -1387,8 +1387,8 @@ Returns:            nothing
 */

 static void
-do_after_lines(int lastmatchnumber, char *lastmatchrestart, char *endptr,
-  char *printname)
+do_after_lines(unsigned long int lastmatchnumber, char *lastmatchrestart,
+  char *endptr, char *printname)
 {
 if (after_context > 0 && lastmatchnumber > 0)
  {
@@ -1398,7 +1398,7 @@ if (after_context > 0 && lastmatchnumber > 0)
    int ellength;
    char *pp = lastmatchrestart;
    if (printname != NULL) fprintf(stdout, "%s-", printname);
-    if (number) fprintf(stdout, "%d-", lastmatchnumber++);
+    if (number) fprintf(stdout, "%lu-", lastmatchnumber++);
    pp = end_of_line(pp, endptr, &ellength);
    FWRITE(lastmatchrestart, 1, pp - lastmatchrestart, stdout);
    lastmatchrestart = pp;
@@ -1502,11 +1502,11 @@ static int
 pcregrep(void *handle, int frtype, char *filename, char *printname)
 {
 int rc = 1;
-int linenumber = 1;
-int lastmatchnumber = 0;
-int count = 0;
 int filepos = 0;
 int offsets[OFFSET_SIZE];
+unsigned long int linenumber = 1;
+unsigned long int lastmatchnumber = 0;
+unsigned long int count = 0;
 char *lastmatchrestart = NULL;
 char *ptr = main_buffer;
 char *endptr;
@@ -1609,7 +1609,7 @@ while (ptr < endptr)

  if (endlinelength == 0 && t == main_buffer + bufsize)
    {
-    fprintf(stderr, "pcregrep: line %d%s%s is too long for the internal buffer\n"
+    fprintf(stderr, "pcregrep: line %lu%s%s is too long for the internal buffer\n"
                    "pcregrep: check the --buffer-size option\n",
                    linenumber,
                    (filename == NULL)? "" : " of file ",
@@ -1747,7 +1747,7 @@ while (ptr < endptr)
          prevoffsets[1] = offsets[1];

          if (printname != NULL) fprintf(stdout, "%s:", printname);
-          if (number) fprintf(stdout, "%d:", linenumber);
+          if (number) fprintf(stdout, "%lu:", linenumber);

          /* Handle --line-offsets */

@@ -1862,7 +1862,7 @@ while (ptr < endptr)
          {
          char *pp = lastmatchrestart;
          if (printname != NULL) fprintf(stdout, "%s-", printname);
-          if (number) fprintf(stdout, "%d-", lastmatchnumber++);
+          if (number) fprintf(stdout, "%lu-", lastmatchnumber++);
          pp = end_of_line(pp, endptr, &ellength);
          FWRITE(lastmatchrestart, 1, pp - lastmatchrestart, stdout);
          lastmatchrestart = pp;
@@ -1902,7 +1902,7 @@ while (ptr < endptr)
          int ellength;
          char *pp = p;
          if (printname != NULL) fprintf(stdout, "%s-", printname);
-          if (number) fprintf(stdout, "%d-", linenumber - linecount--);
+          if (number) fprintf(stdout, "%lu-", linenumber - linecount--);
          pp = end_of_line(pp, endptr, &ellength);
          FWRITE(p, 1, pp - p, stdout);
          p = pp;
@@ -1916,7 +1916,7 @@ while (ptr < endptr)
        endhyphenpending = TRUE;

      if (printname != NULL) fprintf(stdout, "%s:", printname);
-      if (number) fprintf(stdout, "%d:", linenumber);
+      if (number) fprintf(stdout, "%lu:", linenumber);

      /* In multiline mode, we want to print to the end of the line in which
      the end of the matched string is found, so we adjust linelength and the
@@ -2112,7 +2112,7 @@ if (count_only && !quiet)
    {
    if (printname != NULL && filenames != FN_NONE)
      fprintf(stdout, "%s:", printname);
-    fprintf(stdout, "%d\n", count);
+    fprintf(stdout, "%lu\n", count);
    }
  }

@@ -2234,7 +2234,7 @@ if (isdirectory(pathname))

  if (dee_action == dee_RECURSE)
    {
-    char buffer[1024];
+    char buffer[2048];
    char *nextfile;
    directory_type *dir = opendirectory(pathname);

@@ -2249,7 +2249,14 @@ if (isdirectory(pathname))
    while ((nextfile = readdirectory(dir)) != NULL)
      {
      int frc;
-      sprintf(buffer, "%.512s%c%.128s", pathname, FILESEP, nextfile);
+      int fnlength = strlen(pathname) + strlen(nextfile) + 2;
+      if (fnlength > 2048)
+        {
+        fprintf(stderr, "pcre2grep: recursive filename is too long\n");
+        rc = 2;
+        break;
+        }
+      sprintf(buffer, "%s%c%s", pathname, FILESEP, nextfile);
      frc = grep_or_recurse(buffer, dir_recurse, FALSE);
      if (frc > 1) rc = frc;
       else if (frc == 0 && rc == 1) rc = 0;
@@ -2520,7 +2527,14 @@ if ((popts & PO_FIXED_STRINGS) != 0)
    }
  }

-sprintf(buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]);
+if (snprintf(buffer, PATBUFSIZE, "%s%.*s%s", prefix[popts], patlen, ps,
+      suffix[popts]) > PATBUFSIZE)
+  {
+  fprintf(stderr, "pcregrep: Buffer overflow while compiling \"%s\"\n",
+    ps);
+  return FALSE;
+  }
+
 p->compiled = pcre_compile(buffer, options, &error, &errptr, pcretables);
 if (p->compiled != NULL) return TRUE;

@@ -2756,8 +2770,15 @@ for (i = 1; i < argc; i++)
        int arglen = (argequals == NULL || equals == NULL)?
          (int)strlen(arg) : (int)(argequals - arg);

-        sprintf(buff1, "%.*s", baselen, op->long_name);
-        sprintf(buff2, "%s%.*s", buff1, fulllen - baselen - 2, opbra + 1);
+        if (snprintf(buff1, sizeof(buff1), "%.*s", baselen, op->long_name) >
+              (int)sizeof(buff1) ||
+            snprintf(buff2, sizeof(buff2), "%s%.*s", buff1,
+              fulllen - baselen - 2, opbra + 1) > (int)sizeof(buff2))
+          {
+          fprintf(stderr, "pcregrep: Buffer overflow when parsing %s option\n",
+            op->long_name);
+          pcregrep_exit(2);
+          }

        if (strncmp(arg, buff1, arglen) == 0 ||
           strncmp(arg, buff2, arglen) == 0)

--- a/pcre/pcreposix.c
+++ b/pcre/pcreposix.c
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2017 University of Cambridge
+           Copyright (c) 1997-2018 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -389,8 +389,8 @@ if (rc >= 0)
    {
    for (i = 0; i < (size_t)rc; i++)
      {
-      pmatch[i].rm_so = ovector[i*2] + so;
-      pmatch[i].rm_eo = ovector[i*2+1] + so;
+      pmatch[i].rm_so = (ovector[i*2] < 0)? -1 : ovector[i*2] + so;
+      pmatch[i].rm_eo = (ovector[i*2+1] < 0)? -1: ovector[i*2+1] + so;
      }
    if (allocated_ovector) free(ovector);
    for (; i < nmatch; i++) pmatch[i].rm_so = pmatch[i].rm_eo = -1;

--- a/pcre/testdata/testinput2
+++ b/pcre/testdata/testinput2
@@ -4249,4 +4249,12 @@ backtracking verbs. --/

 /(?=.*[A-Z])/I

+"(?<=(a))\1?b"
+    ab
+    aaab 
+
+"(?=(a))\1?b"
+    ab
+    aaab 
+
 /-- End of testinput2 --/
--- a/pcre/testdata/testinput5
+++ b/pcre/testdata/testinput5
@@ -798,4 +798,10 @@
 /(?<=\K\x{17f})/8G+
    \x{17f}\x{17f}\x{17f}\x{17f}\x{17f}

+/\C[^\v]+\x80/8
+    [AΏBŀC]
+
+/\C[^\d]+\x80/8
+    [AΏBŀC]
+
 /-- End of testinput5 --/

--- a/pcre/testdata/testoutput2
+++ b/pcre/testdata/testoutput2
@@ -14705,4 +14705,20 @@ No options
 No first char
 No need char

+"(?<=(a))\1?b"
+    ab
+ 0: b
+ 1: a
+    aaab 
+ 0: ab
+ 1: a
+
+"(?=(a))\1?b"
+    ab
+ 0: ab
+ 1: a
+    aaab 
+ 0: ab
+ 1: a
+
 /-- End of testinput2 --/
--- a/pcre/testdata/testoutput5
+++ b/pcre/testdata/testoutput5
@@ -1942,4 +1942,12 @@ Need char = 'z'
 0: \x{17f}
 0+ 

+/\C[^\v]+\x80/8
+    [AΏBŀC]
+No match
+
+/\C[^\d]+\x80/8
+    [AΏBŀC]
+No match
+
 /-- End of testinput5 --/