X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~ian/git?a=blobdiff_plain;f=ChangeLog;h=a34f845f8a19d457fb06b60ec822c277d2c9ba22;hb=cef3458b870c4ae76061d6238197dcf96dc9690b;hp=7801ef841179c7ca8030646d62af4ea85e6e50db;hpb=dd986e8b547c0dde924c4b566ad0894ad4f1beb9;p=pcre3.git diff --git a/ChangeLog b/ChangeLog index 7801ef8..a34f845 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,6 +1,539 @@ ChangeLog for PCRE ------------------ +Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All +development is happening in the PCRE2 10.xx series. + +Version 8.39 14-June-2016 +------------------------- + +1. If PCRE_AUTO_CALLOUT was set on a pattern that had a (?# comment between + an item and its qualifier (for example, A(?#comment)?B) pcre_compile() + misbehaved. This bug was found by the LLVM fuzzer. + +2. Similar to the above, if an isolated \E was present between an item and its + qualifier when PCRE_AUTO_CALLOUT was set, pcre_compile() misbehaved. This + bug was found by the LLVM fuzzer. + +3. Further to 8.38/46, negated classes such as [^[:^ascii:]\d] were also not + working correctly in UCP mode. + +4. The POSIX wrapper function regexec() crashed if the option REG_STARTEND + was set when the pmatch argument was NULL. It now returns REG_INVARG. + +5. Allow for up to 32-bit numbers in the ordin() function in pcregrep. + +6. An empty \Q\E sequence between an item and its qualifier caused + pcre_compile() to misbehave when auto callouts were enabled. This bug was + found by the LLVM fuzzer. + +7. If a pattern that was compiled with PCRE_EXTENDED started with white + space or a #-type comment that was followed by (?-x), which turns off + PCRE_EXTENDED, and there was no subsequent (?x) to turn it on again, + pcre_compile() assumed that (?-x) applied to the whole pattern and + consequently mis-compiled it. This bug was found by the LLVM fuzzer. + +8. A call of pcre_copy_named_substring() for a named substring whose number + was greater than the space in the ovector could cause a crash. + +9. Yet another buffer overflow bug involved duplicate named groups with a + group that reset capture numbers (compare 8.38/7 below). Once again, I have + just allowed for more memory, even if not needed. (A proper fix is + implemented in PCRE2, but it involves a lot of refactoring.) + +10. pcre_get_substring_list() crashed if the use of \K in a match caused the + start of the match to be earlier than the end. + +11. Migrating appropriate PCRE2 JIT improvements to PCRE. + +12. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind + assertion, caused pcretest to generate incorrect output, and also to read + uninitialized memory (detected by ASAN or valgrind). + +13. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply + nested set of parentheses of sufficient size caused an overflow of the + compiling workspace (which was diagnosed, but of course is not desirable). + +14. And yet another buffer overflow bug involving duplicate named groups, this + time nested, with a nested back reference. Yet again, I have just allowed + for more memory, because anything more needs all the refactoring that has + been done for PCRE2. An example pattern that provoked this bug is: + /((?J)(?'R'(?'R'(?'R'(?'R'(?'R'(?|(\k'R'))))))))/ and the bug was + registered as CVE-2016-1283. + +15. pcretest went into a loop if global matching was requested with an ovector + size less than 2. It now gives an error message. This bug was found by + afl-fuzz. + +16. An invalid pattern fragment such as (?(?C)0 was not diagnosing an error + ("assertion expected") when (?(?C) was not followed by an opening + parenthesis. + +17. Fixed typo ("&&" for "&") in pcre_study(). Fortunately, this could not + actually affect anything, by sheer luck. + +18. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC + static compilation. + +19. Modified the RunTest script to incorporate a valgrind suppressions file so + that certain errors, provoked by the SSE2 instruction set when JIT is used, + are ignored. + +20. A racing condition is fixed in JIT reported by Mozilla. + +21. Minor code refactor to avoid "array subscript is below array bounds" + compiler warning. + +22. Minor code refactor to avoid "left shift of negative number" warning. + +23. Fix typo causing compile error when 16- or 32-bit JIT is compiled without + UCP support. + +24. Refactor to avoid compiler warnings in pcrecpp.cc. + +25. Refactor to fix a typo in pcre_jit_test.c + +26. Patch to support compiling pcrecpp.cc with Intel compiler. + + +Version 8.38 23-November-2015 +----------------------------- + +1. If a group that contained a recursive back reference also contained a + forward reference subroutine call followed by a non-forward-reference + subroutine call, for example /.((?2)(?R)\1)()/, pcre_compile() failed to + compile correct code, leading to undefined behaviour or an internally + detected error. This bug was discovered by the LLVM fuzzer. + +2. Quantification of certain items (e.g. atomic back references) could cause + incorrect code to be compiled when recursive forward references were + involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. + This bug was discovered by the LLVM fuzzer. + +3. A repeated conditional group whose condition was a reference by name caused + a buffer overflow if there was more than one group with the given name. + This bug was discovered by the LLVM fuzzer. + +4. A recursive back reference by name within a group that had the same name as + another group caused a buffer overflow. For example: + /(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer. + +5. A forward reference by name to a group whose number is the same as the + current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused + a buffer overflow at compile time. This bug was discovered by the LLVM + fuzzer. + +6. A lookbehind assertion within a set of mutually recursive subpatterns could + provoke a buffer overflow. This bug was discovered by the LLVM fuzzer. + +7. Another buffer overflow bug involved duplicate named groups with a + reference between their definition, with a group that reset capture + numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been + fixed by always allowing for more memory, even if not needed. (A proper fix + is implemented in PCRE2, but it involves more refactoring.) + +8. There was no check for integer overflow in subroutine calls such as (?123). + +9. The table entry for \l in EBCDIC environments was incorrect, leading to its + being treated as a literal 'l' instead of causing an error. + +10. There was a buffer overflow if pcre_exec() was called with an ovector of + size 1. This bug was found by american fuzzy lop. + +11. If a non-capturing group containing a conditional group that could match + an empty string was repeated, it was not identified as matching an empty + string itself. For example: /^(?:(?(1)x|)+)+$()/. + +12. In an EBCDIC environment, pcretest was mishandling the escape sequences + \a and \e in test subject lines. + +13. In an EBCDIC environment, \a in a pattern was converted to the ASCII + instead of the EBCDIC value. + +14. The handling of \c in an EBCDIC environment has been revised so that it is + now compatible with the specification in Perl's perlebcdic page. + +15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in + ASCII/Unicode. This has now been added to the list of characters that are + recognized as white space in EBCDIC. + +16. When PCRE was compiled without UCP support, the use of \p and \P gave an + error (correctly) when used outside a class, but did not give an error + within a class. + +17. \h within a class was incorrectly compiled in EBCDIC environments. + +18. A pattern with an unmatched closing parenthesis that contained a backward + assertion which itself contained a forward reference caused buffer + overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/. + +19. JIT should return with error when the compiled pattern requires more stack + space than the maximum. + +20. A possessively repeated conditional group that could match an empty string, + for example, /(?(R))*+/, was incorrectly compiled. + +21. Fix infinite recursion in the JIT compiler when certain patterns such as + /(?:|a|){100}x/ are analysed. + +22. Some patterns with character classes involving [: and \\ were incorrectly + compiled and could cause reading from uninitialized memory or an incorrect + error diagnosis. + +23. Pathological patterns containing many nested occurrences of [: caused + pcre_compile() to run for a very long time. + +24. A conditional group with only one branch has an implicit empty alternative + branch and must therefore be treated as potentially matching an empty + string. + +25. If (?R was followed by - or + incorrect behaviour happened instead of a + diagnostic. + +26. Arrange to give up on finding the minimum matching length for overly + complex patterns. + +27. Similar to (4) above: in a pattern with duplicated named groups and an + occurrence of (?| it is possible for an apparently non-recursive back + reference to become recursive if a later named group with the relevant + number is encountered. This could lead to a buffer overflow. Wen Guanxing + from Venustech ADLAB discovered this bug. + +28. If pcregrep was given the -q option with -c or -l, or when handling a + binary file, it incorrectly wrote output to stdout. + +29. The JIT compiler did not restore the control verb head in case of *THEN + control verbs. This issue was found by Karl Skomski with a custom LLVM + fuzzer. + +30. Error messages for syntax errors following \g and \k were giving inaccurate + offsets in the pattern. + +31. Added a check for integer overflow in conditions (?() and + (?(R). This omission was discovered by Karl Skomski with the LLVM + fuzzer. + +32. Handling recursive references such as (?2) when the reference is to a group + later in the pattern uses code that is very hacked about and error-prone. + It has been re-written for PCRE2. Here in PCRE1, a check has been added to + give an internal error if it is obvious that compiling has gone wrong. + +33. The JIT compiler should not check repeats after a {0,1} repeat byte code. + This issue was found by Karl Skomski with a custom LLVM fuzzer. + +34. The JIT compiler should restore the control chain for empty possessive + repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer. + +35. Match limit check added to JIT recursion. This issue was found by Karl + Skomski with a custom LLVM fuzzer. + +36. Yet another case similar to 27 above has been circumvented by an + unconditional allocation of extra memory. This issue is fixed "properly" in + PCRE2 by refactoring the way references are handled. Wen Guanxing + from Venustech ADLAB discovered this bug. + +37. Fix two assertion fails in JIT. These issues were found by Karl Skomski + with a custom LLVM fuzzer. + +38. Fixed a corner case of range optimization in JIT. + +39. An incorrect error "overran compiling workspace" was given if there were + exactly enough group forward references such that the last one extended + into the workspace safety margin. The next one would have expanded the + workspace. The test for overflow was not including the safety margin. + +40. A match limit issue is fixed in JIT which was found by Karl Skomski + with a custom LLVM fuzzer. + +41. Remove the use of /dev/null in testdata/testinput2, because it doesn't + work under Windows. (Why has it taken so long for anyone to notice?) + +42. In a character class such as [\W\p{Any}] where both a negative-type escape + ("not a word character") and a property escape were present, the property + escape was being ignored. + +43. Fix crash caused by very long (*MARK) or (*THEN) names. + +44. A sequence such as [[:punct:]b] that is, a POSIX character class followed + by a single ASCII character in a class item, was incorrectly compiled in + UCP mode. The POSIX class got lost, but only if the single character + followed it. + +45. [:punct:] in UCP mode was matching some characters in the range 128-255 + that should not have been matched. + +46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated + class, all characters with code points greater than 255 are in the class. + When a Unicode property was also in the class (if PCRE_UCP is set, escapes + such as \w are turned into Unicode properties), wide characters were not + correctly handled, and could fail to match. + + +Version 8.37 28-April-2015 +-------------------------- + +1. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges + for those parentheses to be closed with whatever has been captured so far. + However, it was failing to mark any other groups between the hightest + capture so far and the currrent group as "unset". Thus, the ovector for + those groups contained whatever was previously there. An example is the + pattern /(x)|((*ACCEPT))/ when matched against "abcd". + +2. If an assertion condition was quantified with a minimum of zero (an odd + thing to do, but it happened), SIGSEGV or other misbehaviour could occur. + +3. If a pattern in pcretest input had the P (POSIX) modifier followed by an + unrecognized modifier, a crash could occur. + +4. An attempt to do global matching in pcretest with a zero-length ovector + caused a crash. + +5. Fixed a memory leak during matching that could occur for a subpattern + subroutine call (recursive or otherwise) if the number of captured groups + that had to be saved was greater than ten. + +6. Catch a bad opcode during auto-possessification after compiling a bad UTF + string with NO_UTF_CHECK. This is a tidyup, not a bug fix, as passing bad + UTF with NO_UTF_CHECK is documented as having an undefined outcome. + +7. A UTF pattern containing a "not" match of a non-ASCII character and a + subroutine reference could loop at compile time. Example: /[^\xff]((?1))/. + +8. When a pattern is compiled, it remembers the highest back reference so that + when matching, if the ovector is too small, extra memory can be obtained to + use instead. A conditional subpattern whose condition is a check on a + capture having happened, such as, for example in the pattern + /^(?:(a)|b)(?(1)A|B)/, is another kind of back reference, but it was not + setting the highest backreference number. This mattered only if pcre_exec() + was called with an ovector that was too small to hold the capture, and there + was no other kind of back reference (a situation which is probably quite + rare). The effect of the bug was that the condition was always treated as + FALSE when the capture could not be consulted, leading to a incorrect + behaviour by pcre_exec(). This bug has been fixed. + +9. A reference to a duplicated named group (either a back reference or a test + for being set in a conditional) that occurred in a part of the pattern where + PCRE_DUPNAMES was not set caused the amount of memory needed for the pattern + to be incorrectly calculated, leading to overwriting. + +10. A mutually recursive set of back references such as (\2)(\1) caused a + segfault at study time (while trying to find the minimum matching length). + The infinite loop is now broken (with the minimum length unset, that is, + zero). + +11. If an assertion that was used as a condition was quantified with a minimum + of zero, matching went wrong. In particular, if the whole group had + unlimited repetition and could match an empty string, a segfault was + likely. The pattern (?(?=0)?)+ is an example that caused this. Perl allows + assertions to be quantified, but not if they are being used as conditions, + so the above pattern is faulted by Perl. PCRE has now been changed so that + it also rejects such patterns. + +12. A possessive capturing group such as (a)*+ with a minimum repeat of zero + failed to allow the zero-repeat case if pcre2_exec() was called with an + ovector too small to capture the group. + +13. Fixed two bugs in pcretest that were discovered by fuzzing and reported by + Red Hat Product Security: + + (a) A crash if /K and /F were both set with the option to save the compiled + pattern. + + (b) Another crash if the option to print captured substrings in a callout + was combined with setting a null ovector, for example \O\C+ as a subject + string. + +14. A pattern such as "((?2){0,1999}())?", which has a group containing a + forward reference repeated a large (but limited) number of times within a + repeated outer group that has a zero minimum quantifier, caused incorrect + code to be compiled, leading to the error "internal error: + previously-checked referenced subpattern not found" when an incorrect + memory address was read. This bug was reported as "heap overflow", + discovered by Kai Lu of Fortinet's FortiGuard Labs and given the CVE number + CVE-2015-2325. + +23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine + call within a group that also contained a recursive back reference caused + incorrect code to be compiled. This bug was reported as "heap overflow", + discovered by Kai Lu of Fortinet's FortiGuard Labs, and given the CVE + number CVE-2015-2326. + +24. Computing the size of the JIT read-only data in advance has been a source + of various issues, and new ones are still appear unfortunately. To fix + existing and future issues, size computation is eliminated from the code, + and replaced by on-demand memory allocation. + +25. A pattern such as /(?i)[A-`]/, where characters in the other case are + adjacent to the end of the range, and the range contained characters with + more than one other case, caused incorrect behaviour when compiled in UTF + mode. In that example, the range a-j was left out of the class. + +26. Fix JIT compilation of conditional blocks, which assertion + is converted to (*FAIL). E.g: /(?(?!))/. + +27. The pattern /(?(?!)^)/ caused references to random memory. This bug was + discovered by the LLVM fuzzer. + +28. The assertion (?!) is optimized to (*FAIL). This was not handled correctly + when this assertion was used as a condition, for example (?(?!)a|b). In + pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect + error about an unsupported item. + +29. For some types of pattern, for example /Z*(|d*){216}/, the auto- + possessification code could take exponential time to complete. A recursion + depth limit of 1000 has been imposed to limit the resources used by this + optimization. + +30. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class + such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored + because \S ensures they are all in the class. The code for doing this was + interacting badly with the code for computing the amount of space needed to + compile the pattern, leading to a buffer overflow. This bug was discovered + by the LLVM fuzzer. + +31. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside + other kinds of group caused stack overflow at compile time. This bug was + discovered by the LLVM fuzzer. + +32. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment + between a subroutine call and its quantifier was incorrectly compiled, + leading to buffer overflow or other errors. This bug was discovered by the + LLVM fuzzer. + +33. The illegal pattern /(?(?.*!.*)?)/ was not being diagnosed as missing an + assertion after (?(. The code was failing to check the character after + (?(?< for the ! or = that would indicate a lookbehind assertion. This bug + was discovered by the LLVM fuzzer. + +34. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with + a fixed maximum following a group that contains a subroutine reference was + incorrectly compiled and could trigger buffer overflow. This bug was + discovered by the LLVM fuzzer. + +35. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) + caused a stack overflow instead of the diagnosis of a non-fixed length + lookbehind assertion. This bug was discovered by the LLVM fuzzer. + +36. The use of \K in a positive lookbehind assertion in a non-anchored pattern + (e.g. /(?<=\Ka)/) could make pcregrep loop. + +37. There was a similar problem to 36 in pcretest for global matches. + +38. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), + and a subsequent item in the pattern caused a non-match, backtracking over + the repeated \X did not stop, but carried on past the start of the subject, + causing reference to random memory and/or a segfault. There were also some + other cases where backtracking after \C could crash. This set of bugs was + discovered by the LLVM fuzzer. + +39. The function for finding the minimum length of a matching string could take + a very long time if mutual recursion was present many times in a pattern, + for example, /((?2){73}(?2))((?1))/. A better mutual recursion detection + method has been implemented. This infelicity was discovered by the LLVM + fuzzer. + +40. Static linking against the PCRE library using the pkg-config module was + failing on missing pthread symbols. + + +Version 8.36 26-September-2014 +------------------------------ + +1. Got rid of some compiler warnings in the C++ modules that were shown up by + -Wmissing-field-initializers and -Wunused-parameter. + +2. The tests for quantifiers being too big (greater than 65535) were being + applied after reading the number, and stupidly assuming that integer + overflow would give a negative number. The tests are now applied as the + numbers are read. + +3. Tidy code in pcre_exec.c where two branches that used to be different are + now the same. + +4. The JIT compiler did not generate match limit checks for certain + bracketed expressions with quantifiers. This may lead to exponential + backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This + issue should be resolved now. + +5. Fixed an issue, which occures when nested alternatives are optimized + with table jumps. + +6. Inserted two casts and changed some ints to size_t in the light of some + reported 64-bit compiler warnings (Bugzilla 1477). + +7. Fixed a bug concerned with zero-minimum possessive groups that could match + an empty string, which sometimes were behaving incorrectly in the + interpreter (though correctly in the JIT matcher). This pcretest input is + an example: + + '\A(?:[^"]++|"(?:[^"]*+|"")*+")++' + NON QUOTED "QUOT""ED" AFTER "NOT MATCHED + + the interpreter was reporting a match of 'NON QUOTED ' only, whereas the + JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test + for an empty string was breaking the inner loop and carrying on at a lower + level, when possessive repeated groups should always return to a higher + level as they have no backtrack points in them. The empty string test now + occurs at the outer level. + +8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern + ^\w+(?>\s*)(?<=\w) which caused it not to match "test test". + +9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl + doesn't). + +10. Change 8.34/15 introduced a bug that caused the amount of memory needed + to hold a pattern to be incorrectly computed (too small) when there were + named back references to duplicated names. This could cause "internal + error: code overflow" or "double free or corruption" or other memory + handling errors. + +11. When named subpatterns had the same prefixes, back references could be + confused. For example, in this pattern: + + /(?Pa)?(?Pb)?(?()c|d)*l/ + + the reference to 'Name' was incorrectly treated as a reference to a + duplicate name. + +12. A pattern such as /^s?c/mi8 where the optional character has more than + one "other case" was incorrectly compiled such that it would only try to + match starting at "c". + +13. When a pattern starting with \s was studied, VT was not included in the + list of possible starting characters; this should have been part of the + 8.34/18 patch. + +14. If a character class started [\Qx]... where x is any character, the class + was incorrectly terminated at the ]. + +15. If a pattern that started with a caseless match for a character with more + than one "other case" was studied, PCRE did not set up the starting code + unit bit map for the list of possible characters. Now it does. This is an + optimization improvement, not a bug fix. + +16. The Unicode data tables have been updated to Unicode 7.0.0. + +17. Fixed a number of memory leaks in pcregrep. + +18. Avoid a compiler warning (from some compilers) for a function call with + a cast that removes "const" from an lvalue by using an intermediate + variable (to which the compiler does not object). + +19. Incorrect code was compiled if a group that contained an internal recursive + back reference was optional (had quantifier with a minimum of zero). This + example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples + caused segmentation faults because of stack overflows at compile time. + +20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a + group that is quantified with an indefinite repeat, caused a compile-time + loop which used up all the system stack and provoked a segmentation fault. + This was not the same bug as 19 above. + +21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h. + Patch by Mike Frysinger. + + Version 8.35 04-April-2014 -------------------------- @@ -27,9 +560,9 @@ Version 8.35 04-April-2014 6. Improve character range checks in JIT. Characters are read by an inprecise function now, which returns with an unknown value if the character code is - above a certain treshold (e.g: 256). The only limitation is that the value - must be bigger than the treshold as well. This function is useful, when - the characters above the treshold are handled in the same way. + above a certain threshold (e.g: 256). The only limitation is that the value + must be bigger than the threshold as well. This function is useful when + the characters above the threshold are handled in the same way. 7. The macros whose names start with RAWUCHAR are placeholders for a future mode in which only the bottom 21 bits of 32-bit data items are used. To