1 .TH PCRESYNTAX 3 "08 January 2014" "PCRE 8.35"
3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
12 documentation. This document contains a quick-reference summary of the syntax.
18 \ex where x is non-alphanumeric is a literal x
19 \eQ...\eE treat enclosed characters as literal
25 \ea alarm, that is, the BEL character (hex 07)
26 \ecx "control-x", where x is any ASCII character
28 \ef form feed (hex 0C)
30 \er carriage return (hex 0D)
32 \e0dd character with octal code 0dd
33 \eddd character with octal code ddd, or backreference
34 \eo{ddd..} character with octal code ddd..
35 \exhh character with hex code hh
36 \ex{hhh..} character with hex code hhh..
38 Note that \e0dd is always an octal code, and that \e8 and \e9 are the literal
39 characters "8" and "9".
45 . any character except newline;
46 in dotall mode, any character whatsoever
47 \eC one data unit, even in UTF mode (best avoided)
49 \eD a character that is not a decimal digit
50 \eh a horizontal white space character
51 \eH a character that is not a horizontal white space character
52 \eN a character that is not a newline
53 \ep{\fIxx\fP} a character with the \fIxx\fP property
54 \eP{\fIxx\fP} a character without the \fIxx\fP property
55 \eR a newline sequence
56 \es a white space character
57 \eS a character that is not a white space character
58 \ev a vertical white space character
59 \eV a character that is not a vertical white space character
60 \ew a "word" character
61 \eW a "non-word" character
62 \eX a Unicode extended grapheme cluster
64 By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
65 or in the 16- bit and 32-bit libraries. However, if locale-specific matching is
66 happening, \es and \ew may also match characters with code points in the range
67 128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences
68 is changed to use Unicode properties and they match many more characters.
71 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
100 Pc Connector punctuation
104 Pi Initial punctuation
111 Sm Mathematical symbol
116 Zp Paragraph separator
120 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
123 Xan Alphanumeric: union of properties L and N
124 Xps POSIX space: property Z or tab, NL, VT, FF, CR
125 Xsp Perl space: property Z or tab, NL, VT, FF, CR
126 Xuc Univerally-named character: one that can be
127 represented by a Universal Character Name
128 Xwd Perl word: property Xan or underscore
130 Perl and POSIX space are now the same. Perl added VT to its space character set
131 at release 5.18 and PCRE changed at release 8.34.
134 .SH "SCRIPT NAMES FOR \ep AND \eP"
164 Egyptian_Hieroglyphs,
181 Inscriptional_Pahlavi,
182 Inscriptional_Parthian,
208 Meroitic_Hieroglyphs,
264 .SH "CHARACTER CLASSES"
267 [...] positive character class
268 [^...] negative character class
269 [x-y] range (can be used for hex characters)
270 [[:xxx:]] positive POSIX named set
271 [[:^xxx:]] negative POSIX named set
277 cntrl control character
279 graph printing, excluding space
280 lower lower case letter
281 print printing, including space
282 punct printing, excluding alphanumeric
284 upper upper case letter
286 xdigit hexadecimal digit
288 In PCRE, POSIX character set names recognize only ASCII characters by default,
289 but some of them use Unicode properties if PCRE_UCP is set. You can use
290 \eQ...\eE inside a character class.
297 ?+ 0 or 1, possessive
300 *+ 0 or more, possessive
303 ++ 1 or more, possessive
306 {n,m} at least n, no more than m, greedy
307 {n,m}+ at least n, no more than m, possessive
308 {n,m}? at least n, no more than m, lazy
309 {n,} n or more, greedy
310 {n,}+ n or more, possessive
311 {n,}? n or more, lazy
314 .SH "ANCHORS AND SIMPLE ASSERTIONS"
318 \eB not a word boundary
320 also after internal newline in multiline mode
323 also before newline at end of subject
324 also before internal newline in multiline mode
326 also before newline at end of subject
328 \eG first matching position in subject
331 .SH "MATCH POINT RESET"
334 \eK reset start of match
336 \eK is honoured in positive assertions, but ignored in negative ones.
348 (...) capturing group
349 (?<name>...) named capturing group (Perl)
350 (?'name'...) named capturing group (Perl)
351 (?P<name>...) named capturing group (Python)
352 (?:...) non-capturing group
353 (?|...) non-capturing group; reset group numbers for
354 capturing groups in each alternative
360 (?>...) atomic, non-capturing group
368 (?#....) comment (not nestable)
375 (?J) allow duplicate names
377 (?s) single line (dotall)
378 (?U) default ungreedy (lazy)
379 (?x) extended (ignore white space)
380 (?-...) unset option(s)
382 The following are recognized only at the very start of a pattern or after one
383 of the newline or \eR options with similar syntax. More than one of them may
386 (*LIMIT_MATCH=d) set the match limit to d (decimal number)
387 (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
388 (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
389 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
390 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
391 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
392 (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
393 (*UTF) set appropriate UTF mode for the library in use
394 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
396 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
397 limits set by the caller of pcre_exec(), not increase them.
400 .SH "NEWLINE CONVENTION"
403 These are recognized only at the very start of the pattern or after option
404 settings with a similar syntax.
406 (*CR) carriage return only
408 (*CRLF) carriage return followed by linefeed
409 (*ANYCRLF) all three of the above
410 (*ANY) any Unicode newline sequence
413 .SH "WHAT \eR MATCHES"
416 These are recognized only at the very start of the pattern or after option
417 setting with a similar syntax.
419 (*BSR_ANYCRLF) CR, LF, or CRLF
420 (*BSR_UNICODE) any Unicode newline sequence
423 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
426 (?=...) positive look ahead
427 (?!...) negative look ahead
428 (?<=...) positive look behind
429 (?<!...) negative look behind
431 Each top-level branch of a look behind must be of a fixed length.
437 \en reference by number (can be ambiguous)
438 \egn reference by number
439 \eg{n} reference by number
440 \eg{-n} relative reference by number
441 \ek<name> reference by name (Perl)
442 \ek'name' reference by name (Perl)
443 \eg{name} reference by name (Perl)
444 \ek{name} reference by name (.NET)
445 (?P=name) reference by name (Python)
448 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
451 (?R) recurse whole pattern
452 (?n) call subpattern by absolute number
453 (?+n) call subpattern by relative number
454 (?-n) call subpattern by relative number
455 (?&name) call subpattern by name (Perl)
456 (?P>name) call subpattern by name (Python)
457 \eg<name> call subpattern by name (Oniguruma)
458 \eg'name' call subpattern by name (Oniguruma)
459 \eg<n> call subpattern by absolute number (Oniguruma)
460 \eg'n' call subpattern by absolute number (Oniguruma)
461 \eg<+n> call subpattern by relative number (PCRE extension)
462 \eg'+n' call subpattern by relative number (PCRE extension)
463 \eg<-n> call subpattern by relative number (PCRE extension)
464 \eg'-n' call subpattern by relative number (PCRE extension)
467 .SH "CONDITIONAL PATTERNS"
470 (?(condition)yes-pattern)
471 (?(condition)yes-pattern|no-pattern)
473 (?(n)... absolute reference condition
474 (?(+n)... relative reference condition
475 (?(-n)... relative reference condition
476 (?(<name>)... named reference condition (Perl)
477 (?('name')... named reference condition (Perl)
478 (?(name)... named reference condition (PCRE)
479 (?(R)... overall recursion condition
480 (?(Rn)... specific group recursion condition
481 (?(R&name)... specific recursion condition
482 (?(DEFINE)... define subpattern for reference
483 (?(assert)... assertion condition
486 .SH "BACKTRACKING CONTROL"
489 The following act immediately they are reached:
491 (*ACCEPT) force successful match
492 (*FAIL) force backtrack; synonym (*F)
493 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
495 The following act only when a subsequent match failure causes a backtrack to
496 reach them. They all force a match failure, but they differ in what happens
497 afterwards. Those that advance the start-of-match point do so only if the
498 pattern is not anchored.
500 (*COMMIT) overall failure, no advance of starting point
501 (*PRUNE) advance to next starting character
502 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
503 (*SKIP) advance to current matching position
504 (*SKIP:NAME) advance to position corresponding to an earlier
505 (*MARK:NAME); if not found, the (*SKIP) is ignored
506 (*THEN) local failure, backtrack to next alternation
507 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
514 (?Cn) callout with data n
520 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
521 \fBpcrematching\fP(3), \fBpcre\fP(3).
529 University Computing Service
530 Cambridge CB2 3QH, England.
538 Last updated: 08 January 2014
539 Copyright (c) 1997-2014 University of Cambridge.