3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
12 documentation. This document contains a quick-reference summary of the syntax.
18 \ex where x is non-alphanumeric is a literal x
19 \eQ...\eE treat enclosed characters as literal
25 \ea alarm, that is, the BEL character (hex 07)
26 \ecx "control-x", where x is any ASCII character
30 \er carriage return (hex 0D)
32 \eddd character with octal code ddd, or backreference
33 \exhh character with hex code hh
34 \ex{hhh..} character with hex code hhh..
40 . any character except newline;
41 in dotall mode, any character whatsoever
42 \eC one data unit, even in UTF mode (best avoided)
44 \eD a character that is not a decimal digit
45 \eh a horizontal whitespace character
46 \eH a character that is not a horizontal whitespace character
47 \eN a character that is not a newline
48 \ep{\fIxx\fP} a character with the \fIxx\fP property
49 \eP{\fIxx\fP} a character without the \fIxx\fP property
50 \eR a newline sequence
51 \es a whitespace character
52 \eS a character that is not a whitespace character
53 \ev a vertical whitespace character
54 \eV a character that is not a vertical whitespace character
55 \ew a "word" character
56 \eW a "non-word" character
57 \eX an extended Unicode sequence
59 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
60 characters, even in a UTF mode. However, this can be changed by setting the
64 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
93 Pc Connector punctuation
97 Pi Initial punctuation
104 Sm Mathematical symbol
109 Zp Paragraph separator
113 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
116 Xan Alphanumeric: union of properties L and N
117 Xps POSIX space: property Z or tab, NL, VT, FF, CR
118 Xsp Perl space: property Z or tab, NL, FF, CR
119 Xwd Perl word: property Xan or underscore
122 .SH "SCRIPT NAMES FOR \ep AND \eP"
146 Egyptian_Hieroglyphs,
161 Inscriptional_Pahlavi,
162 Inscriptional_Parthian,
219 .SH "CHARACTER CLASSES"
222 [...] positive character class
223 [^...] negative character class
224 [x-y] range (can be used for hex characters)
225 [[:xxx:]] positive POSIX named set
226 [[:^xxx:]] negative POSIX named set
232 cntrl control character
234 graph printing, excluding space
235 lower lower case letter
236 print printing, including space
237 punct printing, excluding alphanumeric
239 upper upper case letter
241 xdigit hexadecimal digit
243 In PCRE, POSIX character set names recognize only ASCII characters by default,
244 but some of them use Unicode properties if PCRE_UCP is set. You can use
245 \eQ...\eE inside a character class.
252 ?+ 0 or 1, possessive
255 *+ 0 or more, possessive
258 ++ 1 or more, possessive
261 {n,m} at least n, no more than m, greedy
262 {n,m}+ at least n, no more than m, possessive
263 {n,m}? at least n, no more than m, lazy
264 {n,} n or more, greedy
265 {n,}+ n or more, possessive
266 {n,}? n or more, lazy
269 .SH "ANCHORS AND SIMPLE ASSERTIONS"
273 \eB not a word boundary
275 also after internal newline in multiline mode
278 also before newline at end of subject
279 also before internal newline in multiline mode
281 also before newline at end of subject
283 \eG first matching position in subject
286 .SH "MATCH POINT RESET"
289 \eK reset start of match
301 (...) capturing group
302 (?<name>...) named capturing group (Perl)
303 (?'name'...) named capturing group (Perl)
304 (?P<name>...) named capturing group (Python)
305 (?:...) non-capturing group
306 (?|...) non-capturing group; reset group numbers for
307 capturing groups in each alternative
313 (?>...) atomic, non-capturing group
321 (?#....) comment (not nestable)
328 (?J) allow duplicate names
330 (?s) single line (dotall)
331 (?U) default ungreedy (lazy)
332 (?x) extended (ignore white space)
333 (?-...) unset option(s)
335 The following are recognized only at the start of a pattern or after one of the
336 newline-setting options with similar syntax:
338 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
339 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
340 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
341 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
344 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
347 (?=...) positive look ahead
348 (?!...) negative look ahead
349 (?<=...) positive look behind
350 (?<!...) negative look behind
352 Each top-level branch of a look behind must be of a fixed length.
358 \en reference by number (can be ambiguous)
359 \egn reference by number
360 \eg{n} reference by number
361 \eg{-n} relative reference by number
362 \ek<name> reference by name (Perl)
363 \ek'name' reference by name (Perl)
364 \eg{name} reference by name (Perl)
365 \ek{name} reference by name (.NET)
366 (?P=name) reference by name (Python)
369 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
372 (?R) recurse whole pattern
373 (?n) call subpattern by absolute number
374 (?+n) call subpattern by relative number
375 (?-n) call subpattern by relative number
376 (?&name) call subpattern by name (Perl)
377 (?P>name) call subpattern by name (Python)
378 \eg<name> call subpattern by name (Oniguruma)
379 \eg'name' call subpattern by name (Oniguruma)
380 \eg<n> call subpattern by absolute number (Oniguruma)
381 \eg'n' call subpattern by absolute number (Oniguruma)
382 \eg<+n> call subpattern by relative number (PCRE extension)
383 \eg'+n' call subpattern by relative number (PCRE extension)
384 \eg<-n> call subpattern by relative number (PCRE extension)
385 \eg'-n' call subpattern by relative number (PCRE extension)
388 .SH "CONDITIONAL PATTERNS"
391 (?(condition)yes-pattern)
392 (?(condition)yes-pattern|no-pattern)
394 (?(n)... absolute reference condition
395 (?(+n)... relative reference condition
396 (?(-n)... relative reference condition
397 (?(<name>)... named reference condition (Perl)
398 (?('name')... named reference condition (Perl)
399 (?(name)... named reference condition (PCRE)
400 (?(R)... overall recursion condition
401 (?(Rn)... specific group recursion condition
402 (?(R&name)... specific recursion condition
403 (?(DEFINE)... define subpattern for reference
404 (?(assert)... assertion condition
407 .SH "BACKTRACKING CONTROL"
410 The following act immediately they are reached:
412 (*ACCEPT) force successful match
413 (*FAIL) force backtrack; synonym (*F)
414 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
416 The following act only when a subsequent match failure causes a backtrack to
417 reach them. They all force a match failure, but they differ in what happens
418 afterwards. Those that advance the start-of-match point do so only if the
419 pattern is not anchored.
421 (*COMMIT) overall failure, no advance of starting point
422 (*PRUNE) advance to next starting character
423 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
424 (*SKIP) advance to current matching position
425 (*SKIP:NAME) advance to position corresponding to an earlier
426 (*MARK:NAME); if not found, the (*SKIP) is ignored
427 (*THEN) local failure, backtrack to next alternation
428 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
431 .SH "NEWLINE CONVENTIONS"
434 These are recognized only at the very start of the pattern or after a
435 (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
437 (*CR) carriage return only
439 (*CRLF) carriage return followed by linefeed
440 (*ANYCRLF) all three of the above
441 (*ANY) any Unicode newline sequence
444 .SH "WHAT \eR MATCHES"
447 These are recognized only at the very start of the pattern or after a
448 (*...) option that sets the newline convention or a UTF or UCP mode.
450 (*BSR_ANYCRLF) CR, LF, or CRLF
451 (*BSR_UNICODE) any Unicode newline sequence
458 (?Cn) callout with data n
464 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
465 \fBpcrematching\fP(3), \fBpcre\fP(3).
473 University Computing Service
474 Cambridge CB2 3QH, England.
482 Last updated: 10 January 2012
483 Copyright (c) 1997-2012 University of Cambridge.