Jump to content

Talk:Regular expression examples/sandbox

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 120.138.100.72 (talk) at 11:57, 16 October 2009. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Trying to come up with a language-independent version of this article...


Metacharacter(s) Description Example
string regex match notes
. Normally matches any character except a newline. Within square brackets the

dot is literal

Hello World ...... Hello,
( ) Groups a series of pattern elements to a single element. [1] Hello World (H..).(o..) Hello W

Group 1: Hel
Group 2: o W

? Matches the preceding pattern element zero or one times. Hello World H(.?)e He

Group 1: «empty»

There is a possible character (no, in this case) between 'H' and 'e'.
+ Matches the preceding pattern element one or more times. Hello World l+ ll There are one or more consecutive letter "l"'s in "Hello World"
* Matches the preceding pattern element zero or more times. Hello World el*o ello There is an 'e' followed by zero to many 'l' followed by 'o' (eo, elo, ello, elllo)
{M,N} Denotes the minimum M and the maximum N match count. Hello World l{1,2} ll There exists a substring with at least 1 and at most 2 l's
? Modifies the *, +, or {M,N}'d regexp that comes before

to match as few times as possible.

Hello World l+? l Compare this (called the non-greedy match) with the greedier version

above with the unmodified '+'.

[...] Denotes a set of possible character matches. Hello World [aeiou]+ e Matches the first occurrence of a succession of vowels (one or more).
[^...] Matches every character except the ones inside brackets. Hello World [^aeiou]+ H Matches the first occurrence of a succession of 'not-vowels'
Separates alternate possibilities. Hello World Hi|Pogo) Hello At least one of Hello, Hi, or Pogo is contained in the string.
\b Matches a word boundary Hello World ell\b matches nothing There is no substring matching 'ell' at the end of a word
\w Matches a 'word' character (defined as the group of alphanumeric

characters, including the underscore "_"; same as [A-Za-z0-9_])

Hello World \w H There is at least one alphanumeric character in string

(A-Z, a-z, 0-9, _)

\W Matches a non-alphanumeric character, excluding "_"; same as [^A-Za-z0-9_] Hello World \W «space» The space between Hello and World is not alphanumeric
\s Matches a whitespace character (space, tab, newline, form feed) Hello World \s.* World Any characters (0 or more) after a whitespace character
\S Matches anything BUT a whitespace. Hello World \S.*\S Hello World There are TWO non-whitespace characters, which may be separated

by other characters

\d Matches a digit; same as [0-9]. 99 bottles of beer on the wall (\d+) Group 1: 99 Group 1 is the first number in the string
\D Matches a non-digit; same as [^0-9]. 99 bottles of beer on the wall \D «space» The first non-digit character is the space after 99
^ Matches the beginning of a line or string. Hello World ^He He The string starts with the characters 'He'
$ Matches the end of a line or string. Hello World rld$ rld The given string is a line or string that ends with 'rld'
\A Matches the beginning of a string (but not an internal line). Hello\nWorld \AH H The matched string starts with 'H'
\z Matches the end of a string (but not an internal line).

[2]

Hello\nWorld\n d\n\z d\n The matched string is a string that ends with 'd\\n'

Notes

  1. ^ When you match a pattern within parentheses, you can use any of $1, $2, ... later to refer to the previously matched pattern.
  2. ^ see Perl Best Practices Page 240