12 June 2007

Regular Expressions

by mo


Regular expressions can be used to find patterns in Strings, useful for validation to ensure that data is in a particular format. Compilers use regular expressions to validate the syntax of programs.

Character Class’

  • \d: Matches any digit
  • \D: Matches any non-digit
  • \w: Matches any word
  • \W: Matches any non-word
  • \s: Matches any whitespace
  • \S: Matches any non-whitespace

Quantifiers

    • Matches zero or more occurrences of the preceding pattern.
    • Matches one or more occurrences of the preceding pattern.
  • ? Matches zero or one occurrences of the preceding pattern.
  • {n} Matches exactly n occurrences of the preceding pattern.
  • {n,} Matches at least n occurrences of the preceding pattern.
  • {n,m} Matches between n and m (inclusive) occurrences of the preceding pattern.

Examples:

  • . Matches any single character except a newline character
  • .* Matches any number of unspecified characters except newlines.
  • * When the applied this will match zero or more occurrences.
  • + causes the pattern to match one or more occurrences.
    • E.g. A* and A+ will match A, but A* will match an empty string as well.
  • \d matches any numeric digit
  • [] to specify sets of characters other than those that belong to a predefined character class.
    • E.g. [aeiou] matches any vowel.
    • can be used to specify ranges of characters
    • E.g. [0-35-9] -> 0 to 3, or 5 to 9. Not 4!
  • ^ specifies that a pattern should match anything but.
    • E.g. [^4] matches any non-digit and digits that is not 4.
csharp