Lesson 22b: Regular Expression: Atoms and Quantifiers

Regular Expression Bits and Pieces

A regular expression is normally delimited by two slashes ("/").
Everything between the slashes is a pattern to match. Patterns can
be made up of the following Atoms:

  1. Ordinary characters: a-z, A-Z, 0-9 and some punctuation. These
    match themselves.

  2. The "." character, which matches everything except the newline.

  3. A bracket list of characters, such as [AaGgCcTtNn], [A-F0-9], or
    [^A-Z] (the last means anything BUT A-Z).

  4. Certain predefined character sets:
    The digits [0-9]
    A word character [A-Za-z_0-9]
    White space [ \t\n\r]
    A non-digit
    A non-word
  5. Anchors:
    Matches the beginning of the string
    Matches the end of the string
    Matches a word boundary (between a \w and a \W)


  • /g..t/ matches "gaat", "goat", and "gotta get a goat" (twice)
  • /g[gatc][gatc]t/ matches "gaat", "gttt", "gatt", and
    "gotta get an agatt" (once)
  • /\d\d\d-\d\d\d\d/ matches 376-8380, and 5128-8181, but not
  • /^\d\d\d-\d\d\d\d/ matches 376-8380 and 376-83801, but not
  • /^\d\d\d-\d\d\d\d$/ only matches telephone numbers.
  • /\bcat/ matches "cat", "catsup" and "more catsup please"
    but not "scat".
  • /\bcat\b/ only text containing the word "cat".


By default, an atom matches once. This can be modified by following
the atom with a quantifier:

atom matches zero or exactly once
atom matches zero or more times
atom matches one or more times
atom matches exactly three times
atom matches between two and four times, inclusive
atom matches at least four times


  • /goa?t/ matches "goat" and "got". Also any text that contains these words.
  • /g.+t/ matches "goat", "goot", and "grant", among others.
  • /g.*t/ matches "gt", "goat", "goot", and "grant", among others.
  • /^\d{3}-\d{4}$/ matches US telephone numbers (no extra text allowed).


  1. Design a pattern to recognize an email address.
  2. Design a pattern to recognize the id portion of a sequence in a FASTA file

