Next: Regexp Functions, Previous: Syntax of Regexps, Up: Regular Expressions
Here is a complicated regexp which was formerly used by Emacs to
recognize the end of a sentence together with any whitespace that
follows. (Nowadays Emacs uses a similar but more complex default
regexp constructed by the function sentence-end.
See Standard Regexps.)
First, we show the regexp as a string in Lisp syntax to distinguish spaces from tab characters. The string constant begins and ends with a double-quote. `\"' stands for a double-quote as part of the string, `\\' for a backslash as part of the string, `\t' for a tab and `\n' for a newline.
"[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
In contrast, if you evaluate this string, you will see the following:
"[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
⇒ "[.?!][]\"')}]*\\($\\| $\\| \\| \\)[
]*"
In this output, tab and newline appear as themselves.
This regular expression contains four parts in succession and can be deciphered as follows:
[.?!][]\"')}]*\" is Lisp syntax for a double-quote in
a string. The `*' at the end indicates that the immediately
preceding regular expression (a character alternative, in this case) may be
repeated zero or more times.
\\($\\| $\\|\t\\| \\)[ \t\n]*