Emacs: Text Pattern Matching (regex) tutorial

Advertise Here

, 2007-08, …, 2011-12-06

This page is a tutorial on emacs regex.

Emacs's regex is not based on Perl or Python's, but is very similar. In emacs regex, the parenthesis characters () are literal. If you want to capture a pattern, you need to escape the paren like this: \(myPattern\).

Common Patterns

Here are some common patterns:

PatternMatches
.Any single character
\.One period
[0-9]+Sequence of digits
[A-Za-z]+Sequence of letters
[-A-Za-z0-9]+Sequence of letter, digit, hyphen
[_A-Za-z0-9]+Sequence of letter, digit, underscore
[-_A-Za-z0-9]+Sequence of letter, digit, hyphen, underscore
[[:ascii:]]+Sequence of ASCII chars.
[[:nonascii:]]+Sequence of none ASCII chars (e.g. Unicode chars)
[\t\n ]+Sequence of {tab, newline, space}
PatternMatches
"\([^"]+?\)"capture text between double quotes (non-greedy)
“\([^”]+?\)”capture text between curly double quotes (non-greedy; Unicode char)
(\([^)]+?\))capture text between parenthesis (non-greedy)
PatternMatches
+means match previous pattern 1 or more times
*means match previous pattern 0 or more times
?means match previous pattern 0 or 1 time
+?means match previous pattern 1 or more times, but with minimal match (aka non-greedy)
PatternMatches
^…Beginning of {line, string, buffer}
…$End of {line, string, buffer}
\`…Beginning of {string, buffer}
…\'End of {string, buffer}
\bword boundary marker

Differences from Perl's Regex

If you are familiar with Perl's regex, here are some practical major differences.

Testing Your Regex

Emacs has a interactive regex mode. It show matches as you type. To go into the mode, call regexp-builder.

Alternatively, you can call query-replace-regexp to test your pattern. Ι prefer this.

Regex in Emacs Lisp Code

Regex is used in elisp code too, just like Perl as a language.

How to Test Regex in Elisp Code

To test regex in your elisp code, you can open a empty file and place the regex function at top and the text you want to match below it, like this:

(search-forward-regexp "yourRegex")

whatever text here

Then, put your cursor to the right of the closing parenthesis, then call eval-last-sexpCtrl+x Ctrl+e】. If your regex matches, it'll move cursor to the last char of the matched text. If you get a lisp error saying search failed, then your regex didn't match. If you get a lisp syntax error, then you probably screwed up on the backslashs.

Double Backslash in Lisp Code

In a lisp regex function that takes a regex string (e.g. search-forward-regexp), you will need to use double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.

For example, suppose you have this text:

Sin[x] + Sin[y]

and you need to capture the x or y. If you are calling regex command such as query-replace-regexp, you can input in the prompt:

\(\[[a-z]\]\)

But in lisp code, you'll need to double the backslashes, like this:

(search-forward-regexp "\\(\\[[a-z]\\]\\)")

The regex engine really just got:

\(\[[a-z]\]\)

C language style escape for newline (linne feed) \n and tab \t must not have double backslash in elisp string, regex or not.

blog comments powered by Disqus