Regular Expressions in Python


4.2.4 Regex Objects and Methods

compile( pattern[, flags])
Compile a regular expression pattern into a regular expression object, which can then call match() and search() methods. For example:

This code

result = re.search(pat, str)

is equivalent to

regexObj = re.compile(pat)
result = regexObj.search(str)

If the same regex pattern is to be used to match different texts, use compile() to create a regex object first. This avoids compiling the same regex over and over, which is relatively time consuming. Compiled pattern can also be used in functions. For example, the following is also equivalent to the above:

result = re.search(re.compile(pat), str)

Compile() can have a optional second argument (flags) that modifies the meaning of the given pattern. The flags can be any of I, L, M, S, U, S. They can be combined with the | operator. For example, re.compile(pat,re.M|re.U) creates a regex pattern that matches multiple lines of a Unicode string. For a detail account, see Regex Functions

Compiled regular expression objects support the following methods and attributes. These are simply documented. For a more detail documentation, see their functional equivalent at Regex Functions.


The following are methods for a compiled pattern object.

search( string[, pos[, endpos]])
re.compile(pat).search(str) is equivalent to re.search(pat, str). If pattern matches (parts of) string, then a MatchObject is returned. Returns None if pattern is not found in the string. See re.search() at Regex Functions for detailed account.

The optional parameters pos gives an index in the string where the search is to start; it defaults to 0. This is not completely equivalent to slicing the string; the '^' pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start.

The optional parameter endpos limits how far the string will be searched; it will be as if the string is endpos characters long, so only the characters from pos to endpos - 1 will be searched for a match. If endpos is less than pos, no match will be found, otherwise, if rx is a compiled regular expression object, rx.match(string, 0, 50) is equivalent to rx.match(string[:50], 0).

match( string[, pos[, endpos]])
re.compile(pat).match(str) is equivalent to re.match(pat, str).

The optional pos and endpos parameters have the same meaning as for the search() method.

split( string[, maxsplit = 0])
re.compile(pat).split(str) is equivalent to re.split(pat, str).
findall( string[, pos[, endpos]])
re.compile(pat).findall(str) is equivalent to re.findall(pat, str).
finditer( string[, pos[, endpos]])
re.compile(pat).finditer(str) is equivalent to re.finditer(pat, str).
sub( repl, string[, count = 0])
re.compile(pat).sub(str) is equivalent to re.sub(pat, str).
subn( repl, string[, count = 0])
re.compile(pat).subn(str) is equivalent to re.subn(pat, str).

The following are constants assigned by the module when a pattern object is compiled.

flags
The flags argument used when the regex object was compiled, or 0 if no flags were provided. Example:
patObj = re.compile(ur'\w', re.M|re.U)
print patObj.flags  # prints 40

NOTE TO DOC WRITER: needs a brief explanation here on the meaning of the returned number, or, how to turn this number into the flags.

groupindex
A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers. The dictionary is empty if no symbolic groups were used in the pattern. Example:
myText='<img src="some.jpg" alt="beauty" width="123" height="456">'
patObj = re.compile(r'src="(?P<filename>[^"]+)" alt="(?P<alttext>[^"]+)" width="(?P<width>\d+)" height="(?P<height>\d+)"')
print patObj.groupindex

# prints: {'height': 4, 'width': 3, 'alttext': 2, 'filename': 1}
pattern
The pattern string from which the regex object was compiled.

Page created: 2005-04, by Xah Lee.
For copyright and terms, see terms.html
Xah Signet