Python Regex Functions

Advertise Here

, 2005, …, 2011-12-05

search()

search( ‹pattern›, ‹string›)

If pattern matches part of a string, then a MatchObject is returned.

Returns None if pattern is not found in the string.

Note: A successful match does not necessarily mean it contains part of the given string. For example, these patterns matches any string: r'' and r'y*'.

Example:

# python
import re
result=re.search(r'\w+@\w+\.com', 'long text xyz@xyz.com long')
if result:
    print "yes!"
    print result.group()
else:
    print "no!"

Note: pattern string should be enclosed using single quotes, like this r'…'. Otherwise, backslashes in it must be escaped. For example, to search for a sequence of tabs, use re.search(r'\t+') or re.search('\\t+').

search( ‹pattern›, ‹string›, ‹flags›)

The optional second argument “flags” modifies the meaning of the given pattern. The flags can be any of re.I, re.L, re.M, re.S, re.U, re.S. They can be combined with the | operator. For example, re.search(pat,re.M|re.U) creates a regex pattern that matches multiple lines of a Unicode string. For detail, see: Pyhton Regex Flags.

Pyhton Regex Flags

match()

match(‹pattern›, ‹string›)

match(‹pattern›, ‹string›, ‹flags›)

The “match” function is like “search” except that the match must start at the beginning of string. For example, re.search('me','somestring') matches, but re.match('me','somestring') returns None.

Note: Match() is not exactly equivalent to Search() with ^. Example:

re.search(r'^B', 'A\nB',re.M) # succeeds
re.match(r'B', 'A\nB',re.M)   # fails

split()

split( ‹pattern›, ‹string›)

Returns a list of splitted string with pattern as boundary. Example:

re.split(r' +', 'what   do  you think')
# returns ['what', 'do', 'you', 'think']

If the boundary pattern is enclosed in parenthesis, then it is included in the returned list. For Example:

re.split(r'( +)', 'what   do  you think')
# returns ['what', '   ', 'do', '  ', 'you', ' ', 'think']

If there are more than one capturing parenthesis in pattern, they are all included in the returned list in sequence. For Example:

  
re.split(r'( +)(@+)', 'what   @@do  @@you @@think')
# returns ['what', '   ', '@@', 'do', '  ', '@@', 'you', ' ', '@@', 'think']

split( ‹pattern›, ‹string›, maxsplit = 0)

If the optional “maxsplit” is given, then the returned list's length is no more than “maxsplit”.

findall

findall( ‹pattern›, ‹string›)

Return a list of all non-overlapping matches of ‹pattern› in ‹string›. Example:

re.findall(r'@+', 'what   @@@do  @@you @think')
# returns ['@@@', '@@', '@']

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Example:

re.findall(r'( +)(@+)', 'what   @@@do  @@you @think')
# returns [('   ', '@@@'), ('  ', '@@'), (' ', '@')]

Empty matches are included in the result unless they touch the beginning of another match. Example:

re.findall(r'\b', 'what   @@@do  @@you @think')
# returns ['', '', '', '', '', '', '', '']
need another example here showing what is
meant by "unless they touch the beginning of another match."

findall( ‹pattern›, ‹string›, ‹flags›)

finditer

finditer( ‹pattern›, ‹string›)

finditer( ‹pattern›, ‹string›, ‹flags›)

Like “findall”, except an "iterator" is returned with MatchObject as members. This is to be used in a loop. Example:

for matched in re.finditer(r'(\w+)', 'what   do  you think'):
    print matched.group()

sub

sub( ‹pattern›, ‹repl›, ‹string›)

sub( ‹pattern›, ‹repl›, ‹string›, ‹count›)

Returns a string by substituting pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Any \number in repl are replaced by the captured pattern in pattern (That is, sub patterns enclosed in parenthesis). Example:

  
newstr=re.sub(r'([^-]+)--(.+)$', r'\1--Me, not \2','"what do you mean?" --A Sage')
# returns: "what do you mean?" --Me, not A Sage

repl can also be a function for more complicated replacement. When a match is found, the function is called and its return value used as the replacement string. Example:

def fun(matchObj):
    if matchObj.group(0) == '--A Sage':
        return '--Me'
    else:
        return '--Some Joe'

newstr=re.sub(r'--.+$', fun,'"what do you mean?" --xyz')
print newstr       # prints:  "what do you mean?"  --Some Joe

The first argument pattern may be a string or an regex object. If you need to specify regular expression flags, you must use a regex object. Alternatively, you can embed a flag in your regex pattern by (?iLmsux) in the beginning of your pattern. For example, sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'. ( See regex pattern syntax for detail.)

The optional argument count is the maximum number of pattern occurrences to be replaced.

In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>…) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn't ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character 0. The backreference \g<0> substitutes in the entire substring matched by the pattern.

subn

subn( ‹pattern›, ‹repl›, ‹string›)

subn( ‹pattern›, ‹repl›, ‹string›, ‹count›)

Perform the same operation as sub(), but returns a tuple: (new_string, number_of_subs_made).

escape(‹string›)

Return a string with a backslash character “\” inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.

exception error

Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern.

blog comments powered by Disqus