| |
|
Back to Index
|
|
The module defines the following functions and constants, and an exception:
-
| compile( |
pattern[, flags]) |
- Compile a regular expression pattern into a regular expression object, which can be used
for matching using its match() and search()
methods, described below.
The expression's behaviour can be modified by specifying a flags value.
Values can be any of the following variables, combined using bitwise OR (the |
operator).
The sequence
prog = re.compile(pat)
result = prog.match(str)
is equivalent to
result = re.match(pat, str)
but the version using compile() is more efficient when the
expression will be used several times in a single program.
- I
-
- IGNORECASE
- Perform case-insensitive matching; expressions like [A-Z] will
match lowercase letters, too. This is not affected by the current locale.
- L
-
- LOCALE
- Make \w, \W, \b,
and \B dependent on the current locale.
- M
-
- MULTILINE
- When specified, the pattern character "^" matches
at the beginning of the string and at the beginning of each line (immediately following
each newline); and the pattern character "$" matches
at the end of the string and at the end of each line (immediately preceding each newline).
By default, "^" matches only at the beginning of the
string, and "$" only at the end of the string and
immediately before the newline (if any) at the end of the string.
- S
-
- DOTALL
- Make the "." special character match any character
at all, including a newline; without this flag, "."
will match anything except a newline.
- U
-
- UNICODE
- Make \w, \W, \b,
and \B dependent on the Unicode character properties database. New in version 2.0.
- X
-
- VERBOSE
- This flag allows you to write regular expressions that look nicer. Whitespace within the
pattern is ignored, except when in a character class or preceded by an unescaped
backslash, and, when a line contains a "#" neither in
a character class or preceded by an unescaped backslash, all characters from the leftmost
such "#" through the end of the line are ignored.
-
| search( |
pattern, string[, flags]) |
- Scan through string looking for a location where the regular expression pattern
produces a match, and return a corresponding MatchObject instance.
Return
None if no position in the string matches the pattern; note that this
is different from finding a zero-length match at some point in the string.
-
| match( |
pattern, string[, flags]) |
- If zero or more characters at the beginning of string match the regular
expression pattern, return a corresponding MatchObject
instance. Return
None if the string does not match the pattern; note that
this is different from a zero-length match.
Note: If you want to locate a match anywhere in
string, use search() instead.
-
| split( |
pattern, string[, maxsplit = 0]) |
- Split string by the occurrences of pattern. If capturing
parentheses are used in pattern, then the text of all groups in the pattern are
also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit
splits occur, and the remainder of the string is returned as the final element of the
list. (Incompatibility note: in the original Python 1.5 release, maxsplit was
ignored. This has been fixed in later releases.)
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
This function combines and extends the functionality of the old regsub.split()
and regsub.splitx().
-
| findall( |
pattern, string) |
- Return a list of all non-overlapping matches of pattern in string.
If one or more groups are present in the pattern, return a list of groups; this will be a
list of tuples if the pattern has more than one group. Empty matches are included in the
result unless they touch the beginning of another match. New in
version 1.5.2.
-
| finditer( |
pattern, string) |
- Return an iterator over all non-overlapping matches for the RE pattern in string.
For each match, the iterator returns a match object. Empty matches are included in the
result unless they touch the beginning of another match. New in
version 2.2.
-
| sub( |
pattern, repl, string[, count]) |
- Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern
in string by the replacement repl. If the pattern isn't found, string
is returned unchanged. repl can be a string or a function; if it is a string,
any backslash escapes in it are processed. That is, "\n"
is converted to a single newline character, "\r" is
converted to a linefeed, and so forth. Unknown escapes such as "\j"
are left alone. Backreferences, such as "\6", are replaced
with the substring matched by group 6 in the pattern. For example:
>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
... r'static PyObject*\npy_\1(void)\n{',
... 'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'
If repl is a function, it is called for every non-overlapping occurrence of pattern.
The function takes a single match object argument, and returns the replacement string. For
example:
>>> def dashrepl(matchobj):
.... if matchobj.group(0) == '-': return ' '
.... else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
The pattern may be a string or an RE object; if you need to specify regular expression
flags, you must use a RE object, or use embedded modifiers in a pattern; for example,
"sub("(?i)b+", "x", "bbbb BBBB")"
returns 'x x'.
The optional argument count is the maximum number of pattern occurrences to
be replaced; count must be a non-negative integer. If omitted or zero, all
occurrences will be replaced. Empty matches for the pattern are replaced only when not
adjacent to a previous match, so "sub('x*', '-', 'abc')"
returns '-a-b-c-'.
In addition to character escapes and backreferences as described above, "\g<name>" will use the substring matched by the group named
"name", as defined by the (?P<name>...)
syntax. "\g<number>" uses the corresponding group
number; "\g<2>" is therefore equivalent to "\2", but isn't ambiguous in a replacement such as "\g<2>0". "\20" would be
interpreted as a reference to group 20, not a reference to group 2 followed by the literal
character "0". The backreference "\g<0>" substitutes in the entire substring matched by the RE.
-
| subn( |
pattern, repl, string[, count]) |
- Perform the same operation as sub(), but return a tuple
(new_string,
number_of_subs_made).
-
- Return string with all non-alphanumerics backslashed; this is useful if you
want to match an arbitrary literal string that may have regular expression metacharacters
in it.
- exception error
- Exception raised when a string passed to one of the functions here is not a valid
regular expression (for example, it might contain unmatched parentheses) or when some
other error occurs during compilation or matching. It is never an error if a string
contains no match for a pattern.
|
|
|
|
|
|
© 2002-2004 Active-Venture.com
Webhosting
Service
|
| |
|
Disclaimer: This
documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://python.org/doc/
|
Domain registration : Buy domain name or register domain name from $5.95/year only |
Cheap domain registration : Register domain name or
buy domain name, including free domain hosting services |
Active-Domain.com offers cheap domain registration, domain name transfer and domain search services |
|
|
|