import re
For each of the character group abbreviations, capitalizing it will created its negated version:
\w
matches all alphanumeric characters; \W
matches everything else\d
matches all digits; \D
matches everything else\s
matches all whitespace; \S
matches everything elseBy default, the quantifiers will match greedily. To turn this off, add a question mark:
re.search(r".+,", "aa,bb,") ## Matches 'aa,bb,'
re.search(r".+?,", "aa,bb,") ## Matches 'aa,'
Parentheses are used both for grouping and for capturing. If you want to use them just for grouping – no capturing – then start the group with ?:
:
re.search(r"(a)b", "ab").group(1) ## Returns 'a'
re.search(r"(?:a)b", "ab").group(1) ## Error, because there isn't a group 1.
findall
The method findall
is a simple variant of search
in which all the matches are returned as a list:
re.findall(r"abc", "abcabc") ## ['abc', 'abc']
A common gotcha is that, if there are capturing groups, then findall
will include only them:
re.findall(r"(ab|AB)c", "abcABc") ## ['ab', 'AB']
This is a situation in which we often want to turn off the grouping:
re.findall(r"(?:ab|AB)c", "abcABc") ## ['abc', 'ABc']
re.sub
will allow you to replace matches with something else:
re.sub(r"abc", r"XYZ", "abc")
You can refer to ca groups using \1
, \2
, etc.:
re.sub(r"(a)(b)", r"\2\1", "abab")
Notice the use of raw strings for the replacement. This is primarily due to the backslash. I try always to use a raw string for this argument.
We've only scratched the surface of what can be done with regular expressions in Python. For more: https://docs.python.org/3/library/re.html