The Python re Module
00:00 In the previous lesson, I finished up the language of regular expressions and introduced the concept of grouping. In this lesson, I’ll be showing you how to use regular expressions inside of Python.
First, a little review. The regex on the left has some grouping in it. The square brackets indicate a character class of the vowels, the parentheses around it are a group of that choice of vowel, and the
r is literal.
This highlights the
'er' in the string, creating three groups matching an
'o', and an
'e'. Later when I show you how to do this in Python, you’ll actually be able to access this content. Here’s another one, this time a character class of the digits
9, and that is grouped, and then one or more of those can happen using the plus (
+) quantifier. Notice that this matches the
'9', and the
'888', but the grouping is the
'9', and the
re—short for regular expression—library is a standard part of Python. Most of the methods inside of the
re module take a string pattern—which is the regex—and a string to search against, and return a result.
The result is usually a
Match object. This
Match object gives information about the match—whether or not a match happened and what portions of the string matched the result.
Match objects are truthy.
That means they can be compared as Booleans, so you can use a
re method that returns one of these objects and then compare the object in an
if statement to see whether or not a match happened. We’ll start out by importing the module.
search() method inside of the
re module returns a
Match object. In this line, I’ve searched for the literal expression
"spam" inside of the
question. As a
Match object was returned, that tells you that
'spam' was successfully found within the
This is the opposite of the curly brackets inside of the regular expressions themselves. It can be a little confusing as you switch back and forth between the two mechanisms, but the
span parameter of the
Match is closer to the Pythonic mechanism.
Evaluating this variable as a Boolean returns
True, indicating that a match was found. I can run a function called
.span() on the
Match object that returns the lower and upper boundaries, Another function called
.start(), showing the lower boundary, and finally
.end() to give you the upper boundary.
Comparing it to
None shows that it’s
True. Or, converting it to a Boolean means it’s
False. This shows you how you could test the results of your regular expression functions inside of an
This regex was successful. It’s looking for the repetition of
5 word-like meta-characters. As the string starts with
'Lovel', which are all word characters, the match results showing the
5. Python 3.4 added a function called
06:29 Let’s break this regular expression down. Looking at the inner group first, there’s the word meta-character with zero or more instances, there’s the whitespace character with zero or more instances.
06:41 So I’m looking for something that looks like an actual word. That is inside of a group. That group repeats itself zero or more times, and then is followed by an exclamation mark. All of that is grouped.
The outside group can be repeated zero or more times. This is successful because it matches the two sub-parts of this string.
"Lovely spam!" and
"Wonderful spam!" each match the outer group. And because the outer group is repeated, this regular expression matches the entire string, giving a truth value for
Unlike the other functions I’ve shown you so far,
findall() doesn’t return a
Match object—it returns a
list. It applies the regular expression and finds each match inside of the string, returning the matching characters in a list.
findall() returns a
list. Sometimes instead of wanting a list, you want an iterator. Enter
finditer(). It essentially does the same thing as
findall() but returns an iterator instead of a list. This is more efficient in memory if you’re doing a large number of matches.
Become a Member to join the conversation.