The Python re Module
00:00 In the previous lesson, I finished up the language of regular expressions and introduced the concept of grouping. In this lesson, I’ll be showing you how to use regular expressions inside of Python.
First, a little review. The regex on the left has some grouping in it. The square brackets indicate a character class of the vowels, the parentheses around it are a group of that choice of vowel, and the
r is literal.
This highlights the
'er' in the string, creating three groups matching an
'o', and an
'e'. Later when I show you how to do this in Python, you’ll actually be able to access this content. Here’s another one, this time a character class of the digits
9, and that is grouped, and then one or more of those can happen using the plus (
+) quantifier. Notice that this matches the
'9', and the
'888', but the grouping is the
'9', and the
When you apply a quantifier to a group, only the last group gets counted. So the
'4' and the
'5' matches the regex, but the
'5' ends up in the group.
'9' matches the regex and ends up in the group, and the
'888' matches the regex and the last
'8' ends up in the group.
This regex looks for the literal letters
the grouped together, can have zero or more repetitions of any character, then uses a backreference.
So you’re looking for whatever matched the first group happening again inside of the string. So what you’re getting is a match between the words
All of the red text matches, but only the
'the' ends up in the group.
01:53 You’ve been pretty patient with me up until now. Everything’s been about regexes without really talking about Python, so now I’m going to show you how to use this inside of Python.
re—short for regular expression—library is a standard part of Python. Most of the methods inside of the
re module take a string pattern—which is the regex—and a string to search against, and return a result.
The result is usually a
Match object. This
Match object gives information about the match—whether or not a match happened and what portions of the string matched the result.
Match objects are truthy.
That means they can be compared as Booleans, so you can use a
re method that returns one of these objects and then compare the object in an
if statement to see whether or not a match happened. We’ll start out by importing the module.
question will be my test string.
search() method inside of the
re module returns a
Match object. In this line, I’ve searched for the literal expression
"spam" inside of the
question. As a
Match object was returned, that tells you that
'spam' was successfully found within the
span parameter inside of the
Match object tells you where the match happened. This is between letters
The numbers in the
span are equivalent to a slice in a list or a slice of a string, so this indicates that it starts at
7 and finishes at the
11 is the upper limit, not included.
This is the opposite of the curly brackets inside of the regular expressions themselves. It can be a little confusing as you switch back and forth between the two mechanisms, but the
span parameter of the
Match is closer to the Pythonic mechanism.
Here, you can see I’ve sliced question using the
11 from the
span, and I get back
'spam', the match from the string.
04:06 I’ll do that again, this time storing it in a variable.
Evaluating this variable as a Boolean returns
True, indicating that a match was found. I can run a function called
.span() on the
Match object that returns the lower and upper boundaries, Another function called
.start(), showing the lower boundary, and finally
.end() to give you the upper boundary.
.string attribute shows you what was being matched against. Somewhat confusingly, the
re module also has a function called
To be clear as I’m moving forward, if I’m talking about the function, I will be explicit and say the
match() function. Otherwise, I’m talking about a resulting
match() function matches the beginning of the string. This is the equivalent of using a caret anchor (
^) inside of your regex.
05:07 This did not return anything, and that’s because no match was found. Let me do that again, this time storing it in a variable.
05:18 The variable doesn’t contain anything.
Comparing it to
None shows that it’s
True. Or, converting it to a Boolean means it’s
False. This shows you how you could test the results of your regular expression functions inside of an
This regex was successful. It’s looking for the repetition of
5 word-like meta-characters. As the string starts with
'Lovel', which are all word characters, the match results showing the
5. Python 3.4 added a function called
As you might guess from its name, it’s looking for a regular expression that matches the entire string. Of course, looking for
"spam", that’s not going to match the whole string,
so once again, you’re getting back a
06:29 Let’s break this regular expression down. Looking at the inner group first, there’s the word meta-character with zero or more instances, there’s the whitespace character with zero or more instances.
06:41 So I’m looking for something that looks like an actual word. That is inside of a group. That group repeats itself zero or more times, and then is followed by an exclamation mark. All of that is grouped.
The outside group can be repeated zero or more times. This is successful because it matches the two sub-parts of this string.
"Lovely spam!" and
"Wonderful spam!" each match the outer group. And because the outer group is repeated, this regular expression matches the entire string, giving a truth value for
Another function the library has is
Unlike the other functions I’ve shown you so far,
findall() doesn’t return a
Match object—it returns a
list. It applies the regular expression and finds each match inside of the string, returning the matching characters in a list.
This regular expression is looking for a vowel, followed by not a vowel. Inside of
"Lovely spam! Wonderful spam!" you have
'spam', et cetera.
findall() returns a
list. Sometimes instead of wanting a list, you want an iterator. Enter
finditer(). It essentially does the same thing as
findall() but returns an iterator instead of a list. This is more efficient in memory if you’re doing a large number of matches.
08:17 Well, that was your first exposure to using regular expressions inside of Python. Next up, I’ll show you how to take advantage of grouped results.
Yep, I find I use
.findall() the most myself, but there are cases where you’re looking for the beginning to match. Think of it like using
.startswith in strings instead of
If you are only looking at the beginning,
.match() will definitely be faster, especially for longer chunks of data to be matched against.
@Walid A common use case for
re.match() is password or input validation. Although, the
re.fullmatch() is usually a better choice in such a case.
Thank you @Bartosz. I wish there was a like button ;-)
Christopher, your clarification of what multiline mode means may well have launched my career. Thanks! Whatever we are paying you… it ain’t enough.
Glad you’re finding the course helpful.
Become a Member to join the conversation.
Walid on Jan. 19, 2021
re.match()seems very restrictive use case! Not sure why would one use it?