Explore a Match Object
00:00
In the previous lesson, you identified a substring using re.search()
, and what you got from running the function was a Match
object. In this lesson, you’ll see how you can get information about the substring from that Match
object. You could assign that to a variable.
00:16
Often these are named m
, which stands for match—not super descriptive, but it’s somewhat of a convention, so I’m going to use it here.
00:24
I can separately address the information that is saved in this Match
object. I can say m.group()
,
00:34
and while using a method named .group
might sound a bit strange, bear with me for a moment. I’ll show you in the upcoming lesson why the term group makes sense.
00:44
And in this case, this right away gives you the word that got matched, and then you can say m.span()
, and that would give you the beginning and the end of the substring that got matched in here. So if you use the re
module and, for example, .search()
, you can find a substring according to somewhat more elaborate conditions than you did when, for example, you used .index()
to get just the index.
01:12 Then you got the beginning index. In this case, you get both the beginning and the end index, and you also get the actual string that got matched. So this gives you more to work with if you really need to do something with the substring that you’re searching. And you can also match other things.
01:27
So for example, if you didn’t want to match "secretly"
, but you wanted to match just the word "secret"
that are delimited by punctuation character afterwards.
01:37
Let’s say just "secret."
or "secret,"
in that case, but not the word "secret"
with a whitespace before and after it, and also not the word "secretly"
.
01:47 I can write a different regex pattern that in this case would look
01:54 like so. Let me copy this.
01:59
So by opening and closing a square bracket here, I can define some characters where just one of them needs to match. So I’m going to say a dot, and I’m escaping this with the backslash (\
) character because the dot has special meanings in regex, and then also a comma.
02:14
So with this, I’m saying give me anything that is the string "secret"
and then followed by either a dot or a comma. Now if I run this,
02:27
you see that we do get a match and the match is "secret."
, but what happened to the comma?
02:35
And the reason is that "secret."
appears first in the text and search is just going to return you the first match that it finds scanning the string that you’re searching in from the beginning. So in that case, after it finds "secret."
, it basically stops and just returns you that match.
02:54
And that’s not dependent on the order of the characters here in the square brackets. If you wrote the comma first, and then the dot, Python would still match the substring "secret."
just because it appears first in the text that you’re searching in. All right, so this seems useful, but at the same time, what if you do want to have both of those results back?
03:15 You know that there’s something else in there. In that case, you want to use a different function, and you’ll learn more about that one in the next lesson.
Become a Member to join the conversation.