Find All Pattern Matches
00:01
Previously, you’ve used re.search()
to look for appearances of the word "secret"
that are then delimited by a dot or a comma. And using .search()
only gave you the first match.
00:13
So what’s with the second one? You want to use a different function. You can use re.findall()
00:23 so that you also see how powerful actually these regular expression patterns are. If you run this instead, you’ll see that Python returns you a list of two strings.
00:34
One is the first one—that’s "secret."
from up here—and then also "secret,"
from this line in the text that you searched in. So that’s pretty cool.
00:46 But maybe you do want to find these two specific instances of the substring, but you don’t want to keep the punctuation characters with them. You want to clean it up a bit more.
00:58
And you can do that already in the search by using something that’s called a match group. So if you wrap something in brackets (()
), then you create a match group. So this is the first match group here.
01:12
And while the whole pattern looks only for the word "secret"
, followed by either a dot or a comma, what you get returned from .findall()
here is only going to be the word "secret"
, what you’re wrapping inside of this match group.
01:26 So let me show you this.
01:28
If I run the .findall()
function using this match group, then you can see that from the two strings with the dot and the comma, you are only getting the word back.
01:38 So that’s pretty neat, and it gives you a lot of power to look around. Like just for the fun of it, let me show you that you could also do something differently depending on how you create these match groups. So I could also say, give me back just the last character and then the punctuation character.
02:00 So it depends on where you wrap these parentheses to what you actually get returned in this list.
02:07 And you could see that in this case, you always got back a list of strings. What the actual return value is, it could also be a list of tuples, and that’s when you’re using more than one capturing group in here.
02:18 So let me say for example, you want to get the punctuation separately. You can create two capturing groups, the first one being the word and the second one, the punctuation.
02:31 If I run this, you can see that the return value is still a list, but instead of then containing a string, you get a tuple that contains the strings of all of the capturing groups per find, basically.
02:46
So that’s just something to keep in mind. If you only have one capturing group, then .findall()
returns a list of strings. If you have more than one capturing group in your pattern, then it’s going to return a list of tuples. Okay, pretty cool.
03:01
You’ve found multiple occurrences and according to your conditions, but you kind of lost something as opposed to using .search()
before. You don’t have a Match
object anymore.
03:13
You don’t have the additional information such as the start and the end index. So in the next lesson you’ll see how you can use, again, a different function to actually get back multiple Match
objects that you can work with more.
Become a Member to join the conversation.