Match a Substring Using re.search()
00:00 Let’s think about one of the maybe issues that you encountered early on. While you did lowercase all of the text to make it a bit more general to search, you might have noticed that still, given to the fact that every character is just a character for Python, you might get results that you’re not expecting.
00:20 So earlier, you were looking for how often is secret in there, and the count gave you four times, even though one of those is actually a different word—that’s secretly. Still, the word secret is in there, so the count makes complete sense according to Python.
00:34 But now maybe you’re looking for a way to identify this one specific word that includes secret but then continues, like maybe there’s more of those in there, right?
00:44
So one way of doing this is using the built-in re
module that you’ll need to import. So I’m going to say import re
, and then I can use re.search()
00:58 and pass it a regular expression pattern.
01:04
And here I’m adding the r
before the string to create a raw string. This is normal Python and doesn’t have anything to do with regex. In a raw string, Python treats the string literally without interpreting any special characters.
01:17
This means that an escape sequence, which in Python starts with a backslash character (\
), isn’t interpreted. Think of \n
, which stands for the newline character.
01:27
If the combination of these two characters is in a normal Python string, then it stands for a newline character. If it’s in a raw string, then Python treats it as two separate characters: \
and n
. Such raw strings are useful for writing regex patterns because the backslash character has a special meaning also in regular expressions. And if you use a raw string, then you can avoid having to double-escape the backslash character. For now, the quick takeaway is that prefixing your regex pattern with r
can make writing it a bit less complicated, and you can use every bit of reduced complication that you can get with regex patterns.
02:06
In this case, I’m going to say "secret"
, so similar to the substring that you searched before. Now I’m adding a regex word character that is \w
. It’s a placeholder for any word character.
02:20
And in regex, a word character means any letter, digit, or underscore, which means that it’s going match this l
here that comes after, but it won’t match a whitespace that would come after some or a dot, for that matter. So in this case, it’s going to match the l
, but you want to get the full word, so you need more than one word character, and you can do that with by adding the quantifier plus (+
) at the end.
02:47
So this regular expression pattern is going to find the word "secretly"
here, but it’s more flexible than that. The same pattern would also match other words such as "secrets"
or "secretary"
, or even "secret_9"
.
03:04 That’s the kind of flexibility you get when using regular expressions.
03:08
And of course, you also need to say where it can search, and that’s going to be text_lower
. If you press Enter here, you can see that the re
module returns you a Match
object,
03:23
and you have quite a bit of information in there. So you can see that it matched the word 'secretly'
, and it also tells you where does the substring start and where does it end.
03:34 So I get quite a bit of information in here, basically out of the box.
03:40
Now you can work with that, right? In the next lesson, you’ll pick apart this Match
object and learn about a few methods that you can use to extract different pieces of information from it.
Become a Member to join the conversation.