Plain Matching and Class Matching
00:00 In the previous lesson, I gave you a quick overview of what regular expressions are and how to use them. In this lesson, I’ll show you your first regular expression, showing you how to do plain character matching. Regexes can range in complexity.
The simplest concept is just a plain string match. This regular expression looks for the five letters
"thing" inside of the text it’s being applied to. It’ll match anywhere where the word
"thing" is. To be strictly correct, I shouldn’t use the word word.
"thing" will match
"thing" on the end of
"something", as well as
"thing" on its own.
00:39 There’s nothing in this plain text match that talks about the spacing around it, so it’s specifically only looking for those five letters. On the other end of the complexity spectrum, there’s how to match an email address header inside of RFC822.
01:04 Don’t worry, no human being actually looks at this. It was created by comprising several smaller regular expressions that are easier to read together. Later on, I’ll show you where this came from. But for now, I’ll get you started with simpler matching. As I mentioned in the overview, regexes are a language to themselves. In order to understand how to use them in Python, you’ll need to understand both regexes and how Python uses regexes.
I’ll just be showing you the regexes. Later on, I’ll show you how to put those inside of your Python code and how to use the
re (regular expression) module. First up, I’ll show you some plain text matching.
01:53 And then after that, I’ll introduce you to character classes—a way of choosing from a range of characters. To introduce you to the language of regular expressions, I’m going to be using an online tool called pythex.
02:06 There are several different websites out there that help you build and debug regular expressions. This is just one of them. In the further reading section, I’ll show you a few more. The tool is comprised of three basic parts.
02:19 The first line is where the actual regular expression is. Then there’s a text area of the string that you are testing the regular expression against. And then down at the bottom, there’s an area showing you the matching. Throughout these lessons I’m going to be using the same chunk of text, so you’ll be very familiar with it by the time you’re done.
Starting with a straight text match, I’ve put in
Super with a capital
S. The test string is looked through and a result is shown. Because all this doesn’t fit on the screen at once, I’m going to shrink the test string. As the matched result shows the entire message and highlights where the matches are, I don’t need both on the screen at the same time, so I’ll just be leaving this minimized. When you look at the match results, you’ll see the word
'Super' is highlighted three times.
'failing' gets matched. If you don’t include information about the whitespace inside of the regex, you will get a straight text match, whether it’s a word on its own or part of a longer word.
And here, I’m matching any vowel. So—you’re looking for a pattern that starts with the letter
t and then one of the vowels. Let’s look at some of the matches: the
05:09 You need to be careful when you’re using a tool like pythex to know when you’re looking at a match and when you’re looking at multiple matches—it doesn’t make the distinction in the resulting screen.
You can also do ranges on capital letters. Here’s another expression. This matches a single capital letter followed by either a capital letter or a small letter. This finds
'RP'—two capital letters,
'St'—with a capital letter and a small letter.
This expression combines the ideas. This is looking for any digit followed by an even digit, or more specifically, not an odd digit—not the digits
9. Matches include
'1a'! Notice that it’s just not those digits, so letters are included. This is why tools like pythex are useful when you’re building your regular expressions.
So this expression says to match the number sign (
#), the colon (
:), or the caret (
^). Part of Wile E. Coyote’s cartoon utterance of a swear word is being matched. Notice, this is actually two matches—the
'#' and the
To match square brackets, you include those inside of the square brackets. This regex is three pieces. The first part is the character left square bracket, the second part is a capital letter, and the third part is the right square bracket. Combined, these match
'[X]' inside of the model number.
Alternatively, you can use the backslash to escape the special character. Backslash left bracket (
\[) and backslash right bracket (
\]) tell the regular expression to actually look for left and right bracket characters, matching the
'[X]' in the model number below.
10:09 You have to be very careful with this. Escape sequences are used in Python strings to escape other kinds of letters, so when you have a string containing a regular expression, you can end up with escape sequences of escape sequences.
There’s another way of expressing this concept. The pipe operator acts as OR, so this expression is equivalent to the previous one, using either the letter
A or the letter
C. With a short expression like this one, it doesn’t make much difference which mechanism you use. When you start to do more complicated mechanisms and grouping things together, the OR symbol allows you to do things that square bracket character sets does not. So, those were your first regular expressions.
Become a Member to join the conversation.