00:11 I’m going to cover four concepts using pythex: conditional matches, lookahead matches, lookbehind matches, and comments. Conditional matches change the matching behavior based on whether or not something is present or not.
Comments are, well, comments. We’ll start with a familiar plain text match inside of a group, looking for the word
ACME. I can add a quantifier to say this can occur zero or more times. For the purposes of this expression, nothing’s changed.
The part that’s been added is inside of this group. The question mark with brackets says that this is a conditional group. The
1 is a backreference, so this conditional is conditional on the presence of backreference
You can see the two different places where this is happening. The first match has
'Super', and because
'ACME' is present, it’s including the
'Out' part of
'Outfit'—but only the
'Out' part of
You can also see how this works out inside of the groups. For the first match,
'ACME' is found and is set for the first group. The conditional is run, so the first part of the conditional is operated, which is the second group, which in this case is
'Out'. And the third group, which is the
'fit', does not get evaluated because the conditional passed. Match 2 has the opposite. The first group is empty, the second group is empty because it’s part of the
True portion of the conditional, and the third group is run because it’s part of the
False portion of the conditional. Because group
ACME wasn’t found, the second part of the conditional is run and
'fit' is matched.
The reason this is called a lookahead is because you’re looking for
writing, the match looks ahead to see if it’s a
t, if it does find it, then it matches, but it doesn’t use the
t as part of the evaluation.
This matches the model numbers below. Because it’s a lookahead, only the
'3990' actually participates in the match. But you’ll notice, because the lookahead is there, none of the other four-digit numbers is matched. The lookahead looks for the
[<letter>] form, limiting this to just the digits inside of the model number.
Changing the equal sign (
=) to an exclamation mark (
!) changes it to a negative lookahead, meaning “Only match situations where the digits are not followed by
[<letter>].” This matches all the four-digit numbers that aren’t associated with the model number.
?<= is lookbehind. This is a similar concept to lookahead, but it happens before the pattern that you’re looking for. So in this situation, I’m looking to match the literal
], preceded by
06:10 Regular expressions are built on something called finite-state machines. Finite-state machines only allow certain kinds of computing patterns, and this is not one of them. In the reference material in a later lesson, I’ll show you where you can dig out more information on how this works.
You can put comments inside of your regular expression.
?# is the comment symbol. Anything inside of the group is ignored. This is part of the regular expression standard. In Python, where you have the
VERBOSE flag, I would much rather use that.
Become a Member to join the conversation.