Matching Filename Patterns
00:00 You now have the ability to create and manipulate files and directories in a number of different ways, but a convenient thing to have access to would be some way to filter the different types and subsegments of files that you want to interact with, based on characteristics of those files.
The different functions that I’ll be using for pattern matching are kind of listed in an ascending order of complexity, but also of convenience. So first off, there’s
.endswith(), which just operate on strings.
They’re part of the Python Standard Library of strings. Those can be useful when you’re dealing strictly with filenames. So, you could say
filename.endswith('.txt') or something like that; they take in substring parameters.
Then you have
fnmatch.fnmatch(), which takes in a filename and then a pattern. That pattern is of the general form of the Unix or Bash shell, the way that you match filename patterns with that shell.
Then it’s simply returns whether the filename matches that pattern. So, it’s a little bit more complex, but a little bit more useful than
.endswith() because it can deal with patterns in the middle of strings and more sophisticated patterns than just plain text. Then you have
So it’s even a little bit more convenient than
fnmatch because you don’t have to loop through the files. You just put in the pattern. But it’s also not quite as simple to use because you have to make sure your pattern takes into account all the different possible files that you could be dealing with. And then
pathlib.Path().glob works similarly to
glob, and it just operates on a
Path object, as usual.
01:58 So, let’s take a look at how these work in the REPL. The sample directory that I’ll use for this has a bunch of similarly named files that’ll be useful for taking a look at with pattern matching.
03:13 I’ll leave you to try that out a little bit on your own. But, as you can see, this might fail. It starts to become a little bit more difficult to work with when you want to say something like, “Well, what if I want to match for something that’s in the middle of the string,” right?
You could use something like
filename.substring(), you could just check if this is a substring, but that starts to become kind of complex, and it starts to become a bit of a maintenance hassle, where you start to say, “Well, oh, what am I looking for?
What is a substring of what? How can I look for this? How do I easily check if it’s a match?” A better way to do that is with
fnmatch. So you can say
fnmatch.fnmatch(), and you just pass in some name, so let’s say something like
"data"—let’s use one of these actual files here—so,
04:08 Then you just pass in a pattern. And this gives you access to all kinds of awesome things like wildcard characters. A wildcard character simply stands for zero to any number of any different kind of character.
As you can see, this isolates all of the things with
backup in it. So, that’s pretty darn useful, and you can also use
fnmatch for something, like, you can start to use something called a character class.
05:23 But then if I just delete one, then it will match anything with a single-digit number in it. So this is convenient and useful, but what it doesn’t let you do is easily search through your whole directory.
glob(), you just pass in a
pathname and you tell it whether you want it to be recursive or not. So, this
pathname is really a pattern, like one of the ones that you did with
So this will match anything in any directory. And then, of course, you’ll get both of the Python files in the subdirectory as well. As you might imagine, the
pathlib.Path() option works really similarly.
07:14 You can either convert that to a list and deal with a little memory overhead, or you can loop through it. I’m going to convert to a list, just because that’s a little more convenient to do in a tutorial.
As you can see, it does the same thing as
glob.glob(), it just returns a generator object and it defaults to recursive behavior. So, those are several different ways to pattern match in Python, and I would encourage you to also look at the Bash shell ways of file pattern matching as well, because that will enumerate some even more useful little tricks that you can use with wildcard characters, with optional characters, with character classes.
You can do a whole bunch of amazing stuff with it. Just check that out on your own. In the next lesson, I’m going to cover the
os.walk() function which lets you recursively walk over file trees and process the files as you like.
Become a Member to join the conversation.