Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Matching Filename Patterns

00:00 You now have the ability to create and manipulate files and directories in a number of different ways, but a convenient thing to have access to would be some way to filter the different types and subsegments of files that you want to interact with, based on characteristics of those files.

00:16 Let’s say you want all files with a .txt extension, or something like that. Well, the way to do that is using something called filename pattern matching.

00:25 In this lesson, I’ll go over the main ways to do that in Python.

00:30 The different functions that I’ll be using for pattern matching are kind of listed in an ascending order of complexity, but also of convenience. So first off, there’s .startswith() and .endswith(), which just operate on strings.

00:44 They’re part of the Python Standard Library of strings. Those can be useful when you’re dealing strictly with filenames. So, you could say filename.endswith('.txt') or something like that; they take in substring parameters.

00:56 Then you have fnmatch.fnmatch(), which takes in a filename and then a pattern. That pattern is of the general form of the Unix or Bash shell, the way that you match filename patterns with that shell.

01:11 Then it’s simply returns whether the filename matches that pattern. So, it’s a little bit more complex, but a little bit more useful than .startswith() and .endswith() because it can deal with patterns in the middle of strings and more sophisticated patterns than just plain text. Then you have glob.glob().

01:28 glob stands for global, and it just takes in a search pattern and it returns a list of all the files in the current directory that match that pattern.

01:36 So it’s even a little bit more convenient than fnmatch because you don’t have to loop through the files. You just put in the pattern. But it’s also not quite as simple to use because you have to make sure your pattern takes into account all the different possible files that you could be dealing with. And then pathlib.Path().glob works similarly to glob, and it just operates on a Path object, as usual.

01:58 So, let’s take a look at how these work in the REPL. The sample directory that I’ll use for this has a bunch of similarly named files that’ll be useful for taking a look at with pattern matching.

02:11 I have all of my imports up here and of course, I’ve imported all of these just because I want to demonstrate them all. In reality, you probably only want one, whichever one you like best.

02:20 I’ll just list the things in the directory. As you can see, you got a lot of data files, text files, a couple of Python files, and then a subdirectory that also has some Python files in it.

02:31 The first thing that you might want to do is maybe you want to just isolate all the files that end in .txt, and that’s quite simple to do with just basic string functions.

02:41 You can say for fname in os.listdir():, and then relying on the fact that listdir() just returns a list of strings, you can say if fname.endswith(".txt"): print(fname).

02:58 As you see, that gives you all of the text files that you could want. And it’s really simple to use and pretty easy, especially if you are familiar with string methods.

03:07 And then, of course, you can do the same thing with .startswith(). You could say .startswith("data"), .startswith("t"), whatever you want to do.

03:13 I’ll leave you to try that out a little bit on your own. But, as you can see, this might fail. It starts to become a little bit more difficult to work with when you want to say something like, “Well, what if I want to match for something that’s in the middle of the string,” right?

03:25 “What if I want to find all files that contain the word 'backup',” or something like that, right? It’s not clear how you would get that with just .endswith() or .startswith().

03:34 You could use something like filename.substring(), you could just check if this is a substring, but that starts to become kind of complex, and it starts to become a bit of a maintenance hassle, where you start to say, “Well, oh, what am I looking for?

03:48 What is a substring of what? How can I look for this? How do I easily check if it’s a match?” A better way to do that is with fnmatch. So you can say fnmatch.fnmatch(), and you just pass in some name, so let’s say something like "data"let’s use one of these actual files here—so, "data_01_backup.txt".

04:08 Then you just pass in a pattern. And this gives you access to all kinds of awesome things like wildcard characters. A wildcard character simply stands for zero to any number of any different kind of character.

04:22 It matches anything before the "backup" and anything after "backup". So in this case, this returns True because "data_01_backup.txt" does contain "backup", right?

04:32 Then, if you want to filter it for everything in there, you could just say for fname in os.listdir(): if fnmatch.fnmatch(fname, "*backup*"),

04:47 and then, of course, you have to actually have a :, because that’s how print statements work—or that’s how, if statements work, I should say.

04:54 As you can see, this isolates all of the things with backup in it. So, that’s pretty darn useful, and you can also use fnmatch for something, like, you can start to use something called a character class.

05:05 You can check, for example, for anything that contains a two-digit number in it. And so again, this result is True because there is a two-digit number.

05:15 But if I delete one of these digits, then I get False, because there’s no longer a two-digit number in there and you need to match both of these character classes.

05:23 But then if I just delete one, then it will match anything with a single-digit number in it. So this is convenient and useful, but what it doesn’t let you do is easily search through your whole directory.

05:35 You have to do this looping logic. The way to do that is with glob, or glob, as a lot of people say, glob short for global.

05:42 and with glob(), you just pass in a pathname and you tell it whether you want it to be recursive or not. So, this pathname is really a pattern, like one of the ones that you did with fnmatch.fnmatch().

05:54 So with that, I could say something like, “Oh, well, let’s take a look at the backups.” I can find everything with "backup" in the filename in the current directory, right?

06:04 Then I can do all of the same things that I could do with .startswith() or .endswith(), as well, by just using a wildcard character.

06:11 I can find all .py (Python) files.

06:14 And now, what if you want to define these recursively? Because if I call os.listdir() on the "sub_dir" I’ll see that it has some Python files in it, as well.

06:23 I want to find all the Python files, you know, all of the subdirectories. Well, that’s also relatively easy to do with the recursive=True option.

06:33 There’s one other thing you have to do, as well, which is you have to specify that it can be anything in any directory. The way to do that is with a double wildcard and then a slash.

06:42 So this will match anything in any directory. And then, of course, you’ll get both of the Python files in the subdirectory as well. As you might imagine, the pathlib.Path() option works really similarly.

06:54 You just have to create the Path object from the current directory, and then you can say path.glob(). And with this one, it defaults to having recursive behavior.

07:06 That’s convenient because you don’t even need to pass in another parameter. But you do need to close all your strings, of course. As you can see, it returns a generator.

07:14 You can either convert that to a list and deal with a little memory overhead, or you can loop through it. I’m going to convert to a list, just because that’s a little more convenient to do in a tutorial.

07:24 As you can see, it does the same thing as glob.glob(), it just returns a generator object and it defaults to recursive behavior. So, those are several different ways to pattern match in Python, and I would encourage you to also look at the Bash shell ways of file pattern matching as well, because that will enumerate some even more useful little tricks that you can use with wildcard characters, with optional characters, with character classes.

07:48 You can do a whole bunch of amazing stuff with it. Just check that out on your own. In the next lesson, I’m going to cover the os.walk() function which lets you recursively walk over file trees and process the files as you like.

Become a Member to join the conversation.