Using the Star Wildcard
00:00
Let’s take a look at these wildcard characters, starting again with the *
wildcard. This character matches any number of characters in the file path pattern.
00:11
So if I head back over to IDLE, and I’m again inside of this notes directory—well, let’s take a moment and remember what it looks like. So inside of the notes/
directory, you have a couple of folders, plans/
and yearly/
, and then you have also have three files, a README.md
file and two files, one called goals1.txt
and one called goals2.txt
.
00:34
Now I want to filter out for only .txt
files that are directly in the notes/
directory. So I want to get as a result goals1.txt
and goals2.txt
, but none of the folders and also not the Markdown file.
00:49
And I can do this by using the wildcard *
pattern. I’m going to say notes_dir.glob()
and then pass in the pattern, where I will say *
for anything that ends with .txt
.
01:04
And I will wrap this whole thing into a call to list()
just so that we get the results from the iterator and we can look at them here in IDLE.
01:14
And here you go. So it returns goals2.txt
and goals1.txt
because the star matches any of these characters, the g
, the o
, the a
, the l
, the s
the 1
, and also the 2
.
01:26
And then both of these files end with .txt
, so they’re a match with this pattern. Now, you can use the *
character also more than once in a pattern. For that, let’s move to a different directory.
01:42
We’re going to look inside of yearly/
, and I want to match the files that start with the 2
and then end with a 3
. So I only want to match this one file, 2033
, but I’m going to build a pattern using the *
character.
02:00
So first I need to find a path for the yearly/
directory. So I’m going to say yearly_dir
from notes_dir /
into "yearly"
.
02:12
And then just to confirm … All right, this is the correct path. And then in here I can say yearly_dir.glob()
. And now I can build the pattern and I said it should start with a 2
, then could have any characters, and then it should end with a 3
.
02:33
And then again, it could have any character. So I don’t care which file extension it has. I’ve got to close the string as well. And then, again, wrap it inside of a call to list()
, just so we can see the output.
02:46
And the results that I get here is two files, of course, because I said anything after the 3
could happen, so the 4
as well as the 3
matches.
02:56 So let’s change the pattern a tiny bit so that I actually get the result I was aiming for before.
03:02
So I’m going to say 3.
and afterwards, it doesn’t matter to me so it can have any file extension, but I want the 3
to be directly followed by the dot.
03:15
And this now only gives me 2033
because now it’s matching this specific 3
character followed by a dot. And the file that matched before that has a 4
here before the .
doesn’t fit to this pattern anymore. So as you can see, you can use the *
more than once in a pattern, and it just matches any number of any characters.
03:38 What happens if nothing matches the search pattern that you’re using? Try that out as well. So if I use the same pattern, but let’s say I want it to start—oops, accidental Enter press. Let me try that again.
03:55
So if I want it to start with a 4
, currently I don’t have any files matched with this pattern. So if I run this pattern, then I just get an empty list because there’s no matching patterns.
04:09
Also, it’s important to remember that the pattern only applies to the actual filename, so it does not apply to the whole path. This is just a representation of the Path
object. So if I was to look for,
04:22
let’s say anything, and then it ends with ly
. So I’m trying to address this yearly
here, right? But it’s not the filename. So this is not going to match anything because the search only goes on the filenames, so on the last part of the path, which means the actual files and folders that are inside of the directory that I am calling the .glob()
method on, which in this case is the yearly_dir
.
04:52
Okay, so this is the *
wildcard. It matches any number of characters in the file path pattern. We tried this example at the beginning to call it on notes_dir
and use the *
pattern.
05:03
*.txt
matches all the .txt
files that are directly in the notes directory. And then also keep in mind that you can use the *
wildcard multiple times in a single pattern.
Martin Breuss RP Team on Jan. 4, 2023
Hi @akazimierz glad the course is useful :)
You’re correct, you can use .glob()
also to find directories. If you want to make sure that you’re only matching files, you can check for that by considering the dot (.
) that separates the file extension from the file name and building a pattern that includes it:
>>> list(notes_dir.glob('*ly.*'))
Directories generally don’t have a dot in their name, although you can name a folder also with a dot. It’s a great way to confuse yourself and Python :P
More importantly, however, there are quite a lot of files that don’t have a file extension. These wouldn’t match with the code shown above.
So if you can’t rely on your folders not having a dot in their name, or you can’t rely on all files having a file extension, then you might have to write some logic to filter only for files. For example you could do that using a list comprehension:
>>> [file for file in notes_dir.glob('*ly') if file.is_file()]
Hope that helps and was a somewhat interesting footnote to your comment :)
akazimierz on Jan. 5, 2023
Hello Martin.
Thanks for the details. I think I got the impression, that glob
applied to files only, and not directories (I just simply won’t admit that all this is new to me…)
Martin Breuss RP Team on Jan. 5, 2023
I think that makes sense, I believe I only show how to match files and maybe there’s even an example where I show that it doesn’t match a folder—but in that case it’s about the search location.
In any case, thanks for asking! Having our comments here in the course will help other learners who might have the same question :)
Become a Member to join the conversation.
akazimierz on Jan. 3, 2023
Hello Martin. Thanks for thorough explanation of quite a few ideas on
pathlib.Path
.I’ve got a remark: the
glob
also matches directories, as I tried inIDLE
, e.g.:(Or I got something wrong…)