Pathlib and Globbing Improvements
00:00
In the previous lesson, I showed you the new replace
function in the copy
module. This lesson talks about the improvements to the pathlib
module.
00:09
A few changes have been made to pathlib
. If you’re still using os
, I highly recommend changing over to pathlib
. It does almost all the same things, requires less code, and in my opinion, it’s easier to read.
00:21 The first change in 3.13 is the ability to construct a path from a file URI. Next, is a new function that does matching and comparisons supporting wildcards.
00:32
And that’s not the only change having to do with wildcards. Previously, if you used **
with glob
or rglob
, you only got directories and not files.
00:42
This was an odd choice since nowhere else where you use **
behaves this way. The documentation did explicitly tell you about it but who reads that stuff? Am I right?
00:53
Still on changes to glob
and rglob
, there’s a new argument called recurse_symlinks
, which defaults to False
. When True
, glob
calls will descend through symlink directories.
01:05
And similar to that, the follow_symlinks
argument has been added to .is_file()
, .is_dir()
, .owner()
and .group()
. follow_symlinks
defaults to True
.
01:15 To demonstrate some of the globbing features, I’m going to need some files. Here, I’ve got a music directory containing two genres, and each genre has a few text files with the titles of the songs.
01:26
Have you got it memorized? Good. Once more into the REPL, let’s play with some paths. First, I need a Path
class. Even before this release, I could take an existing path and call .as_uri()
.
01:48 This function gives me the URI version of the path. Newly added to this release is the ability to go in the other direction as well.
02:04
The from_uri()
method is a constructor method that takes a URI and returns a path, so now you can go in both directions. The next new feature is the full_match()
method, which does path comparisons, supporting wildcards.
02:26
Matching *.py
in the a/
directory means any file with a .py
extension, which of course includes b.py
.
02:39
Whereas matching *.py
in the b/
directory does not match any file in the a/
directory as the directories are different.
02:52
Likewise, with the generic *.py
as b.py
is in a directory and *.py
on its own doesn’t recurse into directories. To do that, you use the double star mechanism.
03:08
A double star means any file or directory underneath a/
. You can also use double star as a prefix.
03:20
This case matches c.py
in any subdirectory. You can combine the two types of wildcards.
03:32
This gives the case of any file ending in .py
in any directory. full_match()
compares paths using pattern matching, but like with a lot of path operations, the path object doesn’t actually have to point to something on your drive.
03:47
The glob()
function, on the other hand, actually does a file operation looking for those files on your system that match a pattern. It similarly uses wildcards to help you find those matches.
03:59
Remember that music directory I told you to memorize? Well, let’s play with it now. That’s a path for the music directory. Now, if I call glob()
on it using a wildcard, I get back a map object.
04:17 This is a generator that produces the results. To see them, you’ll have to do something with the generator. Let me write a quick function that prints out the contents of the response.
04:35
Nothing magical here. Just printing out the returned paths as strings. Let me try that glob()
again this time passing it to my new show()
function.
04:48
glob('*')
means return what is in the directory, which in this case is the opera/
and rap/
directories. Notice there’s no files here.
04:57
You can also glob()
inside a partial path.
05:05
Here I’m getting back everything inside the opera/
directory, like with the match paths before I can glob()
on a file extension.
05:18
This didn’t return anything as there’s no text files in the music directory. They’re inside the opera/
and rap/
directories instead. If I want to descend into subdirectories, I combine the star with the double star.
05:36 And that’s all the text files inside both directories. You can also get at all the files using just double star. This is actually new, as logical as it is, and although it’s the way all the other libraries work, this isn’t how older versions of Python worked.
05:58
Previously, double star only returned directories, so the same call in Python 3.12 would only return opera/
and rap/
. You would then have to use glob()
on the results to find the files, but that takes an extra step.
06:11 This new way means no extra step and more importantly, consistency with other languages, and in fact, even just consistency within the standard library. But wait, there’s more.
06:23 In the next lesson, I’ll cover a few of the odds and ends in Python 3.13 that I didn’t have time to cover in this course.
Become a Member to join the conversation.