Traversing Directory Trees
In this lesson, I’ll show you how to traverse entire directory trees and process the files that you find. That’s distinct from getting a directory listing, in that when you get a directory listing with something like
os.listdir(), you need to do some extra work to process all the subdirectories, as well.
But I’ll be showing you the
os.walk() function, which lets you walk an entire directory tree with very little work. As I mentioned, I’ll be using mostly the
os.walk() function, which takes in a directory path, which is the root of the traversal, and then a parameter called
topdown, which says whether to start processing at that directory path, or at the farthest child of that path.
00:50 Each tuple contains the current directory’s path—a string—a list of the subdirectory names of those, which are also strings, and then a list of all of the files in that directory on each iteration. So it goes through all of these, and in the iterator it split things up into nice, easy lists of the files, the directories, and then the current directory path.
01:13 Let’s take a look at how it works in the Python REPL, after looking at the sample directory. The sample directory is pretty simple: a couple of text files, and then the two folders, both of which have Python files with different names in them.
well, first let’s talk a little about the parameters. As I said, there’s the
top and the
topdown parameter. There’s also a couple of others that I would encourage you to look up in the documentation because I’m not going to talk much about them.
onerror, which is a function that says what to do on error, it defaults to
None. Then there’s a
followlinks parameter, which just says whether to follow symbolic links or not; symbolic links are kind of like links to other directories.
I won’t talk much about them. Definitely take a look on your own if you’re interested. So, I’m just going to call it on the current directory and then I’m going to leave
True. And, as you can see, it’s a generator object, which means that you have to iterate through it if you really want to get much out of it.
As you can see, first it processes the current directory with two subdirectories and two files. Then it processes
folder_2/, which is just the first folder in the
sub_dirs list, and has no subdirectories, but a few files.
03:31 One other thing about the ordering that I want to do is I want to do a quick little exploration and show you whether the behavior of this traversal is a depth-first or a breadth-first search. Simply put, does this walking procedure go down all of the children of a given child before it starts processing?
so I’ll give
folder_2/ a subfolder. Then I’m going to run the same thing here:
topdown=True. You’ll see that first, it processes the current folder, then it processes
04:33 If you’re not familiar with that, be on the lookout for tutorials on the subject from Real Python, or from any other source that you use, and take a look at those because it’s a really foundational computer science concept and the behavior will be very predictable once you understand how the DFS works.
os.walk(). I find it really convenient, not only because of its awesome recursive behavior, but also just because it splits up the traversal so nicely into your current directory and then splits the directories and files into two separate lists really quickly and easily.
05:06 So I like to use this even when I don’t need recursion. In the next lesson, I’m going to cover temporary files and temporary directories, which can also be really useful, especially in testing constructs.
Become a Member to join the conversation.