Tree Traversal
00:00 In the previous lesson, I showed you how to write a factorial function using recursion. In this lesson, I’ll show you how to use recursion to traverse tree data structures.
00:12 A tree-based data structure is used to encapsulate hierarchical relationships. Some examples are file systems, indexes in databases, decision trees and flow charts, and data partitioning, which is heavily used in data science and machine learning.
00:30
There is a Unix command called tree
that prints the file structure as a vertical tree. The example here on the screen is the result from me running it on one of my Python projects. That’s not strictly true.
00:42
I modified it a bit to make the example work, but that’s just inside baseball. There are three levels to this tree. The first has four Python files: actions
, lexers
, parser
, and ui
, as well as a directory named animation/
.
00:57
Inside of animation/
, there are three files and a directory named color/
. There’s a hint that I changed the example. color
is spelled the American way.
01:06
Inside the color/
directory are five more Python files. You don’t have to worry about what any of this does. I’ll only be using it as a data structure example.
01:18
This is the same data visualized in a different way. The animation
and color
boxes have children, giving a total of three levels.
01:27 One way of storing this in Python is as a list. The items in the list are either the files—represented by strings here—or a tuple. If it is a tuple, the first thing in the tuple is the name of the directory, and the second thing in the tuple is the level below the folder.
01:45
Let’s pop into the REPL and see this as code. Inside a file called data.py
, I’ve created a value named purdy-tree
. Purdy is the name of the program whose directory structure I’m using as the example.
02:00
purdy-tree
is a list, with the first item in the list being the "actions"
string, representing action.py
. The second item is a tuple.
02:11
The tuple contains the name "animation"
and a nested list. The nested list contains three strings and the "color"
tuple.
02:22
Then at the same level as "actions"
are three more strings in the list. This is all the coding equivalent of the previous diagram. Okay, let’s import that tree into the REPL and take a look at it.
02:41 If I want to know how many items there are in the tree …
02:46
yeah, that doesn’t work. Because of how it is nested, I only get the size of the first list, the top one. The "animation"
tuple is counted as one thing, so there are only five items in the purdy-tree
list.
03:00 Let’s take a closer look at that.
03:11
Iterating through it, you can see the five items, the second one being the "animation"
tuple, which itself has the "color"
tuple nested inside.
03:20 So, how do you actually visit the nodes to count them or print them? If you guessed the answer was recursion, gold star for you.
03:32
This is print_tree()
in walker.py
. It recursively traverses a data tree printing out the contents at each node. The function takes two arguments, a node to parse and a depth.
03:44
The depth defaults to 0
. I want to indent each node when it’s printed based on the depth in the tree, so I’m multiplying the depth times a few spaces.
03:55
If you haven’t seen this before, when you multiply a string, you get another string repeated that many times. On the first run, this will be empty because 0
times a string is an empty string.
04:07 I then iterate through the items in the node, and here is where I check the content. If it’s a tuple, then I print the directory name and recursively call the function, passing in the child of the tuple.
04:20
I’m also incrementing the depth by 1
here. Any printing that happens in the child call will be indebted by one more multiple of the spacer.
04:31 If the item wasn’t a tuple, then it’s our node. So I print it out. Note that this function is depth-first. That means every time it hits a branch in the tree, it descends, going down to the deepest node before moving to the rest of the tree.
04:46 With a little tweaking, you could also make the function breadth-first, meaning it would print the entire level before descending. Let’s try it out as it is in the REPL.
05:00
Okay, I’ve imported it. Let’s call it on purdy-tree
.
05:08
And there you go. Let me just scroll back a little bit. The leading space here is because I used the comma inside of print()
. Comma in print()
always prints a space before printing the thing after the comma, so actions
was printed with an empty spacer, and because there was a comma between them, there’s a single space first.
05:29
After actions
is animation
, and then the indent increases When the recursion is called. When it hits the color
directory, it indents again because you’re down to the third level of recursion.
05:44
After that, it ascends and ascends again and finishes off the first level with lexers
, parser
, and ui
.
05:56 Everything you can do with recursion, you can do with loops. But that doesn’t mean it’s as easy to write or read. To do it with loops, you need extra data to store the nested items.
06:07 This is typically done with a stack data object, different from the calling stack. And because of this storage, it tends to take up extra memory. The code often ends up being about twice as long as well, and that’s kind of the argument for recursion. I’ll tell you a little secret: this isn’t my natural way of thinking.
06:27 I’ve known coders who go to recursion first, but looping for me always makes more sense. Over time, though, I’ve learned the places where recursion is a cleaner algorithm, and although it means I have to think about it a little harder, I know when it is time to use this mechanism.
06:42 With Python’s built-in data structures and accompanying search and sorting mechanisms, the need for recursion is less frequent in this language. But when I have a tree structure, I know for sure recursion is almost always the answer.
06:59 Next up, one of those stalwarts of computer science education: the Quicksort.
Become a Member to join the conversation.