00:12 A tree-based data structure is used to encapsulate hierarchical relationships. Some examples are file systems, indexes in databases, decision trees and flow charts, and data partitioning, which is heavily used in data science and machine learning.
There is a Unix command called
tree that prints the file structure as a vertical tree. The example here on the screen is the result from me running it on one of my Python projects. That’s not strictly true.
I modified it a bit to make the example work, but that’s just inside baseball. There are three levels to this tree. The first has four Python files:
ui, as well as a directory named
01:27 One way of storing this in Python is as a list. The items in the list are either the files—represented by strings here—or a tuple. If it is a tuple, the first thing in the tuple is the name of the directory, and the second thing in the tuple is the level below the folder.
Let’s pop into the REPL and see this as code. Inside a file called
data.py, I’ve created a value named
purdy-tree. Purdy is the name of the program whose directory structure I’m using as the example.
Then at the same level as
"actions" are three more strings in the list. This is all the coding equivalent of the previous diagram. Okay, let’s import that tree into the REPL and take a look at it.
yeah, that doesn’t work. Because of how it is nested, I only get the size of the first list, the top one. The
"animation" tuple is counted as one thing, so there are only five items in the
04:07 I then iterate through the items in the node, and here is where I check the content. If it’s a tuple, then I print the directory name and recursively call the function, passing in the child of the tuple.
04:31 If the item wasn’t a tuple, then it’s our node. So I print it out. Note that this function is depth-first. That means every time it hits a branch in the tree, it descends, going down to the deepest node before moving to the rest of the tree.
And there you go. Let me just scroll back a little bit. The leading space here is because I used the comma inside of
print(). Comma in
print() always prints a space before printing the thing after the comma, so
actions was printed with an empty spacer, and because there was a comma between them, there’s a single space first.
06:07 This is typically done with a stack data object, different from the calling stack. And because of this storage, it tends to take up extra memory. The code often ends up being about twice as long as well, and that’s kind of the argument for recursion. I’ll tell you a little secret: this isn’t my natural way of thinking.
06:27 I’ve known coders who go to recursion first, but looping for me always makes more sense. Over time, though, I’ve learned the places where recursion is a cleaner algorithm, and although it means I have to think about it a little harder, I know when it is time to use this mechanism.
06:42 With Python’s built-in data structures and accompanying search and sorting mechanisms, the need for recursion is less frequent in this language. But when I have a tree structure, I know for sure recursion is almost always the answer.
Become a Member to join the conversation.