Understanding Search Algorithms
00:06 Taking off our expert programmer hats for a moment, if I asked you to search something like a full backpack for an eraser, how would you do it? You might take something out, check to see if it’s the eraser, and then toss it back in if it’s not, and repeat. That’s the exact idea behind random search, which has an implementation like one you’ll find here on the right.
I’ve got a file here where I’m going to put in the various search algorithms that I’ll demonstrate, and as you can see, it has a basic function and a couple of calls to that function to load the names from
So you can see the different algorithms’ behavior on some really long lists of actual items. So, let’s type this thing out.
from random import randint because that’s the only function I’ll need.
Let’s see this thing work in the Python REPL. If you’re following along in a REPL, just remember to import as needed, so something like
from search_algorithms import random_find, and this will make the function available to you in your REPL.
So here I am in the REPL and I’ve defined a familiar list of elements here. Let’s test
random_find() with this
els and I’ll just search for, let’s say,
4. It seems to run pretty quickly and it does return the index of what you’re looking for, so that’s good. Now let’s try it with names—
Now, this isn’t quite so good, because I’ve hit Enter but it seems to be hanging for a while. And this illustrates one of the really basic problems with this
random_find(), which is that
random_find() doesn’t actually narrow the search space at all, so it could be testing some of these different indices hundreds or even thousands of times before it gets to the one that it needs to actually find what it’s looking for. So, there it was—and it did find it after a long time—but another problem here is that if I call this even on a short array with an element that’s not actually in it, it will just hang forever.
So as you can see,
random_find() is a really flawed algorithm here because it doesn’t actually allow you any opportunity to determine if something is or isn’t in the list of items unless it finds that element, because there’s no narrowing of the search space.
03:53 First, that it’s inconsistent—sometimes it might be fast, while other times it might be really slow. And the worst thing of all is that when something is missing, the only bound on the time it takes to run is whether you actually put a stop condition in there, which is what happens in this benchmark.
04:20 So, what I’m sure all of you intelligent listeners were thinking about as I talked about random search was that the problem with random search is that you don’t narrow the search space at all. Going back to the backpack analogy, once you’ve taken a ruler out of your backpack and eliminated it from contention because you know it’s not an eraser, you should never have to look at it again.
04:40 This is the basic idea of linear search. You just have to process the elements in your collection in a linear order so that you can’t repeat elements. That way, the worst thing that can happen is if the element isn’t in your list, you look at all of the elements in the list once, whereas with random search, you might look at all of the different elements many different times before you find what you’re looking for.
You could also do
-1 or any other value that works for you to actually indicate that the value was not found. Let’s see it work. If I run this on the familiar list of elements, it works just fine—pretty much instantaneous. And if I try it on something a little bit longer, let’s say looking through the
basic_names list, it takes a second—it seemed like about a second to me—but it got me this huge index of 10 million.
So it was much faster than the random search. Now, of course, if you try this with a name that’s earlier in the list, just due to how linear search works,
"Fred Astaire" will be found much more quickly than
"Baoyin Liu" because the index is so much smaller. Now, one more thing to note is that something like
"Baoyin Liu" in basic_names, using the
in operator, will run much faster than the Python version because these basic operators like
in are implemented in C under the hood, and so they run a lot faster than this pure Python implementation could ever work.
07:05 Something else you’ll notice is that the best average and worst time are much, much closer together than they were for random search, which is great for predictability and for being able to write your code confident in knowing how long certain operations will take. Something else you might notice, though, is that the element index seems to have an effect on how long the search actually takes, and if you plot these on a graph, you’ll see that there is exactly a linear correlation between the element index and the amount of search time.
07:44 This isn’t always great because even though this still runs pretty quickly for 10 million items, when you start to get on the data scale of a social media network or something like that, you can be talking about billions and even trillions of items, and so this linear algorithm will eventually grind to a halt when you add too much data. Now let’s look at binary search, which doesn’t fall victim to that same problem to quite the same degree.
08:10 Binary search lends itself to a bit of a different analogy here. Binary search is kind of like flipping through the pages of a book, because what it allows you to do is if the elements of your collection are sorted, it lets you eliminate about half of the possibilities with each comparison.
So if I were looking through this basic array here for the number
3, the way this would work is I’d inspect the middle element and see whether the middle element is less than, greater than, or equal to what I’m looking for.
And since it happens to be greater than, I can discard this element and anything that’s greater than it, because by the transitive property, anything greater than this element must also be greater than
Then I either recurse or simply modify my bounds so that I can consider the three remaining elements. Then I check the middle again—it’s less than
3, I can discard this and anything less than it.
And then my last array here is actually just the one element array
3, and so I’ve found what I’m looking for. Now, the awesome thing about this is that the number of comparisons is so much smaller than it would be with a linear search. So for example, think about the worst case behavior for a linear search of this array.
09:48 So, binary search has, frankly, amazing speed. No matter where in the actual input array that you’re looking, the average time is always on the order of around six or seven microseconds, which is just tiny! I mean, that’s just a unbelievably small fraction of a second.
10:12 So, binary search uses the fact that the array is sorted to be able to give you just incredible speed. In the next lesson, I’ll get into how you can start to actually use binary search in your code.
Become a Member to join the conversation.