Implementing Binary Search

Creating a Binary Search in Python Liam Pulsifer 10:59

Transcript
Discussion

00:00 In this lesson, we’re going to get into how to actually implement binary search on your own in Python from scratch.

00:08 Let’s take it a look at this binary search function. I’m going to start out with an iterative approach, so I’m just going to call it binary_iterative(), and it’s going to take in a list of elements and then a search_item.

00:22 Ideally, it will return the index of the search_item or None if not found.

00:29 Okay. So with that boilerplate out of the way, first, you have to define the search space, which is what the binary search will actually decrease—hopefully, by around half—on each run-through of the algorithm.

00:42 Normally, the way that you can do that is you can say left and right, and those are equal to the boundaries of the actual list that you’re searching, right? So the zeroth index and then the far-rightmost index.

00:55 Now, you have to have a termination condition. I like to say, while left <= right. You can also do while left < right and do a little bit more work later, but I like this because it essentially just says, “As long as there’s anything that you’re searching for at all, keep running.” And if there’s nothing left to search for—if your search base is 0—then you know to just kick out. So, while left <= right: the first thing that you have to do is that you have to get a middle index, and a way that works pretty well to do that is to just say (left + right), with integer division of 2.

01:30 So, divide it in half and truncate to the nearest integer. Then you can get the actual middle element, which is just equal to, of course, elements at the middle_dex. Now, there are three cases.

01:43 The first is if your middle element is equal to your search_item. And this case is simple—you just return the middle_dex, so you return the currently searched index. Then you have two more cases.

01:56 If middle element is less than your search_item—remember that these have to be comparable and in sorted order—so if that’s the case, then you can discard the half of the elements that are less than or equal to the middle element. And a way to do that that’s really convenient is to just say left is equal to the middle_dex + 1—you just move the left bound over. Then you can say elif middle_el—and really, you could just say else here, because you’ve already checked for this one case and there’s nothing else.

02:28 But I’m going to say just for clarity’s sake if middle_el > search_item: then you can say right = middle_dex - 1. And then you just continue on into your loop. And if you exit out of the loop, you know that you’ve never returned a middle index, so you just know that you need to return None. So, this works just fine.

02:48 Let’s check it out in the Python REPL. Okay, so I’ve defined a little list of elements here called els, and then I’m going to run binary_iterative() on els with the value 5.

03:01 As you can see, it returns 2—0, 1, 2—so that’s pretty nice. And then I can run it on 9 and it gets 4, which works just great. Run it on 12 and it returns None, which isn’t printed out but it was in fact what was returned. Run it on 3, it gives me 0. It seems to be working just fine.

03:19 One important thing to note, though, is that when I called this on 5, I got 0, 1, 2—so I got the left-hand 5, even though there are two 5s in this list.

03:28 But if I modify els just a little bit and I say els +=, let’s just say [9, 10], and then I run it again here on 5, I’ll actually get 3—0, 1, 2, 3—simply because els is now equal to this seven-element list that has the rightmost 5 in the middle, so that’s what gets picked first. So importantly, this iterative method that I’ve defined so far doesn’t make any guarantees about which index of an item that you get, which isn’t always desirable because, in general, you’d like your algorithms to be a little bit more deterministic.

04:02 So let’s take a look at how to write a version of this algorithm that gives you the leftmost or the rightmost of the items that you’re searching for.

04:12 So there are a couple of different ways that you can do this, and I’m going to do the simplest and the easiest of these methods. I’m going to define binary_leftmost(), which is going to take in—just as you might expect—elements and a search_item.

04:24 And one way you can handle this slight issue where you’re not sure if you’re going to get the leftmost, the rightmost, or somewhere in the middle, is you can first call your iterative algorithms—so you can get, let’s call it tentative for the tentative index, is just the binary_iterative() of those elements and that search_item.

04:43 And then you say, first, if tentative is None: then you just return None because you know that’s not going to change. But otherwise, you can say while tentative, or I should say, while elements[tentative] == search_item:

05:05 then you just subtract from tentative. So you say tentative -= 1. And then finally, at the end, you just return tentative + 1, because you know that this while loop will cause you to go one too far, so you want to actually return the last one that you did. So tentative + 1.

05:22 And this essentially just says, “Hey, I know that I’ve found an instance of the search item and now I just need to look left until I find the end of those search items.” This implementation actually has a slight bug in it.

05:35 Can you see what it is? As a hint, consider what might happen if your whole list

05:41 was composed of copies of what you’re looking for. At the moment, there’s no guard to make sure that if you hit the leftmost index, the zeroth index, you stop because that’s about as far left as it’s possible to go. With this implementation you might run into negative indices and keep looping indefinitely. Of course, one super-simple way to fix this is to just add in a quick and statement that says and tentative >= 0.

06:13 That will just say when tentative first hits less than 0—so, -1—then it will stop looping, and so then it will return tentative + 1, which will be equal to 0.

06:24 So this essentially just controls for that case.

06:28 So, I have my little test list els here, and I’m going to call binary_leftmost()—and remember to import this function if

06:37 you’re following along in the REPL. But I’m going to call binary_leftmost() on the repeated element here, 5.

06:43 And as you’ll notice, it returns 2, which is the correct index for the leftmost 5. But remember, of course, that the original binary search would also have returned the same index in this case, so I need to give it one more test here, which is to add [9, 10], which was the case that this failed on earlier.

07:02 And now I’ll call it one more time, and I still get 2, even though els now has more elements and so 5 is now at the middle.

07:11 So, the original binary search would now return this middle index instead of this leftmost index, but binary_leftmost() works just fine. So it seems to be working. On your own, of course, you should do a little bit more rigorous testing than I’ve shown in this demonstration, but that seems to fix the general problem. Next, I want to show you the recursive implementation of binary search, which is a little different than the iterative version, but follows pretty much the same logic in a different paradigm.

07:39 Recursive implementation of the binary search functions in much the same way and with exactly the same logic as the iterative approach, but what it lets you do is it lets you discard these left and right bounds, and instead use slices of your actual input array and then recursive calls to accomplish the same objective.

07:58 Let me type this out and then I’ll run you through it. So there it is, the recursive implementation of binary search. Now, the biggest thing you’ll probably notice is this dex_mod parameter with a default value that I’ve now passed into binary_recursive().

08:15 This dex_mod parameter can be a little confusing, so I’ve made a slide to hopefully illustrate it a little bit more clearly. The root of the problem is that the recursive binary search algorithm can return the wrong index if you’re not careful.

08:28 And that’s because of what I’m going to call the sub-list indexing problem. So, take a look at how it works. Imagine that I’m searching this list for the element 5, which is at index 4 because of zero indexing.

08:40 Now I choose 4, the middle element of the list, and I notice that it’s less than 5, the element that I’m searching for, so I discard all of these first four elements.

08:50 But then I recursively call the function on the last three elements of the list as a sub-list. But of course, if you look at this new list, 5 is at the index 0 in this sub-list, so its sub-list index is 0, but its real index is 4 in the main list.

09:08 And without some kind of parameter to keep track of the things that you’ve discarded, there is no way for a new function call to know that you’ve already discarded four elements and so that that needs to be added to the index of 5.

09:21 So when I make the second and final recursive call after discarding 6 and 7, 5’s index remains 0, which is totally wrong because the index should be 4.

09:30 So, whenever you recurse on a right-hand half of the list, you need to actually add in the number of elements you’ve just discarded. It doesn’t matter when you’re discarding elements that are on the right-hand side, because then this index is still 0 no matter if you discard two elements or if you don’t, right?

09:49 So discarding elements from the right side doesn’t matter but whenever you discard on the left, you need to start adding that into an accumulator, what I call the dex_mod parameter.

10:00 I would encourage you to check out this code that I’ll include below this lesson if you want to get a closer look at this and see why some of the logic works exactly as it does. But in terms of the actual functionality of the algorithm, nothing here has really changed.

10:14 When I call binary_recursive() on my els list, it will give me exactly the same results that the original binary_iterative() implementation would do. And of course, if I want to find the leftmost, I’ll have to either work this recursive algorithm into the binary leftmost search, somehow. And that won’t be hard to do, so I encourage you to do that as an exercise on your own.

10:36 But this is the recursive approach—different philosophy, but same results.

10:42 So now you know several ways to implement a binary search and how to deal with some of the deficiencies of a super basic implementation. In the next lesson, I’ll get into some even trickier parts of this algorithm that often trip people up.

Become a Member to join the conversation.