Covering Tricky Details
00:00 As usual in these kinds of demonstrations, there’s some tricky details that I haven’t fully gotten into yet. Let’s run through some of these now and get into a mode of thinking where you’re always looking for these kinds of little edge cases when you implement algorithms like binary search.
A famous binary search bug is the integer overflow problem. When you calculate a middle index, you add
right together, and then divide the result by
2 using integer division. However, in many languages where an integer is represented by a finite sequence of bits, if
right are sufficiently large, this calculation may overflow and give you, in some cases, a completely wrong number because these two numbers, their sum is bigger than it’s possible to represent. Now, this isn’t actually a problem—technically—in Python, because Python integers can be as large as you want up to the boundaries of your system memory.
00:59 But I’ll tell you why it’s still an issue: because many libraries in Python are written in C under the hood, and C is a language that does have the possibility of integer overflow.
01:10 So it’s always best to be safe in case you’re trying to pipe this result into something that’s being used by a C library. You can fix this problem by calculating the offset in advance.
(right - left) // 2, and then adding that to the lower bound. This cannot overflow, no matter how big
left are, as long as
left can be represented in an integer system.
01:36 And in general, it’s always best to be thinking about these kinds of issues so that your Python knowledge can be transferable to other languages as well.
01:43 So, that’s the problem of integer overflow, but don’t worry because as long as you use this little fix, it won’t be a problem.
01:50 Another possible problem you might run into is a stack overflow exception. You might recognize this from the name of the popular internet forum, but stack overflow is what happens when you make too many recursive calls and you overflow the stack. You don’t need to get into the technical details here, but generally, every recursive call that you make puts more local variables and such onto the stack of your computer’s memory. But the stack has a limited size, so there’s a recursion limit—the number of recursive calls a function can make is limited by the system. In Python, I believe, this is 3,000, but it might vary in other languages.
Now, of course, a good binary search implementation should never cause a stack overflow, but it’s really easy to write bugs that do, so you should know how to recognize this, just in case. For example, take this
binary_search() truncated implementation here. If you were to make a
binary_search() recursive call without modifying the elements as necessary—so if you didn’t actually take a slice of elements, as I showed you how to do in the recursive implementation earlier—then this would run infinitely, because you would never actually diminish your search space.
03:01 So that’s a case where you could have a stack overflow.
03:04 A problem that I’ve already discussed a little bit is that binary search in its most basic form doesn’t give you a consistent left- or right-most index of a given item. It just gives you one of the indices of a given item.
03:17 And of course, you know several ways to deal with that already, so I won’t belabor the point, but just make sure that in a tense situation—like, say, a technical interview or just a higher-pressure coding environment—that you’re not relying on behavior that doesn’t actually exist with the basic binary search algorithm.
A problem that’s not unique to binary search but can show up in binary search is the problem of floating-point rounding. Now, due to weaknesses in how computers represent floating-point numbers, they can sometimes generate really strange comparisons in Python. Take a look at this example on the right. If you have a list comprehension of floats where it’s
[.1 * i for i in range(10)], so it should be
.4, all the way up to
What you’ll get, if you try to actually do some of these comparisons and say
.1 in floats,
.2 in floats—those will work, but
.3 in floats will be
False simply because of quirks in the IEEE floating-point representation. When I enter in this list comprehension
[.1 * i for i in range(10)], you can take a look at it.
What it actually has for some of these numbers is numbers that are really, really close to what you might expect, like
0.7—but they’re actually just barely off. This is, of course, not necessarily a bug in Python—it’s just a weakness in that floating-point numbers can have infinite precision, right? I could say
0.00000… you know, a hundred
0s, a hundred thousand
0s, and then I could say
1, and that would be a perfectly legitimate floating-point number.
05:00 But on a computer you would need infinite precision to represent that, which means you would need infinite bits, which means you would need infinite memory, which is just not possible.
05:09 So there have to be some small weaknesses in how floating-point numbers work, no matter how you choose to do them. So for this reason, you can often decide on a precision that you need and then use integers in your code. If, for example, you only need a certain level of precision, you might define your own data class where it’s, you know, up to four-digit numbers or something like that.
But if you really need to do comparisons with floating-point numbers, you can also use the
math.isclose() function to generate more accurate comparisons.
05:40 This, of course, will just compare to see whether a number is within a small margin of the actual number that you’re looking for. So that’s a way to deal with the fact that when you do a binary search, you have to do comparisons, and if those comparisons are between floats, they can generate weird results. In the next lesson, I’ll talk about the time and space complexity of binary search, and I’ll also talk about how you can search even more quickly than binary search.
Become a Member to join the conversation.