Merge Sort
Here are resources for indexing and slicing in Python:
00:00 The next algorithm I want to consider is merge sort. Now merge sort is a really clever algorithm that relies on an interesting property of sorted lists.
00:11 You can merge two sorted lists into one bigger sorted list just by including the elements from both, and you can actually do that in order of n time—that is, in as many operations as there are elements in the sum of those two sorted lists. That may not seem that important but it actually is the key to this algorithm, which is a recursive algorithm.
00:34 So let’s take a look at how it works.
00:36 The basic idea of merge sort is that you break the list into two halves. You sort those halves recursively also using merge sort, and then you merge the halves together again.
00:47 It relies on the fact that merging two sorted lists is actually an order of n operation. So if you take a look here, what you can do is divide this list of five elements into two smaller lists, which are both about half the size of the big list.
01:01 And then you do the same on all of these sublists that you generate there. And then you actually merge those two lists back together in a sorted manner.
01:10
So [8]
and [2]
, which are two one-element lists, become the two-element list [2, 8]
. And then that gets merged with the result of this recursive computation on [6, 4, 5]
, which is [4, 5, 6]
, and those two get merged into one big list. Now if you’re not super familiar with algorithmic theory, this may not be obvious to you that this is faster than something like insertion sort or bubble sort. But there’s a basic rule of algorithmic study which is that if you can perform recursive operations on half-sized lists, and then if you can merge them together quickly, you often get a big performance boost as opposed to having to do some kind of iterative method.
01:52 So, let’s take a look at how this algorithm actually gets implemented in practice, and it may become a little bit more clear to you.
02:01
The great thing about merge sort is that if you have a function called merge_sorted_lists()
that takes in two already sorted lists—let’s call them left
and right
—and merges them into one larger sorted list that contains all of the elements of both of the sublists, then the actual logic of merge_sort()
itself is pretty much exactly as I described it in the slide a second ago.
02:25
You just say if the length of the items
is less than or equal to 1
, so a little guard clause, you just return items
, because obviously you can’t sort a list with one item in it. It’s already sorted, you can’t get any better than that.
02:40 So the next thing to do is, as I described earlier, you have to divide the list into two sublists roughly of equal size, and the best way to do that is to divide them right at the middle.
02:52
So I can choose the middle index, which is the len(items) // 2
, and I use integer division to make sure that the midpoint
is actually a valid index, and it’s not a huge deal because it doesn’t really matter if one sublist has one more element than the other. That’s not a problem for this algorithm.
03:09
So once you have that, you can divide the list pretty straightforwardly into left and right halves by saying items
at the beginning of the list up to the midpoint
, and then items
03:24
from the midpoint
up to the end of the list. And this is Python’s awesome slice operator, and if you’re not super familiar with it, I would encourage you to go check out some of the Real Python resources on the slice operator. It’s amazingly convenient.
03:38
It’s one of my favorite Python features, this ability to slice lists in a variety of different ways as you might need them. Now there’s really only one step left here, which is to say return merge_sorted_lists()
of the recursively sorted left-hand list and the recursively sorted right-hand list.
03:59 So you merge sort the two halves and then you merge them together. That’s really all there is to merge sort, which is crazy because it turns out to only be what—what is this? 5 lines of code? That’s crazy.
04:11
But of course you actually need this merge_sorted_lists()
function, so let’s take a look and implement that. Before I get into it, though, I’d encourage you to think about how you can merge sorted lists as quickly as possible. The ideal, of course, is to do it in order of n time, because that will give this algorithm the order of nlog(n) runtime, which is the, kind of, coveted sorting runtime. In fact, there’s a proof out there, a mathematical proof, which shows that any comparison-based sorting algorithm, the fastest possible time you can take is order of nlog(n). And so if we can achieve that here, that’s a pretty darn great accomplishment, especially with five lines of code for the real meat of the algorithm.
04:54 So, let’s merge some sorted lists. The key to understanding this algorithm—which is, I’d say, a little bit more opaque than the actual merge sort—is that the idea is if you have two sorted lists, you know that the smallest elements of both of those lists are at the beginning of the lists, right? Because they’re sorted.
05:14 And if you want to find the smallest element in either of those two lists, you just have to take the smallest of the two elements at the beginning of each list, right?
05:23 Because even if one list has all smaller elements than the other, you know that its smallest element has to be at the beginning because we know it’s already sorted. So this is a really convenient property, and it leads to an algorithm which looks something like this.
05:39
You get a left index and a right index, and those will both start at 0
, so start at the beginning of both lists, and the idea is to slowly build up a return list, which is going to start as empty, and then you can just say, while the length of the return_list
is less than the length of left
plus the length of right
. And you don’t actually really need these parentheses here, but I just put them there for my own kind of clarity.
06:09 What you can do is you can say, let’s get the elements at each of these indices.
06:18
And there’s a little bit of a tricky step here, which is to say, I can’t just get the left item and say left
at left_index
. I actually have to check if this is a valid index for it to be at, so I say left[left_index] if left_index < len(left)
.
06:39
And what do I put otherwise? What do I put in the else
clause? Well, this is another little tricky thing, which is I’m going to say float()
of 'inf'
(infinity), which is a common Python idiom which essentially just gives an infinite or max numeric value.
06:54
And the reason for why I do that will become clear in one second. So let me just type out, I’ll do the exact same thing with the right_index
and the right
list.
07:07 And then the key to this is I’m just going to choose between these two items and slowly move through these two lists—or quickly, really, as it were—and I’ll just keep adding one element at a time.
07:19
So, if left_item < right_item:
then I know that left_item
is the smallest remaining item that hasn’t yet been added to the return_list
.
07:29
So I can say return_list.append(left_item)
.
07:35
And then, of course, since I appended that left_item
, I need to add 1
to my left_index
because I no longer need to consider that item. And in all other cases, I can say here return_list.append(right_item)
and increment the right_index
.
07:53
And, of course, if the two are equal, then I’ll just add the right_item
but it won’t really matter because I could have added either of those two in that case. Then all that’s left to do is just return the return_list
.
08:05 So, of course you might expect that to be the final step because of its name.
08:10
And so really all this is saying is “Choose the smallest remaining item and add it to the return_list
.” So that will guarantee that you’re always adding things to this return_list
in sorted order, and that’s the key to this algorithm. So, there it is.
08:28 And if this is not looking 100% clear to you, I encourage you to check out the downloadable code underneath this lesson, because it will have a little bit more comments and a little bit more, kind of, just annotation for you to hopefully help you understand it.
08:42
And you can also run that code, of course, and add whatever print statements you want to better kind of follow the progress of this algorithm as it goes. So, I already imported random
and time
, so let me just add a little bit of quick kind of benchmarking code here.
08:57 So, my little decorator doesn’t work so well anymore because this is a recursive algorithm and it’s kind of hard, I would just have to write some more code and it would be a little more opaque if I used a decorator for this.
09:08 So I’m just going to do this in kind of a brute force way, which is just start the timer, run the algorithm, and then print out how much time has passed.
09:16 So let’s see this work with just a 10-element list in the terminal.
09:21 When I run this from the command line in Python,
09:25 as you can see, this outputs a nicely sorted list of these random integers, and then it says—wow, this took an unbelievably small amount of time, almost infinitesimal.
09:36
So let’s bump it up to 100
, see how long that takes to run and make sure it’s still sorted. And as you can see, it looks like everything is in sorted order.
09:46
Of course, you can take some time to check that if you want, but I can guarantee it’s in sorted order. And it still took a really small period of time. Let’s remove this print statement, I don’t think I want to see a thousand-element list printed into my terminal. And I’ll change that to 1000
, and we can kind of start to see it move up.
10:04 And of course, you know, it would be nice if I had made this into a nice script that just would increment slowly and go up all of this but, you know, it’s nice to kind of see it done in just a low-tech kind of way as well. As you can see, it’s starting to take a little bit longer. It took a little over a second there. And if I add one more zero it will take an even longer period of time, but it’s still so much faster than insertion sort and bubble sort for the same number of elements in the list. So, that’s merge sort for you.
10:35 And as I said, feel free to browse the code for a little bit more context and the ability to look at this on your own. A great thing about merge sort—it’s super fast compared to the two sorts that I showed you earlier, and it’s really quite simple once you understand the power of recursion.
10:52 The actual algorithm itself is just not super, super complicated. And it has another great property, which is that merge sorts are stable, meaning that you can sort on multiple keys and maintain the order of previous sorts.
11:06 That’s a very desirable property when you’re sorting large data structures that often need to be in very complex orders based on multiple pieces of the data that they contain. Next up, I’m going to talk about quicksort, which in some ways is a really similar algorithm to merge sort but ends up having a lot of distinct behavior.
Become a Member to join the conversation.