Why Sorting Matters?
00:00 The first and most important question before I get into actually implementing some sorting algorithms is: Why does sorting actually matter? Why is it something that you might want to do when you’re working in your Python code? To answer that question, an important thing to have an understanding of is what sorting means in a programming context specifically. So, when we talk about sorting and programming, we normally mean putting a collection of items into some well-defined order.
00:33 This implies that there has to be some way to actually compare elements in that collection. So for example, you couldn’t sort a list of numbers if you didn’t have a way to say that 5 was greater than 4.
00:44 There would simply be no way to actually put things into any defined order unless you had some way to compare them. This can often be a simple numerical comparison, but it can also be a lot more complex. So imagine, for example, you have a database with information about customers at an online store, and you’re trying to figure out who to invite to participate in a rewards program. You can’t simply say, “Oh, customer A has a greater numerical value than customer B.” You have to do some other comparisons.
01:12 So you might want to give first priority to people who have been customers the longest—so a timescale comparison. But you also might want to weigh how much money each customer has spent and maybe the last time that they made a purchase.
01:23 So people who have made a purchase more recently, you might want to offer them to participate in the rewards program more readily. You can combine all of those things into one big comparison function that takes into account all those different factors, and as long as that comparison function works, then you’ll be able to order those customers in a way that makes sense.
01:49 Here are a few keywords and properties of sorting algorithms that will be useful as I talk about the different kinds of sorting algorithms we’ll consider in this lesson. The first is the runtime, which is, of course, how quickly the algorithm runs.
02:10 But that’s just generally something that you want to think about when you’re implementing sorting algorithms—how quickly do they run? A property that’s of interest when you design a sorting algorithm is whether that sorting algorithm is stable or not.
02:23 So, a stable sort is a kind of sort that retains its sorted nature even when you sort it multiple times with different keys. So for example, if you sort a list of store item objects alphabetically, then you sort again by price, are the items that are equal in price still sorted alphabetically?
02:53 Another property that’s interesting is whether a sorting algorithm is in-place. That means, does it take up any more additional memory space than the memory taken up by the input? Some algorithms are in-place—that is, they don’t take up any more memory—and some algorithms require extra space, so might call those out-of-place sorting algorithms. They need to, for example, allocate a new list or something like that in order to actually fulfill their function.
03:19 Then one more key word that I’d like to mention is something called a pathological case. And that’s normally how you want to consider the performance of your sorting algorithm—a case in which the input data is specifically designed to make your algorithm run as slowly or as poorly, in general, as possible.
03:37 So for example, say you wrote a sorting algorithm that relied on replacing some values with -1 or something like that in order to mark them as placeholders. Well, of course the pathological case for that would be a list that had negative numbers in it, right?
03:52 Because -1 would be a valid case for that list that your algorithm didn’t take into account. And there are many other ways that this can kind of show itself and I’ll talk about that later when I talk about quicksort.
04:02 Now let’s get into the big question. Why is sorted data so important for computer scientists and programmers writ large? There are a lot of things that sorting makes easier when you’re actually trying to work with some data. For example, searching for an item in a list is much, much faster if the list is sorted. Selecting items from a list based on their relationship to the rest of the items is easier, so finding the k-th largest or smallest value, finding the median, so on. Right? What the idea is is that you have guarantees in a sorted list as to properties of the data, right?
04:37 So figuring out things related to those properties becomes much easier. In that same vein, finding duplicate values on a list is a very quick operation when the list is sorted. And analyzing the frequency distribution of items, which is a really common task in data science, is super fast if the list is sorted. Of course, there are all kinds of crazy optimizations as well that you can do with sorted lists. For example, things like compilers might want to keep sorted items, and really, any kind of data structure where you’re trying to access elements quickly and easily is better done and easier done with sorted data in some sense.
05:17 Now that you’re familiar with why sorting is a good thing, let’s talk about some ways to analyze the performance of sorting algorithms. And then after that, I’ll get into actually implementing them.
Become a Member to join the conversation.