Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Quicksort Algorithm

Introduction to Sorting Algorithms in Python Liam Pulsifer 08:07

Transcript
Discussion (1)

00:00 The next algorithm I’ll discuss is called quicksort. You might be wondering why quicksort is written as one word, as opposed to the other kinds of sorts that I’ve covered so far, which were two words. And the reasons for that are really just historical.

00:13 The guy who first wrote a paper that elaborated the method of quicksort wrote it as one word, and so that’s kind of become tradition. So, quicksort is a cool sorting algorithm.

00:23 It has very different characteristics than some of the others we’ve talked about. One of those things is that merge sort, insertion sort, and bubble sort—which I’ve already mentioned—are what are called stable sorts.

00:35 But quicksort is not a stable sort, and what that means is that with the preceding three algorithms, if you sort by one characteristic—so, say you sort alphabetically, and then you sort again numerically, the elements that are the same numerically will remain sorted alphabetically, whereas you don’t quite get that with quicksort.

00:54 That can be kind of a problem but quicksort has advantages in that it’s really fast and it’s really easy to write. So let’s take a look at how it works.

01:04 The basic idea of quicksort is that you choose some pivot element, which is normally somewhere near, kind of, the middle of the list, ideally. Then you divide the list into three sections—the elements that are less than, equal to, and greater than the pivot element.

01:19 And then you call quicksort recursively on the less-than and greater-than sublists and then concatenate those results into one big list.

01:28 So, it’s a lot like merge sort in that you divide the list up and then call recursive functions on it, but it’s different in the strategy that it uses to divide.

01:36 It actually divides not purely based on position in the list, but also on the actual values of the elements in the list. So, let’s take a look at it in code.

01:48 The best thing about quicksort, in my opinion, is how easy it is to actually implement. Once you understand the idea of it and why it works, then the actual implementation is really straightforward.

02:00 Just like merge sort, you want to say if the length of the items is less than or equal to 1, then you can just return those items because you know that a list of length 1 or 0 is already sorted by definition. Now you need to choose a pivot.

02:15 And for the first strategy, I’m just going to say pivot = items[0]. So just choose the first element of the list and use that as the pivot.

02:23 I’ll talk in a little bit about how there are different pivot strategies which can change the performance of the algorithm. Now Python makes the rest of this implementation incredibly easy.

02:33 You could say less_than_pivot = [x for x in items if x < pivot]. This gets you all of the items that are less than the pivot. Now you can do the exact same thing here for the equal_to_pivot and greater_than_pivot.

02:49 And then the simplest part of this algorithm is just the end. You just say return quicksort(less_than_pivot) + equal_to_pivot, so concatenating these lists together or just sticking them together, + quicksort(greater_than_pivot).

03:10 And of course, the reason you don’t need to quicksort this middle list is because you know all of these elements have the same value. They’re all equal to the pivot, so there’s no need to actually sort it. And guess what? That’s it.

03:20 That’s the whole algorithm, which is kind of crazy. It seems like it shouldn’t work, but it really does because you just divide it up into less than, equal to, and greater than. And you know that the equal_to_pivot, those are already in the correct place, so you just kind of keep on recursing down until you get to these one element lists and you know those are already sorted.

03:39 And then you just stick them all back together again. So, it’s really quite simple. Let’s take a look and see how it works, and then I’ll just do the same timing. Because this is also a recursive algorithm, so I’m just going to do the same timing that I did in the last video with merge sort.

03:54 And of course I need to import time, but after that, I should be able to show you how this works.

04:00 So, let’s watch it run. quicksort. Wow, so this took two thousandths of a second to run on a list of a thousand elements. Now what about 10,000 elements? That only took a little less than two hundredths of a second.

04:17 Let’s bump it straight up to a million again with the same thing as the merge sort and let’s see how long it takes. It’s definitely going to take a little bit longer than it has in the past, but this only took three seconds to sort a million elements, which is pretty darn good because even the merge sort was running at a much longer time. Now something that’s interesting about quicksort is that it can be kind of a variable algorithm.

04:42 It’s worst-case performance is actually on the order of n squared. Consider what happens if this list is already sorted when you pass it in. So, for example, you choose the pivot is the first element, but since it’s already sorted, you know that all of the other items are actually greater than the pivot.

04:59 So the length of this list is really just the length of the whole list minus one. And what that means is that every time you choose a pivot, you’re just choosing the smallest element, and so you’re actually trying to sort something that’s already sorted and doing a lot of extra work, which actually results in an order of n squared runtime.

05:17 But if the list is random or is semi-random or is not sorted, then quicksort tends to be really fast. One thing that I could do is I could just say here, instead of doing randint(), I could just say [_ for _ in range(1000000)], and then I could try to run this again. So as you’ll notice here, I actually got a RecursionError for maximum recursion depth exceeded, and that’s because I’m calling this quicksort() function a million times, right?

05:43 Because every time I call quicksort(), it’s actually just getting the very first element and greater_than_pivot just has len(items) - 1 length, and so quicksort() will just keep getting called for as many items as there are items.

05:57 So that becomes both an O(n**2) runtime and you risk running into the maximum recursion depth on your interpreter. So, a better pivot is actually… If I change this import to be at the very beginning of things…

06:14 A better pivot is random.choice() of items. And so what that does is it just chooses a random pivot, which means that even if the list is sorted, the random.choice() will in the vast, vast majority of cases choose an element that’s farther away from the bottom or the top of the random array. So now if I say python quicksort.py, it still will take a little bit longer on this million-element list, but it only took about five seconds as opposed to reaching a maximum recursion depth and taking a huge amount of time. So this is a better pivot.

06:54 There are also other ways that you can choose the pivot, right? You could make a few different random choices and you could take the median of those or the mean of those, and that can also have some performance gains depending on the kind of data that you have. So there are a lot of different ways to make this pivot choice.

07:11 That’s one of the funny things about sorting is that depending on the kind of data that you’re working with, you can almost always find a different sorting algorithm that will perform best.

07:21 So, you know, merge sort is great if your data structures tend to be recursively defined. Quicksort is great because it can actually sort in place—my version of this algorithm does not actually do that, it allocates extra lists. But if you want to, you can implement a version of this that actually does some swapping so that you actually use only the space for the items that you take in.

07:45 You don’t have to allocate any more space. I’ll leave that to you to do a little bit of research and to see if you can implement a version that works that way and then I’ll probably include one in the source code for this lesson as well.

07:56 So, there are always trade-offs with sorting algorithms but quicksort is super fast and the basic implementation is incredibly straightforward to write, so it’s a good one to consider.

hauntarl on Feb. 14, 2021

Here’s an in-place version for the quicksort algorithm.

from random import randint
from timer import timed


def sort(items, low, high):
    '''
    Three-way Partition
    . quick sort can be inplace as well, in order to reduce space complexity
      using inplace sorting and have support for duplicate elements you need to
      perform a three-way partition as follows
    '''
    if low >= high:
        return items

    pivot = items[randint(low, high)]  # selecting a pivot
    # initialize variables for partition
    left = low    # [low, left) - less than elements
    right = low   # [left, right) - equal to pivot
    upper = high  # [right, upper] - unexamined, (upper, high] - greater than
    while right <= upper:
        if items[right] < pivot:
            items[left], items[right] = items[right], items[left]
            left += 1
            right += 1
        elif items[right] > pivot:
            items[right], items[upper] = items[upper], items[right]
            upper -= 1
        else:
            right += 1

    sort(items, low, left-1)
    sort(items, right, high)
    return items


@timed
def to_sort(items): sort(items, 0, len(items) - 1)


# Testing
items = [randint(1, 10) for _ in range(10)]
print(items)
print(sort(items, 0, len(items) - 1))

# Benchmarking: to test how scalable the algorithm is
to_sort([randint(1, 1000) for _ in range(1000)])
to_sort([randint(1, 1000) for _ in range(10000)])
'''
OUTPUT:
[4, 10, 5, 6, 2, 1, 7, 10, 1, 7]
[1, 1, 2, 4, 5, 6, 7, 7, 10, 10]
Finished 'to_sort' in 0.0039396000 secs
Finished 'to_sort' in 0.0343496000 secs
'''

PS: It might not be perfect or even well-optimized, open to suggestions and further improvements.

Become a Member to join the conversation.