Hard Interview Question (Optimal Solution & PriorityQueue)

Python Coding Interviews: Tips & Best Practices James Uejio 11:37

In this lesson, you’ll see the final optimal solution to solving the merge_k_lists hard interview question. The solution takes advantage of the PriorityQueue abstract data type, which is similar to a queue but associates each element with a priority. Getting an element, it retrieves the element with the highest priority. Both .put() and .get() take logarithmic time.

Here’s an example:

Python
      
        
      
    
>>> from queue import PriorityQueue
>>> pq = PriorityQueue()
>>> pq.put(2)
>>> pq.put(1)
>>> pq.put(3)
>>> pq.get()
1
>>> pq.get()
2
>>> pq.get()
3
>>> pq.empty()
True
>>> pq.put((1, "hello"))
>>> pq.put((2, "world"))
>>> pq.get()
(1, "hello")
>>> pq.get()
(2, "world")

You can check out the Python documentation on PriorityQueue and the Wikipedia article.

You’ll see all solutions at the end of the course.

00:00 In this video, we’ll look at the current solution we have for the merge_k_linked_list() question and see if we can improve it. Just to review, the algorithm looks at the front value of all the linked lists, finds the minimum, puts it in the result linked list, remove that value from this copy_linked_lists, then keeps going until there are no more values to add.

00:18 So really, there are a couple places in this while loop that we need to look at. These two O(k) lines can sort of be combined into one by iterating, but that’s not really an optimization. Here, this is sort of annoying that we have to look through all the linked lists again to find the one that matches that minimum value.

00:35 It would be great if we had a mapping from the value of the linked list to that link. So, the key idea here is # keep a mapping of the value to the linked list.

00:46 That way, we don’t need to iterate through the linked list again to find the one that corresponds to the min value. We can think, long term, that if we optimize this to have a mapping and then somehow optimize a way to get the minimum value faster than O(k), and then optimize the while condition, we can get to a better solution.

01:02 So, let’s start with optimizing this by having a mapping. Let’s actually just put this out here. # keep a mapping of the value to the linked list. Well, whenever you think of mapping, you think of dictionary.

01:13 What is this dictionary going to map? It’s going to map val_to_link. Well, what if linked lists have the same value? This should be val_to_links.

01:22 It’s going to be a number map to a list of linked lists. So, instead of using a regular dictionary, we can use a defaultdict. Just import it right here, from collections import defaultdict, and the default is going to be a list().

01:37 Then, let’s add all the linked lists to this mapping for link in copy_linked_lists: val_to_links[link.val].append(link). Let’s print val_to_links, just to see what happens here. Let’s just return, so we don’t execute the rest of the code.

01:56 Okay. We got a defaultdict of mapping 1 to Link(1, Link(2)), 2 to mapping Link(2, Link(4)), 3 to mapping Link(3, Link(3)).

02:04 It seemed to map correctly. I guess in this example, 1, 2, 3 are all unique, so there’s no list of linked lists.

02:14 Okay. Now, we have a mapping of the value to linked list. This logic here becomes pretty duplicate, so let’s just comment it out. And now, we can find the link basically being this variable, val_to_links[min_val].pop().

02:30 I’m using .pop() because .pop() is a constant time, and that will actually pop off one of the linked lists in our mapping. pointer.next = Link(link.val).

02:41 pointer = pointer.next. And let’s save and see what happens. Let’s just remove this to make it really clean, like this, run it. pop from empty list—okay, that makes sense in our solution because every time we’re popping off from the list—and that can be empty. So, what should we do? Well, we know that if we’ve popped off the last linked list, then there are no more linked lists mapping to that min value.

03:05 So we can do something like if len() of here is None—== 0—let’s just close this like that—

03:17 then, we should just delete the mapping, because there are no more linked lists mapped to that min_val. And the issue that this is causing is the next iteration, that min value is going to be the min value, but there’s not going to be any linked list mapped to it.

03:31 And then, we actually forgot one part. Every time we’re popping off, but nowhere in the code—except for up here—are we ever adding a linked list back to this mapping.

03:42 Well, if link.next exists, then we need to add the mapping of the front of link.next to link.next. val_to_links[link.next.val] = link.next.

03:58 Let’s save it, let’s run it. Okay. Again, we’re popping off an empty list. I assume—well, this line’s not even right. It should be .append().

04:09 I don’t think that’s going to fix it, but let’s see. Well, this logic is wrong. This is legacy logic, where we loop through the copy_linked_lists.

04:17 We’re not changing copy_linked_lists anymore, so I’m assuming that might be causing it? Let’s remove this. Let’s get min_val as actually the min of val_to_links.

04:30 Run it. min() arg is an empty sequence. Okay, well that’s because this condition is wrong, because we’re not changing copy_link_lists. We’re actually not even using copy_linked_lists, so let’s actually just replace this, replace this, delete this copy—just to really make the code clean.

04:44 And here, it should really be if there are any values in val_to_links.

04:55 Cool! And it passed. It’s a little hard to read the code—I’m going to remove the terminal output—but basically, this is checking, “Okay, are there still values mapped? And if there are, then do this logic, but if there aren’t, then that means we’ve ran out of linked lists to map.” Okay, so let’s look at the runtime. Let’s scroll all the way to the top. Here, this is constant.

05:14 This is going to be O(k). Appending to a list is constant time. Constant, constant. This is actually constant because taking the length of a list is constant.

05:23 This line—so, how many keys can there possibly be in val_to_links? Well, the initial loop—there can be max k keys, but you might be thinking, “Okay, as we’re adding and deleting links from that mapping, won’t it grow?” Well, no, it can’t. Because if there are no more linked lists mapped to a certain value, it will get removed from the keys.

05:43 And so, basically—scrolling all the way to the top—the mapping would include 1, 2, and 3 as keys, but then when we’ve looked at all the linked list mapping to 1, it will get removed from the keys, and then 2 already exists there.

05:55 So, at a certain time, there could only be k unique numbers mapped in val_to_keys, so this min actually still takes O(k). This .pop() is still constant.

06:06 Constant, constant, constant—deleting from a dictionary is constant. Constant. And appending is constant. So, what is our final runtime? Well, how many times does a while loop run? Well, even though we changed the condition, it still runs k * n times because it has to go through every number in all the linked lists. So the current solution is still k * n * k.

06:24 The previous solution in the previous video was k * n * k, and we still got k * n * k. So, you might be thinking, “Okay, that’s a lot of work to do nothing,” but notice how we’ve removed all the O(k) runtimes, except for one line.

06:37 So, if there was some way to get this minimum in less than O(k) time, we’ve optimized the solution. We don’t have to worry about any other place because everything else is constant time. And that’s exactly what we’re going to do.

06:49 So, we’re going to introduce a new data structure called the PriorityQueue. I felt like the PriorityQueue is not a large enough data structure to warrant its own video, but we’re going to sneak it into here and optimize this solution.

07:00 A PriorityQueue is a data structure that allows you to put items into it and get items based on a priority. So, from queue import PriorityQueue.

07:10 pq = PriorityQueue(). And then, when you do pq.put(), the value that you’re putting in is going to also be the priority. So, pq.put(1), then 2, then 3.

07:23 1 has the highest priority, so pq.get() is going to return 1, then 2, then 3. For some reason, negative numbers have a higher priority, so pq.put(-1)—1. And -5. pq.get() will be -5, -1, and 1. You

07:42 can also put tuples where the first number is the priority. (1, "hello"), (2, "world"), pq.get() will return a tuple (1, 'hello'), and then, (2, 'world'). You can check if it’s empty by pq.empty().

08:02 PriorityQueues are really useful because they’re usually implemented with heaps, which allows you to put() and get() in log(n) time, where n is the length of the PriorityQueue. I will link some more documentation down below, but let’s use this to speed up our solution.

08:20 from queue import PriorityQueue.

08:24 pq = PriorityQueue().

08:29 pq.put(link.val). And then now, to get the minimum it’s just pq.get().

08:38 It doesn’t even need to be min(), it’s literally just like that, which now takes log(k) time because the PriorityQueue will at any time have max k values. Here, we don’t need to do anything because the .get() actually removes it from the PriorityQueue. Here, though, we need to put

08:55 link.next.val into the PriorityQueue. Save it, exit, and run it…

09:05 and it passed!

09:08 So now, let’s call this # find min val solution and this’ll be # val to link solution, and now, finally, # priority queue solution—

09:19 final runtime will be

09:23 O(k * n * log(k)). So, we actually optimized from log(k * n) to log(k), and interviewers really, really like these small optimizations, because it shows that you know your data structures, you know how to think about the best solution, and think about ways to improve it.

09:42 Let’s review all four solutions and think about what concepts we used and what we learned. So, the brute force solution—you put all the values of all the linked lists into a list, sorted it, and created a final linked list.

09:55 Here, we took advantage of the sorted() built-in function, and it got us comfortable with building a linked list. The final resulting runtime was k * n * (log(k * n). Then, we used a solution that took advantage of the fact that the lists were sorted by looking at the front of all the linked lists, found the minimum, put it in the result linked list, removed that value that we’ve added, kept going until there were no more values to add.

10:18 We copied the input because we didn’t want to mutate the input, and then kept going until there were no more values, found the minimum value, found the linked list that was mapped to the minimum value, and then added that value to the result, and then returned it.

10:33 The final runtime was k * n * k. Then, we saw that we could optimize this part by using a map to map the value to the linked lists.

10:41 We used a defaultdict to map val_to_links, actually saw a way to optimize the while condition so that it’s constant time, found the minimum value, found the link that corresponds to it, deleted that min value mapping if there were no more linked lists, and then added a mapping to the .next value of that Link that we just popped off, and the final runtime was k * n * k.

11:03 But we did that because we wanted the only line to take O(k) time to be this min() call, and then used a PriorityQueue to decrease that runtime from O(k) to log(k) so that the final PriorityQueue solution is k * n * log(k).

11:17 This concludes the three-part video on finding multiple ways to solve the same problem and using all these different concepts that we’ve talked about: defaultdict, min(), sorted(), and then a new data structure, PriorityQueue.

11:30 The next video is a conclusion video, and you’ll go through everything that you learned in this course.

James Uejio RP Team on April 27, 2020

Here is the Python documentation on PriorityQueue and a Wikipedia article.

Daniel A on May 18, 2020

Not sure if i’m doing the Big O notation right but i believe this solution may run O(n) times.

from functool import reduce

def merge_k_linked_lists(linked_lists):
    next_links = [link.next for link in linked_lists if link.next]  # O(n)
    total_links = next_links + linked_lists # O(1)
    total_links_sorted = sorted(total_links, key=lambda link: link.val, reverse=True)  # Olog(k)
    result = reduce(lambda val, next_val: Link(next_val.val, next=val), total_links_sorted) # O(n)
    return result

Daniel A on May 18, 2020

This solution essentially builds the links backwards from the largest link to the smallest.

Simon Yates on July 11, 2020

You’re relying on the linked lists only being 2 items deep, which wasn’t stated in the problem (although it’s true of both the test cases).

ambreen2006 on Nov. 19, 2020

In the optimal solution without the priority queue there is an improvement made by removing the redundant iteration to find the index to which the min value belongs to. I’m thinking though that it does a tad bit more than just this change, because it has this line:

val_to_links[min_val].pop()

The pop() without the argument should be removing the last item from the list associated with the min value in the dictionary. However, if there are multiple linked list with the same min value, the program in the previous video advances the linked list found first but this one advances the linked list that was found last in the creation/update of ‘val_to_links’ dictionary. That’s fine here but if the Link object had more attributes associated to it, the order in which the sorted list is returned would be different for these two programs.

ambreen2006 on Nov. 19, 2020

PriorityQueue solution can also do without the additional book keeping of indices.

Because duplicates are allowed, the entry into the PQ needs to be wrapped (or the next element of the tuple needs to be comparable).

class PQ_Entry:
    def __init__(self, value, data):
        self.value = value
        self.data = data

    def __lt__(self, other):
        return self.value < other.value

Now, we can just use the PQ without needing the dictionary.

def merge_k_linked_lists(linked_lists):

    pq = PriorityQueue()
    for link in linked_lists:
        if link:
            pq.put(PQ_Entry(link.val, link))

    result = Link(0)
    ptr = result

    while not pq.empty():
        pq_entry = pq.get()
        min_val, min_link = pq_entry.value, pq_entry.data
        ptr.next = Link(min_val)
        ptr = ptr.next
        if min_link.next:
            pq.put(PQ_Entry(min_link.next.val, min_link.next))

    return result.next

Geoff Rliey on Jan. 30, 2021

A quick note about PriorityQueue, in the video at TC:07:23 you say “1 has the highest priority, so pq.get() is going to return 1, then 2, then 3. For some reason, negative numbers have a higher priority, so pq.put(-1), 1 and -5. pq.get() will be -5, -1, and 1.”

I’m sure you’ve realised by now, but in case anyone else is wondering, the reason that negative numbers have a higher priority is simply because they are lower values than positive numbers.

I could probably get a degree in stating the blinking obvious. ;)

pavolsevera on Feb. 22, 2021

If we’re allowed to use the standard library, the simplest solution seems to use heapq.merge. The only thing we need to do then is to convert linked lists to iterators and iterators to linked lists, which is easy and pleasant.

stephencarey1905 on July 20, 2021

I solved the problem initially with recursion but I’m not sure what the big O complexity might be for that. Mainly because I dont know what the complexity of my merge(merged_list, list) is.... its effectivly merge sort so o(n log n)?

def merge_k_linked_lists(linked_lists):
    merged_list = linked_lists.pop(0)

    # o(k)
    for list in linked_lists:
        # o(nlogn)??? 
        merged_list = merge(merged_list, list)
    return merged_list


def merge(lst_1, lst_2):
    if lst_1 is None:
        return lst_2
    if lst_2 is None:
        return lst_1

    if lst_1.val < lst_2.val:
        return Link(lst_1.val, merge(lst_1.next, lst_2))
    return Link(lst_2.val, merge(lst_1, lst_2.next))

Engleang Sam on Jan. 4, 2022

I think it’s O( k * n ) not O( k * n log(n)) . By merging the list it will take O(n) rather than O( n * log( n)) .

Become a Member to join the conversation.