Hard Interview Question (Optimal Solution & PriorityQueue)
In this lesson, you’ll see the final optimal solution to solving the
merge_k_lists hard interview question. The solution takes advantage of the
PriorityQueue abstract data type, which is similar to a queue but associates each element with a priority. Getting an element, it retrieves the element with the highest priority. Both
.get() take logarithmic time.
Here’s an example:
>>> from queue import PriorityQueue >>> pq = PriorityQueue() >>> pq.put(2) >>> pq.put(1) >>> pq.put(3) >>> pq.get() 1 >>> pq.get() 2 >>> pq.get() 3 >>> pq.empty() True >>> pq.put((1, "hello")) >>> pq.put((2, "world")) >>> pq.get() (1, "hello") >>> pq.get() (2, "world")
You’ll see all solutions at the end of the course.
In this video, we’ll look at the current solution we have for the
merge_k_linked_list() question and see if we can improve it. Just to review, the algorithm looks at the front value of all the linked lists, finds the minimum, puts it in the result linked list, remove that value from this
copy_linked_lists, then keeps going until there are no more values to add.
So really, there are a couple places in this
while loop that we need to look at. These two O(k) lines can sort of be combined into one by iterating, but that’s not really an optimization. Here, this is sort of annoying that we have to look through all the linked lists again to find the one that matches that minimum value.
That way, we don’t need to iterate through the linked list again to find the one that corresponds to the min value. We can think, long term, that if we optimize this to have a mapping and then somehow optimize a way to get the minimum value faster than O(k), and then optimize the
while condition, we can get to a better solution.
So, let’s start with optimizing this by having a mapping. Let’s actually just put this out here.
# keep a mapping of the value to the linked list. Well, whenever you think of mapping, you think of dictionary.
It’s going to be a number map to a list of linked lists. So, instead of using a regular dictionary, we can use a
defaultdict. Just import it right here,
from collections import defaultdict, and the default is going to be a
Then, let’s add all the linked lists to this mapping
for link in copy_linked_lists:
val_to_links[link.val].append(link). Let’s print
val_to_links, just to see what happens here. Let’s just return, so we don’t execute the rest of the code.
Okay. Now, we have a mapping of the value to linked list. This logic here becomes pretty duplicate, so let’s just comment it out. And now, we can find the
link basically being this variable,
pointer = pointer.next. And let’s save and see what happens. Let’s just remove this to make it really clean, like this, run it.
pop from empty list—okay, that makes sense in our solution because every time we’re popping off from the list—and that can be empty. So, what should we do? Well, we know that if we’ve popped off the last linked list, then there are no more linked lists mapping to that min value.
then, we should just delete the mapping, because there are no more linked lists mapped to that
min_val. And the issue that this is causing is the next iteration, that min value is going to be the min value, but there’s not going to be any linked list mapped to it.
min() arg is an empty sequence. Okay, well that’s because this condition is wrong, because we’re not changing
copy_link_lists. We’re actually not even using
copy_linked_lists, so let’s actually just replace this, replace this, delete this copy—just to really make the code clean.
04:55 Cool! And it passed. It’s a little hard to read the code—I’m going to remove the terminal output—but basically, this is checking, “Okay, are there still values mapped? And if there are, then do this logic, but if there aren’t, then that means we’ve ran out of linked lists to map.” Okay, so let’s look at the runtime. Let’s scroll all the way to the top. Here, this is constant.
This line—so, how many keys can there possibly be in
val_to_links? Well, the initial loop—there can be max k keys, but you might be thinking, “Okay, as we’re adding and deleting links from that mapping, won’t it grow?” Well, no, it can’t. Because if there are no more linked lists mapped to a certain value, it will get removed from the keys.
And so, basically—scrolling all the way to the top—the mapping would include
3 as keys, but then when we’ve looked at all the linked list mapping to
1, it will get removed from the keys, and then
2 already exists there.
Constant, constant, constant—deleting from a dictionary is constant. Constant. And appending is constant. So, what is our final runtime? Well, how many times does a
while loop run? Well, even though we changed the condition, it still runs k * n times because it has to go through every number in all the linked lists. So the current solution is still k * n * k.
06:24 The previous solution in the previous video was k * n * k, and we still got k * n * k. So, you might be thinking, “Okay, that’s a lot of work to do nothing,” but notice how we’ve removed all the O(k) runtimes, except for one line.
06:37 So, if there was some way to get this minimum in less than O(k) time, we’ve optimized the solution. We don’t have to worry about any other place because everything else is constant time. And that’s exactly what we’re going to do.
So, we’re going to introduce a new data structure called the
PriorityQueue. I felt like the
PriorityQueue is not a large enough data structure to warrant its own video, but we’re going to sneak it into here and optimize this solution.
1 has the highest priority, so
pq.get() is going to return
3. For some reason, negative numbers have a higher priority, so
pq.get() will be
can also put tuples where the first number is the priority.
pq.get() will return a tuple
(1, 'hello'), and then,
(2, 'world'). You can check if it’s empty by
PriorityQueues are really useful because they’re usually implemented with heaps, which allows you to
get() in log(n) time, where n is the length of the
PriorityQueue. I will link some more documentation down below, but let’s use this to speed up our solution.
It doesn’t even need to be
min(), it’s literally just like that, which now takes log(k) time because the
PriorityQueue will at any time have max k values. Here, we don’t need to do anything because the
.get() actually removes it from the
PriorityQueue. Here, though, we need to put
09:23 O(k * n * log(k)). So, we actually optimized from log(k * n) to log(k), and interviewers really, really like these small optimizations, because it shows that you know your data structures, you know how to think about the best solution, and think about ways to improve it.
09:42 Let’s review all four solutions and think about what concepts we used and what we learned. So, the brute force solution—you put all the values of all the linked lists into a list, sorted it, and created a final linked list.
Here, we took advantage of the
sorted() built-in function, and it got us comfortable with building a linked list. The final resulting runtime was k * n * (log(k * n). Then, we used a solution that took advantage of the fact that the lists were sorted by looking at the front of all the linked lists, found the minimum, put it in the result linked list, removed that value that we’ve added, kept going until there were no more values to add.
10:18 We copied the input because we didn’t want to mutate the input, and then kept going until there were no more values, found the minimum value, found the linked list that was mapped to the minimum value, and then added that value to the result, and then returned it.
We used a
defaultdict to map
val_to_links, actually saw a way to optimize the
while condition so that it’s constant time, found the minimum value, found the link that corresponds to it, deleted that min value mapping if there were no more linked lists, and then added a mapping to the
.next value of that
Link that we just popped off, and the final runtime was k * n * k.
But we did that because we wanted the only line to take O(k) time to be this
min() call, and then used a
PriorityQueue to decrease that runtime from O(k) to log(k) so that the final
PriorityQueue solution is k * n * log(k).
This concludes the three-part video on finding multiple ways to solve the same problem and using all these different concepts that we’ve talked about:
sorted(), and then a new data structure,
Become a Member to join the conversation.