The zip() Function

Parallel Iteration With Python's zip() Function Liam Pulsifer 07:38

00:28 You have something that you want iterate through, and you normally just say for x in that thing, do something to x, right? And the way that that generally works is that you have an index, which generates the ordering of that item that you’re iterating through, and then the index allows you to get each element of that item in turn, right?

01:31 And finally, what it returns iterator of tuples with the i-th element of each iterable that you passed in in the i-th tuple. Now, this is super convenient because it lets you zip up or group multiple iterables—and most of the time it’ll be related iterables—in a nice, easy-to-package fashion.

01:53 And so you can get access to all of the corresponding elements of each iterable in one simple little tuple package. So, that’s very convenient, especially when the data that you have is related and related on an index.

02:06 So, let’s take a look at the documentation of the zip() function in Python to get a little bit more nuance in how this functions. So, from the documentation, you can see that zip() “returns an iterator of tuples where the i-th tuple contains the i-th element from each of the argument sequences or iterables.”

02:53 Let’s take a look at how some of these concepts work themselves out in real code.

02:58 So, I’ve defined a few iterables, just ahead of time so you don’t have to watch me type them. Let’s do a simple example first with just the prices list and the car_sizes list. So if I say zip(prices, car sizes), what do I get? Well, I get a zip object, which is an iterator, and then you can also see the memory address at which that object is stored.

03:22 If I want to take a look at the actual things in this iterator, I can convert it to a list. I can say list(zip(prices, car_sizes)).

03:33 And as you can see, the behavior is exactly like what you observed in the documentation. So, at the zeroth index of prices there’s the value 10000, and at the zeroth index of car_sizes there’s the string "small", and at the zeroth index of the output there is simply a tuple with 10000 and 'small'.

03:53 And then the same pattern continues for index 1 and index 2. And as you can see, the length of the output is 3 because the length of both of the inputs are 3.

04:03 And then the number of elements in each tuple is two, because I passed in two iterables to the zip() function.

04:10 If I add the colors iterable to this zip() function, then I get a slightly different output—and of course I’ll have to convert it to a list if I want to actually see what’s in the zip object offhand.

04:51 And so as you saw in the documentation, the zip() function—the length of that output—is dependent on the shortest input. If I want to use the longest input to determine the number of elements in my output iterator, then I can import from itertools the function zip_longest(), and that does almost exactly the same thing as zip(),

05:13 except that it uses the longest input to determine—oh, I’m sorry. I said zip().

05:57 And I can use this to just add in a different value for the fillvalue. In this case, it’s a question mark string ("?"). I could also make it the value 1.

06:06 You know, so there’s a whole list of—you know, you can put any value in here that you want, fillvalue. So that’s pretty nice. That’s how you can avoid the shortening of the zip() output.

06:17 Let’s take a look at just the other two edge cases that the documentation warned you about. So, if you want to say list(zip(prices)), you get a list of 1-tuples.

06:30 And this is important because if you index this list, what you get is a tuple with one item, so if you actually need this number, you’ll need to index twice to get that actual value in there.

06:43 That’s something to be aware of when you’re writing code that uses this. And then if I pass in just nothing to the zip() function, I just get an empty iterator as output. And of course, an empty iterator as a list is an empty list.

06:56 So, that is how the zip() function works and what it looks like in practice. I worked up just a little slide which is an overview, and as you can see, it goes over everything that I’ve done in this lesson so far.

07:08 So, it’s a function for parallel iteration that gets the i-th element of each input and then zips them into a tuple. zip() uses the length of the shortest input by default.

07:17 You can use itertools.zip_longest() if you don’t want that to happen. It returns 1-tuples if given only one input, and it returns an empty iterator with no inputs.

07:26 So, that’s pretty much everything that I’ve just covered in short form.

07:30 In the next lesson, I’m going to cover the differences between the zip() function in Python 2 and in Python 3.

Become a Member to join the conversation.