The zip() Function
00:00
In this lesson, I’m going to take you through what the zip()
function is and how it works. The key to understanding the Python zip()
function is that it’s a function for parallel iteration. Now, what does that mean, exactly? Well, when you think about the concept of iteration—which is moving through a list, or a tuple, or another form of iterable—you normally think of saying something like for number in A:
print(number)
, or do some other operation on number
, right?
00:28
You have something that you want iterate through, and you normally just say for x
in that thing, do something to x
, right? And the way that that generally works is that you have an index, which generates the ordering of that item that you’re iterating through, and then the index allows you to get each element of that item in turn, right?
00:48
Now what zip()
lets you do is it lets you pass in multiple iterables—so zero through any number of iterables. And what it does is it follows that same process of starting at some index, but what it does is it gets the value from each input iterable in the order that you’re passed them in, at that index. So in this example, it would be if you passed in three lists, A
, B
, and C
, first zip()
goes through index 0
and it says, “Okay, I’ll get 1
, I’ll get 'd'
, I’ll get .4
, and then package all of those into a tuple.” And then it does the same thing for all of the remaining indices of the input iterables.
01:31 And finally, what it returns iterator of tuples with the i-th element of each iterable that you passed in in the i-th tuple. Now, this is super convenient because it lets you zip up or group multiple iterables—and most of the time it’ll be related iterables—in a nice, easy-to-package fashion.
01:53 And so you can get access to all of the corresponding elements of each iterable in one simple little tuple package. So, that’s very convenient, especially when the data that you have is related and related on an index.
02:06
So, let’s take a look at the documentation of the zip()
function in Python to get a little bit more nuance in how this functions. So, from the documentation, you can see that zip()
“returns an iterator of tuples where the i-th tuple contains the i-th element from each of the argument sequences or iterables.”
02:24 But another couple of interesting things to note are that “the iterator stops when the shortest input iterable is exhausted.” So, the length of the output is determined by the shortest input iterable, rather than the longest. “With a single iterable argument, it returns an iterator of 1-tuples.” So you still get tuples, even if there’s only one element in each of them, and then “with no arguments, it simply returns an empty iterator.” So, you don’t get empty tuples or anything, you just get a completely empty iterator.
02:53 Let’s take a look at how some of these concepts work themselves out in real code.
02:58
So, I’ve defined a few iterables, just ahead of time so you don’t have to watch me type them. Let’s do a simple example first with just the prices
list and the car_sizes
list. So if I say zip(prices, car sizes)
, what do I get? Well, I get a zip
object, which is an iterator, and then you can also see the memory address at which that object is stored.
03:22
If I want to take a look at the actual things in this iterator, I can convert it to a list
. I can say list(zip(prices, car_sizes))
.
03:33
And as you can see, the behavior is exactly like what you observed in the documentation. So, at the zeroth index of prices
there’s the value 10000
, and at the zeroth index of car_sizes
there’s the string "small"
, and at the zeroth index of the output there is simply a tuple with 10000
and 'small'
.
03:53
And then the same pattern continues for index 1
and index 2
. And as you can see, the length of the output is 3
because the length of both of the inputs are 3
.
04:03
And then the number of elements in each tuple is two, because I passed in two iterables to the zip()
function.
04:10
If I add the colors
iterable to this zip()
function, then I get a slightly different output—and of course I’ll have to convert it to a list
if I want to actually see what’s in the zip
object offhand.
04:26
Normally, you would put this in a for
loop so that you could iterate through it, but I’m using the list()
constructor just to make it easier to see. Now, since I passed in colors
as well, the size of each output tuple has increased to 3
, because I passed in three inputs, but the length of the total output has actually decreased by one, and that’s because colors
only has two elements in it, "red"
and "blue"
.
04:51
And so as you saw in the documentation, the zip()
function—the length of that output—is dependent on the shortest input. If I want to use the longest input to determine the number of elements in my output iterator, then I can import from itertools
the function zip_longest()
, and that does almost exactly the same thing as zip()
,
05:13
except that it uses the longest input to determine—oh, I’m sorry. I said zip()
.
05:19
I was trying to type and talk at the same time and that’s always dangerous. So, I am now going to use zip_longest()
and as you can see, the length of the output is once again 3
, except that I now have this padding None
value because there is no third element of the colors
list, and so zip_longest()
has to fill it with something, so it just uses None
. If you don’t want to use None
, you can use the fillvalue
parameter of the function, and as you can see there’s this nice param iter1, param iter2...
—so this just means any number of iterables—and then the fillvalue
parameter.
05:57
And I can use this to just add in a different value for the fillvalue
. In this case, it’s a question mark string ("?"
). I could also make it the value 1
.
06:06
You know, so there’s a whole list of—you know, you can put any value in here that you want, fillvalue
. So that’s pretty nice. That’s how you can avoid the shortening of the zip()
output.
06:17
Let’s take a look at just the other two edge cases that the documentation warned you about. So, if you want to say list(zip(prices))
, you get a list of 1-tuples.
06:30 And this is important because if you index this list, what you get is a tuple with one item, so if you actually need this number, you’ll need to index twice to get that actual value in there.
06:43
That’s something to be aware of when you’re writing code that uses this. And then if I pass in just nothing to the zip()
function, I just get an empty iterator as output. And of course, an empty iterator as a list
is an empty list.
06:56
So, that is how the zip()
function works and what it looks like in practice. I worked up just a little slide which is an overview, and as you can see, it goes over everything that I’ve done in this lesson so far.
07:08
So, it’s a function for parallel iteration that gets the i-th element of each input and then zips them into a tuple. zip()
uses the length of the shortest input by default.
07:17
You can use itertools.zip_longest()
if you don’t want that to happen. It returns 1-tuples if given only one input, and it returns an empty iterator with no inputs.
07:26 So, that’s pretty much everything that I’ve just covered in short form.
07:30
In the next lesson, I’m going to cover the differences between the zip()
function in Python 2 and in Python 3.
Become a Member to join the conversation.