The zip() Function
In this lesson, I’m going to take you through what the
zip() function is and how it works. The key to understanding the Python
zip() function is that it’s a function for parallel iteration. Now, what does that mean, exactly? Well, when you think about the concept of iteration—which is moving through a list, or a tuple, or another form of iterable—you normally think of saying something like
for number in A:
print(number), or do some other operation on
You have something that you want iterate through, and you normally just say
for x in that thing, do something to
x, right? And the way that that generally works is that you have an index, which generates the ordering of that item that you’re iterating through, and then the index allows you to get each element of that item in turn, right?
zip() lets you do is it lets you pass in multiple iterables—so zero through any number of iterables. And what it does is it follows that same process of starting at some index, but what it does is it gets the value from each input iterable in the order that you’re passed them in, at that index. So in this example, it would be if you passed in three lists,
zip() goes through index
0 and it says, “Okay, I’ll get
1, I’ll get
'd', I’ll get
.4, and then package all of those into a tuple.” And then it does the same thing for all of the remaining indices of the input iterables.
01:31 And finally, what it returns iterator of tuples with the i-th element of each iterable that you passed in in the i-th tuple. Now, this is super convenient because it lets you zip up or group multiple iterables—and most of the time it’ll be related iterables—in a nice, easy-to-package fashion.
01:53 And so you can get access to all of the corresponding elements of each iterable in one simple little tuple package. So, that’s very convenient, especially when the data that you have is related and related on an index.
So, let’s take a look at the documentation of the
zip() function in Python to get a little bit more nuance in how this functions. So, from the documentation, you can see that
zip() “returns an iterator of tuples where the i-th tuple contains the i-th element from each of the argument sequences or iterables.”
02:24 But another couple of interesting things to note are that “the iterator stops when the shortest input iterable is exhausted.” So, the length of the output is determined by the shortest input iterable, rather than the longest. “With a single iterable argument, it returns an iterator of 1-tuples.” So you still get tuples, even if there’s only one element in each of them, and then “with no arguments, it simply returns an empty iterator.” So, you don’t get empty tuples or anything, you just get a completely empty iterator.
02:53 Let’s take a look at how some of these concepts work themselves out in real code.
So, I’ve defined a few iterables, just ahead of time so you don’t have to watch me type them. Let’s do a simple example first with just the
prices list and the
car_sizes list. So if I say
zip(prices, car sizes), what do I get? Well, I get a
zip object, which is an iterator, and then you can also see the memory address at which that object is stored.
If I want to take a look at the actual things in this iterator, I can convert it to a
list. I can say
And as you can see, the behavior is exactly like what you observed in the documentation. So, at the zeroth index of
prices there’s the value
10000, and at the zeroth index of
car_sizes there’s the string
"small", and at the zeroth index of the output there is simply a tuple with
And then the same pattern continues for index
1 and index
2. And as you can see, the length of the output is
3 because the length of both of the inputs are
And then the number of elements in each tuple is two, because I passed in two iterables to the
If I add the
colors iterable to this
zip() function, then I get a slightly different output—and of course I’ll have to convert it to a
list if I want to actually see what’s in the
zip object offhand.
Normally, you would put this in a
for loop so that you could iterate through it, but I’m using the
list() constructor just to make it easier to see. Now, since I passed in
colors as well, the size of each output tuple has increased to
3, because I passed in three inputs, but the length of the total output has actually decreased by one, and that’s because
colors only has two elements in it,
And so as you saw in the documentation, the
zip() function—the length of that output—is dependent on the shortest input. If I want to use the longest input to determine the number of elements in my output iterator, then I can import from
itertools the function
zip_longest(), and that does almost exactly the same thing as
except that it uses the longest input to determine—oh, I’m sorry. I said
I was trying to type and talk at the same time and that’s always dangerous. So, I am now going to use
zip_longest() and as you can see, the length of the output is once again
3, except that I now have this padding
None value because there is no third element of the
colors list, and so
zip_longest() has to fill it with something, so it just uses
None. If you don’t want to use
None, you can use the
fillvalue parameter of the function, and as you can see there’s this nice
param iter1, param iter2...—so this just means any number of iterables—and then the
And I can use this to just add in a different value for the
fillvalue. In this case, it’s a question mark string (
"?"). I could also make it the value
You know, so there’s a whole list of—you know, you can put any value in here that you want,
fillvalue. So that’s pretty nice. That’s how you can avoid the shortening of the
Let’s take a look at just the other two edge cases that the documentation warned you about. So, if you want to say
list(zip(prices)), you get a list of 1-tuples.
06:30 And this is important because if you index this list, what you get is a tuple with one item, so if you actually need this number, you’ll need to index twice to get that actual value in there.
That’s something to be aware of when you’re writing code that uses this. And then if I pass in just nothing to the
zip() function, I just get an empty iterator as output. And of course, an empty iterator as a
list is an empty list.
So, that is how the
zip() function works and what it looks like in practice. I worked up just a little slide which is an overview, and as you can see, it goes over everything that I’ve done in this lesson so far.
So, it’s a function for parallel iteration that gets the i-th element of each input and then zips them into a tuple.
zip() uses the length of the shortest input by default.
You can use
itertools.zip_longest() if you don’t want that to happen. It returns 1-tuples if given only one input, and it returns an empty iterator with no inputs.
07:26 So, that’s pretty much everything that I’ve just covered in short form.
In the next lesson, I’m going to cover the differences between the
zip() function in Python 2 and in Python 3.
Become a Member to join the conversation.