Python Versions

Parallel Iteration With Python's zip() Function Liam Pulsifer 08:02

00:00 In this lesson, I’m going to cover the differences between how the zip() function works in Python 2 and Python 3. So far, I have described—and later in this series, I will continue to describe—the behavior of zip() in Python 3, because that’s the most recent and widely used family of Python.

00:18 But there are still legacy codebases out there that use a lot of Python 2, and it’s likely that you’ll run into it at least at some point in your Python career.

00:26 So, it’s always best to make sure you’re familiar with the differences between the two versions of any Python function, including zip() in Python 2 and Python 3.

00:36 The main difference between how zip() works in Python 2 versus Python 3 is simple—it returns a list of tuples instead of an iterator. Otherwise, it has almost exactly the same behavior and effects. However, this little change can have a dramatic effect on your memory usage if you’re passing in large input iterables to the zip() function.

00:58 And the reason for this is because a list—when you return a list, you need to construct that entire list and build it and hold it somewhere in memory so that it can be returned, whereas an iterator actually constructs the tuples of the contents of the zipped output in a one-at-a-time fashion, so only one tuple is ever actually being processed at a given time.

01:20 So that can save a lot of memory when you have large inputs to the zip() function.

01:25 Of course you can mimic the Python 2 behavior in Python 3 by just wrapping all of your calls to zip() with a list constructor, if for some reason you really need a list. And in fact, that’s what I did in the last lesson.

01:37 Just to show you the contents of the zip() output, I often found it convenient to wrap it in a list constructor, just because that has more convenient printed output. But remember, if you use this little trick to get a list, then you’ll incur exactly the same—perhaps—downsides if you’re using some really large inputs.

01:55 So, keep that in mind, but know that it’s very easy to get a list if you need to in Python 3. The reverse is also true, luckily. It’s not quite as simple, but you can use the itertools.izip() function in Python 2 to get the same behavior as Python 3’s zip() function.

02:12 So, let’s take a look over in the REPL and see what this looks like in practice. So as you can see, I’ve set up a Python 2 REPL instead of my usual Python 3.

02:24 What I can do is I can demonstrate just the behavior of the zip() function by constructing a couple of basic iterables. So just for fun, let’s say something like fruits = a list with "Apple",

02:39 "Orange", and "Banana"—and I should keep them all capitalized for consistency’s sake. And then I’ll have a tuple of prices.

02:50 And this is a classic example for me, and these are not whatsoever accurate to real prices, so please don’t assume that this is what you’ll pay for an apple or an orange or banana. Ha!

03:40 then I get an itertools.izip object, and I’ll just use the the type() function one more time just to make sure that everything is in order, and as you can see—yup!

03:51 It’s an itertools.izip object at this memory location. But of course, you can iterate through this, so you could say for f, p in izip(fruits, prices): print(f, p).

04:10 And as you can see, it does exactly what you might expect it to do. This is indeed an iterator that one can iterate through. And I’ll go through this unpacking paradigm a little bit more in the next lesson, but just to show you that it’s not a list but it is still an iterable thing, and it functions in exactly the same way as the zip() from Python 3.

04:31 So, something that I think is kind of fun about this difference—or, kind of a fun way to get around it—is a little trick that you can do to make sure that your usage of zip() is compatible in both Python 2 and Python 3.

04:46 If you’re not sure where or which Python version is going to be run on your code, you can actually do a trick like this, where you can say try: from itertools import izip as zip—so, give izip() the alias zip—and then you can say except Error: and then you just say pass.

05:46 And I should’ve said except ImportError because that’s the more clear thing to do, but in that case, it will just pass—it won’t do anything.

05:54 And zip() will indeed still have the Python 3 behavior because nothing happened, right? And so in this case, now I can say zip(fruits, prices) and I get an itertools.izip object, which behaves in almost exactly the same way as the Python 3 zip object that it returns in Python 3.

06:13 So, if I just return really quick to my normal Python REPL, which is ptpython, then I can say—let’s see. I’ll just reconstruct this. I’ll say fruits = ["Apple", "Orange", "Banana"],

06:29 and then prices = [1.2, 1.4, 1.8].

07:05 And then I can say except ImportError: and just pass.

07:11 And as you can see, that does, really, nothing here. And if I then say zip(fruits, prices), I still get the same kind of zip object that I did in the beginning.

07:22 So, this will still work in Python 3 and it will essentially guarantee that your zip() has the iterator returned rather than the list version.

Become a Member to join the conversation.