Stricter Zipping
00:00
In the previous lesson, you learned about the three new functions in the statistics
module. In this lesson, I’ll show you the zip()
function and its new strict
parameter.
00:10
The built-in zip()
function is used to combine two or more sets of data into a set of tuples. The easiest way to understand this is to use an example.
00:19
Consider this table that shows data for five different sets of Lego. Each set has a name, an ID number, and some number of pieces. I’ll use this data in the REPL to show you zip()
.
00:33
Let’s say you’ve got three lists, one for each column in the Lego table. What zip()
lets you do is create a series of tuples, each tuple containing the name, number, and size of a Lego set.
00:46
You run zip()
by passing in the sequences you want to operate on. And you get back a generator. That’s great from a memory management perspective but not particularly instructive. So let me do that again, converting the generator into a list.
01:11 That’s better. The result of the zip is a list of tuples, each tuple being about a Lego set—from the famous French museum to the New York City skyline. Now, here’s the footgun.
01:24
zip()
just assumes your data is all good, but what if it isn’t? Let me remove the last number from the ID set.
01:38
Now, set_numbers
has only four items in it. The name and pieces lists still have five members, but the set ID sequence now only has four.
01:56
Running zip()
on this corrupted data is problematic. It just does it. This can be a hard bug to find. You end up with a zipped sequence with just one less item in it. This is even worse if the missing piece of data is in the middle.
02:09 The wrong things will get associated with each other. Python 3.10 to the rescue.
02:22
zip()
now supports an optional argument called strict
. When set to True
, zip()
will no longer work if there is a length mismatch. It throws a ValueError
instead.
02:34
There is an alternative to this that’s been kicking around for a while in the itertools
module. It’s called zip_longest()
. It works like zip()
but inserts None
into the data when there’s a length mismatch. Let’s run it on the Lego data.
02:56
This time, New York City is included, it just ends up with None
as its set ID. zip_longest()
also takes an optional argument called fillvalue
.
03:06
This allows you to fill in the empty data with something other than None
, that you specify.
03:18
Here, I’ve set fillvalue
to 0
, and you can see the difference at the end of the sequence in the New York City tuple. Appropriately enough, the lesson on zip()
was rather zippy. Next up, a miscellaneous, catchall, collage, grab bag, hodgepodge, potpourri smorgasbord of the smaller stuff that’s left in 3.10. Why, I do own a thesaurus! Why do you ask?
Become a Member to join the conversation.