00:00 In this lesson, I’m going to show you how to manipulate archive files like ZIP and TAR files. These files are convenient because they package multiple files into one, often compressed, version of that multiple file set.
00:16 They can be very helpful, especially when you need to save space, or send files over a network, or something like that. The rest of this lesson will kind of be organized into a two-part sequence.
First, I’ll show you the
zipfile.ZipFile() function and object, which will let you manipulate ZIP files on a really—kind of a low level, you might say?
00:38 As in, you can work with the individual files involved from that ZIP, you can extract single files, and so on. It’s kind of a granular approach to working with ZIP files.
And that approach will also translate to the
tarfile module, which I’m not going to really cover in this tutorial, but it has very similar syntax and usability to the
And then in the second section, I’ll show you the
.unpack_archive() functions that
make all of this super easy, but remove a little bit of that granular control. So, as I said,
zipfile.ZipFile() makes working with ZIP files almost like working with any other file, and then they have
.extract_all() functions to get those files back out of the ZIP.
unpack_archive() functions are pretty much just—give this a bunch of files and it will make them into a ZIP file, or a TAR file, or whatever you want, and then give this the archive and it will extract it all for you.
01:36 So kind of, like, the hard, but more granular approach versus the easy, but a little bit less flexible version. For the tutorial I’ll use a super-simple sample directory that I’ll then just recombine, and zip and tar in multiple ways, and just start making a whole bunch of crazy stuff happen there.
01:56 So let’s take a look at how this works over in the REPL.
02:00 All right, so I have all my imports together. The first thing I’ll do is just show you the contents of the directory so that I make sure not to zip anything weird together.
And then I’ll show you how the
zipfile.ZipFile() constructor works. You just give it the name that you want your ZIP file to have and then you give it a file mode just like you would with
with open() as.
You just want to make sure that you’re writing to it or reading from it. And then you can say something as simple as
And then if I use my
listdir() here, I can see that
first.zip has indeed been created. If I want to actually take a look at the things that are in that ZIP file, I can also use the
ZipFile() constructor, I just have to now pass in the read mode parameter.
And so I can now say, let’s just say,
print(z.filelist). Take a look at the different attributes of a
ZipFile on your own, because it has a lot of useful attributes here.
So as you can see, it has
file1.py and then it has its file mode, so the read and write permissions, it has a size, which is just
0 because I just made this as a test.
03:08 It doesn’t have anything in it. So, that’s all pretty cool, pretty easy to add. And you can do this same thing with a loop if you want to add all of these items in here, and it works just fine with directories as well.
You just need to pass
z.write() a file or directory name, and it will write it into the ZIP file. So, that’s all pretty convenient and I’ll just use
os.listdir() one more time to show you that opening it in read mode doesn’t actually do anything to the ZIP, it just gives you access to the data that’s inside it.
03:37 So, that’s all well and good, but how do you actually get the files back out? I’ve shown you how to print the things that are in there, but I haven’t shown you how to actually get them.
Well, that’s pretty simple, too. You just say
z.extract() and then you can extract a single file, or a directory, or—in a second I’ll show you—you can use
.extract_all() to just get everything.
So, let’s say
.extract('file1.py') and then you can pass it a target of where to extract it to. So I’m going to create a new directory called
extracted, and I’ll say
zipfile will actually do all this work for me to create this extracted directory and then put the extracted file into it. So if I just
listdir('extracted'), you can see that that’s in there, just as expected.
04:24 So now, let me quickly—I’m going to create a new ZIP file, which is going to be a ZIP file with all of the things in here in it. So it’ll show you that you can actually create nested ZIP files, you can have directories in your ZIP files.
All of it works just fine. So I’ll say
for f in os.listdir():,
z.write(f), there we go. So now, if I take a look, I’ll do just what I did above, except I will say here—I’m going to change this call from
.extract_all(), and I’m going to say
'extracted_all/', and so that will be the new directory, which will have all of this extracted content.
.extract_all() is great. It actually has some other parameters, too, like it has a password parameter,
pwd, so if you’re extracting from a password-protected ZIP file, that’ll work just fine.
It has a lot of other parameters, as well, that you can use. Now, I
listdir(), you can see I have this
'extracted_all' directory, and then I will list that and show you that it has everything that was in this
05:36 So, this all works great. That’s how you can work with ZIP files, and it gives you a lot of control over how much you want to extract, how much you want to write, all of that.
But there’s actually an even easier way here, which is to use
shutil I can say
make_archive(), and then I give it a base name, a format, and a root directory.
The base name is just the name of the new file. So I’ll say,
'made_with_shutil.zip', and then I will pass in the
'zip' format option.
I won’t give it a
root_dir, because it’s just going to default to zipping everything in my current directory. So I can take that, and now I can say
shutil—and, of course, what I forgot when I was making this name was that
shutil is a little smarter than
zipfile, so it actually appends the
.zip on, without me even needing to.
So I now have a
made_with_shutil.zip.zip, which is kind of silly but works just fine as a ZIP file, so it’s not a big deal. And then I can also say
shutil, and I’ll show you when I unpack it that it has all the correct stuff in it.
So, I unpack the archive with the filename
'made_with_shutil.zip.zip', and then I’ll give it the
This is all starting to get a little bit hairy with all of these different ZIP files and everything. I hope that you can keep track of it just fine. And so now, I can say
os.listdir() for, hopefully, the final time in this lesson.
And as you can see, there’s a
'from_shutil_archive'. And if I list that,
then it has all of these, kind of, crazy—now it’s starting to get really recursive, almost, because I have this
'from_shutil_archive', which has all of the contents of this except for itself.
But that, actually, is exactly the expected behavior. So you can see that
make_archive() do the same things I was able to do above really quickly and easily.
You can also use
shutil.make_archive() to make a TAR archive, which I haven’t talked much about, mostly because the
zipfile module has a corresponding
tarfile module that has a lot of the same syntax.
So I encourage you to check that out, because once you’ve seen this video and watched how
tarfile will be no problem for you at all.
But you can use
shutil.make_archive() to make a TAR file, which works just as well. Then you can also unpack with the same logic. So, this all works great, and
shutil makes things super simple and easy, but again, you lose a little bit of the granularity.
08:13 Like, it’s hard to extract just one file from the archive or to make an archive from just a few different files without wrapping them in a directory first.
So, that’s something to keep in mind. In the next and last lesson, I’m going to discuss how you can use the
fileinput module to put multiple files together into one input stream, which can make reading from a large number of related files quite easy.
Can you review the provided approach for extract()? When I use
with zipfile.ZipFile(base_dir / "first.zip", "r") as z: z.extract("file1.py", base_dir / "extracted/file1.py")
I get ‘file1.py’ in directory ‘file1.py’ under the directory ‘extract’. The only solution I have for extracting ‘file1.py’ to the directory ‘extract’ is
z.extract("file1.py", base_dir / "extracted")
I don’t understand the logic behind
with zipfile.ZipFile("first.zip", "w") as z: for f in os.listdir(): z.write(f)
This is adding not just files and directories to “first.zip” but “first.zip” itself. Also, the directories that are added are empty, they don’t include any content. So when extracting to ‘extract_all’, the directories ‘extracted’ and ‘sub_dir’ are empty and the extracted file ‘first.zip’ under ‘extracted_all’ is invalid. What am I missing?
It seems counter intuitive to have a zip file of a directory within the zip file of that directory i.e. ‘made_with_shutil.zip.zip’ contains not just the contents of the directory ‘Lesson 10’ but also the file ‘made_with_shutil.zip.zip’ (which when implemented on Windows is an inaccessible file).
So apart from creating ‘made_with_shutil.zip.zip’ elsewhere in the directory tree but outside of the path which includes ‘Lesson 10’ (which means that ‘made_with_shutil.zip.zip’ does not contain a file ‘made_with_shutil.zip.zip’), is there a way to create ‘made_with_shutil.zip.zip’ within directory ‘Lesson 10’ but without including the file ‘made_with_shutil.zip.zip’? Or is that not possible with shutil.make_archive()?
Become a Member to join the conversation.
tonypy on March 17, 2023
It’s worth noting that when the directories and files are on a defined path then the files added to the zipfile will contain the directory structure.
This results in the content of ‘first.zip’ as ‘Python\Real Python\sample_directories\Lesson 10\file1.py’
To avoid this I used
The content is then