Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Archiving Files

00:00 In this lesson, I’m going to show you how to manipulate archive files like ZIP and TAR files. These files are convenient because they package multiple files into one, often compressed, version of that multiple file set.

00:16 They can be very helpful, especially when you need to save space, or send files over a network, or something like that. The rest of this lesson will kind of be organized into a two-part sequence.

00:27 First, I’ll show you the zipfile.ZipFile() function and object, which will let you manipulate ZIP files on a really—kind of a low level, you might say?

00:38 As in, you can work with the individual files involved from that ZIP, you can extract single files, and so on. It’s kind of a granular approach to working with ZIP files.

00:49 And that approach will also translate to the tarfile module, which I’m not going to really cover in this tutorial, but it has very similar syntax and usability to the zipfile module.

00:59 And then in the second section, I’ll show you the shutil.make_archive() and .unpack_archive() functions that

01:07 make all of this super easy, but remove a little bit of that granular control. So, as I said, zipfile.ZipFile() makes working with ZIP files almost like working with any other file, and then they have .extract() and .extract_all() functions to get those files back out of the ZIP.

01:22 Then the shutil.make_archive() and unpack_archive() functions are pretty much just—give this a bunch of files and it will make them into a ZIP file, or a TAR file, or whatever you want, and then give this the archive and it will extract it all for you.

01:36 So kind of, like, the hard, but more granular approach versus the easy, but a little bit less flexible version. For the tutorial I’ll use a super-simple sample directory that I’ll then just recombine, and zip and tar in multiple ways, and just start making a whole bunch of crazy stuff happen there.

01:56 So let’s take a look at how this works over in the REPL.

02:00 All right, so I have all my imports together. The first thing I’ll do is just show you the contents of the directory so that I make sure not to zip anything weird together.

02:10 And then I’ll show you how the zipfile.ZipFile() constructor works. You just give it the name that you want your ZIP file to have and then you give it a file mode just like you would with with open() as.

02:22 You just want to make sure that you’re writing to it or reading from it. And then you can say something as simple as z.write('file1.py').

02:32 And then if I use my listdir() here, I can see that first.zip has indeed been created. If I want to actually take a look at the things that are in that ZIP file, I can also use the ZipFile() constructor, I just have to now pass in the read mode parameter.

02:46 And so I can now say, let’s just say, print(z.filelist). Take a look at the different attributes of a ZipFile on your own, because it has a lot of useful attributes here.

02:57 So as you can see, it has file1.py and then it has its file mode, so the read and write permissions, it has a size, which is just 0 because I just made this as a test.

03:08 It doesn’t have anything in it. So, that’s all pretty cool, pretty easy to add. And you can do this same thing with a loop if you want to add all of these items in here, and it works just fine with directories as well.

03:20 You just need to pass z.write() a file or directory name, and it will write it into the ZIP file. So, that’s all pretty convenient and I’ll just use os.listdir() one more time to show you that opening it in read mode doesn’t actually do anything to the ZIP, it just gives you access to the data that’s inside it.

03:37 So, that’s all well and good, but how do you actually get the files back out? I’ve shown you how to print the things that are in there, but I haven’t shown you how to actually get them.

03:44 Well, that’s pretty simple, too. You just say z.extract() and then you can extract a single file, or a directory, or—in a second I’ll show you—you can use .extract_all() to just get everything.

03:57 So, let’s say .extract('file1.py') and then you can pass it a target of where to extract it to. So I’m going to create a new directory called extracted, and I’ll say 'file1.py' here.

04:10 And so zipfile will actually do all this work for me to create this extracted directory and then put the extracted file into it. So if I just listdir('extracted'), you can see that that’s in there, just as expected.

04:24 So now, let me quickly—I’m going to create a new ZIP file, which is going to be a ZIP file with all of the things in here in it. So it’ll show you that you can actually create nested ZIP files, you can have directories in your ZIP files.

04:39 All of it works just fine. So I’ll say for f in os.listdir():,

04:46 z.write(f), there we go. So now, if I take a look, I’ll do just what I did above, except I will say here—I’m going to change this call from .extract() to .extract_all(), and I’m going to say 'extracted_all/', and so that will be the new directory, which will have all of this extracted content.

05:10 And .extract_all() is great. It actually has some other parameters, too, like it has a password parameter, pwd, so if you’re extracting from a password-protected ZIP file, that’ll work just fine.

05:21 It has a lot of other parameters, as well, that you can use. Now, I listdir(), you can see I have this 'extracted_all' directory, and then I will list that and show you that it has everything that was in this listdir() above.

05:36 So, this all works great. That’s how you can work with ZIP files, and it gives you a lot of control over how much you want to extract, how much you want to write, all of that.

05:45 But there’s actually an even easier way here, which is to use shutil. With shutil I can say make_archive(), and then I give it a base name, a format, and a root directory.

05:58 The base name is just the name of the new file. So I’ll say, 'made_with_shutil.zip', and then I will pass in the 'zip' format option.

06:11 I won’t give it a root_dir, because it’s just going to default to zipping everything in my current directory. So I can take that, and now I can say shutiland, of course, what I forgot when I was making this name was that shutil is a little smarter than zipfile, so it actually appends the .zip on, without me even needing to.

06:29 So I now have a made_with_shutil.zip.zip, which is kind of silly but works just fine as a ZIP file, so it’s not a big deal. And then I can also say shutil, and I’ll show you when I unpack it that it has all the correct stuff in it.

06:44 So, I unpack the archive with the filename 'made_with_shutil.zip.zip', and then I’ll give it the extract_dir of "from_shutil_archive".

06:58 This is all starting to get a little bit hairy with all of these different ZIP files and everything. I hope that you can keep track of it just fine. And so now, I can say os.listdir() for, hopefully, the final time in this lesson.

07:10 And as you can see, there’s a 'from_shutil_archive'. And if I list that,

07:17 then it has all of these, kind of, crazy—now it’s starting to get really recursive, almost, because I have this 'from_shutil_archive', which has all of the contents of this except for itself.

07:28 But that, actually, is exactly the expected behavior. So you can see that unpack_archive() and make_archive() do the same things I was able to do above really quickly and easily.

07:37 You can also use shutil.make_archive() to make a TAR archive, which I haven’t talked much about, mostly because the zipfile module has a corresponding tarfile module that has a lot of the same syntax.

07:50 So I encourage you to check that out, because once you’ve seen this video and watched how zipfile works, tarfile will be no problem for you at all.

07:58 But you can use shutil.make_archive() to make a TAR file, which works just as well. Then you can also unpack with the same logic. So, this all works great, and shutil makes things super simple and easy, but again, you lose a little bit of the granularity.

08:13 Like, it’s hard to extract just one file from the archive or to make an archive from just a few different files without wrapping them in a directory first.

08:23 So, that’s something to keep in mind. In the next and last lesson, I’m going to discuss how you can use the fileinput module to put multiple files together into one input stream, which can make reading from a large number of related files quite easy.

Avatar image for tonypy

tonypy on March 17, 2023

It’s worth noting that when the directories and files are on a defined path then the files added to the zipfile will contain the directory structure.

base_dir = Path(r"D:\Python\Real Python\sample_directories\Lesson 10")
with zipfile.ZipFile(base_dir / "first.zip", "w") as z: 
    z.write(base_dir / "file1.py")

This results in the content of ‘first.zip’ as ‘Python\Real Python\sample_directories\Lesson 10\file1.py’

[<ZipInfo filename='Python/Real Python/sample_directories/Lesson 10/file1.py' filemode='-rw-rw-rw-' file_size=24>]

To avoid this I used

z.write(base_dir / "file1.py", arcname = "file1.py")

The content is then

#[<ZipInfo filename='file1.py' filemode='-rw-rw-rw-' file_size=24>]

See docs.python.org/3/library/zipfile.html#zipfile.ZipFile.write

Avatar image for tonypy

tonypy on March 17, 2023

Can you review the provided approach for extract()? When I use

with zipfile.ZipFile(base_dir / "first.zip", "r") as z:
    z.extract("file1.py", base_dir / "extracted/file1.py")

I get ‘file1.py’ in directory ‘file1.py’ under the directory ‘extract’. The only solution I have for extracting ‘file1.py’ to the directory ‘extract’ is

    z.extract("file1.py", base_dir / "extracted")

Thoughts?

Avatar image for tonypy

tonypy on March 17, 2023

I don’t understand the logic behind

with zipfile.ZipFile("first.zip", "w") as z:
    for f in os.listdir():
        z.write(f)

This is adding not just files and directories to “first.zip” but “first.zip” itself. Also, the directories that are added are empty, they don’t include any content. So when extracting to ‘extract_all’, the directories ‘extracted’ and ‘sub_dir’ are empty and the extracted file ‘first.zip’ under ‘extracted_all’ is invalid. What am I missing?

Avatar image for tonypy

tonypy on March 18, 2023

It seems counter intuitive to have a zip file of a directory within the zip file of that directory i.e. ‘made_with_shutil.zip.zip’ contains not just the contents of the directory ‘Lesson 10’ but also the file ‘made_with_shutil.zip.zip’ (which when implemented on Windows is an inaccessible file).

So apart from creating ‘made_with_shutil.zip.zip’ elsewhere in the directory tree but outside of the path which includes ‘Lesson 10’ (which means that ‘made_with_shutil.zip.zip’ does not contain a file ‘made_with_shutil.zip.zip’), is there a way to create ‘made_with_shutil.zip.zip’ within directory ‘Lesson 10’ but without including the file ‘made_with_shutil.zip.zip’? Or is that not possible with shutil.make_archive()?

Avatar image for Dick de Goede

Dick de Goede on July 12, 2023

You can put a slash at the end of the target to have the file being extracted into a directory:

with zipfile.ZipFile('first.zip', 'r') as z:
    ...:     z.extract('file1.py', 'extracted/')

When you use extracted/file1.py as target, you will have a directory extracted/file1.py with the file file1.py in it. So to extract in a directory just use <dirname>/

Become a Member to join the conversation.