Python's zipfile: Manipulate Your ZIP Files Efficiently

Python's zipfile: Manipulate Your ZIP Files Efficiently

by Leodanis Pozo Ramos Feb 14, 2022 intermediate python

Python’s zipfile is a standard library module intended to manipulate ZIP files. This file format is a widely adopted industry standard when it comes to archiving and compressing digital data. You can use it to package together several related files. It also allows you to reduce the size of your files and save disk space. Most importantly, it facilitates data exchange over computer networks.

Knowing how to create, read, write, populate, extract, and list ZIP files using the zipfile module is a useful skill to have as a Python developer or a DevOps engineer.

In this tutorial, you’ll learn how to:

  • Read, write, and extract files from ZIP files with Python’s zipfile
  • Read metadata about the content of ZIP files using zipfile
  • Use zipfile to manipulate member files in existing ZIP files
  • Create new ZIP files to archive and compress files

If you commonly deal with ZIP files, then this knowledge can help to streamline your workflow to process your files confidently.

To get the most out of this tutorial, you should know the basics of working with files, using the with statement, handling file system paths with pathlib, and working with classes and object-oriented programming.

To get the files and archives that you’ll use to code the examples in this tutorial, click the link below:

Getting Started With ZIP Files

ZIP files are a well-known and popular tool in today’s digital world. These files are fairly popular and widely used for cross-platform data exchange over computer networks, notably the Internet.

You can use ZIP files for bundling regular files together into a single archive, compressing your data to save some disk space, distributing your digital products, and more. In this tutorial, you’ll learn how to manipulate ZIP files using Python’s zipfile module.

Because the terminology around ZIP files can be confusing at times, this tutorial will stick to the following conventions regarding terminology:

Term Meaning
ZIP file, ZIP archive, or archive A physical file that uses the ZIP file format
File A regular computer file
Member file A file that is part of an existing ZIP file

Having these terms clear in your mind will help you avoid confusion while you read through the upcoming sections. Now you’re ready to continue learning how to manipulate ZIP files efficiently in your Python code!

What Is a ZIP File?

You’ve probably already encountered and worked with ZIP files. Yes, those with the .zip file extension are everywhere! ZIP files, also known as ZIP archives, are files that use the ZIP file format.

PKWARE is the company that created and first implemented this file format. The company put together and maintains the current format specification, which is publicly available and allows the creation of products, programs, and processes that read and write files using the ZIP file format.

The ZIP file format is a cross-platform, interoperable file storage and transfer format. It combines lossless data compression, file management, and data encryption.

Data compression isn’t a requirement for an archive to be considered a ZIP file. So you can have compressed or uncompressed member files in your ZIP archives. The ZIP file format supports several compression algorithms, though Deflate is the most common. The format also supports information integrity checks with CRC32.

Even though there are other similar archiving formats, such as RAR and TAR files, the ZIP file format has quickly become a common standard for efficient data storage and for data exchange over computer networks.

ZIP files are everywhere. For example, office suites such as Microsoft Office and Libre Office rely on the ZIP file format as their document container file. This means that .docx, .xlsx, .pptx, .odt, .ods, .odp files are actually ZIP archives containing several files and folders that make up each document. Other common files that use the ZIP format include .jar, .war, and .epub files.

You may be familiar with GitHub, which provides web hosting for software development and version control using Git. GitHub uses ZIP files to package software projects when you download them to your local computer. For example, you can download the exercise solutions for Python Basics: A Practical Introduction to Python 3 book in a ZIP file, or you can download any other project of your choice.

ZIP files allow you to aggregate, compress, and encrypt files into a single interoperable and portable container. You can stream ZIP files, split them into segments, make them self-extracting, and more.

Why Use ZIP Files?

Knowing how to create, read, write, and extract ZIP files can be a useful skill for developers and professionals who work with computers and digital information. Among other benefits, ZIP files allow you to:

  • Reduce the size of files and their storage requirements without losing information
  • Improve transfer speed over the network due to reduced size and single-file transfer
  • Pack several related files together into a single archive for efficient management
  • Bundle your code into a single archive for distribution purposes
  • Secure your data by using encryption, which is a common requirement nowadays
  • Guarantee the integrity of your information to avoid accidental and malicious changes to your data

These features make ZIP files a useful addition to your Python toolbox if you’re looking for a flexible, portable, and reliable way to archive your digital files.

Can Python Manipulate ZIP Files?

Yes! Python has several tools that allow you to manipulate ZIP files. Some of these tools are available in the Python standard library. They include low-level libraries for compressing and decompressing data using specific compression algorithms, such as zlib, bz2, lzma, and others.

Python also provides a high-level module called zipfile specifically designed to create, read, write, extract, and list the content of ZIP files. In this tutorial, you’ll learn about Python’s zipfile and how to use it effectively.

Manipulating Existing ZIP Files With Python’s zipfile

Python’s zipfile provides convenient classes and functions that allow you to create, read, write, extract, and list the content of your ZIP files. Here are some additional features that zipfile supports:

  • ZIP files greater than 4 GiB (ZIP64 files)
  • Data decryption
  • Several compression algorithms, such as Deflate, Bzip2, and LZMA
  • Information integrity checks with CRC32

Be aware that zipfile does have a few limitations. For example, the current data decryption feature can be pretty slow because it uses pure Python code. The module can’t handle the creation of encrypted ZIP files. Finally, the use of multi-disk ZIP files isn’t supported either. Despite these limitations, zipfile is still a great and useful tool. Keep reading to explore its capabilities.

Opening ZIP Files for Reading and Writing

In the zipfile module, you’ll find the ZipFile class. This class works pretty much like Python’s built-in open() function, allowing you to open your ZIP files using different modes. The read mode ("r") is the default. You can also use the write ("w"), append ("a"), and exclusive ("x") modes. You’ll learn more about each of these in a moment.

ZipFile implements the context manager protocol so that you can use the class in a with statement. This feature allows you to quickly open and work with a ZIP file without worrying about closing the file after you finish your work.

Before writing any code, make sure you have a copy of the files and archives that you’ll be using:

To get your working environment ready, place the downloaded resources into a directory called python-zipfile/ in your home folder. Once you have the files in the right place, move to the newly created directory and fire up a Python interactive session there.

To warm up, you’ll start by reading the ZIP file called sample.zip. To do that, you can use ZipFile in reading mode:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

The first argument to the initializer of ZipFile can be a string representing the path to the ZIP file that you need to open. This argument can accept file-like and path-like objects too. In this example, you use a string-based path.

The second argument to ZipFile is a single-letter string representing the mode that you’ll use to open the file. As you learned at the beginning of this section, ZipFile can accept four possible modes, depending on your needs. The mode positional argument defaults to "r", so you can get rid of it if you want to open the archive for reading only.

Inside the with statement, you call .printdir() on archive. The archive variable now holds the instance of ZipFile itself. This function provides a quick way to display the content of the underlying ZIP file on your screen. The function’s output has a user-friendly tabular format with three informative columns:

  • File Name
  • Modified
  • Size

If you want to make sure that you’re targeting a valid ZIP file before you try to open it, then you can wrap ZipFile in a tryexcept statement and catch any BadZipFile exception:

>>>
>>> import zipfile

>>> try:
...     with zipfile.ZipFile("sample.zip") as archive:
...         archive.printdir()
... except zipfile.BadZipFile as error:
...     print(error)
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

>>> try:
...     with zipfile.ZipFile("bad_sample.zip") as archive:
...         archive.printdir()
... except zipfile.BadZipFile as error:
...     print(error)
...
File is not a zip file

The first example successfully opens sample.zip without raising a BadZipFile exception. That’s because sample.zip has a valid ZIP format. On the other hand, the second example doesn’t succeed in opening bad_sample.zip, because the file is not a valid ZIP file.

To check for a valid ZIP file, you can also use the is_zipfile() function:

>>>
>>> import zipfile

>>> if zipfile.is_zipfile("sample.zip"):
...     with zipfile.ZipFile("sample.zip", "r") as archive:
...         archive.printdir()
... else:
...     print("File is not a zip file")
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

>>> if zipfile.is_zipfile("bad_sample.zip"):
...     with zipfile.ZipFile("bad_sample.zip", "r") as archive:
...         archive.printdir()
... else:
...     print("File is not a zip file")
...
File is not a zip file

In these examples, you use a conditional statement with is_zipfile() as a condition. This function takes a filename argument that holds the path to a ZIP file in your file system. This argument can accept string, file-like, or path-like objects. The function returns True if filename is a valid ZIP file. Otherwise, it returns False.

Now say you want to add hello.txt to a hello.zip archive using ZipFile. To do that, you can use the write mode ("w"). This mode opens a ZIP file for writing. If the target ZIP file exists, then the "w" mode truncates it and writes any new content you pass in.

If the target ZIP file doesn’t exist, then ZipFile creates it for you when you close the archive:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("hello.zip", mode="w") as archive:
...     archive.write("hello.txt")
...

After running this code, you’ll have a hello.zip file in your python-zipfile/ directory. If you list the file content using .printdir(), then you’ll notice that hello.txt will be there. In this example, you call .write() on the ZipFile object. This method allows you to write member files into your ZIP archives. Note that the argument to .write() should be an existing file.

The append mode ("a") allows you to append new member files to an existing ZIP file. This mode doesn’t truncate the archive, so its original content is safe. If the target ZIP file doesn’t exist, then the "a" mode creates a new one for you and then appends any input files that you pass as an argument to .write().

To try out the "a" mode, go ahead and add the new_hello.txt file to your newly created hello.zip archive:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("hello.zip", mode="a") as archive:
...     archive.write("new_hello.txt")
...

>>> with zipfile.ZipFile("hello.zip") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
new_hello.txt                             2021-08-31 17:13:44           13

Here, you use the append mode to add new_hello.txt to the hello.zip file. Then you run .printdir() to confirm that the new file is present in the ZIP file.

ZipFile also supports an exclusive mode ("x"). This mode allows you to exclusively create new ZIP files and write new member files into them. You’ll use the exclusive mode when you want to make a new ZIP file without overwriting an existing one. If the target file already exists, then you get FileExistsError.

Finally, if you create a ZIP file using the "w", "a", or "x" mode and then close the archive without adding any member files, then ZipFile creates an empty archive with the appropriate ZIP format.

Reading Metadata From ZIP Files

You’ve already put .printdir() into action. It’s a useful method that you can use to list the content of your ZIP files quickly. Along with .printdir(), the ZipFile class provides several handy methods for extracting metadata from existing ZIP files.

Here’s a summary of those methods:

Method Description
.getinfo(filename) Returns a ZipInfo object with information about the member file provided by filename. Note that filename must hold the path to the target file inside the underlying ZIP file.
.infolist() Returns a list of ZipInfo objects, one per member file.
.namelist() Returns a list holding the names of all the member files in the underlying archive. The names in this list are valid arguments to .getinfo().

With these three tools, you can retrieve a lot of useful information about the content of your ZIP files. For example, take a look at the following example, which uses .getinfo():

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     info = archive.getinfo("hello.txt")
...

>>> info.file_size
83

>>> info.compress_size
83

>>> info.filename
'hello.txt'

>>> info.date_time
(2021, 9, 7, 19, 50, 10)

As you learned in the table above, .getinfo() takes a member file as an argument and returns a ZipInfo object with information about it.

ZipInfo objects have several attributes that allow you to retrieve valuable information about the target member file. For example, .file_size and .compress_size hold the size, in bytes, of the original and compressed files, respectively. The class also has some other useful attributes, such as .filename and .date_time, which return the filename and the last modification date.

With .infolist(), you can extract information from all the files in a given archive. Here’s an example that uses this method to generate a minimal report with information about all the member files in your sample.zip archive:

>>>
>>> import datetime
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for info in archive.infolist():
...         print(f"Filename: {info.filename}")
...         print(f"Modified: {datetime.datetime(*info.date_time)}")
...         print(f"Normal size: {info.file_size} bytes")
...         print(f"Compressed size: {info.compress_size} bytes")
...         print("-" * 20)
...
Filename: hello.txt
Modified: 2021-09-07 19:50:10
Normal size: 83 bytes
Compressed size: 83 bytes
--------------------
Filename: lorem.md
Modified: 2021-09-07 19:50:10
Normal size: 2609 bytes
Compressed size: 2609 bytes
--------------------
Filename: realpython.md
Modified: 2021-09-07 19:50:10
Normal size: 428 bytes
Compressed size: 428 bytes
--------------------

The for loop iterates over the ZipInfo objects from .infolist(), retrieving the filename, the last modification date, the normal size, and the compressed size of each member file. In this example, you’ve used datetime to format the date in a human-readable way.

If you just need to perform a quick check on a ZIP file and list the names of its member files, then you can use .namelist():

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for filename in archive.namelist():
...         print(filename)
...
hello.txt
lorem.md
realpython.md

Because the filenames in this output are valid arguments to .getinfo(), you can combine these two methods to retrieve information about selected member files only.

For example, you may have a ZIP file containing different types of member files (.docx, .xlsx, .txt, and so on). Instead of getting the complete information with .infolist(), you just need to get the information about the .docx files. Then you can filter the files by their extension and call .getinfo() on your .docx files only. Go ahead and give it a try!

Reading From and Writing to Member Files

Sometimes you have a ZIP file and need to read the content of a given member file without extracting it. To do that, you can use .read(). This method takes a member file’s name and returns that file’s content as bytes:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for line in archive.read("hello.txt").split(b"\n"):
...         print(line)
...
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''

To use .read(), you need to open the ZIP file for reading or appending. Note that .read() returns the content of the target file as a stream of bytes. In this example, you use .split() to split the stream into lines, using the line feed character "\n" as a separator. Because .split() is operating on a byte object, you need to add a leading b to the string used as an argument.

ZipFile.read() also accepts a second positional argument called pwd. This argument allows you to provide a password for reading encrypted files. To try this feature, you can rely on the sample_pwd.zip file that you downloaded with the material for this tutorial:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt", pwd=b"secret").split(b"\n"):
...         print(line)
...
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt").split(b"\n"):
...         print(line)
...
Traceback (most recent call last):
    ...
RuntimeError: File 'hello.txt' is encrypted, password required for extraction

In the first example, you provide the password secret to read your encrypted file. The pwd argument accepts values of the bytes type. If you use .read() on an encrypted file without providing the required password, then you get a RuntimeError, as you can note in the second example.

For large encrypted ZIP files, keep in mind that the decryption operation can be extremely slow because it’s implemented in pure Python. In such cases, consider using a specialized program to handle your archives instead of using zipfile.

If you regularly work with encrypted files, then you may want to avoid providing the decryption password every time you call .read() or another method that accepts a pwd argument. If that’s the case, you can use ZipFile.setpassword() to set a global password:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     archive.setpassword(b"secret")
...     for file in archive.namelist():
...         print(file)
...         print("-" * 20)
...         for line in archive.read(file).split(b"\n"):
...             print(line)
...
hello.txt
--------------------
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''
lorem.md
--------------------
b'# Lorem Ipsum'
b''
b'Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    ...

With .setpassword(), you just need to provide your password once. ZipFile uses that unique password for decrypting all the member files.

In contrast, if you have ZIP files with different passwords for individual member files, then you need to provide the specific password for each file using the pwd argument of .read():

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample_file_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt", pwd=b"secret1").split(b"\n"):
...         print(line)
...
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''

>>> with zipfile.ZipFile("sample_file_pwd.zip", mode="r") as archive:
...     for line in archive.read("lorem.md", pwd=b"secret2").split(b"\n"):
...         print(line)
...
b'# Lorem Ipsum'
b''
b'Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    ...

In this example, you use secret1 as a password to read hello.txt and secret2 to read lorem.md. A final detail to consider is that when you use the pwd argument, you’re overriding whatever archive-level password you may have set with .setpassword().

If you’re looking for a more flexible way to read from member files and create and add new member files to an archive, then ZipFile.open() is for you. Like the built-in open() function, this method implements the context manager protocol, and therefore it supports the with statement:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     with archive.open("hello.txt", mode="r") as hello:
...         for line in hello:
...             print(line)
...
b'Hello, Pythonista!\n'
b'\n'
b'Welcome to Real Python!\n'
b'\n'
b"Ready to try Python's zipfile module?\n"

In this example, you open hello.txt for reading. The first argument to .open() is name, indicating the member file that you want to open. The second argument is the mode, which defaults to "r" as usual. ZipFile.open() also accepts a pwd argument for opening encrypted files. This argument works the same as the equivalent pwd argument in .read().

You can also use .open() with the "w" mode. This mode allows you to create a new member file, write content to it, and finally append the file to the underlying archive, which you should open in append mode:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="a") as archive:
...     with archive.open("new_hello.txt", "w") as new_hello:
...         new_hello.write(b"Hello, World!")
...
13

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.printdir()
...     print("------")
...     archive.read("new_hello.txt")
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428
new_hello.txt                             1980-01-01 00:00:00           13
------
b'Hello, World!'

In the first code snippet, you open sample.zip in append mode ("a"). Then you create new_hello.txt by calling .open() with the "w" mode. This function returns a file-like object that supports .write(), which allows you to write bytes into the newly created file.

In this example, you write b'Hello, World!' into new_hello.txt. When the execution flow exits the inner with statement, Python writes the input bytes to the member file. When the outer with statement exits, Python writes new_hello.txt to the underlying ZIP file, sample.zip.

The second code snippet confirms that new_hello.txt is now a member file of sample.zip. A detail to notice in the output of this example is that .write() sets the Modified date of the newly added file to 1980-01-01 00:00:00, which is a weird behavior that you should keep in mind when using this method.

Reading the Content of Member Files as Text

As you learned in the above section, you can use the .read() and .write() methods to read from and write to member files without extracting them from the containing ZIP archive. Both of these methods work exclusively with bytes.

However, when you have a ZIP archive containing text files, you may want to read their content as text instead of as bytes. There are at least two way to do this. You can use:

  1. bytes.decode()
  2. io.TextIOWrapper

Because ZipFile.read() returns the content of the target member file as bytes, .decode() can operate on these bytes directly. The .decode() method decodes a bytes object into a string using a given character encoding format.

Here’s how you can use .decode() to read text from the hello.txt file in your sample.zip archive:

>>>
>>> import zipfile

>>>  with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     text = archive.read("hello.txt").decode(encoding="utf-8")
...

>>> print(text)
Hello, Pythonista!

Welcome to Real Python!

Ready to try Python's zipfile module?

In this example, you read the content of hello.txt as bytes. Then you call .decode() to decode the bytes into a string using UTF-8 as encoding. To set the encoding argument, you use the "utf-8" string. However, you can use any other valid encoding, such as UTF-16 or cp1252, which can be represented as case-insensitive strings. Note that "utf-8" is the default value of the encoding argument to .decode().

It’s important to keep in mind that you need to know beforehand the character encoding format of any member file that you want to process using .decode(). If you use the wrong character encoding, then your code will fail to correctly decode the underlying bytes into text, and you can end up with a ton of indecipherable characters.

The second option for reading text out of a member file is to use an io.TextIOWrapper object, which provides a buffered text stream. This time you need to use .open() instead of .read(). Here’s an example of using io.TextIOWrapper to read the content of the hello.txt member file as a stream of text:

>>>
>>> import io
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     with archive.open("hello.txt", mode="r") as hello:
...         for line in io.TextIOWrapper(hello, encoding="utf-8"):
...             print(line.strip())
...
Hello, Pythonista!

Welcome to Real Python!

Ready to try Python's zipfile module?

In the inner with statement in this example, you open the hello.txt member file from your sample.zip archive. Then you pass the resulting binary file-like object, hello, as an argument to io.TextIOWrapper. This creates a buffered text stream by decoding the content of hello using the UTF-8 character encoding format. As a result, you get a stream of text directly from your target member file.

Just like with .encode(), the io.TextIOWrapper class takes an encoding argument. You should always specify a value for this argument because the default text encoding depends on the system running the code and may not be the right value for the file that you’re trying to decode.

Extracting Member Files From Your ZIP Archives

Extracting the content of a given archive is one of the most common operations that you’ll do on ZIP files. Depending on your needs, you may want to extract a single file at a time or all the files in one go.

ZipFile.extract() allows you to accomplish the first task. This method takes the name of a member file and extracts it to a given directory signaled by path. The destination path defaults to the current directory:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extract("new_hello.txt", path="output_dir/")
...
'output_dir/new_hello.txt'

Now new_hello.txt will be in your output_dir/ directory. If the target filename already exists in the output directory, then .extract() overwrites it without asking for confirmation. If the output directory doesn’t exist, then .extract() creates it for you. Note that .extract() returns the path to the extracted file.

The name of the member file must be the file’s full name as returned by .namelist(). It can also be a ZipInfo object containing the file’s information.

You can also use .extract() with encrypted files. In that case, you need to provide the required pwd argument or set the archive-level password with .setpassword().

When it comes to extracting all the member files from an archive, you can use .extractall(). As its name implies, this method extracts all the member files to a destination path, which is the current directory by default:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extractall("output_dir/")
...

After running this code, all the current content of sample.zip will be in your output_dir/ directory. If you pass a non-existing directory to .extractall(), then this method automatically creates the directory. Finally, if any of the member files already exist in the destination directory, then .extractall() will overwrite them without asking for your confirmation, so be careful.

If you only need to extract some of the member files from a given archive, then you can use the members argument. This argument accepts a list of member files, which should be a subset of the whole list of files in the archive at hand. Finally, just like .extract(), the .extractall() method also accepts a pwd argument to extract encrypted files.

Closing ZIP Files After Use

Sometimes, it’s convenient for you to open a given ZIP file without using a with statement. In those cases, you need to manually close the archive after use to complete any writing operations and to free the acquired resources.

To do that, you can call .close() on your ZipFile object:

>>>
>>> import zipfile

>>> archive = zipfile.ZipFile("sample.zip", mode="r")

>>> # Use archive in different parts of your code
>>> archive.printdir()
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428
new_hello.txt                             1980-01-01 00:00:00           13

>>> # Close the archive when you're done
>>> archive.close()
>>> archive
<zipfile.ZipFile [closed]>

The call to .close() closes archive for you. You must call .close() before exiting your program. Otherwise, some writing operations might not be executed. For example, if you open a ZIP file for appending ("a") new member files, then you need to close the archive to write the files.

Creating, Populating, and Extracting Your Own ZIP Files

So far, you’ve learned how to work with existing ZIP files. You’ve learned to read, write, and append member files to them by using the different modes of ZipFile. You’ve also learned how to read relevant metadata and how to extract the content of a given ZIP file.

In this section, you’ll code a few practical examples that’ll help you learn how to create ZIP files from several input files and from an entire directory using zipfile and other Python tools. You’ll also learn how to use zipfile for file compression and more.

Creating a ZIP File From Multiple Regular Files

Sometimes you need to create a ZIP archive from several related files. This way, you can have all the files in a single container for distributing them over a computer network or sharing them with friends or colleagues. To this end, you can create a list of target files and write them into an archive using ZipFile and a loop:

>>>
>>> import zipfile

>>> filenames = ["hello.txt", "lorem.md", "realpython.md"]

>>> with zipfile.ZipFile("multiple_files.zip", mode="w") as archive:
...     for filename in filenames:
...         archive.write(filename)
...

Here, you create a ZipFile object with the desired archive name as its first argument. The "w" mode allows you to write member files into the final ZIP file.

The for loop iterates over your list of input files and writes them into the underlying ZIP file using .write(). Once the execution flow exits the with statement, ZipFile automatically closes the archive, saving the changes for you. Now you have a multiple_files.zip archive containing all the files from your original list of files.

Building a ZIP File From a Directory

Bundling the content of a directory into a single archive is another everyday use case for ZIP files. Python has several tools that you can use with zipfile to approach this task. For example, you can use pathlib to read the content of a given directory. With that information, you can create a container archive using ZipFile.

In the python-zipfile/ directory, you have a subdirectory called source_dir/, with the following content:

source_dir/
│
├── hello.txt
├── lorem.md
└── realpython.md

In source_dir/, you only have three regular files. Because the directory doesn’t contain subdirectories, you can use pathlib.Path.iterdir() to iterate over its content directly. With this idea in mind, here’s how you can build a ZIP file from the content of source_dir/:

>>>
>>> import pathlib
>>> import zipfile

>>> directory = pathlib.Path("source_dir/")

>>> with zipfile.ZipFile("directory.zip", mode="w") as archive:
...    for file_path in directory.iterdir():
...        archive.write(file_path, arcname=file_path.name)
...

>>> with zipfile.ZipFile("directory.zip", mode="r") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
realpython.md                             2021-09-07 19:50:10          428
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609

In this example, you create a pathlib.Path object from your source directory. The first with statement creates a ZipFile object ready for writing. Then the call to .iterdir() returns an iterator over the entries in the underlying directory.

Because you don’t have any subdirectories in source_dir/, the .iterdir() function yields only files. The for loop iterates over the files and writes them into the archive.

In this case, you pass file_path.name to the second argument of .write(). This argument is called arcname and holds the name of the member file inside the resulting archive. All the examples that you’ve seen so far rely on the default value of arcname, which is the same filename you pass as the first argument to .write().

If you don’t pass file_path.name to arcname, then your source directory will be at the root of your ZIP file, which can also be a valid result depending on your needs.

Now check out the root_dir/ folder in your working directory. In this case, you’ll find the following structure:

root_dir/
│
├── sub_dir/
│   └── new_hello.txt
│
├── hello.txt
├── lorem.md
└── realpython.md

You have the usual files and a subdirectory with a single file in it. If you want to create a ZIP file with this same internal structure, then you need a tool that recursively iterates through the directory tree under root_dir/.

Here’s how to zip a complete directory tree, like the one above, using zipfile along with Path.rglob() from the pathlib module:

>>>
>>> import pathlib
>>> import zipfile

>>> directory = pathlib.Path("root_dir/")

>>> with zipfile.ZipFile("directory_tree.zip", mode="w") as archive:
...     for file_path in directory.rglob("*"):
...         archive.write(
...             file_path,
...             arcname=file_path.relative_to(directory)
...         )
...

>>> with zipfile.ZipFile("directory_tree.zip", mode="r") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
sub_dir/                                  2021-09-09 20:52:14            0
realpython.md                             2021-09-07 19:50:10          428
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
sub_dir/new_hello.txt                     2021-08-31 17:13:44           13

In this example, you use Path.rglob() to recursively traverse the directory tree under root_dir/. Then you write every file and subdirectory to the target ZIP archive.

This time, you use Path.relative_to() to get the relative path to each file and then pass the result to the second argument of .write(). This way, the resulting ZIP file ends up with the same internal structure as your original source directory. Again, you can get rid of this argument if you want your source directory to be at the root of your ZIP file.

Compressing Files and Directories

If your files are taking up too much disk space, then you might consider compressing them. Python’s zipfile supports a few popular compression methods. However, the module doesn’t compress your files by default. If you want to make your files smaller, then you need to explicitly supply a compression method to ZipFile.

Typically, you’ll use the term stored to refer to member files written into a ZIP file without compression. That’s why the default compression method of ZipFile is called ZIP_STORED, which actually refers to uncompressed member files that are simply stored in the containing archive.

The compression method is the third argument to the initializer of ZipFile. If you want to compress your files while you write them into a ZIP archive, then you can set this argument to one of the following constants:

Constant Compression Method Required Module
zipfile.ZIP_DEFLATED Deflate zlib
zipfile.ZIP_BZIP2 Bzip2 bz2
zipfile.ZIP_LZMA LZMA lzma

These are the compression methods that you can currently use with ZipFile. A different method will raise a NotImplementedError. There are no additional compression methods available to zipfile as of Python 3.10.

As an additional requirement, if you choose one of these methods, then the compression module that supports it must be available in your Python installation. Otherwise, you’ll get a RuntimeError exception, and your code will break.

Another relevant argument of ZipFile when it comes to compressing your files is compresslevel. This argument controls which compression level you use.

With the Deflate method, compresslevel can take integer numbers from 0 through 9. With the Bzip2 method, you can pass integers from 1 through 9. In both cases, when the compression level increases, you get higher compression and lower compression speed.

Now say you want to archive and compress the content of a given directory using the Deflate method, which is the most commonly used method in ZIP files. To do that, you can run the following code:

>>>
>>> import pathlib
>>> from zipfile import ZipFile, ZIP_DEFLATED

>>> directory = pathlib.Path("source_dir/")

>>> with ZipFile("comp_dir.zip", "w", ZIP_DEFLATED, compresslevel=9) as archive:
...     for file_path in directory.rglob("*"):
...         archive.write(file_path, arcname=file_path.relative_to(directory))
...

In this example, you pass 9 to compresslevel to get maximum compression. To provide this argument, you use a keyword argument. You need to do this because compresslevel isn’t the fourth positional argument to the ZipFile initializer.

After running this code, you’ll have a comp_dir.zip file in your current directory. If you compare the size of that file with the size of your original sample.zip file, then you’ll note a significant size reduction.

Creating ZIP Files Sequentially

Creating ZIP files sequentially can be another common requirement in your day-to-day programming. For example, you may need to create an initial ZIP file with or without content and then append new member files as soon as they become available. In this situation, you need to open and close the target ZIP file multiple times.

To solve this problem, you can use ZipFile in append mode ("a"), as you have already done. This mode allows you to safely append new member files to a ZIP archive without truncating its current content:

>>>
>>> import zipfile

>>> def append_member(zip_file, member):
...     with zipfile.ZipFile(zip_file, mode="a") as archive:
...         archive.write(member)
...

>>> def get_file_from_stream():
...     """Simulate a stream of files."""
...     for file in ["hello.txt", "lorem.md", "realpython.md"]:
...         yield file
...

>>> for filename in get_file_from_stream():
...     append_member("incremental.zip", filename)
...

>>> with zipfile.ZipFile("incremental.zip", mode="r") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

In this example, append_member() is a function that appends a file (member) to the input ZIP archive (zip_file). To perform this action, the function opens and closes the target archive every time you call it. Using a function to perform this task allows you to reuse the code as many times as you need.

The get_file_from_stream() function is a generator function simulating a stream of files to process. Meanwhile, the for loop sequentially adds member files to incremental.zip using append_number(). If you check your working directory after running this code, then you’ll find an incremental.zip archive containing the three files that you passed into the loop.

Extracting Files and Directories

One of the most common operations you’ll ever perform on ZIP files is to extract their content to a given directory in your file system. You already learned the basics of using .extract() and .extractall() to extract one or all the files from an archive.

As an additional example, get back to your sample.zip file. At this point, the archive contains four files of different types. You have two .txt files and two .md files. Now say you want to extract only the .md files. To do so, you can run the following code:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for file in archive.namelist():
...         if file.endswith(".md"):
...             archive.extract(file, "output_dir/")
...
'output_dir/lorem.md'
'output_dir/realpython.md'

The with statement opens sample.zip for reading. The loop iterates over each file in the archive using namelist(), while the conditional statement checks if the filename ends with the .md extension. If it does, then you extract the file at hand to a target directory, output_dir/, using .extract().

Exploring Additional Classes From zipfile

So far, you’ve learned about ZipFile and ZipInfo, which are two of the classes available in zipfile. This module also provides two more classes that can be handy in some situations. Those classes are zipfile.Path and zipfile.PyZipFile. In the following two sections, you’ll learn the basics of these classes and their main features.

Finding Path in a ZIP File

When you open a ZIP file with your favorite archiver application, you see the archive’s internal structure. You may have files at the root of the archive. You may also have subdirectories with more files. The archive looks like a normal directory on your file system, with each file located at a specific path.

The zipfile.Path class allows you to construct path objects to quickly create and manage paths to member files and directories inside a given ZIP file. The class takes two arguments:

  • root accepts a ZIP file, either as a ZipFile object or a string-based path to a physical ZIP file.
  • at holds the location of a specific member file or directory inside the archive. It defaults to the empty string, representing the root of the archive.

With your old friend sample.zip as the target, run the following code:

>>>
>>> import zipfile

>>> hello_txt = zipfile.Path("sample.zip", "hello.txt")

>>> hello_txt
Path('sample.zip', 'hello.txt')

>>> hello_txt.name
'hello.txt'

>>> hello_txt.is_file()
True

>>> hello_txt.exists()
True

>>> print(hello_txt.read_text())
Hello, Pythonista!

Welcome to Real Python!

Ready to try Python's zipfile module?

This code shows that zipfile.Path implements several features that are common to a pathlib.Path object. You can get the name of the file with .name. You can check if the path points to a regular file with .is_file(). You can check if a given file exists inside a particular ZIP file, and more.

Path also provides an .open() method to open a member file using different modes. For example, the code below opens hello.txt for reading:

>>>
>>> import zipfile

>>> hello_txt = zipfile.Path("sample.zip", "hello.txt")

>>> with hello_txt.open(mode="r") as hello:
...     for line in hello:
...         print(line)
...
Hello, Pythonista!

Welcome to Real Python!

Ready to try Python's zipfile module?

With Path, you can quickly create a path object pointing to a specific member file in a given ZIP file and access its content immediately using .open().

Just like with a pathlib.Path object, you can list the content of a ZIP file by calling .iterdir() on a zipfile.Path object:

>>>
>>> import zipfile

>>> root = zipfile.Path("sample.zip")
>>> root
Path('sample.zip', '')

>>> root.is_dir()
True

>>> list(root.iterdir())
[
    Path('sample.zip', 'hello.txt'),
    Path('sample.zip', 'lorem.md'),
    Path('sample.zip', 'realpython.md')
]

It’s clear that zipfile.Path provides many useful features that you can use to manage member files in your ZIP archives in almost no time.

Building Importable ZIP Files With PyZipFile

Another useful class in zipfile is PyZipFile. This class is pretty similar to ZipFile, and it’s especially handy when you need to bundle Python modules and packages into ZIP files. The main difference from ZipFile is that the initializer of PyZipFile takes an optional argument called optimize, which allows you to optimize the Python code by compiling it to bytecode before archiving it.

PyZipFile provides the same interface as ZipFile, with the addition of .writepy(). This method can take a Python file (.py) as an argument and add it to the underlying ZIP file. If optimize is -1 (the default), then the .py file is automatically compiled to a .pyc file and then added to the target archive. Why does this happen?

Since version 2.3, the Python interpreter has supported importing Python code from ZIP files, a capability known as Zip imports. This feature is quite convenient. It allows you to create importable ZIP files to distribute your modules and packages as a single archive.

PyZipFile is helpful when you need to generate importable ZIP files. Packaging the .pyc file rather than the .py file makes the importing process way more efficient because it skips the compilation step.

Inside the python-zipfile/ directory, you have a hello.py module with the following content:

"""Print a greeting message."""
# hello.py

def greet(name="World"):
    print(f"Hello, {name}! Welcome to Real Python!")

This code defines a function called greet(), which takes name as an argument and prints a greeting message to the screen. Now say you want to package this module into a ZIP file for distribution purposes. To do that, you can run the following code:

>>>
>>> import zipfile

>>> with zipfile.PyZipFile("hello.zip", mode="w") as zip_module:
...     zip_module.writepy("hello.py")
...

>>> with zipfile.PyZipFile("hello.zip", mode="r") as zip_module:
...     zip_module.printdir()
...
File Name                                        Modified             Size
hello.pyc                                 2021-09-13 13:25:56          311

In this example, the call to .writepy() automatically compiles hello.py to hello.pyc and stores it in hello.zip. This becomes clear when you list the archive’s content using .printdir().

Once you have hello.py bundled into a ZIP file, then you can use Python’s import system to import this module from its containing archive:

>>>
>>> import sys

>>> # Insert the archive into sys.path
>>> sys.path.insert(0, "/home/user/python-zipfile/hello.zip")
>>> sys.path[0]
'/home/user/python-zipfile/hello.zip'

>>> # Import and use the code
>>> import hello

>>> hello.greet("Pythonista")
Hello, Pythonista! Welcome to Real Python!

The first step to import code from a ZIP file is to make that file available in sys.path. This variable holds a list of strings that specifies Python’s search path for modules. To add a new item to sys.path, you can use .insert().

For this example to work, you need to change the placeholder path and pass the path to hello.zip on your file system. Once your importable ZIP file is in this list, then you can import your code just like you’d do with a regular module.

Finally, consider the hello/ subdirectory in your working folder. It contains a small Python package with the following structure:

hello/
|
├── __init__.py
└── hello.py

The __init__.py module turns the hello/ directory into a Python package. The hello.py module is the same one that you used in the example above. Now suppose you want to bundle this package into a ZIP file. If that’s the case, then you can do the following:

>>>
>>> import zipfile

>>> with zipfile.PyZipFile("hello.zip", mode="w") as zip_pkg:
...     zip_pkg.writepy("hello")
...

>>> with zipfile.PyZipFile("hello.zip", mode="r") as zip_pkg:
...     zip_pkg.printdir()
...
File Name                                        Modified             Size
hello/__init__.pyc                        2021-09-13 13:39:30          108
hello/hello.pyc                           2021-09-13 13:39:30          317

The call to .writepy() takes the hello package as an argument, searches for .py files inside it, compiles them to .pyc files, and finally adds them to the target ZIP file, hello.zip. Again, you can import your code from this archive by following the steps that you learned before:

>>>
>>> import sys

>>> sys.path.insert(0, "/home/user/python-zipfile/hello.zip")

>>> from hello import hello

>>> hello.greet("Pythonista")
Hello, Pythonista! Welcome to Real Python!

Because your code is in a package now, you first need to import the hello module from the hello package. Then you can access your greet() function normally.

Running zipfile From Your Command Line

Python’s zipfile also offers a minimal command-line interface that allows you to access the module’s main functionality quickly. For example, you can use the -l or --list option to list the content of an existing ZIP file:

$ python -m zipfile --list sample.zip
File Name                                         Modified             Size
hello.txt                                  2021-09-07 19:50:10           83
lorem.md                                   2021-09-07 19:50:10         2609
realpython.md                              2021-09-07 19:50:10          428
new_hello.txt                              1980-01-01 00:00:00           13

This command shows the same output as an equivalent call to .printdir() on the sample.zip archive.

Now say you want to create a new ZIP file containing several input files. In that case, you can use the -c or --create option:

$ python -m zipfile --create new_sample.zip hello.txt lorem.md realpython.md

$ python -m zipfile -l new_sample.zip
File Name                                         Modified             Size
hello.txt                                  2021-09-07 19:50:10           83
lorem.md                                   2021-09-07 19:50:10         2609
realpython.md                              2021-09-07 19:50:10          428

This command creates a new_sample.zip file containing your hello.txt, lorem.md, realpython.md files.

What if you need to create a ZIP file to archive an entire directory? For example, you may have your own source_dir/ with the same three files as the example above. You can create a ZIP file from that directory by using the following command:

$ python -m zipfile -c source_dir.zip source_dir/

$ python -m zipfile -l source_dir.zip
File Name                                         Modified             Size
source_dir/                                2021-08-31 08:55:58            0
source_dir/hello.txt                       2021-08-31 08:55:58           83
source_dir/lorem.md                        2021-08-31 09:01:08         2609
source_dir/realpython.md                   2021-08-31 09:31:22          428

With this command, zipfile places source_dir/ at the root of the resulting source_dir.zip file. As usual, you can list the archive content by running zipfile with the -l option.

You can also extract all the content of a given ZIP file using the -e or --extract option from your command line:

$ python -m zipfile --extract sample.zip sample/

After running this command, you’ll have a new sample/ folder in your working directory. The new folder will contain the current files in your sample.zip archive.

The final option that you can use with zipfile from the command line is -t or --test. This option allows you to test if a given file is a valid ZIP file. Go ahead and give it a try!

Using Other Libraries to Manage ZIP Files

There are a few other tools in the Python standard library that you can use to archive, compress, and decompress your files at a lower level. Python’s zipfile uses some of these internally, mainly for compression purposes. Here’s a summary of some of these tools:

Module Description
zlib Allows compression and decompression using the zlib library
bz2 Provides an interface for compressing and decompressing data using the Bzip2 compression algorithm
lzma Provides classes and functions for compressing and decompressing data using the LZMA compression algorithm

Unlike zipfile, some of these modules allow you to compress and decompress data from memory and data streams other than regular files and archives.

In the Python standard library, you’ll also find tarfile, which supports the TAR archiving format. There’s also a module called gzip, which provides an interface to compress and decompress data, similar to how the GNU Gzip program does it.

For example, you can use gzip to create a compressed file containing some text:

>>>
>>> import gzip

>>> with gzip.open("hello.txt.gz", mode="wt") as gz_file:
...     gz_file.write("Hello, World!")
...
13

Once you run this code, you’ll have a hello.txt.gz archive containing a compressed version of hello.txt in your current directory. Inside hello.txt, you’ll find the text Hello, World!.

A quick and high-level way to create a ZIP file without using zipfile is to use shutil. This module allows you to perform several high-level operations on files and collections of files. When it comes to archiving operations, you have make_archive(), which can create archives, such as ZIP or TAR files:

>>>
>>> import shutil

>>> shutil.make_archive("shutil_sample", format="zip", root_dir="source_dir/")
'/home/user/sample.zip'

This code creates a compressed file called sample.zip in your working directory. This ZIP file will contain all the files in the input directory, source_dir/. The make_archive() function is convenient when you need a quick and high-level way to create your ZIP files in Python.

Conclusion

Python’s zipfile is a handy tool when you need to read, write, compress, decompress, and extract files from ZIP archives. The ZIP file format has become an industry standard, allowing you to package and optionally compress your digital data.

The benefits of using ZIP files include archiving related files together, saving disk space, making it easy to transfer data over computer networks, bundling Python code for distribution purposes, and more.

In this tutorial, you learned how to:

  • Use Python’s zipfile to read, write, and extract existing ZIP files
  • Read metadata about the content of your ZIP files with zipfile
  • Use zipfile to manipulate member files in existing ZIP files
  • Create your own ZIP files to archive and compress your digital data

You also learned how to use zipfile from your command line to list, create, and extract your ZIP files. With this knowledge, you’re ready to efficiently archive, compress, and manipulate your digital data using the ZIP file format.

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Leodanis Pozo Ramos

Leodanis Pozo Ramos Leodanis Pozo Ramos

Leodanis is an industrial engineer who loves Python and software development. He's a self-taught Python developer with 6+ years of experience. He's an avid technical writer with a growing number of articles published on Real Python and other sites.

» More about Leodanis

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Join us and get access to hundreds of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Join us and get access to hundreds of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Real Python Comment Policy: The most useful comments are those written with the goal of learning from or helping out other readers—after reading the whole article and all the earlier comments. Complaints and insults generally won’t make the cut here.

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Keep Learning

Related Tutorial Categories: intermediate python