Reading Information From ZIP Files
00:00 Reading Information From ZIP Files. In this section of the course, you’ll see a number of methods for reading information from ZIP files, starting with how to read metadata.
00:10
You’ve already put .printdir()
into action. It’s a useful method that you can use to list the content of your ZIP files quickly. Along with .printdir()
, the ZipFile
class provides several handy methods for extracting metadata from existing ZIP files. On-screen, you can see a summary of these methods: .getinfo()
returns a ZipInfo
object, .infolist()
returns a list of ZipInfo
objects, and .namelist()
returns a list holding the names of all the member files.
00:39
With these three tools, you can retrieve a lot of useful information about the content of your ZIP files. On-screen, you can see .getinfo()
in use.
01:01
.getinfo()
takes a member file as an argument and returns a ZipInfo
object with information about it. ZipInfo
objects have several attributes that allow you to retrieve valuable information about the target member file. For example, .file_size
and .compress_size
hold the size, in bytes, of the original and compressed files, respectively.
01:22
The class also has some other useful attributes, such as .filename
and .date_time
, which return the filename and last modification date.
01:34
By default, ZipFile
doesn’t compress the input files to add them to the final archive. That’s why the size and the compressed size are the same in the examples seen previously.
01:44
You’ll look at compressing files and directories later on in the course. With .infolist()
, you can extract information from all the files in a given archive.
01:55
On-screen is an example that uses its method to generate a minimal report with information about all the member files in your sample.zip
archive.
02:12
The for
loop iterates over the ZipInfo
objects from .infolist()
, retrieving the filename, the last modification date, the normal size, and the compressed size of each member file. In this example, you use datetime
to format the date in a human-readable way.
02:42
If you just need to perform a quick check on a ZIP file and list the names of its member files, then you can use .namelist()
.
03:01
Because the filenames in this output are valid arguments to .getinfo()
, you can combine these two methods to retrieve information about selected member files only.
03:13
Sometimes you have a ZIP file and need to read the content of a given member file without extracting it. To do that, you can use .read()
. This method takes a member file’s name
and returns that file’s content as bites.
03:33
To use .read()
, you need to open the ZIP file for reading or appending. Note that .read()
returns the content of the target file as a stream of bytes. In this example, you use .split()
to split the stream into lines using the line feed character "\n"
as a separator.
03:50
Because .split()
is operating on a byte object, you need to add a leading b
to the string used as an argument. ZipFile
’s .read()
method also accepts a second positional argument called pwd
.
04:07
This argument allows you to provide a password for reading encrypted files. To try this feature, you can rely on the sample_pwd.zip
file that you downloaded with the materials for this course.
04:31
First, you provide the password secret
to read the encrypted file. The pwd
argument accepts values of the bytes type. As you can see here, if you use read on an encrypted file without providing the required password, then you get a RuntimeError
.
04:59
Note that Python’s zipfile
supports decryption. However, it doesn’t support the creation of encrypted ZIP files. That’s why you’d need to use an external file archiver to encrypt your files.
05:11
Some popular file archivers include 7z and WinRAR for Windows, Ark and GNOME Archive Manager for Linux, and Archiver and Keka for macOS. For large encrypted ZIP files, keep in mind that the decryption operation can be extremely slow because it’s implemented in pure Python. In such cases, consider using a specialized program to handle your archives instead of using zipfile
. If you regularly work with encrypted files, then you may want to avoid providing the decryption password every time you call .read()
or another method that accepts a pwd
argument.
05:51
If that’s the case, you can use ZipFile.setpassword()
to set a global password. With .setpassword()
, you just need to provide the password once. ZipFile
uses that unique password for decrypting all of the member files.
06:25
In contrast, if you have ZIP files with different passwords for individual member files, then you need to provide the specific password for each file using the pwd
argument of .read()
.
06:47
Here, you use secret1
as a password to read hello.txt
07:01
and secret2
to read lorem.md
.
07:14
A final detail to consider is that when you use the pwd
argument, you’re overriding whatever archive-level password you may have set with .setpassword()
.
07:24
If you call .read()
on a ZIP file that uses an unsupported compression method, this raises a NotImplementedError
. You’ll also get an error if the required compression module isn’t available in your Python installation. In the next section of the course, you’ll see some other ways of opening and reading the contents of ZIP files.
Become a Member to join the conversation.