Using Other Methods to Open and Read Member Files
00:00
Other Methods of Opening and Reading Member Files. If you are looking for a more flexible way to read from member files and create and add new member files to an archive, then ZipFile
’s .open()
method is for you. Like the built-in open()
function, this method implements the context manager protocol, and therefore it supports the with
statement.
00:29
Here you open hello.txt
for reading. The first argument to .open()
is name
, indicating the member file that you want to open. The second argument is the mode, which defaults to "r"
as usual.
00:44
ZipFile.open()
also accepts a pwd
argument for opening encrypted files. This argument works as the equivalent pwd
argument does in .read()
.
00:56
You can also use .open()
with the "w"
mode. This mode allows you to create a new member file, write content to it, and finally append the file to the underlying archive, which you should open in append mode.
01:14
First, you open sample.zip
in append mode. Then you create new_hello.txt
by calling .open()
with the "w"
mode.
01:23
This function returns a file-like object that supports .write()
, which allows you to write bytes into this newly created file. Note that you need to supply a non-existing filename to .open()
.
01:36
If you use a filename that already exists in the underlying archive, then you’ll end up with a duplicated file and a UserWarning
exception.
01:45
Here you write the bytes "Hello, World!"
into new_hello.txt
. When the execution flow exits the inner with
statement, Python writes the input bytes into the member file.
01:57
When the outer with
statement exits, Python writes new_hello.txt
into the underlying ZIP file, sample.zip
.
02:11
The second part of the code confirms that new_hello.txt
is now a member file of sample.zip
. A detail to notice in the output of this example is that .write()
sets the Modified
date of the newly added file to the 1980-01-01
, which is a slightly unusual behavior that you should keep in mind when using this method.
02:34
As you saw previously, you can use the .read()
and .write()
methods to read from and write to member files without extracting them from the containing ZIP archive.
02:43 Both of these methods work exclusively with bytes. However, when you have a ZIP archive containing text files, you may want to read their content as text instead of bytes.
02:53
There are at least two ways to do this. You can use bytes.decode()
or io.TextIOWrapper
. Because ZipFile.read()
returns the content of the target member file as bytes, .decode()
can operate on these bytes directly. The .decode()
method decodes a bytes
object into a string using a given character encoding format.
03:18
On-screen, you’ll see how to use .decode()
to read text from the hello.txt
file in the sample.zip
archive.
03:35
Here, you read the content of hello.txt
as bytes. Then you call .decode()
to decode the bytes into a string using UTF-8 as encoding.
03:48
To set the encoding
argument, you use the "utf-8"
string.
03:55
However, you could use any other valid encoding, such as UTF-16 or cp1252, which can be represented as case-insensitive strings. Note that "utf-8"
is the default value of the encoding
argument to .decode()
.
04:11
It’s important to keep in mind that you need to know beforehand the character encoding format of any member file that you want to process using .decode()
.
04:19 If you use the wrong character encoding, then your code will fail to decode the underlying bytes into text, and you can end up instead with indecipherable characters.
04:29
The second option for reading text out of a member file is to use an io.TextIOWrapper
object, which provides a buffered text stream. This time you need to use .open()
instead of .read()
.
04:43
On-screen is an example of using io.TextIOWrapper
to read the content of the hello.txt
member file as a stream of text.
04:57
In the inner with
statement, you open the hello.txt
member file from the sample.zip
archive. You then pass the resulting binary file-like object, hello
, as an argument to io.TextIOWrapper
.
05:11
This creates a buffered text stream by decoding the content of hello
using the UTF-8 character encoding format. As a result, you get a stream of text directly from the target member file.
05:26
Just as with .encode()
, the io.TextIOWrapper
class takes an encoding
argument. You should always specify a value for this argument because the default text encoding depends on the system running the code and may not be the right value for the file that you are trying to decode.
05:43 In the next section of the course, you’ll see how to extract member files from ZIP archives and how to close ZIP files after use.
Become a Member to join the conversation.