Go From Bytes to File
00:00 In this lesson, you’ll learn how to go from bytes to file. If you wanted to encode bytes into text, now you’re good to go. But what if you want to write the body of a response into a file?
00:11 We have two options. Option one, write the bytes directly to the file. Option two, decode the bytes into a Python string and then encode the string back into bytes for the file.
00:22 The first method is most straightforward, but the second method allows you to change the encoding if you want to. You can see an example of the first method in the code.
00:33
With my code editor open, there are a few things already open from the previous example. At the top, you’ll have urlopen
imported from urllib.request
.
00:41
Below that is the with
statement that passes in example.com
and sets the body
variable to response.read()
. To write the bytes directly to a file without having to encode, you’ll need the built-in open()
function, and you’ll need to ensure that you use write binary mode.
00:57
So first, with open('"example.html"')
and then you want to set the mode to write binary, which is "wb"
.
01:08
And you’ll be opening it as an html_file
. And then you write the body to the file with html_file.write()
and pass in the body
variable that was previously set on the response object with the .read()
method.
01:23
To run the script, type py urllib_requests.py
. You may notice that you do not see any output. If you type ls
, which stands for list, into your terminal, this will list the files and directories in the current directory.
01:40
In this case, when you typed ls
, you’ll see example.html
was created in your directory from the script that was just run. Now, type cat example.html
01:54
This will display the content of the example.html
. It will look like an HTML document that now contains the content from your request.
02:05
To recap, using open()
in write binary mode bypasses the need to decode or encode and dumps the bytes of the HTTP message body into the example.html
file.
02:16 That’s it. You’ve written the bytes directly to a file without encoding or decoding anything. Now say you have a URL that doesn’t use UTF-8, but you want to write the contents to a file with UTF-8.
02:27 For this, you’d first decode the bytes into a string and then encode the string into a file specifying the character encoding. You can try this out in the code.
02:38
The beginning of the file will look similar to the examples from before. Start by importing urlopen
. from urllib.request
import urlopen
.
02:50
This time instead of requesting content from example.com
, you’re now using google.com
. Google’s home page seems to use different encodings depending on your location.
02:59
In much of Europe and the US, it uses the ISO 8859-1 encoding. So you can use Google as the example you pass into urlopen()
this time around.
03:09
Do this by typing with urlopen()
and then passing in the Google home page
03:18
as response
and set body
equal to response.read()
. Next, set the variable character_set
to response.headers.get_content_charset()
.
03:38
This will attempt to determine the character set of the content by reading the Content-Type
header of the response. You can set the variable content
equal to body.decode()
and pass in the character_set
.
03:55
This will decode the body using the detected character set. To save the content to the file, you can type out with open("google.html", encoding="utf-8", mode="w")
as file
and write to the file with file.write()
and pass in content
.
04:23
This will open a file named google.html
in write mode with UTF-8 encoding, and write the decoded HTML content that is stored in the content
variable to the file.
04:35
This essentially saves the HTML content of the google.com
web page as a file named google.html
on your local system. Let’s take a look at the terminal and see what this looks like.
04:51
As you can see, when we run the script, it doesn’t look like we did anything, but if you type ls
to list everything in your current directory, you’ll see that google.html
was made.
05:03
So next we type cat google.html
, and we’ll see the content that was saved into the file. As you can see, we got back something that looks pretty similar to JavaScript and HTML.
05:22 To recap, in this code, you got the response character set and used it to decode the bytes object into a string. Then you wrote the string to a file, encoding it using UTF-8.
05:33 Once you’ve written to a file, you should be able to open the resulting file in your browser.
05:39 In this lesson, you learned how to write bytes to a file. In the next lesson, you’ll revisit an example from before and learn how to hand over a JSON response into a Python dictionary.
Become a Member to join the conversation.