Go From Bytes to Dictionary
00:00 In this lesson, you’ll learn how to go from bytes to dictionary. For application JSON responses, you’ll often find that they don’t include any encoding information.
00:10
Fortunately, Python’s json
module can handle decoding JSON data even without encoding information. You may recall working with JSON in the first lesson where you performed basic HTTP requests, and you didn’t need to bother with encoding then. What you’ll find now is that you may not need .get_content_charset()
when you work with JSON content.
00:32 Let’s try an example of this out in the code.
00:39
This time you’ll use the /json
endpoint of httpbin
, a service that allows you to experiment with different types of requests and responses.
00:47
You’ll use what you’ve learned in previous lessons to use the response.read()
method to read the response content and store it in the body
variable.
00:58
Next, you can set your character_set
variable to response.headers.get_content_charset()
method,
01:08 and then you can print the content.
01:15
If you head over to the terminal and run this with py urllib_requests.py
, you’ll see that character_set
prints out None
because coding information is not included.
01:31
The /json
endpoint simulates a typical API that returns JSON data. Note that the .get_content_charset()
method returns nothing in its response.
01:40 Even though there’s no character encoding information, all is not lost. According to RFC 4627, default encoding of UTF-8 is an absolute requirement of the application JSON specification.
01:55 RFC 4627 is a specification document published by the Internet Engineering Task Force. This RFC defines the standards and guidelines for representing data structures in JSON format.
02:07 It also specifies how JSON should be used as a media type in HTTP headers and other internet-related contexts. That’s not to say that every single server plays by the rules, but generally you can assume that if JSON is being transmitted, it’ll almost always be encoded using UTF-8.
02:24
Fortunately, json.loads()
decodes byte objects under the hood and even has some leeway in terms of different encodings that it can deal with.
02:32
So json.loads()
should be able to cope with most byte objects that you throw at it as long as they’re valid JSON. You can see an example of this in the code.
02:43
For this example, you’ll need to import json
, and once again, you’ll need to import urlopen
. from urllib.request import urlopen
.
02:54
Now let’s go ahead and use urlopen()
like we have before. So with urlopen()
and pass in httpbin
, so "https://httpbin.org/json"
as response
, and you will set the body
variable to response.read()
.
03:17
To see an example of the type being converted without the use of a character set, you can print out the body, so print(type(body))
. This will print out the type of the body.
03:27
You can expect this to print out bytes. Next, you can set the data
variable to json.loads()
and pass in body
. Now you can print out the type again to see it has changed, so print(type(data))
,
03:44
and while you’re at it, go ahead and print out the data
. Let’s head over to the terminal to see what this looks like. So we have py urllib_requests.py
As you can see, the first print()
printed out 'bytes'
like we expected, and the second print()
printed out dictionary.
04:06 This proves to us the character set was not needed.
04:12
As you can see, the json
module handles the decoding automatically and produces a Python dictionary. Almost all APIs return key-value information as JSON, although you might run into some older APIs that work with XML.
04:27
For that, you might want to look into the Roadmap to XML Parsers in Python. With that, you should know enough about bytes and encodings and urllib.request
to be dangerous.
Become a Member to join the conversation.