Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Explore HTTP Messages and Their Representation

Alexandra Davis

HTTP Requests With Python's urllib.request Alexandra Davis 07:37

Transcript
Discussion

00:00 In this lesson, you’ll explore HTTP messages and their representation in urllib.requests. In a nutshell, an HTTP message can be understood as text transmitted as a stream of bytes.

00:14 A decoded HTTP message can be as simple as GET / HTTP/1.1 Host: www.google.com. The first line of the message indicates the HTTP method being used.

00:26 In this case, it is a GET, a commonly used HTTP method to request a specific resource from a server. After the HTTP method, you can see the requested resource.

00:36 In this case, it’s indicated by the forward slash to represent the root directory of the website. It means that the client is requesting the default resource from the server. Following the requested resource, you have the HTTP version being used, which is HTTP/1.1 in this example. After the last line is the host header.

00:56 It specifies the domain name of the server the client wants to communicate with. This example shows the client is requesting the resource from www.google.com.

01:06 To summarize, this specifies a GET request at the root using the HTTP/1.1 protocol. The one and only header required is the host, google.com.

01:17 The target server has enough information to make a response with this. A response is similar in structure to requests. HTTP messages have two main parts, the metadata and the body.

01:30 In the previous example, the message is all metadata with no body. The response does have two parts. The response starts with a status line that specifies the HTTP protocol, HTTP/1.1, and the status 200 OK.

01:44 After the status line, you get many key-value pairs, such as Server: gws representing all the response headers. This is the metadata of the response.

01:56 The blank line, also referred to as a newline, at the end of the headers is a divider between the header and the body. Everything that follows the blank line makes up the body.

02:07 This is the part that gets read when you’re using urllib.request.

02:12 The main representation of an HTTP message that you’ll be interacting with when using urllib.request is the HTTPResponse object.

02:20 The urllib.request module itself depends on the low-level http module, which you don’t need to interact with directly. You do end up using some of the data structures that http provides, such as HTTPResponse and HTTPMessage.

02:35 When you make a request with urllib.request.urlopen(), you get an HTTPResponse object in return. If you spend some time exploring HTTPResponse object with pprint() and dir(), you can see all the different methods and properties that belong to it.

02:51 Let’s jump into the code and see what that looks like.

02:55 Here I’m in my code editor with urllib_request.py file open. At the top of the file, you may still have urlopen imported. If not, you can do so by typing from urllib.request import urlopen. First you should import pprint.

03:15 pprint is pretty print. It’s another way of printing out data structures in a pretty, more formatted way. Next, you’re going to use urlopen() and pass in our example website, which is example.com, as response.

03:37 Then inside we want to do pprint(dir(response)), so you’ll be pretty-printing the response object.

03:47 dir() is a built-in function in Python, so it does not need to be imported. dir() is used to list the attributes of an object. It returns a list of all valid attributes and methods associated with the object you pass to it.

04:02 Once again, you can run the script with py urllib_requests.py and hit Enter. The response object is printed out in an easy-to-read format.

04:13 Within the object, you’ll see some key attributes, like .code, which represents the status HTTP code; .headers, which contains the HTTP headers as a dictionary; .url, which is the URL that was requested; and a lot more.

04:31 One way to inspect all the headers is to access the .headers attribute of the HTTPResponse object. This will return an HTTPMessage object.

04:40 You can treat an HTTPMessage like a dictionary by calling .items() on it to get all the headers and tuples. You can take a look at this in the code.

04:49 Once again, you’ll use urlopen() to make a request to example.com.

05:03 This time, you can use pass as a placeholder statement because the with statement requires an indented block. Next, you can access all the headers using the .headers attribute; you can do pprint() so that you can see all these headers in an easy-to-read format, then your response object, headers.items().

05:25 Let’s take a look at what the terminal returns back for us.

05:30 After typing python urllib_requests.py and hitting Enter, you can now see a list of tuples where you can see a header key and the corresponding value.

05:40 A few of the keys you should see are Accept-Ranges, Age, Cache-Control, Content-Type, and Date, all the way down to Connection.

05:50 You probably won’t need most of this information, but some applications do use it, such as your browser might use the headers to read the response, set cookies, and determine an appropriate cache lifetime.

06:03 You can also call .getheaders() directly on the HTTPResponse object, which will return the same list of tuples. And if you’re only interested in one header, say the Server header, then you can use the singular .getheader("Server"), or you can use the square bracket syntax on headers from HTTPMessage.

06:23 You can see what this looks like in the code.

06:27 You can make a minor adjustment to our previous example, so where you have response.headers.items(), you can just call one header. You can do this by doing response.getheader(), and then pass in "Server".

06:43 You can take a look at the terminal to see what this returns.

06:48 After typing py urllib_requests.py and hitting Enter, you’ll see that it returns 'ECS (oxr/830c)'.

07:03 Another option here to return a singular header would be to leave the headers that we had before, so response.headers["Server"]. This will return back the exact same thing that you saw in the terminal previously.

07:18 You can take a look at this now.

07:24 It isn’t likely you’ll need to interact with the headers directly like this, but now you have the tools you need to dig deeper if that need arises. Next up, you’ll learn the importance of closing HTTPResponse objects.

Become a Member to join the conversation.