Fetching a Web Page
00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll introduce you to networking concepts used when writing sockets. If you’re like me, you use the web every day. Hmm, let’s be honest. Almost every waking hour of every day. There’s a chance I do it in my sleep as well.
00:18 Just what does it mean to use the web though? Well, your computer establishes a connection to another computer, say python.org, and asks for a web page and then python.org obliges and sends back text that your browser renders.
00:33 I’ve glossed over a few things here, so let’s drill down one layer.
00:38 First off, computers don’t use names, they use numbers. So when I say python.org, what is actually happening is my computer is connecting to another computer that keeps a master list of all the domain names and maps them to numeric addresses. Even this is an oversimplification, but let’s stick with it for now.
00:58 The numeric address for python.org is 151.101. 0.223. That’s also an oversimplification as they’ve got multiple addresses, but let’s stick with this for now.
01:11 Yeah, I’m going to be saying that a lot. The good folks at python.org are probably offering all sorts of computing services. They definitely host web pages, and I suspect they also support email and other things.
01:24 To distinguish between these services, a server has another number called the port. Each service is registered against a port number. A lot of services on the internet have standardized port numbers.
01:36 For example, the web is port 80, while the encrypted web is port 443, so to connect to python.org, I’m actually connecting to 151.101.0.223, port 443. Establishing the connection to that port on their machine is only the first step, though.
01:56 Since I want the connection to be encrypted, a bunch of data goes back and forth between my browser and the server that does all the fancy math that makes sure the conversation remains private.
02:05 This is done over a protocol known as TLS. Once the TLS connection has been established, I can finally ask for the web page. I do that with yet another protocol. This one being HTTP.
02:18 Yes, there are protocols on top of protocols here. It’s kind of like the turtles, it’s protocols all the way down. The HTTP protocol allows me to say things like, “Hey, don’t compress this stuff I’m asking for”, and “Hey, this is the page I want”.
02:32 With all that done, python.org’s web server finally responds with the content that I requested.
02:39 Let’s start peeling apart some of those protocols. When I make a connection to an address, I do so using a socket. A socket is an abstraction provided by your operating system, which connects two applications.
02:52 The applications can be on the same machine or on different ones. You can send and receive data over a socket as a stream. This is similar to reading and writing data to and from a file, but the data in this case is sent and received by whoever is on the other end of the socket.
03:08 You know those protocols I was talking about? Well, there are protocols below them, and thankfully, your operating system provides an interface for creating sockets that abstracts away even the lower level stuff.
03:20
In Python, you take advantage of the OS’s mechanism through the socket
module.
03:25 Let’s move from the vague “there are protocols below your protocols” to the slightly more specific “you can think of your network as a series of layers”. At the bottom is Ethernet, which is the hardware and what is known as the data link layer.
03:40 This is your actual network card along with device drivers your operating system uses to interface with it. Ethernet isn’t the only networking hardware out there.
03:48 There are others, but as most of the internet is based upon it, it’s the one you’re most likely to bump into. Being an old man, now, I could regale you with the stories about the pain in the butt it was when you had to interconnect offices that use different hardware, but I’ll spare you. Sitting on top of the data link layer is the Internet Protocol Suite.
04:07 Yep. That’s where the internet gets its name from. This suite is split up into layers as well.
04:13 The bottom portion of the internet is the IP layer, also known as the internet protocol. This layer is responsible for sending and receiving the chunks of data known as packets over the data link layer.
04:24 This layer is fairly simple, and if something goes wrong during the transmission or the packet gets mangled, this layer does nothing about it. It’s just there to send data between two machines. On top of the IP layer, you have a choice.
04:39 The first option is UDP, which stands for User Datagram Protocol. This protocol includes checksum information, which means if your packet gets mangled, there’s a chance it can be detected.
04:50 UDP does not do error correcting, though. It often gets used for real-time communication. If you’re doing a voice call and a packet gets mangled, there’s no point in getting a correct version as you’re already playing the audio from the next packet.
05:04 And because mangled packets are problematic in most applications, your second option is TCP. This is a fully error-checked protocol. If something goes wrong, it uses the IP layer to ask the sender to repeat itself.
05:17 It’s also aware of packet order and will play you your data stream in the order it was sent.
05:24 TCP is still pretty low level though. The communication for sending a file is rightfully different from the communication for logging into a server, and as such, there are protocols that use TCP to do all those things you know and love on the internet.
05:37
This top layer is called the application layer. If you ask a Python library like requests
to fetch a web page, this entire stack gets used. Like I said, protocols all the way down.
05:50 In the next lesson, I’ll continue the networking overview and delve into IP addresses a little more.
Become a Member to join the conversation.