Seeing the Network With Wireshark
00:00 In the previous lesson, I delved into IP addresses. In this lesson, I’ll show you some actual TCP/IP traffic. To peek into what actually happens over a socket, I’m going to write a short program that fetches the homepage from python.org.
00:14
I’ll be using the urllib
module to connect to the server and get the page. To watch the actual network traffic, I’m going to be using a tool called Wireshark.
00:23 This is an open-source network monitoring tool, which allows you to see all the packets that happen on your machine. As that’s a lot of packets, I’m going to filter the content when I capture it, and I’m going to do that using the IP address of python.org.
00:38
To find the IP address of python.org, I’m going to use the nslookup
utility available on most operating systems. This takes the name of a server and returns the address or addresses associated with it.
00:50 You might recall in an earlier lesson I mentioned that python.org has multiple addresses. To get around this, I’m going to use a network mask. This is a way of specifying a group of addresses.
01:02 It is typically denoted using an IP address and a slash, and then a bit mask. A bit mask of 16 means all of the addresses in the third and fourth blocks.
01:15
Here is my simple page downloading script. It uses the urllib
module to fetch content. This is the URL of the content I want, which is the homepage of python.org.
01:27
Note the commented out URL below it. I’ll be talking about that later. You connect to a webpage using the urllib.request
classes. urlopen()
method.
01:38
You can use it within a context so that the connection is automatically closed once the with
block is done. Within the context, the context argument is the connection, and here I’m calling the read()
method to grab everything in the page.
01:52 And finally at the bottom, I merely print out some info showing the amount of stuff retrieved and then a preview of the first 50 characters of it. Let’s try this out. Running the code
02:06 then the page is just over 51K in size. Not surprising, it starts with an HTML header. My goal here is to run this exact same script while watching with Wireshark.
02:17 Before doing that though, I need a bit of information.
02:21
First, I’ll use nslookup
to get python.org’s IP address.
02:26 The response here starts out by telling me where the lookup was done. The server in this case is actually my router, which is forwarding the request off to somewhere else.
02:35 Then there are four answers, all of which start with 151.101, which is why I mentioned net masks before. In Wireshark, I’ll use a filter so that only stuff in the 151.101 range gets captured.
02:49 Speaking of Wireshark, let me open it up.
02:53 When you first open Wireshark, you get a list of interfaces that you can monitor. It even has a little squiggly graph here to show you where there is traffic.
03:02
I want the Ethernet en0
interface, and I want to set up a filter so that you aren’t overcome with everything going on on my machine. I have let’s call it a few tabs open in my browsers.
03:16 Yep, that’s plural, browsers.
03:24 In addition to selecting an interface, I’m setting a capture filter. That limits what Wireshark will capture from the card. This filter says I only want TCP/IP traffic that is involved with the 151.101 blocks, and with that setup, I click the little fin at the top and it starts a capture.
03:42 There’s nothing to see here yet because there isn’t any traffic. Now off screen, I’ll run the download program
03:50 and look at all that content. I’m going to stop the capture. This is just a good habit. In theory, my filter should stop other stuff from showing up, but I’ve got everything I want, so why keep looking?
04:03 Let me just scroll up to the top here.
04:06 The _Protocol_ column here shows you what protocol the packet is using. Since the connection to python.org is encrypted, the protocol I’m interested in is TLS.
04:16 Note that the first time this protocol shows up is actually the fourth chunk. TCP needs to chatter a bit to establish a connection before you can move further up the stack to the application. The ‘Client Hello’ message here is the beginning of the TLS protocol, establishing a connection to python.org.
04:33 When I click on the content, you can see all the data that went over the wire. Buried in here is the name of the server, and at the bottom the fact that it’s going to use the HTTP protocol inside of the TLS protocol.
04:48 I could walk you through all of these packets, but since they’re encrypted, there isn’t much to see. The highlighted red item here is the close of the connection.
05:00 Note how you can actually see the ports involved in this connection. 56395 is the ephemeral port on my side while 443 is the TLS port on python.org. Encryption is useful, but gets in the way of understanding, so let’s try an unencrypted connection.
05:19 You might remember there was another URL in my download script. It was for a site called httpforever.com. It does not support HTTPS. This is actually handy if you happen to need a non-HTTPS connection.
05:33 Say, when using a hotel’s broken Wi-Fi, click this to be allowed on the network to be able to get on the actual network mechanism. Off screen I’m going to change the URL, then I’m going to click _Close Capture_,
05:51 and this time I’m going to need a different filter.
05:58 Now I will start the capture, run the download in the off screen, and stop the capture. The same kind of thing happens here, as it did before. TCP still needs to squawk back and forth a bit, and this time, instead of TLS, you see the HTTP connection.
06:17
Since HTTP isn’t encrypted, there’s a lot more to look at here on the right-hand side. These are the headers that urllib
sent to the server.
06:27 In fact, Wireshark even understands some protocols, so you can expand the part on the left and it’ll show all of this in a more readable form.
06:35 These are those same headers.
06:39 Let’s try a little further down in the communication. This is the second chunk using the HTTP protocol and it has all of the actual content.
06:51 If I open up the text-based data,
06:56
this is the HTML, all those t
s are tab characters. I guess I know which side of the tabs versus spaces argument the writer of HTTP Forever falls on.
07:08
Now that you’ve seen how all these pieces fit together, it’s time to use Python’s socket
library to do some of this yourself.
Become a Member to join the conversation.