Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Dealing With Complications

Christopher Trudeau

Programming Sockets in Python Christopher Trudeau 05:41

Transcript
Discussion

00:00 In the previous lesson, I worked our application protocol into a server. In this lesson, I’ll talk about all the things I’ve glossed over that might come along to bite you in the umption.

00:10 In order to try and keep the code along the way simpler to read, I’ve made a bunch of assumptions. These could cause issues in the real world. For starters, I assumed that it would be Python 3 programs talking to other Python 3 programs, and as such, all the text was UTF-8.

00:26 The real world is more complicated than that. You can get around this in two ways. You could restrict all your headers to be ASCII so you don’t have to care about their encoding, or you could include any text content’s encoding information as an additional header.

00:41 I also assumed that my header block would never exceed 64 kilobytes as I used two bytes in my proto-header to store the header block’s size. You could get away with this, but you should have code in the serializer that screams if the header block gets too large.

00:56 The code is rather nonchalant about converting message content to bytes. If it is text, it assumes UTF-8, and if it’s binary, it just converts it. Not all Python objects can be serialized this way, so depending on what the caller passed in as message content, exceptions might occur.

01:13 You could detect this and try to handle it, or you could just let the exception surface and leave the coder to figure it out for themselves.

01:21 As I mentioned when I was showing you the code, a better implementation of Message would be capable of having its .content and .content_type attributes modified and still work.

01:30 You could do this by not caching the headers and recalculating them just before serialization. It means a bit of duplicated code, but it’s a bit more programmer-proof.

01:40 Throughout this course, I’ve been using IPv4. The world is sort of thinking about transitioning to IPv6, sort of. A few years back, everyone was desperate for more IP addresses, so there was a panic, but because IPv6 wasn’t getting adopted all that quickly, everyone resorted to private networks and proxies and the problem lessened.

02:02 IPv6 has a significantly larger address space than version 4, and to use it, you construct a socket with the AF_INET6 flag instead of AF_INET. Version 6 addresses can be up to eight groups of 16-bit hex digits. I say up to as a lot of them end in mostly zeros, and the protocol allows you to drop the zeros off the end.

02:24 This gives plenty of room for expansion. In addition to using one of these longer addresses, you’ll also have to add some additional flags to your bind() and connect() steps as well.

02:33 I’ll leave the documentation delving to you if you plan on going down this path. As a quick aside, in case you were curious, there was an IP version 5. It wasn’t meant as a replacement to 4, just to add features.

02:45 It was designed to do multicasting and broadcasting with the thought that IP could be used for media transmission. It didn’t take off. In fact, I don’t think it ever fully made it out of committee.

02:55 Two things happened. IPv4 became the de facto standard pretty much everywhere and speeds increased so much that multicasting became less important. And two, IPv6 came along, which is more compatible with 4 and has all those same features of 5 built in.

03:11 Speaking of IP addresses, throughout the course, I’ve been using those instead of hostnames.

03:16 You can open a socket using a hostname instead, but that means the name must be resolved into an IP address. That resolution is done by the operating system and it could be based on local configuration or a separate call to a DNS server.

03:29 Yep. Opening a socket might require the system to open a socket for you so you can open your socket. This can get a little messy and the Python documentation warns that behavior may not be deterministic.

03:40 Because of the way the OS does this as well as caching and other things, you might not get the same address if you do another lookup to the same place a second time.

03:50 Part of this complication comes from the fact that a hostname can resolve into multiple IP addresses. You saw this back in the beginning with python.org.

03:58 Python, the language, just uses whichever address is first. The DNS server doesn’t necessarily guarantee the same address order in each response. In fact, it may intentionally shuffle things around to share out the load.

04:12 And finally, I also made an assumption about the way the machine I was running on stored information. It turns out that different computer architectures store bytes in different ways, and this can cause complications.

04:23 Consider the two hex bytes F03B. These get stored as 16 bits. A big-endian machine stores the most significant byte in these two bytes first. The storage in memory looks the same way you’d read the value, but there are also little-endian machines out there that store the bytes the other way around.

04:43 In this case, the least significant bytes get stored first.

04:47 Although this may seem counterintuitive from a reading standpoint, a lot of computers use stacks to do calculations and storing things the other way around makes it easier to grab the content off the stack.

04:57 Because the world is messy, most network protocols actually use big-endian for transmission, but the ubiquitous PC is a little-endian architecture. To deal with this problem, you need more information in your proto-header.

05:10 In fact, if you’re familiar with a byte order mark used to indicate that a stream is Unicode, one of its purposes is to indicate the endianness of the stream.

05:19 If you think back to the pack() call that I used to write, the proto-header, its indicator was >H. The greater than in that argument actually specified which endianness to use.

05:33 One more lesson to go before wrapping up. Next, I’ll give you a quick tour of some of the tools out there that can help you out.

Become a Member to join the conversation.