Byte Order and Bit Packing
00:00 In the previous lesson, I showed you some examples of bitwise operators in practice. In this lesson, I’m going to talk about the two different ways that bytes are strung together to make larger numbers.
Before digging into endianness, let’s take a look at ways of getting at binary information for different types of data. In a previous lesson, I showed you how to use a bitmask as well as the
struct module. Well, masking only works for integers, so let’s dig into
struct some more.
Let me import the
math module so I can get at a rather famous float. Who doesn’t want some pie? No, I’m a cake man, myself. Anyway, as I mentioned, masking only works with integers. In case you were curious, floats aren’t integers.
If a byte is within ASCII range, it shows it as the equivalent of an ASCII character. That first character is the at symbol (
@) because the first byte grab from pi turns into the number 64, which is ASCII for
@. Some of the other characters here are also printable ASCII, and some of them are hex escape sequences.
01:53 What if you want to see the bits? Then you need to loop over each part of that list and change it into bits. I’m going to use a list comprehension with an eight-bit f-string format to do just that.
This loops over every byte inside of the list and prints it out using the f-string. The result is the floating-point approximation of pi in binary. Well, technically it’s a string with the binary digits inside of it, but you get my meaning.
pack() goes forward.
unpack() is expecting a series of bytes. The
bytes() function provides this. I’m passing in a comprehension that converts eight bits of that bit string at a time into an integer, essentially reconstructing the output of
pack() from just a moment ago.
And the end result is a tuple with pi inside. Why a tuple? Well,
unpack() can return multiple chunks depending on the template. I’ll do the same thing, but with a different template to demonstrate this.
03:50 That first number contains the float’s sign bit, the eleven exponent bits, and four bits of the mantissa. The remaining three numbers are sixteen bits each of the remainder of the mantissa, for a total of sixty-four bits in a double-precision float.
Each of the pack and unpack templates that I’ve used so far has had a greater than symbol (
>) in it. That symbol has to do with byte order, also known as endianness. That’s what I’ll talk about next.
04:32 Big-endian is left to right, meaning the most significant byte is at the lowest address in memory. This format is common in mainframes and some of the POWER and ARM family of processors. On the other hand, little-endian is the opposite.
04:49 It’s right-to-left. The least significant byte is at the lowest address in memory. The x86 architecture family is little-endian. An example might shine some light on the differences between these two.
05:05 The Python language isn’t named after a snake, but after Monty Python, the comedy troupe. Monty Python’s first show aired in 1969. Seems like as good a value as any to show off endianness. I’ve broken 1969 into a four-byte integer here.
05:37 The big-endian case puts the bytes in Western reading order. The first address has the leftmost byte. By contrast, little-endian stores the least significant byte in the lowest address, acting like a stack.
05:53 This is the same four bytes of content but stored in two different fashions. The terms big- and little-endian are a reference to the book Gulliver’s Travels, where Gulliver meets two warring factions whose primary argument was over which end of a boiled egg to crack first: the big end or the little end.
06:13 Depending on who you ask, this was biting satire about religious wars or a child’s book filled with nonsense. So, why would you prefer one of these over the other? Well, neither of them really has an advantage.
06:40 This makes certain kinds of math operations a tiny bit more efficient in the CPU. Big-endian, on the other hand, has the advantage of knowing the sign of a number easily. The most significant byte is always the first one, and of course, the most significant bit in that gives you easy access to the sign bit of a number. This difference isn’t a big deal when you’re dealing with a modern high-level language.
07:04 It’s all abstracted away. In the early days of the internet, it was decided that byte order on networks was big-endian. I suspect this is because most of the networking stuff was done in the early days at universities, most of which had mainframes, which were big-endian.
08:41 And this is why you have to be careful with endianness. If you assume the wrong one and reconstruct, you’ll get the wrong value. Building from the raw bytes with little-endian order gives you something that is definitely not 1969 when it was sent two bytes using big-endian order.
htons() (host to network short). It converts the host’s format, whatever that is, to the network’s format for a short int. This is
htonl() (host to network long). Same thing but converting to a long int.
10:04 JPEG is even trickier, as it supports both. Tired yet? Next up, the penultimate lesson: overloading bitwise operators for fun and profit. Hmm, sorry. My producer’s telling me that profit thing is optional.
Become a Member to join the conversation.