Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Define the File Header and Body

00:00 Define the File Header and Body. Remember that the header follows the magic number, which uniquely identifies your file format. To avoid hard-coding literal numeric values, it makes sense to keep your magic number in a python constant, which you can refer to by name.

00:20 You can define the magic number as a read-only bytes object in Python by prepending the letter b to a string literal.

00:31 Even though a bytes literal looks similar to a Python string, it’s actually a disguised sequence of integers in the range of 0 to 255. The letters get internally turned into their ASCII codes.

00:46 Because you can only use ASCII characters in such a literal, it may sometimes be more convenient to encode them with hexdecimal notation as an alternative.

00:58 In general, dealing with bytes instead of characters is preferable when you work with binary data. The file header is perhaps best modeled as a Python data class, letting you declare the fields and their types in the expected order.

01:36 This definition closely follows the table that you saw earlier on in the course, which is shown now as a reminder. These are three mandatory fields in the header, all represented as integers in Python.

01:50 A file header object should be capable of writing its own content into a supplied binary file. You can use the standard library’s struct module to tell Python how to serialize the individual fields as C data types.

02:25 You start by dumping the bytes of your magic number and then proceed to write the remaining fields using types encoded with a special format string. The uppercase letter B stands for unsigned byte, which agrees with your expected version field.

02:43 The less than symbol (<) indicates a little-endian byte order. The number that follows communicates how many consecutive values of the same type you’re going to provide. Finally, the uppercase letter I denotes a 32-bit unsigned integer type.

02:59 So, the string means two unsigned integers, one after another, in little-endian order. It makes sense to group as many fields together as possible to limit the number of expensive system calls that write a block of data to the file.

03:12 Reading back the header from a file is fairly similar to writing it, just in reverse order. At the time of reading the header from a file, no class instance exists yet, so you must define a class method that can act on the class rather than an object.

03:30 These lines read the number of bytes equal to the length of your magic number and compare the result with a constant. Any discrepancy will stop further processing by raising an exception to indicate an unknown file format.

03:47 Here you read a single unsigned byte and assign the result to a local variable with the file format version. Note that there’s a comma right after the variable declaration, which is mandatory. Because struct.unpack always returns a tuple, even if it only has one value, the comma helps unpack the first element from that tuple.

04:14 This reads eight bytes into two unsigned integers in the little-endian byte order. The resulting tuple is then unpacked into a matching number of local variables with the assignment operator. Finally, the method returns a new instance of the FileHeader class initialized with the local variables.

04:35 Note that your file header is currently version-agnostic, so it should work with multiple file format versions. Sometimes you’ll want to read only the metadata from the header, despite not being able to process the file’s body due to an unsupported format.

04:54 When you finish reading the header, you’ll know how many squares there are and how they’re arranged in rows and columns. This will allow you to interpret the remaining part of the file correctly, which you’ll do next.

05:08 The file body is less exciting than the header because it’s merely a stream of small integers representing border patterns and the square roles in the maze.

05:17 It might be your first instinct to put them in a Python list or tuple, but the array module in the standard library provides a much better-suited data type for efficiently storing such homogenous numeric sequences.

05:31 Lists and tuples are heterogeneous, meaning you can stuff elements of all kinds into them, but the price of that versatility is they take up more memory and are slower to access than an array. Because you’ll be storing plain numbers instead of Python objects in the file on disk, you can take advantage of a faster and more suitable array data structure.

05:52 So that’s imported first,

06:01 and then the new FileBody class is added at the bottom of the file.

06:16 When passing an array instance to the file body object, you’ll need to ensure that the array was created with the right type code that identifies the underlying C type.

06:38 Here, you use the uppercase letter B to mean an array of unsigned bytes. You’ll do that right before reading the file body. To read the body, you must have read the header before so that the internal file pointer is at the right offset. Besides that, the header object is one of the parameters expected by this class method.

06:59 You use the information stored in the file header to calculate the number of remaining bytes to read by multiplying the width and height of the maze. This data is then converted to a bite array and passed to the FileBody instance.

07:23 Writing the file body becomes more straightforward when you call the array’s .tobytes() method, which takes care of serializing the items to the correct data type in the requested byte order. However, the byte order becomes irrelevant when your array contains only single-byte values. Later, you’ll use the numbers stored in your array to create Border and Role instances just like you did before.

07:50 So now you can individually write and read the file header and body, which are low-level details of your custom file format, but you still need to serialize and deserialize the higher-level maze object to allow it to be stored in the file.

08:05 And that’s what you’ll be covering in the next part of the course, starting with serializing the maze.

Become a Member to join the conversation.