Load the Maze From a Binary File

Mazes in Python: Build, Visualize, Store, and Solve Darren Jones 04:55

Transcript
Discussion

00:00 Load the Maze From a Binary File. After completing part one of this course, you can build and draw the maze. However, creating it by hand in Python takes a lot of error-prone and tedious effort.

00:13 By the end of this step, you’ll have a custom file format for the persistent storage of mazes, which you’ll be able to save and load at the push of a button. Besides that, you’ll also be able to load the sample mazes supplied with the course materials.

00:29 Squares in the maze are very much like pixels in an image. One of the sample mazes included in the supporting materials has several hundred of them, but you could very easily be looking at thousands of squares in some larger mazes. If you were to represent the squares as text, then they would quickly amount to a huge file size.

00:48 Given that you already devised a compact representation for border patterns that leverages a bit field, it makes perfect sense to define a binary file format instead. When defining a binary file format, the first thing you need to decide on is whether you’ll store values that can fit in a single byte or more.

01:07 The reason is that different computer architectures have different endianness, which is a fancy way of saying that they read a sequence of bytes in the opposite order. The two standards for byte order are called big-endian and little-endian.

01:23 In big-endian, the most significant byte comes first, while in little-endian, the least significant byte comes first.

01:33 If you wrote a numeric value using one standard, but read it with another, then you’ll get a garbled result, just as if you applied the wrong character encoding.

01:44 You are going to need more than one byte to store values like the width or height of the maze, which can be far greater than the maximum byte value of 255.

01:53 Because many popular file formats stick with a little-endian standard, so will you. Binary files often begin with a fixed sequence of bytes to uniquely identify their particular format. For example, Java’s class files always start with the four bytes seen on-screen, which is one of the geeky references to coffee in the language’s vast ecosystem.

02:16 Such an initial value is known as the magic number, although it can sometimes be textual.

02:24 Tools like the Unix file command leverage known magic numbers to detect file types. In this case, the tool correctly recognized an SVG graphics file

02:39 and the contents of a TOML file as ASCII text.

02:46 You can choose any magic number for your custom maze file format, but in this course you’ll use the four ASCII characters MAZE in uppercase.

02:54 The hexadecimal representation seen on screen looks less coffee-like, but it’s just as unique and recognizable. The magic number is usually followed by a file header, which provides useful metadata about the actual data in the file.

03:09 The header has a fixed structure consisting of fields, such as the width of the maze. However, the header may have a variable length—for example, due to enemy positions kept in the lookup table or other optional fields.

03:25 If you foresee introducing breaking changes to your file format in the future—for example, due to new features—then you should consider including a version field either at the beginning or before the header.

03:36 This should enable you to handle slight differences across file format versions. Each header field must have a known type, size, and offset measured in bytes since the beginning of the file, so you know where the next field begins.

03:52 The maze header file will have the three fields seen on-screen. The first field is a single byte with the file format version, which you may increment by one with each new release.

04:03 Notice that the version field starts with an offset of four bytes because the magic number occupies the first four bytes in the file. The other two fields are unsigned integers taking up four bytes each, which means they can have values between zero and over four billion, which should be enough for anybody.

04:22 Once you get through the header and have complete information about the amount and layout of the data, you may start reading the file body, which is the main part of the file. It’s where you’ll find most of your data, and in this case, the body will be a very long one-dimensional array of numbers representing the various border patterns around the squares and their roles.

04:44 Now that you’ve seen how the maze data can be represented in a binary file, in the next section you’ll look into packing the data of the file body using as few bytes as possible.

Become a Member to join the conversation.