This Git repository contains several pieces of code that read and write to MAZ files, associated with the .maz file extension. This document aims to give motivation for the MAZ file format, describe its structure, and provide a starting point to implementers of MAZ serialization and deserialization code.

Background

Maze generation is a favourite pastime of many programmers. The author of this document, one Katy Petrova, has cultivated a particular fascination with maze-generating, -analyzing, and -solving programs. Though many styles of mazes can be generated algorithmically, let us consider two-dimensional, “rectangular” mazes.

A rectangular maze can be drawn on graph paper. It has a width and a height, call them w and h. There are w * h cells in a rectangular maze. Each cell has 4 bits of data associated with it: the N, E, S, and W bits. The bits correspond to the cardinal directions: North, East, South, and West. ^[2‍]

Each bit with a value of 1 represents a path between a cell and its neighbour. Accordingly, a bit with value 0 represents a wall. In this document, when referring to a cell with value 0b0110, we mean a cell with the E and S bits set to 1, and the N and W bits set to 0. The order of the bits, NESW is arbitrary.

[2‍] Or the U, R,D, and L bits, if you aren't cartographically inclined.

Equivalently, a more mathematically-inclined reader may consider a graph where each vertex has at least 2 and at most 4 connections as the “stone” into which a maze is carved – a square grid graph.

So, a maze carved into the stone is just a subset of the graph's edges. I mean, most subsets are probably pretty bad mazes, but you can call them mazes, and, more importantly, represent them in MAZ format.

Then, a “perfect” maze is a spanning tree of such a graph, where there is a path from each vertex to each other vertex. All maze-generating algorithms that produce perfect mazes, really produce spanning trees. Ain’t that neat.

A Simple Approach

A generally good strategy for storing maze data as it is being generated (though not the only viable strategy, especially for certain algorithms with reduced memory requirements,) is to use a simple two-dimensional array of bytes.

Then, to carve a path between a cell at array index current in the two-dimensional array m, and its southern neighbour:

For programmer convenience, the direction bits NESW are often put into integer constants:

The above code is a C implementation of the “carve south” procedure described in pseudocode a couple of paragraphs ago. It uses pointers, but you could write a version that works with indices instead.

Likewise, the bitwise AND operator can be used to check whether a given cell has an opening in a given direction:

Note that earlier we said that we would store the maze in an array of bytes as it was being generated, and indeed, the above code uses the char datatype (8 bits on most machines) to refer to the values of cells in the maze. But we only make use of the lower 4 bits of this char, because we'll never write a value greater than 0b00001111 into the maze!

Motivation

In short: I wanted to create my own bitmap file format, specifically for storing maze data.

Like many bitmap formats, it should have a few header bytes that clarify what dimensions the maze should take. At its most basic, it's more or less equivalent to a bit-depth 4 uncompressed bitmap.

It also supports including auxiliary data, for example: Comments, annotations with associated bounding boxes, information about the generation method and the author's name, as well as labelled blobs of data containing information about each cell. (Colours, IDs, etc.)

Maze data can also be represented as a sort of LOGO-style program:

####################################
#---------------------------------+#
##################################|#
##################################|#
#+--------------------------------+#
#|##################################
#|##################################
#+--------------------------------o#
####################################

TO jshape
fd 12
rt 90
fd 1
rt 90
END
TO lshape
fd 12
lt 90
fd 1
lt 90
END
rt 90
jshape
lshape
jshape
lshape

I suspect that for some mazes, naïvely compressing the bitmap representation will be a lot less effective than compressing the procedural-program-style version. (Ones with lots of regular squiggles and straight lines?) So, another motivation is to test this theory, and if it bears fruit, support more compressable mazes!

As already discussed, while generating mazes, it's convenient to not care too much about memory usage – it's okay to use a whole byte when a nibble would do. But if we want to store the generated maze for later display/analysis/critique/etc., it seems that emitting a whole byte for each cell is very wasteful – a 50x50 maze could be represented in 1,250 bytes, but just dumping it out of memory would produce 2,500.

At time of writing, (this document is being written concurrently with the implementation of the first version of serialization and deserialization code for the MAZ format,) the author has written 5 implementations of one “Recursive Backtracker” algorithm for maze generation, across 3 different languages.

I also expect to write more maze-related utilities, for example structure analyzers and solvers, that don't generate their own mazes but must instead read them from an input file – so I need a file format, so why not make my own.

MAZ Version 1 Spec

File Header

This spells "\xe4\xe5\maze<3".

This is meant to be filled with more useful stuff further down the line – version and extension information.

The next 8 bytes are the width and height fields – 32-bit unsigned integers.

Packed Data

The next chunk contains the actual maze data. It begins with a single byte defining the packing method – at present the only defined method is Simple Raw Packing. In the future the LOGO-style method mentioned above, and compression techniques like Run Length Encoding may be supported. A value of ff indicates that this isn't a packed data chunk - this is a reserved value for extensions to the format.

The following ceil((width * height) / 2) bytes are the packed representation of the maze data. When reconstituting the packed data into a byte-array, a deserializer would consume a byte, write the high nibble to the array, move to the next location in the array, then, write the low nibble into the array.

In a maze with an odd number of cells, the last nibble in the packed data section should be set to 0000.

Reference Implementation

Special Thanks

Maddie, Stefanie, Eve, Zil, Kaylee – you know who you are, thanks for being my cheerleaders.

The MAZ File Format