This Git repository contains several pieces of code that read and write to MAZ
files, associated
with the .maz
file extension. This document aims to give motivation for the MAZ
file
format, describe its structure, and provide a starting point to implementers of MAZ
serialization
and deserialization code.
Maze generation is a favourite pastime of many programmers. The author of this document, one Katy Petrova, has cultivated a particular fascination with maze-generating, -analyzing, and -solving programs. Though many styles of mazes can be generated algorithmically, let us consider two-dimensional, “rectangular” mazes.
A rectangular maze can be drawn on graph paper. It has a width and a height, call them w
and
h
. There are w * h
cells in a rectangular maze. Each cell has 4 bits of data
associated with it: the N
, E
, S
, and W
bits. The bits
correspond to the cardinal directions: North, East, South, and West.
[2]
Each bit with a value of 1
represents a path between a cell and its neighbour. Accordingly, a bit
with value 0
represents a wall. In this document, when referring to a cell with value
0b0110
, we mean a cell with the E
and S
bits set to 1
, and
the N
and W
bits set to 0
. The order of the bits, NESW
is
arbitrary.
[2]
Or the U
, R
,D
, and L
bits, if you aren't
cartographically inclined.
Equivalently, a more mathematically-inclined reader may consider a graph where each vertex has at least 2 and at most 4 connections as the “stone” into which a maze is carved – a square grid graph.
So, a maze carved into the stone is just a subset of the graph's edges. I
mean, most subsets are probably pretty bad mazes, but you can call them
mazes, and, more importantly, represent them in MAZ
format.
Then, a “perfect” maze is a spanning tree of such a graph, where there is a path from each vertex to each other vertex. All maze-generating algorithms that produce perfect mazes, really produce spanning trees. Ain’t that neat.
Left: A maze with w=3, h=3
. All cell values are 0b0000
.
Each cell is displayed with 9 characters – A more compact ASCII display
is possible, but will not reproduce all of the detail in an imperfect
maze, because it assumes the neighbours' bits are consistent with each
other.
Right: The same maze, but all cell values are 0b1111
.
This is an invalid maze for most purposes, since there are connections
leading out of bounds.
######### # ## ## # ######### ######### # ## ## # ######### ######### # ## ## # #########
# ## ## # # ## ## # # ## ## # # ## ## # # ## ## # # ## ## #
A generally good strategy for storing maze data as it is being generated (though not the only viable strategy, especially for certain algorithms with reduced memory requirements,) is to use a simple two-dimensional array of bytes.
Then, to carve a path between a cell at array index current
in the two-dimensional array
m
, and its southern neighbour:
S
bit on m[current]
.N
bit on m[w + current]
.For programmer convenience, the direction bits NESW
are often put into integer constants:
const char N = 8 // 0b1000
const char E = 4 // 0b0100
const char S = 2 // 0b0010
const char W = 1 // 0b0001
And are composed with bitwise operators:
void carve_south(char *cell, struct maze m) {
*cell |= S;
*(cell + m.w) |= N;
}
The above code is a C implementation of the “carve south” procedure described in pseudocode a couple of paragraphs ago. It uses pointers, but you could write a version that works with indices instead.
Likewise, the bitwise AND operator can be used to check whether a given cell has an opening in a given direction:
bool goes_south(char *cell) {
return (*cell & S) != 0;
}
Note that earlier we said that we would store the maze in an array of
bytes as it was being generated, and indeed, the above code uses the char
datatype (8 bits on most machines) to refer to the values of cells in
the maze. But we only make use of the lower 4 bits of this char
, because we'll never write a value
greater than 0b00001111
into the maze!
In short: I wanted to create my own bitmap file format, specifically for storing maze data.
Like many bitmap formats, it should have a few header bytes that clarify what dimensions the maze should take. At its most basic, it's more or less equivalent to a bit-depth 4 uncompressed bitmap.
It also supports including auxiliary data, for example: Comments, annotations with associated bounding boxes, information about the generation method and the author's name, as well as labelled blobs of data containing information about each cell. (Colours, IDs, etc.)
#################################### #---------------------------------+# ##################################|# ##################################|# #+--------------------------------+# #|################################## #|################################## #+--------------------------------o# ####################################
TO jshape fd 12 rt 90 fd 1 rt 90 END TO lshape fd 12 lt 90 fd 1 lt 90 END rt 90 jshape lshape jshape lshape
I suspect that for some mazes, naïvely compressing the bitmap representation will be a lot less effective than compressing the procedural-program-style version. (Ones with lots of regular squiggles and straight lines?) So, another motivation is to test this theory, and if it bears fruit, support more compressable mazes!
As already discussed, while generating mazes, it's convenient to not care too much about memory usage – it's okay to use a whole byte when a nibble would do. But if we want to store the generated maze for later display/analysis/critique/etc., it seems that emitting a whole byte for each cell is very wasteful – a 50x50 maze could be represented in 1,250 bytes, but just dumping it out of memory would produce 2,500.
At time of writing, (this document is being written concurrently with
the implementation of the first version of serialization and
deserialization code for the MAZ
format,) the author has written 5 implementations of one “Recursive
Backtracker” algorithm for maze generation, across 3 different languages.
I also expect to write more maze-related utilities, for example structure analyzers and solvers, that don't generate their own mazes but must instead read them from an input file – so I need a file format, so why not make my own.
All MAZ
-format files start with some header bytes:
e4 e5 6d 61 7a 65 3c 33
This spells "\xe4\xe5\maze<3"
.
The header contains intentionally-blank padding of 16 bytes:
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
This is meant to be filled with more useful stuff further down the line – version and extension information.
The next 8 bytes are the width
and height
fields – 32-bit unsigned integers.
00 00 00 05 00 00 00 05
The next chunk contains the actual maze data. It begins with a single
byte defining the packing method – at present the only defined method
is Simple Raw Packing. In the future the LOGO-style method mentioned
above, and compression techniques like Run Length Encoding may be
supported. A value of ff
indicates that this isn't a packed data chunk - this is a reserved value
for extensions to the format.
+======+====================+
| Byte | Packing Method |
+======+====================+
| 00 | Simple Raw Packing |
+------+--------------------+
| ff | (Reserved) |
+------+--------------------+
The following ceil((width * height) / 2)
bytes are the packed representation of the maze data. When
reconstituting the packed data into a byte-array, a deserializer would
consume a byte, write the high nibble to the array, move to the next
location in the array, then, write the low nibble into the array.
In a maze with an odd number of cells, the last nibble in the packed data section should be set to
0000
.
See export.c
and
import.c
in this repository.
Jamis Buck, for writing the Internet's best maze-related content.
Maddie, Stefanie, Eve, Zil, Kaylee – you know who you are, thanks for being my cheerleaders.
:)