Shortly after we ported SHENZHEN SOLITAIRE to MS-DOS, I started working on a version of HACK*MATCH for the Sega Saturn. (I have one and love it! It’s a fun console with a surprisingly solid and acquirable library of games.) Even though I found a game engine that would have made it easy and convinced Keith to help, we ended up bailing on the project because no one would be able to play it; you’d either need a soft-modded Sega Saturn (so, no one?) or an emulator (none of which seemed to support CD audio).
I never owned an NES but I’d seen a little about the architecture and found it intriguing. (The TEC Redshift in EXAPUNKS is kind of inspired by it, although I guess most game consoles from that era are the same, with sprites and tiles and square waves and stuff.) And, considering that NES emulators are a completely solved problem, and that you can even purchase cheap, flashable NES cartridges that work in unmodded consoles, I figured we could make an NES port and people might pay attention.
And then we spent two years working on it. Ugh!
This wasn't exactly a money-making project, but we did run an IndieGoGo campaign to make and sell physical cartridges, which came with a ROM file and the source code for the game and its compiler. If you're looking for a physical cartridge you can buy one from the Zachtronics pop-up store. If you're looking for the ROM file and source code you can buy it from our itch.io page. If you just want to see some neat pictures and code snippets, keep scrolling.
The best (or worst?) part of developing a game for the NES is how limited the graphics hardware is. This is not an NES development tutorial, so I’m not going to explain it in depth, or even accurately, but you should know that:
Making a block-based action puzzle like HACK*MATCH on the NES is almost certainly going to require drawing all the blocks as the background (rather than having them be sprites). The blocks in HACK*MATCH are large, so it was easy to make them 16x16 and give each block its own color palette. The hardest part was figuring out how to have 5 different color blocks with only 4 palettes. We ended up making both the red and purple blocks use the same pink highlight color, which allowed us to store some grays in the other palettes to draw the UI frame (palettes 2 and 3) and lock blocks (palette 1).
For small graphics that needed different colors or did not align neatly to the background grid we were able to use sprites, so long as we didn’t run over the per-scanline (8) or per-screen (64) limits. The main uses of this ended up being the players' EXA cursors (so that we could use more grays and have them “hold” a block drawn behind them in the background) and most of the text (so that we could perfectly align it and use brighter colors).
We use our own game engine for all of our real games, and one of its features is that it automatically pulls all of our content (textures, sounds, music, etc.) into the code as constants so that they’re easy to reference. I ended up writing something similar for HACK*MATCH, so that we could create a bunch of PNGs and have them be automatically packed into the two tile banks of our 8K CHR-ROM.
Most of the images are 2-bit grayscale, so that we can apply any palette to them at runtime, but the tool also supports fullscreen background images that include color information and are converted into tilemaps that include information about which palette should be used for each 2x2 block of tiles. There are lots of neat sub-problems involved, like de-duplicating tiles (there isn’t enough memory to have every on-screen tile be unique) and choosing a palette that includes all the colors used in the tile.
I knew that making a game for the NES would involve a lot of optimization, but I didn't expect the most oppressive bottleneck to be the limit on how many background tiles we could change per frame. Writing to the background tilemap memory takes time, and you can only do it during a small window when the screen isn't actively being drawn, which means that you need to buffer your writes and quickly blast them out when the time comes.
This also means that large background updates (like, say, every time the block grid advances in HACK*MATCH) need to be spread out over multiple frames. In addition to being difficult to track with limited hardware, these updates need to be optimized and sequenced so that the game continues to feel responsive. It was a challenge to write rendering code that could efficiently handle either a large chunk of the screen or a bunch of small slices.
Everyone knows that you can use color palettes to do cheap fullscreen fades, so I did, and it worked! When we switch between game modes we also need to switch palettes, so instead of setting them directly we fade the palettes to and from black using a busy-loop to delay between steps. The NES system palette is arranged such that you can repeatedly subtract 0x10 from a color to make it darker.
I ended up doing something similar on the pause screen to remove all the color from the blocks as a sort of vague anti-cheat mechanism, which seems to be common in the action puzzle genre. I had originally tried to hide all of the block symbols when pausing, but updating all those tiles took too many frames and felt bad.
There are tools especially made for creating music for NES games, but it seemed cruel to make Matthew learn and use one for a single project. He ended up creating a simplified version of the game’s music using his usual tools, exported it as a MIDI file, and then made some instruments in FamiTracker to go with it. Much like with the graphics, I wrote a tool that imported the MIDI files, converted them into a simple compressed format that lived in code, and then wrote my own playback engine that hardcoded the logic for his instruments.
Sound effects are essentially hacked into the music playback engine, and briefly override the less important parts of the music to make their sounds. This was a lot trickier than I thought it would be, and involved making changes to both the music and the sound effects until everything stopped sounding terrible. I even had to move the sound effect for when a player moves their cursor to another audio channel because it was too obvious when it overrode another sound effect, let alone part of the music. It’s remarkable what developers managed to do with such limited audio hardware.
Most of the gameplay logic was implemented by directly translating the C# code from EXAPUNKS into C.
This almost worked, except for the lists of dictionaries, and other similarly complex data structures that I used in the match detection algorithm. I had to completely rewrite it to use fixed arrays, and be much faster, and not use so much RAM, and it was gross and difficult and I don’t want to talk about it. I can’t imagine having to write code like this in assembly, which must have been the case for most NES games. It’d be so hard to organize your thoughts!
Zach began this project by porting HACK*MATCH from C# to C. Using the CC65 compiler, this was enough to get the game running in an NES emulator, but performance was bad and there were some tricky compiler bugs to work around. Since the 6502 is a famously difficult compiler target, I wanted to see if I could find a way to generate good code for it. After about a year and a half, the new compiler was able to compile HACK*MATCH into efficient code for the NES.
My approach to development was inspired by the paper An Incremental Approach to Compiler Construction. The idea is to start with a very limited, but fully functioning, compiler, and then to incrementally expand its features while keeping it working at each step. With this approach, the first version of the compiler will only handle a minimal program. But even to compile this tiny program, the entire compiler pipeline must exist: tokenizer, parser, code generator, and tests.
// The first test case:
int main()
{
return 42;
}
Compiler projects often spend too much time in the parsing stage, which is unfortunate, because parsing is not very interesting.
This compiler uses a recursive descent parser because it is easy to write and to debug.
C's concept of function calls is difficult to map to the capabilities of the 6502. The main problem is how to store local variables and function parameters, which are essentially the same thing. Locals (and parameters) must be allocated when a function is called, and are expected to be quickly and randomly accessible at all times in the function.
There are many plausible solutions to this problem, with various strengths and weaknesses. I decided to keep the C stack (containing locals and parameters) in the zero page, using register X as the stack pointer. This provides fast access to stack variables with very small code. The main disadvantage is that the stack cannot exceed 256 bytes. Since HACK*MATCH uses only about 50 bytes of stack-allocated variables at most, this limitation does not matter.
On the other hand, the 6502 is very good at managing function return addresses. These are stored on the hardware-supported "return stack," completely separate from the “C stack” described here.
SomeFunction:
DEX ; allocate space for local variables
DEX
INC X,1 ; modify a local variable
LDA (X,0) ; dereference a pointer stored in a local variable
INX ; deallocate stack space and return
INX
RTS
Some interesting references that I didn't end up using:
Code generation is the most challenging part of a compiler for a 6502 machine. Arbitrary C code must be translated into fast and compact machine code. This is made very difficult by the irregularity of the 6502's instruction set. There are two key techniques used in this code generator:
First, the code generator consists of a series of overlapping “pattern matching” rules. Earlier rules match highly specific code patterns that can be translated into especially compact 6502 code; later patterns handle more general cases with larger, slower code. Detecting and optimizing special cases makes the difference between good and terrible 6502 code.
; General case: b = c + d
LDA c
CLC
ADC d
STA b
; Detect a special case: b = b + 1
INC b
To support the flexible pattern-matching code generation strategy, the AST is represented by a dynamically typed tree of nodes -- essentially Lisp s-expressions. Pattern matching cases can test any part of an AST expression, including in sub-expressions, to determine whether the case applies.
// Original C code:
drawnRows[i] = GridWidth - 1;
// The parser produces an abstract syntax tree:
(ASSIGN
(INDEX (VAR drawnRows) (VAR i))
(SUBTRACT (VAR GridWidth) (INT 1)))
// The code generator translates the AST into assembly code:
LDY 0,X ; address 0,X contains local variable 'i'
LDA #5 ; 5 == constant expression 'GridWidth - 1'
STA (10),Y ; address 10 contains global variable 'drawnRows'
Second, rules use “speculative register allocation.” Early rules attempt to perform all calculations in registers, which is fastest. However, if an expression is too complicated, the code generator falls back to a more general case that uses more CPU cycles and memory.
Whenever anything got too slow we’d just precalculate it and shove it in a table.
I started investigating how to codegen routines for multiplying and dividing by constants, but then I realized it’d be both simpler and faster to just generate a table and store it in PRG-ROM. For 8-bit multiplication and division by a constant there are only 256 possible values, which is about 1% of our available code space. We tend to use the same constants throughout the code (mostly for 2D array lookups) so there are only a few of these tables in practice, and they’re really quite fast. The compiler automatically generates one of these lookup tables when an 8-bit multiplication, division, or modulus by a constant value appears in the source code.
Similarly, when I realized my tile shuffling code was hopelessly slow (Fisher-Yates shuffle requires dividing by a variable) I precalculated 256 unique deals of tiles, stashed them in PRG-ROM, and then simply choose one at random at runtime. With 32KB of ROM it seemed like we would never run out, until we did…
It is essential to have a comprehensive set of unit tests while developing a compiler. It is easiest to fix a bug immediately after you introduce it; automated testing ensures that regressions are caught as soon as possible. I followed a simple rule: each time I added a feature or fixed a bug, I added at least one test to cover it.
Because this compiler targets a different platform -- the NES -- a simulator is needed to run the test cases. For testing purposes, I created a simulator by combining an open source implementation of the 6502 CPU with callbacks to simulate the NES' banks of ROM and RAM. The simulator loads standard iNES files.
To enable tests to emit output data, the simulator defines several hardware registers in a part of the address space not used by real NES hardware. Each test is expected to produce a certain sequence of byte or word (16 bit) values by writing to these registers.
// A simple test case: expected output and a test program.
@out 1 9
void reset()
{
OUTB = 10 - 5 - 4;
OUTB = 10 - (5 - 4);
STOP = 0;
}