Where we left off
The v1.0 post ended at a kind of natural stopping point: the CPU was correct, the TUI was usable, the
release pipeline was paranoid, and a 718-line essay had been written about it. The repo had a README that didn’t
lie, a state-format.md that froze the on-disk schema at v1, and a SECURITY.md that pointed at a private-advisory
flow nobody was ever going to use. If chippy had stopped there it would still have been the most complete piece of
software I’ve shipped to myself in a long time.
It didn’t stop there. Two days after v1.0 went out the door I tagged v1.1.0, and the day after that I tagged v1.1.1. This post is about both of them, because they’re really one release split over a weekend by a feature sequence I hadn’t fully appreciated: the moment chippy became a debug-adapter server is the same moment a second project — an NES emulator named nessy — became viable. v1.1.0 is the protocol release. v1.1.1 is the patch that made the protocol survive contact with a second process.
I’m going to keep writing one of these per release for as long as I have releases left to talk about. The cadence will be roughly weekly until we reach the current head and then I’ll fold in a separate piece about nessy. There are six chippy releases past v1.0 right now (v1.1.0 → v1.6.0) and eight nessy releases (v0.1.0 → v0.8.0), and the two projects are tightly enough entangled that you can’t really tell the story of one without the other.
OK, v1.1 — let’s do it.
The thesis: chippy as a library with a debugger, not a debugger with a library
The thing that v1.0 didn’t quite admit out loud is that the TUI was the only consumer of the CPU core. That’s fine
for shipping a tool — the TUI was the whole point — but it leaves a lot of value on the table. The same 6502 core
that runs a Klaus test rig also wants to run an NES game. The same DAP transport that drives a VS Code launch wants
to also drive a remote attach against a running program. The same expr package that evaluates a TUI watch wants to
evaluate a conditional breakpoint over the wire.
The release planning for v1.1.0 (issue #182, in case you ever want to dig through the commit log) collapsed all of that into a single thesis: chippy stops being a 6502 emulator with a debug UI bolted on, and starts being a debuggable 6502 library that ships with one specific frontend. The TUI is just the first frontend. The DAP server is the seam every other frontend hangs off of.
This is the kind of architectural decision you can make in a few hours and pay for over a few weeks. The actual landings broke down into four threads:
- A Debug Adapter Protocol server and a Go client library for the same protocol.
- A
Sourceinterface so the TUI runs against an in-process CPU or a remote DAP server identically. - A new CPU variant —
VariantNES, modeled on the Ricoh 2A03 used in the original NES. - An optional
Tickerhook on the bus, so peripherals can advance per CPU cycle without costing a cycle when they’re not there.
Each of those four threads has a story. Let me take them in the order they made sense at the time.
Thread 1 — DAP, two halves
The Debug Adapter Protocol is one of those specs you read and immediately want to use, because it’s a JSON-RPC
flavor that someone actually thought through. There are requests, responses, and events. There’s a small set of
required messages (initialize, launch/attach, disconnect, setBreakpoints, continue, next, stepIn,
stepOut, pause, stackTrace, scopes, variables, evaluate, disassemble, readMemory, writeMemory) and
a much larger set of optional ones that you can fill in as you grow. Editors that speak DAP (VS Code, nvim-dap,
emacs dape, IntelliJ via a plugin) will drive any debug adapter that implements the required surface, which means
you write the protocol once and get N editor integrations for free.
For v1.1.0 I wrote both halves.
The server (dap/server.go) is transport-agnostic: it takes any io.Reader and any io.Writer, which means it
runs over stdio (chippy -dap stdio, the default for editors), over TCP (chippy -dap :PORT, for remote attach),
and over an in-memory pipe (mostly for tests, but the real fruit of this comes way later — see ADR 0008 and the
v1.5 post). Dispatch is a switch on command, with handlers under cpuMu so two concurrent requests can’t tear
the CPU’s state out from under each other. That mutex contract is the most important load-bearing detail in the
whole protocol surface and we’ll be coming back to it.
The v1.1.0 request set covers the pre-1.0 ground from v0.1.0–v0.3.0 — initialize/launch/disconnect handshake, the six step controls, stackTrace + scopes + variables + setVariable, source + instruction breakpoints, disassemble + readMemory + writeMemory + evaluate, stepBack and function breakpoints and exception filters and the conditional / log breakpoint variations. The CHANGELOG has the gory PR list. The thing that changed in v1.1.0 isn’t the request menu, which was already mostly there. It’s that the server now runs cleanly next to a separately-built emulator that owns its own CPU.
The client (dap/client.go) is the same protocol from the other side. client.Request(cmd, args, out)
serializes a request, blocks on the matching response, and unmarshals the body into your typed struct. There are
event callbacks too — OnStopped, OnOutput, OnTerminated, OnContinued, and a custom-event channel I’ll come
back to. The client is a few hundred lines and it is the single thing that made the next decade of work possible,
because every later frontend talks to chippy through the client library, not through a hand-rolled JSON wrapper.
The pattern I’m trying to advertise here is one I want to internalize for future projects: if you write a protocol, write the client too, in the same PR, and make the server’s own tests use the client. The cost is small. The payoff is that you discover every ambiguity in your protocol at exactly the moment you can still fix them.
Thread 2 — Source: same TUI, two backends
The first place the client got used was inside chippy’s own TUI. Up through v1.0, the TUI ran one and only one
debug target: the in-process CPU it had constructed itself. Every panel — disassembly, memory, stack, flags,
watch — reached into a *cpu.CPU and read its fields directly. That works fine when there’s exactly one CPU and
it lives in this process. It does not work at all when the CPU is in another process and you only have a
JSON pipe to it.
The fix was to put an interface between the TUI and the CPU it was driving:
type Source interface {
Step() error
StepOver() error
StepOut() error
Continue() error
Reset() error
SetBreakpoint(addr uint16) error
ClearBreakpoint(addr uint16) error
// ...
}
(The real interface is a bit bigger and has more methods for breakpoint syncing, source-line resolution, and live state read-back. I’m trimming it down here for the post.)
A LocalSource wraps a *cpu.CPU directly and does what the TUI used to do. A RemoteSource wraps a *dap.Client
and translates the same method calls into DAP requests. The TUI doesn’t know — and doesn’t care — which one it
has. It stores them behind the same Source field and calls the same methods.
The thing this interface is also hiding is the mirror. The TUI’s display panels still want a cpu.CPU and a
cpu.RAM to read from. A LocalSource returns the real ones. A RemoteSource keeps shadow copies — a
cpu.CPU populated from the registers/flags it pulls over variables, a cpu.RAM populated from readMemory
calls — and refreshes them when the remote stops. The panels never see the wire. They see the mirror.
This is the lever I’ll pull at v1.5 to migrate the Registers panel to read through DAP even in local mode
(an in-process DAP server attached to the same CPU, zero-marshal transport, ~90× faster than the unix-socket
case). For now, in v1.1.0, the only thing it does is let chippy -dap-attach HOST:PORT open the chippy TUI
against a remote target. The remote target, at this point, is the same chippy binary running with -dap :14785.
That’ll change in twenty-four hours.
Thread 3 — VariantNES: the NES CPU is just NMOS minus BCD
Here is the chunk of v1.1.0 I am the most quietly pleased about, because the work was almost zero and the consequence was enormous.
The CPU in the original NES is a Ricoh 2A03. It’s a 6502 with one specific change: the ADC and SBC instructions don’t honor the decimal-mode flag. Nintendo had Ricoh disable the BCD logic — the rumor is licensing, although the truth is probably “we didn’t need it, why pay for it” — and shipped what is otherwise a stock NMOS 6502. Every other opcode, every other addressing mode, every other flag behavior is identical to the chip that ran the Apple I.
So adding a new CPU variant for the NES was, in code, a six-line file:
// VariantNES is the Ricoh 2A03 — NMOS 6502 with BCD disabled in ADC/SBC.
var VariantNES = &Variant{
Name: "NES",
Init: func(c *CPU) {
c.opcodes = OpcodesNMOS // shared table with NMOS
c.bcdDisabled = true // ADC/SBC ignore D flag
},
}
The bcdDisabled flag is a single branch inside opADC / opSBC. The opcode table is shared with NMOS — no
copy, no override — because every other opcode is identical. The variant struct is a pointer (D1 in the v1.0 ADR:
variant-based dispatch via a per-CPU opcode-table pointer), so picking up NES is a one-line construction:
c := cpu.NewVariant(bus, cpu.VariantNES)
The consequence: chippy’s 6502 core can now run NES code. Not the NES, not yet — there’s no PPU, no APU, no cart, no controller — but the CPU half of the problem is solved. From here, the work to build a real NES emulator is everything except the CPU, which is a much shorter list than “everything including the CPU.”
That observation is what unlocked nessy. I’m getting ahead of myself; the nessy story really lives in the v1.2
post. But the v1.1 release notes hand-wave at it — there’s a cmd/nessy directory that lands in this same PR
chain, with an Ebiten game loop and a DAP listener and three hand-rolled ca65 demos that draw “HELLO NESSY” on the
screen and bounce a tile around inside a vblank handler. The headline of v1.1.0 is the DAP server. The seed of
v1.1.0 is the NES variant.
(There’s also one small caveat to put on a sticky note for later: the per-cycle CPU↔PPU interleave path that lands
in v1.2.0 is gated on VariantNES, which means it inherits BCD-disabled. The Tom Harte bus-trace harness later
wants per-cycle behavior on VariantNMOS with BCD intact, which is going to force the per-cycle path to be
toggleable on NMOS too. That’s a v1.5 problem. Logged.)
Thread 4 — the bus ticker that costs nothing when it’s not there
The fourth thing v1.1.0 landed is a one-method interface that looks like it doesn’t matter and turns out to be the substrate every later cycle-accuracy improvement runs on.
type Ticker interface {
Tick(cycles int)
}
That’s it. That is the whole interface.
The intent: peripherals that need to advance per CPU cycle (an NES PPU running at 3 PPU dots per CPU cycle, an APU
running its frame counter, a cart with a scanline IRQ) implement Ticker. The bus checks if its peripheral is also
a Ticker at SetBus time and caches the type assertion in a struct field. Step then checks one field — a nil
in the case where no ticker is wired, a function-call-shaped value otherwise — instead of doing a per-call
type-assertion that the optimizer can’t elide.
This is one of those choices that sounds like premature optimization until you remember the constraint chippy is designed to honor: the bare debugger CPU stays single-digit nanoseconds per step. The CI has a perfgate that fails the build if a refactor regresses the inner loop by more than a few percent. The ticker has to be free when it’s not there, or every TUI-only consumer pays a tax for the existence of a feature they don’t use.
The thing this hook unlocks is more important than the hook itself. In v1.2.0 the per-cycle CPU↔PPU interleave —
the whole Mesen2-aligned master-clock model — is implemented as code that runs inside Step after the Ticker
check. In v1.5.0 the optional cpu.SetAccessHook(func(addr, kind)) for the debugger heatmap is layered on the same
“one nil-check per access, free when unused” idea. The pattern in both cases is the same: have the bare core
expose the hook, but cost the consumer nothing when the hook is unset.
I think this is the single most reusable design rule I picked up writing chippy. Optional behavior costs zero when absent. That’s the contract. Everything else flows from it.
v1.1.0 in five lines
If you want the actual changelog without the philosophy:
- DAP server + DAP client library, transport-agnostic over
io.Reader/io.Writer. Sourceinterface so the TUI runs against either a local CPU or a remote DAP target.cpu.VariantNES(Ricoh 2A03: NMOS minus decimal-mode BCD).cpu.Tickerinterface + cachedSetBustype-assertion, zero cost when unused.- A
cmd/nessyscaffold (Ebiten game loop, NROM-only mapper, three hand-rolled ca65 demos) that runs entirely on the chippy core’s CPU and DAP server.
And one quiet decision in the release pipeline: the monorepo grew a chippy + nessy release split that lets a tag
push cut binaries for one tool without disturbing the other. I’ll come back to why the split happened the way it
did when I write the v1.2 carve-out, but the seam is already here.
Twenty-four hours later: v1.1.1
This is the part where the protocol got tested by reality.
The v1.1.0 launcher (chippy -nessy ROM) was supposed to do one thing: spawn nessy in the background, dial its
DAP listener, and open the TUI in attach mode paused at the reset vector. Five integration tests passed locally.
The thing that actually happened the first time I ran it on a fresh machine was that the TUI opened on a 64 KiB
swath of BRK instructions, the disassembly panel showed 00 00 00 00…, the source view was blank, and the
program counter was happily wandering through what looked like virgin RAM.
Three separate bugs were colliding.
Bug 1 — both processes were stepping the CPU
When the v1.1.0 nessy binary launched, it started its game loop, which ticked the CPU at ~29830 cycles per frame.
When the chippy TUI attached over DAP and the user hit r to run, the DAP server’s run loop also started
stepping the CPU. Both processes were now driving the same instance through different goroutines. The CPU mutex
(cpuMu — the load-bearing mutex from Thread 1) prevented memory corruption, but it didn’t prevent the semantic
race: the game loop was advancing the PC while the DAP server was trying to single-step it, and the user got a
sort of nondeterministic chunky drift that looked from the outside like “step does the wrong thing about a third
of the time.”
The fix is the ownership model that becomes the canonical concurrency contract for every later DAP feature:
On DAP attach, the host pauses its own loop. The DAP server owns CPU execution from the moment the attach handshake completes until the disconnect. Continue / pause / step go through the server. The host’s game loop resumes only after
disconnect(or the launcher’sterminate) restores it.
A two-line change in cmd/nessy/main.go flipped a dapAttached boolean inside the attach handler. The game loop
checks it at the top of Update and bails. That’s it. The bug was conceptual, not mechanical — the mechanics
were a one-line set, a one-line check.
This is the rule I now write down on a sticky note any time two processes share state: one of them owns it. You can hand ownership back and forth, but at any moment exactly one party is allowed to mutate. Locks aren’t enough; you also need a clear who-owns-execution model.
Bug 2 — the TUI’s RAM mirror was zero
The reason the disassembly looked like 64 KiB of BRK is that BRK is opcode 0x00. The TUI’s mirror RAM had
been initialized to zeros when the TUI started, and v1.1.0 never refilled it from the remote target. There was no
RefreshMemory call. The mirror sat at zero until the user manually scrolled past an address whose disassembly
forced a readMemory round-trip — which, in a freshly-attached session, was approximately never.
The fix is mostly mechanical. RemoteSource.Attach now performs a one-time readMemory(0, 65536) after the
handshake completes, populating the mirror in full. Subsequent step and continue calls do incremental
re-reads of the regions the stop event flagged as modified.
There is a quieter subtlety inside this fix. The DAP readMemory handler can’t naively call bus.Read(addr)
because some addresses are memory-mapped registers with read side effects — a $2002 read on the NES PPU clears
the vblank flag; a $4015 read on the APU clears the frame IRQ. A debugger has no business triggering those. So
readMemory routes through MMIO.Peek, a side-effect-free path that reads the underlying state without invoking
the peripheral’s Read function. Peripherals that need to expose a peek route implement it explicitly. Most
just return the RAM-shadowed value.
This is one of those distinctions (“read vs. peek”) that you read in a debugger spec and shrug at, and then you build a remote debugger against memory-mapped I/O and you understand why the distinction is load-bearing.
Bug 3 — .dbg source resolution
The third bug is the smallest and the most embarrassing. cc65’s .dbg file references source files by absolute
path. The nessy binary had been built from /Users/nkane/dev/.../nessy/roms/demos/hello-bg/hello-bg.s, the chippy
binary was running from a different working directory, and the TUI was trying to open the file at the literal
absolute path baked into the .dbg. On my own machine this worked fine because the path happened to still be
valid. On any other machine, including a CI runner, the source view was blank.
The fix is to try multiple resolution strategies in order: (1) the absolute path from .dbg verbatim; (2) the
path relative to the .dbg file’s directory; (3) the path relative to the current working directory; (4) the path
relative to the ROM’s directory. First hit wins. None of them are clever. The fix is a tryPaths() function and
fifteen lines of test data.
Bug 4 — the breakpoint sync I missed
There’s a fourth bug v1.1.1 fixed that I want to mention because it’s the kind of thing you don’t notice until you’re three hours into using your own tool:
When the user typed :bp $C123 to set a breakpoint, the TUI added it to its local set and the panels rendered it.
But the remote server didn’t know about it. The next continue ran straight past $C123 and the user was left
staring at the wrong frame, wondering why their breakpoint didn’t fire.
The fix is to forward the breakpoint set to the remote on every mutation: :bp sets it, :run (which the user
hits before continue) syncs the full set as a clean state, :bd deletes. The DAP server then short-circuits the
hit check on its side without needing a TUI round-trip. This becomes important when conditional breakpoints land
in v1.3.0 (the condition evaluates on the server, against fresh state), but the wiring was put in here.
What v1.1.1 actually means
If v1.1.0 is the protocol release, v1.1.1 is the the protocol is now usable across a process boundary release. The actual diff is small — maybe two hundred lines once you exclude tests — but every one of the four bugs is a case of “we shipped a thing that worked in tests but didn’t survive a single end-to-end run against a real second process.” That’s a category of bug I want to be slightly paranoid about for the rest of this project.
Specifically, the rule I now hold: for any feature that involves two processes, the first integration test must
be a real exec.Command of one binary attaching to another over a real socket, and assert state on both sides.
Mock pipes are fine for unit tests. They lie about exactly the failure modes you want to catch.
cmd/nessy/wait_for_debugger_test.go is that test now. It spawns a real nessy binary, dials its TCP listener,
attaches a real DAP client, and asserts that the PC is at the reset vector — the exact sequence of events that
the -wait-for-debugger boot gate guarantees. If anything in the launcher chain breaks, that test goes red.
What v1.1 sets up
Three things in this release are bigger than they look.
The DAP server is now the extension seam. Every later debugger feature lands as a new DAP request or event — conditional breakpoints in v1.3, a custom-request escape hatch in v1.4, an in-process zero-marshal transport plus a live-state streaming event in v1.5, dirty-region memory streaming in v1.6. The protocol was the right place to invest.
The TUI is no longer the only frontend. It is a frontend. The same Source interface that lets the TUI
attach over a wire is the lever I’ll pull to migrate panels to read through DAP even in local mode. By v1.5,
the Registers panel renders from Source.Registers() over an in-process DAP server, and I have a working
prototype of the v2.0 architecture: TUI as a DAP client, full stop, no special-case for local execution.
The NES variant is a doorway. v1.2 will carve nessy out into its own repository. v1.5 will introduce host
debug hooks so nessy can layer NES-aware breakpoints (scanline == 30), step granularities (run-to-NMI), and
custom debug-state channels (PPU/OAM/mapper inspectors) on top of chippy’s protocol without forking it. That whole
arc starts with the one-line VariantNES decision in this release.
Next week: v1.2.0. The library gets promoted out of internal/. The per-cycle CPU↔PPU interleave lands so every
Blargg accuracy ROM passes. And nessy moves out of the monorepo and into its own repository.
The 6502 is the easy part. The work around it has its own gravity.