Where we left off

The v1.0 post ended at a kind of natural stopping point: the CPU was correct, the TUI was usable, the release pipeline was paranoid, and a 718-line essay had been written about it. The repo had a README that didn’t lie, a state-format.md that froze the on-disk schema at v1, and a SECURITY.md that pointed at a private-advisory flow nobody was ever going to use. If chippy had stopped there it would still have been the most complete piece of software I’ve shipped to myself in a long time.

It didn’t stop there. Two days after v1.0 went out the door I tagged v1.1.0, and the day after that I tagged v1.1.1. This post is about both of them, because they’re really one release split over a weekend by a feature sequence I hadn’t fully appreciated: the moment chippy became a debug-adapter server is the same moment a second project — an NES emulator named nessy — became viable. v1.1.0 is the protocol release. v1.1.1 is the patch that made the protocol survive contact with a second process.

I’m going to keep writing one of these per release for as long as I have releases left to talk about. The cadence will be roughly weekly until we reach the current head and then I’ll fold in a separate piece about nessy. There are six chippy releases past v1.0 right now (v1.1.0 → v1.6.0) and eight nessy releases (v0.1.0 → v0.8.0), and the two projects are tightly enough entangled that you can’t really tell the story of one without the other.

OK, v1.1 — let’s do it.

The thesis: chippy as a library with a debugger, not a debugger with a library

The thing that v1.0 didn’t quite admit out loud is that the TUI was the only consumer of the CPU core. That’s fine for shipping a tool — the TUI was the whole point — but it leaves a lot of value on the table. The same 6502 core that runs a Klaus test rig also wants to run an NES game. The same DAP transport that drives a VS Code launch wants to also drive a remote attach against a running program. The same expr package that evaluates a TUI watch wants to evaluate a conditional breakpoint over the wire.

The release planning for v1.1.0 (issue #182, in case you ever want to dig through the commit log) collapsed all of that into a single thesis: chippy stops being a 6502 emulator with a debug UI bolted on, and starts being a debuggable 6502 library that ships with one specific frontend. The TUI is just the first frontend. The DAP server is the seam every other frontend hangs off of.

This is the kind of architectural decision you can make in a few hours and pay for over a few weeks. The actual landings broke down into four threads:

  1. A Debug Adapter Protocol server and a Go client library for the same protocol.
  2. A Source interface so the TUI runs against an in-process CPU or a remote DAP server identically.
  3. A new CPU variant — VariantNES, modeled on the Ricoh 2A03 used in the original NES.
  4. An optional Ticker hook on the bus, so peripherals can advance per CPU cycle without costing a cycle when they’re not there.

Each of those four threads has a story. Let me take them in the order they made sense at the time.

Thread 1 — DAP, two halves

The Debug Adapter Protocol is one of those specs you read and immediately want to use, because it’s a JSON-RPC flavor that someone actually thought through. There are requests, responses, and events. There’s a small set of required messages (initialize, launch/attach, disconnect, setBreakpoints, continue, next, stepIn, stepOut, pause, stackTrace, scopes, variables, evaluate, disassemble, readMemory, writeMemory) and a much larger set of optional ones that you can fill in as you grow. Editors that speak DAP (VS Code, nvim-dap, emacs dape, IntelliJ via a plugin) will drive any debug adapter that implements the required surface, which means you write the protocol once and get N editor integrations for free.

For v1.1.0 I wrote both halves.

The server (dap/server.go) is transport-agnostic: it takes any io.Reader and any io.Writer, which means it runs over stdio (chippy -dap stdio, the default for editors), over TCP (chippy -dap :PORT, for remote attach), and over an in-memory pipe (mostly for tests, but the real fruit of this comes way later — see ADR 0008 and the v1.5 post). Dispatch is a switch on command, with handlers under cpuMu so two concurrent requests can’t tear the CPU’s state out from under each other. That mutex contract is the most important load-bearing detail in the whole protocol surface and we’ll be coming back to it.

The v1.1.0 request set covers the pre-1.0 ground from v0.1.0–v0.3.0 — initialize/launch/disconnect handshake, the six step controls, stackTrace + scopes + variables + setVariable, source + instruction breakpoints, disassemble + readMemory + writeMemory + evaluate, stepBack and function breakpoints and exception filters and the conditional / log breakpoint variations. The CHANGELOG has the gory PR list. The thing that changed in v1.1.0 isn’t the request menu, which was already mostly there. It’s that the server now runs cleanly next to a separately-built emulator that owns its own CPU.

The client (dap/client.go) is the same protocol from the other side. client.Request(cmd, args, out) serializes a request, blocks on the matching response, and unmarshals the body into your typed struct. There are event callbacks too — OnStopped, OnOutput, OnTerminated, OnContinued, and a custom-event channel I’ll come back to. The client is a few hundred lines and it is the single thing that made the next decade of work possible, because every later frontend talks to chippy through the client library, not through a hand-rolled JSON wrapper.

The pattern I’m trying to advertise here is one I want to internalize for future projects: if you write a protocol, write the client too, in the same PR, and make the server’s own tests use the client. The cost is small. The payoff is that you discover every ambiguity in your protocol at exactly the moment you can still fix them.

Thread 2 — Source: same TUI, two backends

The first place the client got used was inside chippy’s own TUI. Up through v1.0, the TUI ran one and only one debug target: the in-process CPU it had constructed itself. Every panel — disassembly, memory, stack, flags, watch — reached into a *cpu.CPU and read its fields directly. That works fine when there’s exactly one CPU and it lives in this process. It does not work at all when the CPU is in another process and you only have a JSON pipe to it.

The fix was to put an interface between the TUI and the CPU it was driving:

type Source interface {
    Step() error
    StepOver() error
    StepOut() error
    Continue() error
    Reset() error
    SetBreakpoint(addr uint16) error
    ClearBreakpoint(addr uint16) error
    // ...
}

(The real interface is a bit bigger and has more methods for breakpoint syncing, source-line resolution, and live state read-back. I’m trimming it down here for the post.)

A LocalSource wraps a *cpu.CPU directly and does what the TUI used to do. A RemoteSource wraps a *dap.Client and translates the same method calls into DAP requests. The TUI doesn’t know — and doesn’t care — which one it has. It stores them behind the same Source field and calls the same methods.

The thing this interface is also hiding is the mirror. The TUI’s display panels still want a cpu.CPU and a cpu.RAM to read from. A LocalSource returns the real ones. A RemoteSource keeps shadow copies — a cpu.CPU populated from the registers/flags it pulls over variables, a cpu.RAM populated from readMemory calls — and refreshes them when the remote stops. The panels never see the wire. They see the mirror.

This is the lever I’ll pull at v1.5 to migrate the Registers panel to read through DAP even in local mode (an in-process DAP server attached to the same CPU, zero-marshal transport, ~90× faster than the unix-socket case). For now, in v1.1.0, the only thing it does is let chippy -dap-attach HOST:PORT open the chippy TUI against a remote target. The remote target, at this point, is the same chippy binary running with -dap :14785. That’ll change in twenty-four hours.

Thread 3 — VariantNES: the NES CPU is just NMOS minus BCD

Here is the chunk of v1.1.0 I am the most quietly pleased about, because the work was almost zero and the consequence was enormous.

The CPU in the original NES is a Ricoh 2A03. It’s a 6502 with one specific change: the ADC and SBC instructions don’t honor the decimal-mode flag. Nintendo had Ricoh disable the BCD logic — the rumor is licensing, although the truth is probably “we didn’t need it, why pay for it” — and shipped what is otherwise a stock NMOS 6502. Every other opcode, every other addressing mode, every other flag behavior is identical to the chip that ran the Apple I.

So adding a new CPU variant for the NES was, in code, a six-line file:

// VariantNES is the Ricoh 2A03 — NMOS 6502 with BCD disabled in ADC/SBC.
var VariantNES = &Variant{
    Name:   "NES",
    Init: func(c *CPU) {
        c.opcodes = OpcodesNMOS  // shared table with NMOS
        c.bcdDisabled = true     // ADC/SBC ignore D flag
    },
}

The bcdDisabled flag is a single branch inside opADC / opSBC. The opcode table is shared with NMOS — no copy, no override — because every other opcode is identical. The variant struct is a pointer (D1 in the v1.0 ADR: variant-based dispatch via a per-CPU opcode-table pointer), so picking up NES is a one-line construction:

c := cpu.NewVariant(bus, cpu.VariantNES)

The consequence: chippy’s 6502 core can now run NES code. Not the NES, not yet — there’s no PPU, no APU, no cart, no controller — but the CPU half of the problem is solved. From here, the work to build a real NES emulator is everything except the CPU, which is a much shorter list than “everything including the CPU.”

That observation is what unlocked nessy. I’m getting ahead of myself; the nessy story really lives in the v1.2 post. But the v1.1 release notes hand-wave at it — there’s a cmd/nessy directory that lands in this same PR chain, with an Ebiten game loop and a DAP listener and three hand-rolled ca65 demos that draw “HELLO NESSY” on the screen and bounce a tile around inside a vblank handler. The headline of v1.1.0 is the DAP server. The seed of v1.1.0 is the NES variant.

(There’s also one small caveat to put on a sticky note for later: the per-cycle CPU↔PPU interleave path that lands in v1.2.0 is gated on VariantNES, which means it inherits BCD-disabled. The Tom Harte bus-trace harness later wants per-cycle behavior on VariantNMOS with BCD intact, which is going to force the per-cycle path to be toggleable on NMOS too. That’s a v1.5 problem. Logged.)

Thread 4 — the bus ticker that costs nothing when it’s not there

The fourth thing v1.1.0 landed is a one-method interface that looks like it doesn’t matter and turns out to be the substrate every later cycle-accuracy improvement runs on.

type Ticker interface {
    Tick(cycles int)
}

That’s it. That is the whole interface.

The intent: peripherals that need to advance per CPU cycle (an NES PPU running at 3 PPU dots per CPU cycle, an APU running its frame counter, a cart with a scanline IRQ) implement Ticker. The bus checks if its peripheral is also a Ticker at SetBus time and caches the type assertion in a struct field. Step then checks one field — a nil in the case where no ticker is wired, a function-call-shaped value otherwise — instead of doing a per-call type-assertion that the optimizer can’t elide.

This is one of those choices that sounds like premature optimization until you remember the constraint chippy is designed to honor: the bare debugger CPU stays single-digit nanoseconds per step. The CI has a perfgate that fails the build if a refactor regresses the inner loop by more than a few percent. The ticker has to be free when it’s not there, or every TUI-only consumer pays a tax for the existence of a feature they don’t use.

The thing this hook unlocks is more important than the hook itself. In v1.2.0 the per-cycle CPU↔PPU interleave — the whole Mesen2-aligned master-clock model — is implemented as code that runs inside Step after the Ticker check. In v1.5.0 the optional cpu.SetAccessHook(func(addr, kind)) for the debugger heatmap is layered on the same “one nil-check per access, free when unused” idea. The pattern in both cases is the same: have the bare core expose the hook, but cost the consumer nothing when the hook is unset.

I think this is the single most reusable design rule I picked up writing chippy. Optional behavior costs zero when absent. That’s the contract. Everything else flows from it.

v1.1.0 in five lines

If you want the actual changelog without the philosophy:

  • DAP server + DAP client library, transport-agnostic over io.Reader/io.Writer.
  • Source interface so the TUI runs against either a local CPU or a remote DAP target.
  • cpu.VariantNES (Ricoh 2A03: NMOS minus decimal-mode BCD).
  • cpu.Ticker interface + cached SetBus type-assertion, zero cost when unused.
  • A cmd/nessy scaffold (Ebiten game loop, NROM-only mapper, three hand-rolled ca65 demos) that runs entirely on the chippy core’s CPU and DAP server.

And one quiet decision in the release pipeline: the monorepo grew a chippy + nessy release split that lets a tag push cut binaries for one tool without disturbing the other. I’ll come back to why the split happened the way it did when I write the v1.2 carve-out, but the seam is already here.

Twenty-four hours later: v1.1.1

This is the part where the protocol got tested by reality.

The v1.1.0 launcher (chippy -nessy ROM) was supposed to do one thing: spawn nessy in the background, dial its DAP listener, and open the TUI in attach mode paused at the reset vector. Five integration tests passed locally. The thing that actually happened the first time I ran it on a fresh machine was that the TUI opened on a 64 KiB swath of BRK instructions, the disassembly panel showed 00 00 00 00…, the source view was blank, and the program counter was happily wandering through what looked like virgin RAM.

Three separate bugs were colliding.

Bug 1 — both processes were stepping the CPU

When the v1.1.0 nessy binary launched, it started its game loop, which ticked the CPU at ~29830 cycles per frame. When the chippy TUI attached over DAP and the user hit r to run, the DAP server’s run loop also started stepping the CPU. Both processes were now driving the same instance through different goroutines. The CPU mutex (cpuMu — the load-bearing mutex from Thread 1) prevented memory corruption, but it didn’t prevent the semantic race: the game loop was advancing the PC while the DAP server was trying to single-step it, and the user got a sort of nondeterministic chunky drift that looked from the outside like “step does the wrong thing about a third of the time.”

The fix is the ownership model that becomes the canonical concurrency contract for every later DAP feature:

On DAP attach, the host pauses its own loop. The DAP server owns CPU execution from the moment the attach handshake completes until the disconnect. Continue / pause / step go through the server. The host’s game loop resumes only after disconnect (or the launcher’s terminate) restores it.

A two-line change in cmd/nessy/main.go flipped a dapAttached boolean inside the attach handler. The game loop checks it at the top of Update and bails. That’s it. The bug was conceptual, not mechanical — the mechanics were a one-line set, a one-line check.

This is the rule I now write down on a sticky note any time two processes share state: one of them owns it. You can hand ownership back and forth, but at any moment exactly one party is allowed to mutate. Locks aren’t enough; you also need a clear who-owns-execution model.

Bug 2 — the TUI’s RAM mirror was zero

The reason the disassembly looked like 64 KiB of BRK is that BRK is opcode 0x00. The TUI’s mirror RAM had been initialized to zeros when the TUI started, and v1.1.0 never refilled it from the remote target. There was no RefreshMemory call. The mirror sat at zero until the user manually scrolled past an address whose disassembly forced a readMemory round-trip — which, in a freshly-attached session, was approximately never.

The fix is mostly mechanical. RemoteSource.Attach now performs a one-time readMemory(0, 65536) after the handshake completes, populating the mirror in full. Subsequent step and continue calls do incremental re-reads of the regions the stop event flagged as modified.

There is a quieter subtlety inside this fix. The DAP readMemory handler can’t naively call bus.Read(addr) because some addresses are memory-mapped registers with read side effects — a $2002 read on the NES PPU clears the vblank flag; a $4015 read on the APU clears the frame IRQ. A debugger has no business triggering those. So readMemory routes through MMIO.Peek, a side-effect-free path that reads the underlying state without invoking the peripheral’s Read function. Peripherals that need to expose a peek route implement it explicitly. Most just return the RAM-shadowed value.

This is one of those distinctions (“read vs. peek”) that you read in a debugger spec and shrug at, and then you build a remote debugger against memory-mapped I/O and you understand why the distinction is load-bearing.

Bug 3 — .dbg source resolution

The third bug is the smallest and the most embarrassing. cc65’s .dbg file references source files by absolute path. The nessy binary had been built from /Users/nkane/dev/.../nessy/roms/demos/hello-bg/hello-bg.s, the chippy binary was running from a different working directory, and the TUI was trying to open the file at the literal absolute path baked into the .dbg. On my own machine this worked fine because the path happened to still be valid. On any other machine, including a CI runner, the source view was blank.

The fix is to try multiple resolution strategies in order: (1) the absolute path from .dbg verbatim; (2) the path relative to the .dbg file’s directory; (3) the path relative to the current working directory; (4) the path relative to the ROM’s directory. First hit wins. None of them are clever. The fix is a tryPaths() function and fifteen lines of test data.

Bug 4 — the breakpoint sync I missed

There’s a fourth bug v1.1.1 fixed that I want to mention because it’s the kind of thing you don’t notice until you’re three hours into using your own tool:

When the user typed :bp $C123 to set a breakpoint, the TUI added it to its local set and the panels rendered it. But the remote server didn’t know about it. The next continue ran straight past $C123 and the user was left staring at the wrong frame, wondering why their breakpoint didn’t fire.

The fix is to forward the breakpoint set to the remote on every mutation: :bp sets it, :run (which the user hits before continue) syncs the full set as a clean state, :bd deletes. The DAP server then short-circuits the hit check on its side without needing a TUI round-trip. This becomes important when conditional breakpoints land in v1.3.0 (the condition evaluates on the server, against fresh state), but the wiring was put in here.

chippy TUI: setting a breakpoint and running to it

What v1.1.1 actually means

If v1.1.0 is the protocol release, v1.1.1 is the the protocol is now usable across a process boundary release. The actual diff is small — maybe two hundred lines once you exclude tests — but every one of the four bugs is a case of “we shipped a thing that worked in tests but didn’t survive a single end-to-end run against a real second process.” That’s a category of bug I want to be slightly paranoid about for the rest of this project.

Specifically, the rule I now hold: for any feature that involves two processes, the first integration test must be a real exec.Command of one binary attaching to another over a real socket, and assert state on both sides. Mock pipes are fine for unit tests. They lie about exactly the failure modes you want to catch.

cmd/nessy/wait_for_debugger_test.go is that test now. It spawns a real nessy binary, dials its TCP listener, attaches a real DAP client, and asserts that the PC is at the reset vector — the exact sequence of events that the -wait-for-debugger boot gate guarantees. If anything in the launcher chain breaks, that test goes red.

What v1.1 sets up

Three things in this release are bigger than they look.

The DAP server is now the extension seam. Every later debugger feature lands as a new DAP request or event — conditional breakpoints in v1.3, a custom-request escape hatch in v1.4, an in-process zero-marshal transport plus a live-state streaming event in v1.5, dirty-region memory streaming in v1.6. The protocol was the right place to invest.

The TUI is no longer the only frontend. It is a frontend. The same Source interface that lets the TUI attach over a wire is the lever I’ll pull to migrate panels to read through DAP even in local mode. By v1.5, the Registers panel renders from Source.Registers() over an in-process DAP server, and I have a working prototype of the v2.0 architecture: TUI as a DAP client, full stop, no special-case for local execution.

The NES variant is a doorway. v1.2 will carve nessy out into its own repository. v1.5 will introduce host debug hooks so nessy can layer NES-aware breakpoints (scanline == 30), step granularities (run-to-NMI), and custom debug-state channels (PPU/OAM/mapper inspectors) on top of chippy’s protocol without forking it. That whole arc starts with the one-line VariantNES decision in this release.

Next week: v1.2.0. The library gets promoted out of internal/. The per-cycle CPU↔PPU interleave lands so every Blargg accuracy ROM passes. And nessy moves out of the monorepo and into its own repository.

The 6502 is the easy part. The work around it has its own gravity.