Captures up-front design decisions for RDBMS Playground: stack (Rust + Ratatui + SQLite), input modes, project file format, type vocabulary, undo snapshots and replay log, sharing/export, and testing approach. ADR-0000 establishes the ADR practice itself and mandates index upkeep alongside any ADR change.
5.6 KiB
ADR-0008: Testing approach
Status
Accepted
Context
The project's working standards (CLAUDE.md, plus the user's
global testing rules) require:
- Test coverage established before changes.
- Bugs reproduced with failing tests before fixes.
- Integration tests that exercise the full stack a user touches.
- "All green, zero skips" as the only acceptable end state.
A TUI application needs a credible testing story for those rules to be enforceable. Naïvely, "TUI testing" sounds like an afterthought of manual screenshots. In practice, the Rust + Ratatui ecosystem supports automation across every level that matters, and we want to commit to that explicitly rather than discover it ad-hoc.
Decision
Testing is structured in four tiers. Tiers 1–3 run on every commit in CI; tier 4 runs on every commit for a focused subset of critical flows and on a nightly schedule for broader coverage.
Tier 1 — Pure-logic unit tests
Standard cargo test against modules with no terminal
dependency:
- DSL parser and command dispatcher
- Type mapping (user-facing → SQLite STRICT types)
- Project I/O (
project.yaml+data/<table>.csvround-tripping) - Database rebuild from authoritative text sources
- Snapshot ring buffer logic (ADR-0006)
- Replay log writer/reader
These are the bulk of the test count. Every behavioural unit has
unit tests; modules with non-trivial logic also have property
tests via proptest where the input space justifies it (parser
inputs, type coercion, CSV escaping, etc.).
Tier 2 — Render assertions via Ratatui TestBackend
Ratatui's in-tree TestBackend renders into a Buffer (a 2D
cell grid). Tests build an app state, render a frame, and assert
on the resulting buffer.
- Cell-level assertions for narrow tests
("the status bar shows mode label
Simplein the expected style"). - Snapshot tests via
instafor whole-frame coverage of representative views (default screen, query result table, schema view, undo confirmation prompt). Snapshots are checked in and reviewed on diff.
Snapshot discipline:
- Snapshots cover stable, intentional UI surfaces; they are not added reflexively to every component.
- A snapshot diff is treated as a real review item, not a rubber-stamp. Reviewers must confirm the change is intended.
Tier 3 — Synthetic event-loop integration tests
The application's update function consumes
crossterm::event::Event values. Tests feed sequences of
synthetic events (KeyEvent, MouseEvent, resize) to the update
function, then render via TestBackend and assert on both
state and buffer.
This tier is the equivalent of react-testing-library for our
TUI: it exercises the full input → state → render path without a
real terminal, and is where the most valuable behavioural tests
live. Examples:
- Typing a DSL command in simple mode, submitting with Ctrl-Enter, asserting the table list updates and the schema view re-renders.
- Triggering
undo, asserting the confirmation prompt appears with the expected timestamp and change summary, confirming, asserting state restoration. - Switching modes and verifying the prompt label and border colour change.
Tier 4 — PTY-based black-box end-to-end
A small number of critical flows are exercised against the actual built binary in a pseudo-terminal:
- Tooling:
portable-ptyfor the PTY,expectrlfor expect-style scripting,vt100to parse the terminal output stream into an inspectable cell grid. - These tests catch issues the lower tiers miss: TTY setup, signal handling, terminal mode transitions, real I/O timing.
Tier 4 is reserved for the highest-value flows, not blanket coverage. The initial scope is:
- Cold launch → first DDL command → graceful quit.
- Project save → process restart → reopen → identical state.
- Project export → import in a fresh project → rebuilt database matches the source.
undoimmediately after aDROP TABLE, including the confirmation prompt.
Tier 4 tests run in CI on every commit (the focused list above) and on a nightly schedule for any extended coverage.
Tooling commitments
cargo test— Tier 1, Tier 2, Tier 3.proptest— property-based testing for parser and conversion layers.- Ratatui's
TestBackend— frame rendering for tests. insta— snapshot testing of rendered buffers.portable-pty,expectrl,vt100— Tier 4 PTY-based tests.- CI matrix covers Linux, macOS, and Windows on stable Rust.
Honest limits
- No cross-terminal-emulator regression coverage. Tier 4 exercises a PTY but not real terminal emulators (xterm, Alacritty, Windows Terminal, etc.). Crossterm abstracts these well in practice; if a real-emulator regression ever surfaces, we will revisit.
- No visual aesthetic checks. Tests assert cell contents and styles, not "this layout is pretty". Visual polish is reviewed manually by humans.
- Snapshot brittleness is a known failure mode. We mitigate by being selective about what gets a snapshot and by treating snapshot diffs as real review items.
Consequences
- Test discipline from
CLAUDE.mdis enforceable on every layer: parser bugs caught at Tier 1, UI flow bugs caught at Tier 3, real-binary regressions caught at Tier 4. - Module boundaries are designed for testability from the start (pure logic modules separate from rendering, an explicit update function consuming events).
- CI cost is real but bounded: Tiers 1–3 are fast; Tier 4 is the only slow tier and is kept narrow.
- Adding a feature implies adding tests at the appropriate tier (or tiers); coverage is not retrofitted later.