Files

T

claude@clouddev1 3a0c03d781 Initial planning docs: CLAUDE.md and ADRs 0000-0008

Captures up-front design decisions for RDBMS Playground:
stack (Rust + Ratatui + SQLite), input modes, project file
format, type vocabulary, undo snapshots and replay log,
sharing/export, and testing approach. ADR-0000 establishes
the ADR practice itself and mandates index upkeep alongside
any ADR change.

2026-05-07 09:27:31 +00:00

5.6 KiB

Raw Blame History

ADR-0008: Testing approach

Status

Accepted

Context

The project's working standards (CLAUDE.md, plus the user's global testing rules) require:

Test coverage established before changes.
Bugs reproduced with failing tests before fixes.
Integration tests that exercise the full stack a user touches.
"All green, zero skips" as the only acceptable end state.

A TUI application needs a credible testing story for those rules to be enforceable. Naïvely, "TUI testing" sounds like an afterthought of manual screenshots. In practice, the Rust + Ratatui ecosystem supports automation across every level that matters, and we want to commit to that explicitly rather than discover it ad-hoc.

Decision

Testing is structured in four tiers. Tiers 1–3 run on every commit in CI; tier 4 runs on every commit for a focused subset of critical flows and on a nightly schedule for broader coverage.

Tier 1 — Pure-logic unit tests

Standard cargo test against modules with no terminal dependency:

DSL parser and command dispatcher
Type mapping (user-facing → SQLite STRICT types)
Project I/O (project.yaml + data/<table>.csv round-tripping)
Database rebuild from authoritative text sources
Snapshot ring buffer logic (ADR-0006)
Replay log writer/reader

These are the bulk of the test count. Every behavioural unit has unit tests; modules with non-trivial logic also have property tests via proptest where the input space justifies it (parser inputs, type coercion, CSV escaping, etc.).

Tier 2 — Render assertions via Ratatui `TestBackend`

Ratatui's in-tree TestBackend renders into a Buffer (a 2D cell grid). Tests build an app state, render a frame, and assert on the resulting buffer.

Cell-level assertions for narrow tests ("the status bar shows mode label Simple in the expected style").
Snapshot tests via insta for whole-frame coverage of representative views (default screen, query result table, schema view, undo confirmation prompt). Snapshots are checked in and reviewed on diff.

Snapshot discipline:

Snapshots cover stable, intentional UI surfaces; they are not added reflexively to every component.
A snapshot diff is treated as a real review item, not a rubber-stamp. Reviewers must confirm the change is intended.

Tier 3 — Synthetic event-loop integration tests

The application's update function consumes crossterm::event::Event values. Tests feed sequences of synthetic events (KeyEvent, MouseEvent, resize) to the update function, then render via TestBackend and assert on both state and buffer.

This tier is the equivalent of react-testing-library for our TUI: it exercises the full input → state → render path without a real terminal, and is where the most valuable behavioural tests live. Examples:

Typing a DSL command in simple mode, submitting with Ctrl-Enter, asserting the table list updates and the schema view re-renders.
Triggering undo, asserting the confirmation prompt appears with the expected timestamp and change summary, confirming, asserting state restoration.
Switching modes and verifying the prompt label and border colour change.

Tier 4 — PTY-based black-box end-to-end

A small number of critical flows are exercised against the actual built binary in a pseudo-terminal:

Tooling: portable-pty for the PTY, expectrl for expect-style scripting, vt100 to parse the terminal output stream into an inspectable cell grid.
These tests catch issues the lower tiers miss: TTY setup, signal handling, terminal mode transitions, real I/O timing.

Tier 4 is reserved for the highest-value flows, not blanket coverage. The initial scope is:

Cold launch → first DDL command → graceful quit.
Project save → process restart → reopen → identical state.
Project export → import in a fresh project → rebuilt database matches the source.
undo immediately after a DROP TABLE, including the confirmation prompt.

Tier 4 tests run in CI on every commit (the focused list above) and on a nightly schedule for any extended coverage.

Tooling commitments

cargo test — Tier 1, Tier 2, Tier 3.
proptest — property-based testing for parser and conversion layers.
Ratatui's TestBackend — frame rendering for tests.
insta — snapshot testing of rendered buffers.
portable-pty, expectrl, vt100 — Tier 4 PTY-based tests.
CI matrix covers Linux, macOS, and Windows on stable Rust.

Honest limits

No cross-terminal-emulator regression coverage. Tier 4 exercises a PTY but not real terminal emulators (xterm, Alacritty, Windows Terminal, etc.). Crossterm abstracts these well in practice; if a real-emulator regression ever surfaces, we will revisit.
No visual aesthetic checks. Tests assert cell contents and styles, not "this layout is pretty". Visual polish is reviewed manually by humans.
Snapshot brittleness is a known failure mode. We mitigate by being selective about what gets a snapshot and by treating snapshot diffs as real review items.

Consequences

Test discipline from CLAUDE.md is enforceable on every layer: parser bugs caught at Tier 1, UI flow bugs caught at Tier 3, real-binary regressions caught at Tier 4.
Module boundaries are designed for testability from the start (pure logic modules separate from rendering, an explicit update function consuming events).
CI cost is real but bounded: Tiers 1–3 are fast; Tier 4 is the only slow tier and is kept narrow.
Adding a feature implies adding tests at the appropriate tier (or tiers); coverage is not retrofitted later.

5.6 KiB Raw Blame History Unescape Escape