# ADR-0008: Testing approach ## Status Accepted ## Context The project's working standards (`CLAUDE.md`, plus the user's global testing rules) require: - Test coverage established before changes. - Bugs reproduced with failing tests before fixes. - Integration tests that exercise the full stack a user touches. - "All green, zero skips" as the only acceptable end state. A TUI application needs a credible testing story for those rules to be enforceable. Naïvely, "TUI testing" sounds like an afterthought of manual screenshots. In practice, the Rust + Ratatui ecosystem supports automation across every level that matters, and we want to commit to that explicitly rather than discover it ad-hoc. ## Decision Testing is structured in four tiers. Tiers 1–3 run on every commit in CI; tier 4 runs on every commit for a focused subset of critical flows and on a nightly schedule for broader coverage. ### Tier 1 — Pure-logic unit tests Standard `cargo test` against modules with no terminal dependency: - DSL parser and command dispatcher - Type mapping (user-facing → SQLite STRICT types) - Project I/O (`project.yaml` + `data/.csv` round-tripping) - Database rebuild from authoritative text sources - Snapshot ring buffer logic (ADR-0006) - Replay log writer/reader These are the bulk of the test count. Every behavioural unit has unit tests; modules with non-trivial logic also have property tests via `proptest` where the input space justifies it (parser inputs, type coercion, CSV escaping, etc.). ### Tier 2 — Render assertions via Ratatui `TestBackend` Ratatui's in-tree `TestBackend` renders into a `Buffer` (a 2D cell grid). Tests build an app state, render a frame, and assert on the resulting buffer. - **Cell-level assertions** for narrow tests ("the status bar shows mode label `Simple` in the expected style"). - **Snapshot tests** via `insta` for whole-frame coverage of representative views (default screen, query result table, schema view, undo confirmation prompt). Snapshots are checked in and reviewed on diff. Snapshot discipline: - Snapshots cover stable, intentional UI surfaces; they are not added reflexively to every component. - A snapshot diff is treated as a real review item, not a rubber-stamp. Reviewers must confirm the change is intended. ### Tier 3 — Synthetic event-loop integration tests The application's update function consumes `crossterm::event::Event` values. Tests feed sequences of synthetic events (`KeyEvent`, `MouseEvent`, resize) to the update function, then render via `TestBackend` and assert on both state and buffer. This tier is the equivalent of `react-testing-library` for our TUI: it exercises the full input → state → render path without a real terminal, and is where the most valuable behavioural tests live. Examples: - Typing a DSL command in simple mode, submitting with Ctrl-Enter, asserting the table list updates and the schema view re-renders. - Triggering `undo`, asserting the confirmation prompt appears with the expected timestamp and change summary, confirming, asserting state restoration. - Switching modes and verifying the prompt label and border colour change. ### Tier 4 — PTY-based black-box end-to-end A small number of critical flows are exercised against the **actual built binary** in a pseudo-terminal: - Tooling: `portable-pty` for the PTY, `expectrl` for expect-style scripting, `vt100` to parse the terminal output stream into an inspectable cell grid. - These tests catch issues the lower tiers miss: TTY setup, signal handling, terminal mode transitions, real I/O timing. Tier 4 is **reserved for the highest-value flows**, not blanket coverage. The initial scope is: - Cold launch → first DDL command → graceful quit. - Project save → process restart → reopen → identical state. - Project export → import in a fresh project → rebuilt database matches the source. - `undo` immediately after a `DROP TABLE`, including the confirmation prompt. Tier 4 tests run in CI on every commit (the focused list above) and on a nightly schedule for any extended coverage. ## Tooling commitments - `cargo test` — Tier 1, Tier 2, Tier 3. - `proptest` — property-based testing for parser and conversion layers. - Ratatui's `TestBackend` — frame rendering for tests. - `insta` — snapshot testing of rendered buffers. - `portable-pty`, `expectrl`, `vt100` — Tier 4 PTY-based tests. - CI matrix covers Linux, macOS, and Windows on stable Rust. ## Honest limits - **No cross-terminal-emulator regression coverage.** Tier 4 exercises a PTY but not real terminal emulators (xterm, Alacritty, Windows Terminal, etc.). Crossterm abstracts these well in practice; if a real-emulator regression ever surfaces, we will revisit. - **No visual aesthetic checks.** Tests assert cell contents and styles, not "this layout is pretty". Visual polish is reviewed manually by humans. - **Snapshot brittleness** is a known failure mode. We mitigate by being selective about what gets a snapshot and by treating snapshot diffs as real review items. ## Consequences - Test discipline from `CLAUDE.md` is enforceable on every layer: parser bugs caught at Tier 1, UI flow bugs caught at Tier 3, real-binary regressions caught at Tier 4. - Module boundaries are designed for testability from the start (pure logic modules separate from rendering, an explicit update function consuming events). - CI cost is real but bounded: Tiers 1–3 are fast; Tier 4 is the only slow tier and is kept narrow. - Adding a feature implies adding tests at the appropriate tier (or tiers); coverage is not retrofitted later.