Files
rdbms-playground/docs/adr/0008-testing-approach.md
claude@clouddev1 3a0c03d781 Initial planning docs: CLAUDE.md and ADRs 0000-0008
Captures up-front design decisions for RDBMS Playground:
stack (Rust + Ratatui + SQLite), input modes, project file
format, type vocabulary, undo snapshots and replay log,
sharing/export, and testing approach. ADR-0000 establishes
the ADR practice itself and mandates index upkeep alongside
any ADR change.
2026-05-07 09:27:31 +00:00

149 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0008: Testing approach
## Status
Accepted
## Context
The project's working standards (`CLAUDE.md`, plus the user's
global testing rules) require:
- Test coverage established before changes.
- Bugs reproduced with failing tests before fixes.
- Integration tests that exercise the full stack a user touches.
- "All green, zero skips" as the only acceptable end state.
A TUI application needs a credible testing story for those rules
to be enforceable. Naïvely, "TUI testing" sounds like an
afterthought of manual screenshots. In practice, the Rust +
Ratatui ecosystem supports automation across every level that
matters, and we want to commit to that explicitly rather than
discover it ad-hoc.
## Decision
Testing is structured in four tiers. Tiers 13 run on every
commit in CI; tier 4 runs on every commit for a focused subset of
critical flows and on a nightly schedule for broader coverage.
### Tier 1 — Pure-logic unit tests
Standard `cargo test` against modules with no terminal
dependency:
- DSL parser and command dispatcher
- Type mapping (user-facing → SQLite STRICT types)
- Project I/O (`project.yaml` + `data/<table>.csv` round-tripping)
- Database rebuild from authoritative text sources
- Snapshot ring buffer logic (ADR-0006)
- Replay log writer/reader
These are the bulk of the test count. Every behavioural unit has
unit tests; modules with non-trivial logic also have property
tests via `proptest` where the input space justifies it (parser
inputs, type coercion, CSV escaping, etc.).
### Tier 2 — Render assertions via Ratatui `TestBackend`
Ratatui's in-tree `TestBackend` renders into a `Buffer` (a 2D
cell grid). Tests build an app state, render a frame, and assert
on the resulting buffer.
- **Cell-level assertions** for narrow tests
("the status bar shows mode label `Simple` in the expected
style").
- **Snapshot tests** via `insta` for whole-frame coverage of
representative views (default screen, query result table,
schema view, undo confirmation prompt). Snapshots are checked
in and reviewed on diff.
Snapshot discipline:
- Snapshots cover stable, intentional UI surfaces; they are not
added reflexively to every component.
- A snapshot diff is treated as a real review item, not a
rubber-stamp. Reviewers must confirm the change is intended.
### Tier 3 — Synthetic event-loop integration tests
The application's update function consumes
`crossterm::event::Event` values. Tests feed sequences of
synthetic events (`KeyEvent`, `MouseEvent`, resize) to the update
function, then render via `TestBackend` and assert on both
state and buffer.
This tier is the equivalent of `react-testing-library` for our
TUI: it exercises the full input → state → render path without a
real terminal, and is where the most valuable behavioural tests
live. Examples:
- Typing a DSL command in simple mode, submitting with
Ctrl-Enter, asserting the table list updates and the schema
view re-renders.
- Triggering `undo`, asserting the confirmation prompt appears
with the expected timestamp and change summary, confirming,
asserting state restoration.
- Switching modes and verifying the prompt label and border
colour change.
### Tier 4 — PTY-based black-box end-to-end
A small number of critical flows are exercised against the
**actual built binary** in a pseudo-terminal:
- Tooling: `portable-pty` for the PTY, `expectrl` for
expect-style scripting, `vt100` to parse the terminal output
stream into an inspectable cell grid.
- These tests catch issues the lower tiers miss: TTY setup,
signal handling, terminal mode transitions, real I/O timing.
Tier 4 is **reserved for the highest-value flows**, not blanket
coverage. The initial scope is:
- Cold launch → first DDL command → graceful quit.
- Project save → process restart → reopen → identical state.
- Project export → import in a fresh project → rebuilt database
matches the source.
- `undo` immediately after a `DROP TABLE`, including the
confirmation prompt.
Tier 4 tests run in CI on every commit (the focused list above)
and on a nightly schedule for any extended coverage.
## Tooling commitments
- `cargo test` — Tier 1, Tier 2, Tier 3.
- `proptest` — property-based testing for parser and conversion
layers.
- Ratatui's `TestBackend` — frame rendering for tests.
- `insta` — snapshot testing of rendered buffers.
- `portable-pty`, `expectrl`, `vt100` — Tier 4 PTY-based tests.
- CI matrix covers Linux, macOS, and Windows on stable Rust.
## Honest limits
- **No cross-terminal-emulator regression coverage.** Tier 4
exercises a PTY but not real terminal emulators (xterm,
Alacritty, Windows Terminal, etc.). Crossterm abstracts these
well in practice; if a real-emulator regression ever surfaces,
we will revisit.
- **No visual aesthetic checks.** Tests assert cell contents and
styles, not "this layout is pretty". Visual polish is reviewed
manually by humans.
- **Snapshot brittleness** is a known failure mode. We mitigate
by being selective about what gets a snapshot and by treating
snapshot diffs as real review items.
## Consequences
- Test discipline from `CLAUDE.md` is enforceable on every
layer: parser bugs caught at Tier 1, UI flow bugs caught at
Tier 3, real-binary regressions caught at Tier 4.
- Module boundaries are designed for testability from the start
(pure logic modules separate from rendering, an explicit update
function consuming events).
- CI cost is real but bounded: Tiers 13 are fast; Tier 4 is the
only slow tier and is kept narrow.
- Adding a feature implies adding tests at the appropriate tier
(or tiers); coverage is not retrofitted later.