Initial planning docs: CLAUDE.md and ADRs 0000-0008

Captures up-front design decisions for RDBMS Playground: stack (Rust + Ratatui + SQLite), input modes, project file format, type vocabulary, undo snapshots and replay log, sharing/export, and testing approach. ADR-0000 establishes the ADR practice itself and mandates index upkeep alongside any ADR change.
2026-05-07 09:27:31 +00:00
commit 3a0c03d781
11 changed files with 785 additions and 0 deletions
@@ -0,0 +1,149 @@
+# ADR-0008: Testing approach
+
+## Status
+
+Accepted
+
+## Context
+
+The project's working standards (`CLAUDE.md`, plus the user's
+global testing rules) require:
+
+- Test coverage established before changes.
+- Bugs reproduced with failing tests before fixes.
+- Integration tests that exercise the full stack a user touches.
+- "All green, zero skips" as the only acceptable end state.
+
+A TUI application needs a credible testing story for those rules
+to be enforceable. Naïvely, "TUI testing" sounds like an
+afterthought of manual screenshots. In practice, the Rust +
+Ratatui ecosystem supports automation across every level that
+matters, and we want to commit to that explicitly rather than
+discover it ad-hoc.
+
+## Decision
+
+Testing is structured in four tiers. Tiers 1–3 run on every
+commit in CI; tier 4 runs on every commit for a focused subset of
+critical flows and on a nightly schedule for broader coverage.
+
+### Tier 1 — Pure-logic unit tests
+
+Standard `cargo test` against modules with no terminal
+dependency:
+
+- DSL parser and command dispatcher
+- Type mapping (user-facing → SQLite STRICT types)
+- Project I/O (`project.yaml` + `data/<table>.csv` round-tripping)
+- Database rebuild from authoritative text sources
+- Snapshot ring buffer logic (ADR-0006)
+- Replay log writer/reader
+
+These are the bulk of the test count. Every behavioural unit has
+unit tests; modules with non-trivial logic also have property
+tests via `proptest` where the input space justifies it (parser
+inputs, type coercion, CSV escaping, etc.).
+
+### Tier 2 — Render assertions via Ratatui `TestBackend`
+
+Ratatui's in-tree `TestBackend` renders into a `Buffer` (a 2D
+cell grid). Tests build an app state, render a frame, and assert
+on the resulting buffer.
+
+- **Cell-level assertions** for narrow tests
+  ("the status bar shows mode label `Simple` in the expected
+  style").
+- **Snapshot tests** via `insta` for whole-frame coverage of
+  representative views (default screen, query result table,
+  schema view, undo confirmation prompt). Snapshots are checked
+  in and reviewed on diff.
+
+Snapshot discipline:
+
+- Snapshots cover stable, intentional UI surfaces; they are not
+  added reflexively to every component.
+- A snapshot diff is treated as a real review item, not a
+  rubber-stamp. Reviewers must confirm the change is intended.
+
+### Tier 3 — Synthetic event-loop integration tests
+
+The application's update function consumes
+`crossterm::event::Event` values. Tests feed sequences of
+synthetic events (`KeyEvent`, `MouseEvent`, resize) to the update
+function, then render via `TestBackend` and assert on both
+state and buffer.
+
+This tier is the equivalent of `react-testing-library` for our
+TUI: it exercises the full input → state → render path without a
+real terminal, and is where the most valuable behavioural tests
+live. Examples:
+
+- Typing a DSL command in simple mode, submitting with
+  Ctrl-Enter, asserting the table list updates and the schema
+  view re-renders.
+- Triggering `undo`, asserting the confirmation prompt appears
+  with the expected timestamp and change summary, confirming,
+  asserting state restoration.
+- Switching modes and verifying the prompt label and border
+  colour change.
+
+### Tier 4 — PTY-based black-box end-to-end
+
+A small number of critical flows are exercised against the
+**actual built binary** in a pseudo-terminal:
+
+- Tooling: `portable-pty` for the PTY, `expectrl` for
+  expect-style scripting, `vt100` to parse the terminal output
+  stream into an inspectable cell grid.
+- These tests catch issues the lower tiers miss: TTY setup,
+  signal handling, terminal mode transitions, real I/O timing.
+
+Tier 4 is **reserved for the highest-value flows**, not blanket
+coverage. The initial scope is:
+
+- Cold launch → first DDL command → graceful quit.
+- Project save → process restart → reopen → identical state.
+- Project export → import in a fresh project → rebuilt database
+  matches the source.
+- `undo` immediately after a `DROP TABLE`, including the
+  confirmation prompt.
+
+Tier 4 tests run in CI on every commit (the focused list above)
+and on a nightly schedule for any extended coverage.
+
+## Tooling commitments
+
+- `cargo test` — Tier 1, Tier 2, Tier 3.
+- `proptest` — property-based testing for parser and conversion
+  layers.
+- Ratatui's `TestBackend` — frame rendering for tests.
+- `insta` — snapshot testing of rendered buffers.
+- `portable-pty`, `expectrl`, `vt100` — Tier 4 PTY-based tests.
+- CI matrix covers Linux, macOS, and Windows on stable Rust.
+
+## Honest limits
+
+- **No cross-terminal-emulator regression coverage.** Tier 4
+  exercises a PTY but not real terminal emulators (xterm,
+  Alacritty, Windows Terminal, etc.). Crossterm abstracts these
+  well in practice; if a real-emulator regression ever surfaces,
+  we will revisit.
+- **No visual aesthetic checks.** Tests assert cell contents and
+  styles, not "this layout is pretty". Visual polish is reviewed
+  manually by humans.
+- **Snapshot brittleness** is a known failure mode. We mitigate
+  by being selective about what gets a snapshot and by treating
+  snapshot diffs as real review items.
+
+## Consequences
+
+- Test discipline from `CLAUDE.md` is enforceable on every
+  layer: parser bugs caught at Tier 1, UI flow bugs caught at
+  Tier 3, real-binary regressions caught at Tier 4.
+- Module boundaries are designed for testability from the start
+  (pure logic modules separate from rendering, an explicit update
+  function consuming events).
+- CI cost is real but bounded: Tiers 1–3 are fast; Tier 4 is the
+  only slow tier and is kept narrow.
+- Adding a feature implies adding tests at the appropriate tier
+  (or tiers); coverage is not retrofitted later.