# ADR-0006: Undo snapshots and replay log ## Status Accepted ## Context Two related features address the same underlying need — making the application safe and reproducible for learners: 1. **Accidental destruction.** A student typing `DROP TABLE Customers` and then realising what they did is a near-certain event in this audience. Without a recovery path, the experience is hostile and the learning moment is lost to panic. 2. **Replay and scripting.** A persistent record of every executed command is useful for tutorials, debugging, sharing reproducible problem reports, and rebuilding a project from a blank slate. Both features are cheap to implement and high-leverage. ## Decision ### Undo snapshots Before any destructive operation — `DROP`, `DELETE`, `TRUNCATE`, schema-rebuild migrations, restore, etc. — the application takes a snapshot of the database using SQLite's online backup API into a ring buffer of recent snapshots (size to be tuned; initial target N = 10). An `undo` command (available in both modes as an app-level command per ADR-0003 — no sigil) restores the most recent snapshot. Each undo step is itself snapshotted to keep `redo` possible. **Undo requires confirmation.** Snapshots are taken only before destructive operations, so the "current" state may include non-destructive work (inserts, updates, schema additions) done since the last snapshot. Restoring a snapshot therefore *can* discard data the user has not been explicitly warned about. Before restoring, the application displays a confirmation prompt that includes: - The snapshot's timestamp (local time, with a relative form such as "12 minutes ago"). - A short summary of the operation that triggered the snapshot (the command text, e.g. `DROP TABLE Customers`). - A summary of changes that will be discarded if the undo proceeds — at minimum, counts of rows added/modified/deleted per table and any schema changes since the snapshot. The user must explicitly confirm. A keyboard shortcut for "confirm" is provided so power users are not slowed down, but there is no flag to suppress the prompt — undo is rare enough, and consequential enough, that the prompt is always shown. ### Replay log (`history.log`) Every successfully executed command — DSL or SQL — is appended to `history.log` in the project directory, one command per record, with a timestamp and the resulting status. The format is deliberately simple and human-readable so it can be hand-edited and replayed. The same format serves three purposes: - A persistent input history surfaced via the TUI history feature. - A scripting format: `.commands` files (or `history.log` itself) can be replayed via a `replay` command. - A reproducible bug-report artifact when a project is shared. The log is append-only during a session. It is **not** the authoritative state of the project (that lives in `project.yaml` + `data/`, ADR-0004) — it is an audit and replay trail. ## Consequences - Snapshots add modest overhead per destructive operation. The cost is bounded; learners care about safety far more than microseconds. - The ring buffer size must be tuned later based on realistic database sizes; an initial value is fine for now. - The replay log enables a future `replay` / scripting feature with no additional storage commitment. - Tutorial authors gain a natural "starter script" format for exercises. ## Amendment 1 — Single-step undo: every-mutation snapshots, hybrid storage, batch granularity (2026-05-24) The replay/journal half of this ADR (U3/U4) shipped via ADR-0034. This amendment settles the **undo/snapshot half (U1/U2)** before implementation, and **supersedes the original Decision's "snapshots only before destructive operations" model** and its confirmation rationale. Written with explicit user approval; the implementation plan is `docs/plans/20260524-adr-0006-undo-snapshots.md`. **Implemented 2026-05-24** (see the Implementation note at the end of this amendment). ### Snapshot scope — every mutation (single-step undo) The original Decision snapshots "before any destructive operation — `DROP`, `DELETE`, `TRUNCATE`, schema-rebuild migrations, restore" and explicitly treats inserts/updates/schema-additions as *non-destructive work between snapshots*. That is **replaced**: a snapshot is taken before **every** data/schema mutation — insert, update, delete, drop, all DDL, and all SQL DML. Undo therefore behaves like a familiar single-step "undo my last command" (Ctrl-Z), which is the right model for this teaching environment (clarity over micro-optimisation, per the project's "pedagogy wins ties" posture). A consequence is that the original confirmation clause — "counts of rows added/modified/deleted per table and any schema changes since the snapshot" — **collapses**: with a snapshot per command there is no intervening un-snapshotted work, so undo rolls back exactly one command and the confirmation simply **names that command**. No db-diff machinery is needed. ### Confirmation rationale The original justified the always-on prompt by "undo is rare and consequential." Under single-step undo, undo is *more frequent*, but the prompt is **kept** anyway, now justified by "the prompt names the exact command being undone." There is still no flag to suppress it. `undo` and `redo` each confirm (`Y` confirms; `N`/`Esc` cancels), mirroring the existing `rebuild` modal. The redo prompt names the command that will be re-applied. ### Snapshot mechanism — hybrid db + text (reconciles ADR-0015) The original specifies SQLite's online backup API. Since then, ADR-0015 made `playground.db` a *derived* artifact with `project.yaml` + `data/*.csv` as the authoritative source, committed last for crash recovery. A db-only restore is therefore no longer sufficient on its own. The agreed mechanism is a **hybrid whole-project snapshot**: - the database is copied via the **online backup API** (honouring this ADR; it is also the only safe way to copy a live database), **and** - `project.yaml` + `data/*.csv` are copied as inert files. Undo **restores all three directly** — no rebuild, no re-derivation — re-establishing a consistent `(db, yaml, csv)` triple. This satisfies both this ADR (the backup API *is* used) and ADR-0015 (text remains authoritative). The snapshot is staged *before* the mutation's transaction and finalised into the ring *after* the database commit, preserving ADR-0015 §6's commit-db-last ordering; a rolled-back operation leaves no snapshot. ### Storage and lifetime — persisted ring, N = 50 Snapshots are **persisted on disk** under the project in a `.snapshots/` directory and survive quit (undo works after reopening). The ring keeps the most recent **N = 50** snapshots (the original's N = 10 is raised, since single-step undo means N counts *commands*; still a single tunable constant), evicting the oldest on overflow. `.snapshots/` is added to the `.gitignore` template, **excluded from `export`** (like `playground.db` and `history.log`), and on the temp-project cleanup allowlist so an otherwise-empty temp carrying a snapshots directory remains safely deletable. ### Redo `redo` is supported (as the original states). New semantics are pinned: **the redo stack is discarded on any new mutation** (standard linear undo/redo). Each undo pushes the pre-undo state so redo can restore it. ### Batch operations — one undo step; `import` excluded A single user command that runs many sub-operations — `replay` today, and any future in-project batch command — records **one** boundary snapshot for the whole batch (not one per sub-command), via a Begin/EndBatch worker primitive that suppresses per-command staging and finalises a single ring entry only if ≥1 mutation actually ran. This is a performance win (a long `history.log` replay is one database copy, not N) and the consistent reading of "one undo step per user command." `import` is **outside** the undo model entirely: per ADR-0015 §11 it creates a *new* project and switches to it, leaving the current project untouched on disk, so there is nothing to snapshot and it takes no undo step (the new project simply starts with an empty ring). Project-switch navigation undo ("go back to the previous project") is a separate, out-of-scope mechanism — the prior project is intact and reachable via `load` / `--resume`. ### Disable switch A `--no-undo` CLI flag turns snapshotting off entirely (zero per-command overhead), as a hardware escape hatch should per-command snapshots prove too heavy. When set, `undo` / `redo` report that undo is turned off. CLI-only for v1 (no in-app toggle). ### Consequences - Per-mutation snapshotting costs one database backup + a text copy per command; a bulk paste of N inserts makes N snapshots. Bounded by the N = 50 ring and the `--no-undo` escape hatch; the ADR-0015 "batch" command remains the future remedy, and a hardlink/copy-on-write dedup of unchanged files between consecutive snapshots is a possible future optimisation (not v1). - 50 × (database + text) of persisted snapshots can reach tens of MB for larger projects — an accepted, bounded cost. - `Cargo.toml` gains the `backup` feature on `rusqlite`. - The Phase-3 N/A matrix row ("auto-snapshot fires for SQL DML the same as DSL") becomes non-vacuous: the snapshot hook lives in the worker dispatch and covers DSL and SQL mutations uniformly. ### Implementation note (2026-05-24) Shipped across §8 steps 1–7 of the plan. Details and decisions made during implementation, user-confirmed where they extended the design: - **Ring storage** is `src/undo.rs` (`SnapshotStore`): per-snapshot payload dirs under `/.snapshots//` plus an **`index.yaml`** manifest (YAML reuses the existing `serde_yml` dependency — no `serde_json` added). Monotonic ids, reconciled against on-disk dirs so a crash can't reuse one; `cleanup()` on open sweeps `.staging/` and orphan payloads. - **Worker hook**: `snapshot_then` brackets all 19 mutating dispatch arms in `src/db.rs` (stage → run → finalise/discard); restore (undo/redo) runs in `worker_loop` with `&mut Connection`. Snapshots are **gated on a user command `source`** — internal operations that pass `source = None` (notably the open-time rebuild when `.db` is missing) are not recorded, so `rebuild` is undoable as a user command but opening a project never creates a spurious entry. - **Snapshot-failure policy** (user-confirmed): staging / finalise / discard failures are **non-fatal** (logged) — the real persistence is the durable state and a backup hiccup must not fail the user's work. Only *restore* failures surface (as `UndoFailed`). A `/runda` review found that this policy left a **data-loss edge**: a committed mutation whose snapshot could not be staged added no undo entry and did not clear the redo stack (clearing was a side effect of `finalize`), so a later `redo` could silently discard the new work. Fixed: any committed user mutation (and any batch with ≥1 committed mutation) now **clears the redo stack even when its snapshot could not be staged**, via an explicit `SnapshotStore::clear_redo` (`src/db.rs` `snapshot_then` / `end_batch`). For the realistic failure (disk full), `clear_redo` — which deletes redo payloads and rewrites a tiny index — succeeds even when a full backup couldn't. **Residual edge** (accepted): if the *entire* `.snapshots/` directory is unwritable (so `clear_redo` itself fails), a stale redo can survive; but that state means the whole undo subsystem is broken, which the user would already observe. Regression-tested in `tests/undo_snapshots.rs::redo_is_cleared_when_new_work_commits_without_a_snapshot`. - **Batch** uses `BeginBatch`/`EndBatch` worker requests; `replay` wraps its loop so a multi-command replay is one undo step, finalised only if a mutation committed. - **Testing**: `src/undo.rs` Tier-1 (ring/redo/eviction/restore), `tests/undo_snapshots.rs` Tier-3 (worker, DSL+SQL, db+csv consistency, persistence across reopen), plus App-level Tier-1 (dispatch/modal) and parse/replay-filter tests. The thin runtime glue (`spawn_prepare_undo` / `spawn_undo` + Action arms) is not loop-tested — the same accepted gap recorded for ADR-0034, with the App side and worker side each tested. No Tier-2 insta render test was added for the confirmation modal: the existing modals (`rebuild` / path / load) are tested at the state level only, and the undo modal matches that.