diff --git a/docs/adr/0034-history-journal-and-replay-filter.md b/docs/adr/0034-history-journal-and-replay-filter.md new file mode 100644 index 0000000..bebfb7f --- /dev/null +++ b/docs/adr/0034-history-journal-and-replay-filter.md @@ -0,0 +1,228 @@ +# ADR-0034: `history.log` as a complete command journal; replay reads success-only + +## Status + +Accepted + +## Context + +`history.log` serves three stated purposes (ADR-0006 "Replay +log"; ADR-0015 §5/§12): + +1. **Persistent input history** — the in-memory Up/Down recall + ring is hydrated from `history.log` on project open + (ADR-0015 §12, "I2-persist"). +2. **Replay / scripting** — `replay ` re-executes a file + of commands; `history.log` itself was declared + "replay-compatible" (ADR-0015 §13 OOS-2; ADR-0006). +3. **Reproducible bug-report artifact** when a project is + shared. + +Tracing the implementation surfaced three problems that pull +against each other: + +### Problem 1 — failed commands are lost across sessions + +`history.log` records **only successful** commands. The writer +(`src/persistence/history.rs`) hardcodes the status field to +`ok` and never appends a failed command (ADR-0015 §5: "Status +is always `ok` in v1, because failed commands are not +recorded"). + +But the **in-memory** recall ring (`App.history`) records +**every** submission, success or failure — `App::submit` +pushes the trimmed line "regardless of whether it parses, so +users can recall and edit typo'd commands". So: + +- **Within a session**, Up/Down recalls a failed command (type + bad SQL, see it fail, fix the table, Up-arrow twice to bring + the failed command back and edit it). This works. +- **Across sessions**, the ring is re-seeded from + `history.log`, which dropped the failures — so the failed + command the user wanted to fix is gone after a restart. + +This asymmetry is the core defect. The recall surface should +behave the same whether the command came from this session's +in-memory ring or a previous session's hydrated log. + +(Note: ADR-0015 §12's claim that "new successful commands … are +pushed onto the in-memory stack as they are now" is itself +inaccurate about the code — the stack receives *all* +submissions, not only successful ones. This ADR's model makes +the persistent log match that existing in-memory behaviour.) + +### Problem 2 — the two consumers want different inputs + +- **Replay** wants the commands that *built* the current state + — the successful ones. Replaying a command that failed is + pointless (it re-fails, or worse, behaves differently against + a now-different schema). +- **Recall** wants *everything the user typed*, including + failures, so they can fix and resubmit. + +A single success-only log cannot satisfy both; serving recall +forces failures into the file, which then pollutes replay. The +status field — present in the line format precisely so "future +use cases can carry additional values without a format break" +(ADR-0015 §5) — is the lever that lets one journal serve both: +record everything with a status, and let each consumer filter. + +### Problem 3 — `replay history.log` does not actually work + +`run_replay` (`src/runtime.rs`) reads the target file and +parses **each whole trimmed line** through the DSL parser, +skipping only blank lines and `#` comments. It has no +understanding of the `||` record +shape. Feeding it a real `history.log` fails on line 1 (the +leading timestamp is not a command keyword), despite ADR-0006 +listing "`history.log` itself can be replayed" and the +`resolve_replay_path` doc comment promising "`replay +history.log` works without ceremony". + +No test covers this: every replay test feeds a hand-written +plain `.commands` script (one bare command per line). The test +named `replay_history_log_records_subcommands_only` only +verifies that replaying a plain script *writes* the right +entries *into* `history.log`; it never replays the pipe-format +file as input. So the gap between the stated capability and the +implementation is invisible to the suite. + +## Decision + +### 1. `history.log` is a complete command journal, status-tagged + +Every submitted command is recorded, regardless of outcome, +keeping the existing three-field line shape: + +``` +|| +``` + +`status` takes at least these values: + +- `ok` — the command executed successfully (the only value + written today). +- `err` — the command did **not** succeed: it either failed to + parse / validate, or it dispatched and the engine/worker + returned an error. + +Finer-grained values (e.g. distinguishing a parse error from an +execution error, or marking imported vs user-issued commands) +remain a non-breaking future extension — the field already +tolerates additional values, and readers ignore values they do +not recognise. + +This supersedes ADR-0015 §5's "Status is always `ok` … failed +commands are not recorded" and ADR-0006's "Every *successfully +executed* command … is appended" — both are narrowed to: *every +submitted command is appended, tagged with its resulting +status.* + +### 2. Each consumer filters by status + +- **Input-history hydration** (ADR-0015 §12) reads **all** + records — `ok` and `err` — so cross-session recall matches + the in-memory ring's "record everything" behaviour. A failed + command from a previous session is recallable and editable + after restart, just as it is within the session it was typed. +- **Replay** reads **`ok` records only**. A journal carrying + `err` lines stays directly replayable: the failures are + skipped, leaving exactly the sequence that built the state. +- **Bug-report artifact**: the full journal (both statuses) is + the most useful form — it shows what the user tried, not just + what worked. + +### 3. Replay learns the journal format (and keeps plain scripts) + +`replay ` must accept **both** input shapes a user can +hand it: + +- A **journal record** — a line matching the + `||` prefix. Replay + extracts `` (unescaping it per §5), and **skips the + line unless `status == ok`**. +- A **bare command** — any other non-blank, non-`#` line, as in + a hand-written `.commands` script. Replayed as-is (current + behaviour). + +Detection is by the leading `||` prefix, +where `` is one of the known tokens. A bare command +that happens to contain `|` (e.g. `select 'a|b' from t`) is not +misread, because it does not begin with a timestamp-and-status +prefix. This makes `replay history.log` work as ADR-0006 +promised, without breaking `.commands` scripts. + +### 4. Where the writes happen + +- **Successful commands** continue to be journalled by the db + worker, transactionally coupled to the state change + (`finalize_persistence` before `tx.commit()`), exactly as + today — the `ok` line and the committed state land together + or not at all. +- **Failed commands** are journalled with `err` from the + runtime/app error path, *not* the worker: a parse failure + never reaches the worker, and an execution failure has no + committed state to couple to. The `err` append is therefore + best-effort (a failure to record a failure is logged and + ignored — it must never escalate a user error into a fatal). + +This split is an implementation consequence of the journal +decision, recorded here so the implementing sub-task does not +re-derive it. + +### 5. Backward compatibility + +Existing `history.log` files contain only `ok` lines and remain +valid: hydration reads them (all `ok`), replay replays them +(all `ok`), nothing migrates. The malformed-line skipping in +`read_recent_sources` already tolerates unrecognised shapes, so +a mixed-version file degrades gracefully. + +## Consequences + +- The persistence layer gains a status parameter on the history + append; `format_record` stops hardcoding `ok`. +- The runtime/app error paths gain an `err` journal append + (best-effort) for both parse failures and execution failures. +- `read_recent_sources` (hydration) returns all records + regardless of status — no change if it already ignores the + status field, but the intent is now explicit and must be + test-pinned. +- `run_replay` gains journal-record parsing with an `ok`-only + filter, while still accepting bare-command scripts. This is + the fix for Problem 3 and **must ship with a test that feeds + an actual `history.log` to replay** (the missing-coverage gap + that hid the bug). +- The bug-report artifact becomes strictly more useful (it now + shows attempted-and-failed commands). +- A shared/exported project still excludes `history.log` + (ADR-0015 §11 / ADR-0007), so the journal's added failure + records do not leak by default. +- ADR-0006 and ADR-0015 §5/§12 are amended as stated in + Decision §1–§2; they remain canonical for everything outside + the amended clauses. + +## Implementation note + +Per the decision to capture this now and implement later, the +code lands as two tracked sub-tasks (test-first): + +1. **Journal failures + per-consumer filtering** — status + parameter on the append; `err` writes from the error paths; + hydration reads all; replay reads `ok` only. Tests: a failed + command appears in `history.log` with `err`; it survives a + hydration round-trip into the recall ring; replay skips it. +2. **Replay parses the journal format** — dual-shape input + (journal record vs bare command), `ok`-only filter, source + unescaping. Tests: replaying an actual `history.log` (pipe + format, mixed `ok`/`err`) runs the `ok` commands and skips + the `err` ones; a plain `.commands` script still replays + unchanged; a bare command containing `|` is not misparsed. + +## See also + +- ADR-0004 — the project format `history.log` lives in. +- ADR-0006 — undo snapshots + the replay log this ADR amends. +- ADR-0007 — `export` excludes `history.log`. +- ADR-0015 — project storage runtime; §5 (history format) and + §12 (input-history hydration) amended here. diff --git a/docs/adr/README.md b/docs/adr/README.md index 61beb17..551f1b2 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -39,3 +39,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2 - [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless; Amendment 2 records that §10.6's "rewrite the highlight class" prescription is realised via the two-pass schema-existence diagnostic + the renderer's diagnostic-overlay path (no separate per-byte rewrite step needed; no new HighlightClass variant), and that the projection-before-FROM completion narrowing has been improved by an `src/completion.rs` look-ahead probe when the leading walk's `from_scope` is empty but the full input parses - [ADR-0033 — The full SQL DML grammar (`INSERT` / `UPDATE` / `DELETE`)](0033-sql-dml-grammar.md) — **Proposed**, the Phase-3 grammar commissioned by ADR-0030 §3: single- and multi-row `INSERT` (incl. `INSERT … SELECT` recursing through ADR-0032's `SQL_SELECT_COMPOUND`), `UPDATE` with `SET` assignment list, `DELETE`, all three optionally followed by `RETURNING projection_list`, plus full `ON CONFLICT … DO NOTHING / DO UPDATE` UPSERT on INSERT; **fixes the DSL-vs-SQL dispatch architecture for shared entry words (`insert`/`update`/`delete`)**: SQL-first / DSL-fallback in Advanced mode via a `Choice(SQL_shape, DSL_shape)` per shape, gated by a new walker capability `Node::Guard(fn)` — a zero-byte-consumption gating node that fails the enclosing Seq with a `ValidationError`; carries `Command::SqlInsert` / `SqlUpdate` / `SqlDelete` variants and `do_sql_*` worker handlers each of which knows the target table (for re-persistence) and the `returning: bool` flag (for DataResult routing); `shortid` auto-fill mirrors the DSL `do_insert` mechanism via worker post-fill; SQL DELETE produces the same per-relationship cascade summary the DSL DELETE does (ADR-0014 parity); three new walker diagnostics (`insert_arity_mismatch` ERROR, `auto_column_overridden` WARNING, `not_null_missing` WARNING) with positive + negative tests each; OOS list explicitly carves out `DEFAULT VALUES` (the project's planned seed feature), SQLite-specific `OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` prefixes, `UPDATE FROM` multi-table updates, and WITH-prefixed DML; the `excluded` keyword inside `ON CONFLICT DO UPDATE` is a deliberate carve-out from ADR-0030 §7's engine-neutral posture (no standard-SQL UPSERT spelling exists that SQLite and PostgreSQL share); eleven phased sub-phases each with explicit exit gates + written DA gate, opening with the dispatch mechanism before any DML grammar lands; initial DA review recorded seven critiques that were resolved before status moved to Proposed; **Amendment 1 supersedes §2's dispatch mechanism**: the originally-chosen `Node::Guard(fn)` + `Choice(SQL_shape, DSL_shape)` was found during 3a to be unworkable as framed (any guard-in-`Choice` mechanism forces a `walk_choice` change — `walk_choice` only falls through on `NoMatch`, so Simple-mode valid-DSL would wrongly surface "this is SQL", and `walk_seq` treats a `NoMatch` past `idx 0` as a hard `Failed`, breaking Advanced-mode DSL fall-through); replaced by **category-grouped, mode-aware dispatch** in `walker::walk` (each `REGISTRY` entry tagged `CommandCategory::{Simple, Advanced}`, generalising the existing whole-command `is_advanced_only` gate), shared entry words carrying a node in both groups, no `Node::Guard` and no `walk_choice`/`walk_seq` change, advanced-mode completion SQL-first with DSL as a full-line fallback; **Amendment 2 (sub-phase 3f) supersedes §7's cascade mechanism**: the WHERE-injected per-child pre-count rested on a premise that was factually wrong about the DSL handler (which detects cascades by before/after row-count diffing inside a transaction, not by `Expr`-derived pre-count subqueries) and would have broken the §2 parity promise by reporting `SET NULL` the DSL path doesn't; replaced by mirroring `do_delete`'s count-diff exactly (verbatim DELETE executes, child-count diff observes the cascade — `ON DELETE CASCADE` row removals only, SET NULL deferred for both paths to preserve parity), which shares the render-layer formatter for free via `CommandOutcome::Delete` and **withdraws risk R2** (no WHERE-byte extraction, no N+1 subquery) +- [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](0034-history-journal-and-replay-filter.md) — **Accepted**, resolves a three-way tension in `history.log`'s roles found while implementing ADR-0033 3f: (1) the persistent log is success-only while the in-memory Up/Down recall ring records *every* submission (success or failure, "so users can recall and edit typo'd commands"), and the ring is re-seeded from the log on project open — so **failed commands are recallable within a session but silently lost across sessions**; (2) replay wants the state-building (successful) commands while recall wants everything typed, which one success-only file cannot serve; (3) `replay history.log` never actually worked — `run_replay` parses each whole line through the DSL parser with no understanding of the `||` record shape, so a real log fails on line 1, and **no test ever fed the pipe format to replay** (the `replay_history_log_records_subcommands_only` test only checks what replay *writes*, never replays the log as input). Decision: `history.log` becomes a **complete journal** — every submission recorded, tagged `ok`/`err` via the status field the format already reserved (ADR-0015 §5) — and **each consumer filters**: hydration reads all records (cross-session recall matches in-session), replay reads `ok` only (and learns the journal format, while still accepting bare-command `.commands` scripts; detection by the leading timestamp+status prefix so a `|` inside a bare command isn't misread). Successful commands stay journalled transactionally by the worker; failed commands are journalled `err` best-effort from the runtime/app error path (a parse failure never reaches the worker). Amends ADR-0006's "successfully executed" wording and ADR-0015 §5 ("status always `ok`") / §12 (hydration). Code deferred to two tracked test-first sub-tasks (journal-failures+filtering; replay-parses-journal-format); existing all-`ok` logs need no migration