docs: ADR-0034 — history.log as a complete journal; replay reads ok-only

Found while implementing 3f: history.log is success-only, but the in-memory Up/Down recall ring records every submission — and the ring is re-seeded from the log on open, so failed commands are recallable in-session yet lost across sessions. Replay and recall also want different inputs (state-builders vs everything-typed), which one success-only file can't serve. And replay never parsed the pipe format (run_replay parses whole lines), so `replay history.log` fails on line 1 with no test covering it. Decision: history.log becomes a complete journal tagged ok/err; hydration reads all, replay reads ok-only and learns the format. Amends ADR-0006 + ADR-0015 §5/§12. Code deferred to two tracked sub-tasks. No migration for existing all-ok logs.
2026-05-22 19:17:52 +00:00
parent 62f09bebc5
commit b935090d7b
2 changed files with 229 additions and 0 deletions
@@ -0,0 +1,228 @@
+# ADR-0034: `history.log` as a complete command journal; replay reads success-only
+
+## Status
+
+Accepted
+
+## Context
+
+`history.log` serves three stated purposes (ADR-0006 "Replay
+log"; ADR-0015 §5/§12):
+
+1. **Persistent input history** — the in-memory Up/Down recall
+   ring is hydrated from `history.log` on project open
+   (ADR-0015 §12, "I2-persist").
+2. **Replay / scripting** — `replay <path>` re-executes a file
+   of commands; `history.log` itself was declared
+   "replay-compatible" (ADR-0015 §13 OOS-2; ADR-0006).
+3. **Reproducible bug-report artifact** when a project is
+   shared.
+
+Tracing the implementation surfaced three problems that pull
+against each other:
+
+### Problem 1 — failed commands are lost across sessions
+
+`history.log` records **only successful** commands. The writer
+(`src/persistence/history.rs`) hardcodes the status field to
+`ok` and never appends a failed command (ADR-0015 §5: "Status
+is always `ok` in v1, because failed commands are not
+recorded").
+
+But the **in-memory** recall ring (`App.history`) records
+**every** submission, success or failure — `App::submit`
+pushes the trimmed line "regardless of whether it parses, so
+users can recall and edit typo'd commands". So:
+
+- **Within a session**, Up/Down recalls a failed command (type
+  bad SQL, see it fail, fix the table, Up-arrow twice to bring
+  the failed command back and edit it). This works.
+- **Across sessions**, the ring is re-seeded from
+  `history.log`, which dropped the failures — so the failed
+  command the user wanted to fix is gone after a restart.
+
+This asymmetry is the core defect. The recall surface should
+behave the same whether the command came from this session's
+in-memory ring or a previous session's hydrated log.
+
+(Note: ADR-0015 §12's claim that "new successful commands … are
+pushed onto the in-memory stack as they are now" is itself
+inaccurate about the code — the stack receives *all*
+submissions, not only successful ones. This ADR's model makes
+the persistent log match that existing in-memory behaviour.)
+
+### Problem 2 — the two consumers want different inputs
+
+- **Replay** wants the commands that *built* the current state
+  — the successful ones. Replaying a command that failed is
+  pointless (it re-fails, or worse, behaves differently against
+  a now-different schema).
+- **Recall** wants *everything the user typed*, including
+  failures, so they can fix and resubmit.
+
+A single success-only log cannot satisfy both; serving recall
+forces failures into the file, which then pollutes replay. The
+status field — present in the line format precisely so "future
+use cases can carry additional values without a format break"
+(ADR-0015 §5) — is the lever that lets one journal serve both:
+record everything with a status, and let each consumer filter.
+
+### Problem 3 — `replay history.log` does not actually work
+
+`run_replay` (`src/runtime.rs`) reads the target file and
+parses **each whole trimmed line** through the DSL parser,
+skipping only blank lines and `#` comments. It has no
+understanding of the `<timestamp>|<status>|<source>` record
+shape. Feeding it a real `history.log` fails on line 1 (the
+leading timestamp is not a command keyword), despite ADR-0006
+listing "`history.log` itself can be replayed" and the
+`resolve_replay_path` doc comment promising "`replay
+history.log` works without ceremony".
+
+No test covers this: every replay test feeds a hand-written
+plain `.commands` script (one bare command per line). The test
+named `replay_history_log_records_subcommands_only` only
+verifies that replaying a plain script *writes* the right
+entries *into* `history.log`; it never replays the pipe-format
+file as input. So the gap between the stated capability and the
+implementation is invisible to the suite.
+
+## Decision
+
+### 1. `history.log` is a complete command journal, status-tagged
+
+Every submitted command is recorded, regardless of outcome,
+keeping the existing three-field line shape:
+
+```
+<timestamp>|<status>|<source>
+```
+
+`status` takes at least these values:
+
+- `ok` — the command executed successfully (the only value
+  written today).
+- `err` — the command did **not** succeed: it either failed to
+  parse / validate, or it dispatched and the engine/worker
+  returned an error.
+
+Finer-grained values (e.g. distinguishing a parse error from an
+execution error, or marking imported vs user-issued commands)
+remain a non-breaking future extension — the field already
+tolerates additional values, and readers ignore values they do
+not recognise.
+
+This supersedes ADR-0015 §5's "Status is always `ok` … failed
+commands are not recorded" and ADR-0006's "Every *successfully
+executed* command … is appended" — both are narrowed to: *every
+submitted command is appended, tagged with its resulting
+status.*
+
+### 2. Each consumer filters by status
+
+- **Input-history hydration** (ADR-0015 §12) reads **all**
+  records — `ok` and `err` — so cross-session recall matches
+  the in-memory ring's "record everything" behaviour. A failed
+  command from a previous session is recallable and editable
+  after restart, just as it is within the session it was typed.
+- **Replay** reads **`ok` records only**. A journal carrying
+  `err` lines stays directly replayable: the failures are
+  skipped, leaving exactly the sequence that built the state.
+- **Bug-report artifact**: the full journal (both statuses) is
+  the most useful form — it shows what the user tried, not just
+  what worked.
+
+### 3. Replay learns the journal format (and keeps plain scripts)
+
+`replay <path>` must accept **both** input shapes a user can
+hand it:
+
+- A **journal record** — a line matching the
+  `<iso8601-timestamp>|<status>|<source>` prefix. Replay
+  extracts `<source>` (unescaping it per §5), and **skips the
+  line unless `status == ok`**.
+- A **bare command** — any other non-blank, non-`#` line, as in
+  a hand-written `.commands` script. Replayed as-is (current
+  behaviour).
+
+Detection is by the leading `<timestamp>|<status>|` prefix,
+where `<status>` is one of the known tokens. A bare command
+that happens to contain `|` (e.g. `select 'a|b' from t`) is not
+misread, because it does not begin with a timestamp-and-status
+prefix. This makes `replay history.log` work as ADR-0006
+promised, without breaking `.commands` scripts.
+
+### 4. Where the writes happen
+
+- **Successful commands** continue to be journalled by the db
+  worker, transactionally coupled to the state change
+  (`finalize_persistence` before `tx.commit()`), exactly as
+  today — the `ok` line and the committed state land together
+  or not at all.
+- **Failed commands** are journalled with `err` from the
+  runtime/app error path, *not* the worker: a parse failure
+  never reaches the worker, and an execution failure has no
+  committed state to couple to. The `err` append is therefore
+  best-effort (a failure to record a failure is logged and
+  ignored — it must never escalate a user error into a fatal).
+
+This split is an implementation consequence of the journal
+decision, recorded here so the implementing sub-task does not
+re-derive it.
+
+### 5. Backward compatibility
+
+Existing `history.log` files contain only `ok` lines and remain
+valid: hydration reads them (all `ok`), replay replays them
+(all `ok`), nothing migrates. The malformed-line skipping in
+`read_recent_sources` already tolerates unrecognised shapes, so
+a mixed-version file degrades gracefully.
+
+## Consequences
+
+- The persistence layer gains a status parameter on the history
+  append; `format_record` stops hardcoding `ok`.
+- The runtime/app error paths gain an `err` journal append
+  (best-effort) for both parse failures and execution failures.
+- `read_recent_sources` (hydration) returns all records
+  regardless of status — no change if it already ignores the
+  status field, but the intent is now explicit and must be
+  test-pinned.
+- `run_replay` gains journal-record parsing with an `ok`-only
+  filter, while still accepting bare-command scripts. This is
+  the fix for Problem 3 and **must ship with a test that feeds
+  an actual `history.log` to replay** (the missing-coverage gap
+  that hid the bug).
+- The bug-report artifact becomes strictly more useful (it now
+  shows attempted-and-failed commands).
+- A shared/exported project still excludes `history.log`
+  (ADR-0015 §11 / ADR-0007), so the journal's added failure
+  records do not leak by default.
+- ADR-0006 and ADR-0015 §5/§12 are amended as stated in
+  Decision §1–§2; they remain canonical for everything outside
+  the amended clauses.
+
+## Implementation note
+
+Per the decision to capture this now and implement later, the
+code lands as two tracked sub-tasks (test-first):
+
+1. **Journal failures + per-consumer filtering** — status
+   parameter on the append; `err` writes from the error paths;
+   hydration reads all; replay reads `ok` only. Tests: a failed
+   command appears in `history.log` with `err`; it survives a
+   hydration round-trip into the recall ring; replay skips it.
+2. **Replay parses the journal format** — dual-shape input
+   (journal record vs bare command), `ok`-only filter, source
+   unescaping. Tests: replaying an actual `history.log` (pipe
+   format, mixed `ok`/`err`) runs the `ok` commands and skips
+   the `err` ones; a plain `.commands` script still replays
+   unchanged; a bare command containing `|` is not misparsed.
+
+## See also
+
+- ADR-0004 — the project format `history.log` lives in.
+- ADR-0006 — undo snapshots + the replay log this ADR amends.
+- ADR-0007 — `export` excludes `history.log`.
+- ADR-0015 — project storage runtime; §5 (history format) and
+  §12 (input-history hydration) amended here.
@@ -39,3 +39,4 @@ This directory contains the project's ADRs, recorded per
 - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2
 - [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless; Amendment 2 records that §10.6's "rewrite the highlight class" prescription is realised via the two-pass schema-existence diagnostic + the renderer's diagnostic-overlay path (no separate per-byte rewrite step needed; no new HighlightClass variant), and that the projection-before-FROM completion narrowing has been improved by an `src/completion.rs` look-ahead probe when the leading walk's `from_scope` is empty but the full input parses
 - [ADR-0033 — The full SQL DML grammar (`INSERT` / `UPDATE` / `DELETE`)](0033-sql-dml-grammar.md) — **Proposed**, the Phase-3 grammar commissioned by ADR-0030 §3: single- and multi-row `INSERT` (incl. `INSERT … SELECT` recursing through ADR-0032's `SQL_SELECT_COMPOUND`), `UPDATE` with `SET` assignment list, `DELETE`, all three optionally followed by `RETURNING projection_list`, plus full `ON CONFLICT … DO NOTHING / DO UPDATE` UPSERT on INSERT; **fixes the DSL-vs-SQL dispatch architecture for shared entry words (`insert`/`update`/`delete`)**: SQL-first / DSL-fallback in Advanced mode via a `Choice(SQL_shape, DSL_shape)` per shape, gated by a new walker capability `Node::Guard(fn)` — a zero-byte-consumption gating node that fails the enclosing Seq with a `ValidationError`; carries `Command::SqlInsert` / `SqlUpdate` / `SqlDelete` variants and `do_sql_*` worker handlers each of which knows the target table (for re-persistence) and the `returning: bool` flag (for DataResult routing); `shortid` auto-fill mirrors the DSL `do_insert` mechanism via worker post-fill; SQL DELETE produces the same per-relationship cascade summary the DSL DELETE does (ADR-0014 parity); three new walker diagnostics (`insert_arity_mismatch` ERROR, `auto_column_overridden` WARNING, `not_null_missing` WARNING) with positive + negative tests each; OOS list explicitly carves out `DEFAULT VALUES` (the project's planned seed feature), SQLite-specific `OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` prefixes, `UPDATE FROM` multi-table updates, and WITH-prefixed DML; the `excluded` keyword inside `ON CONFLICT DO UPDATE` is a deliberate carve-out from ADR-0030 §7's engine-neutral posture (no standard-SQL UPSERT spelling exists that SQLite and PostgreSQL share); eleven phased sub-phases each with explicit exit gates + written DA gate, opening with the dispatch mechanism before any DML grammar lands; initial DA review recorded seven critiques that were resolved before status moved to Proposed; **Amendment 1 supersedes §2's dispatch mechanism**: the originally-chosen `Node::Guard(fn)` + `Choice(SQL_shape, DSL_shape)` was found during 3a to be unworkable as framed (any guard-in-`Choice` mechanism forces a `walk_choice` change — `walk_choice` only falls through on `NoMatch`, so Simple-mode valid-DSL would wrongly surface "this is SQL", and `walk_seq` treats a `NoMatch` past `idx 0` as a hard `Failed`, breaking Advanced-mode DSL fall-through); replaced by **category-grouped, mode-aware dispatch** in `walker::walk` (each `REGISTRY` entry tagged `CommandCategory::{Simple, Advanced}`, generalising the existing whole-command `is_advanced_only` gate), shared entry words carrying a node in both groups, no `Node::Guard` and no `walk_choice`/`walk_seq` change, advanced-mode completion SQL-first with DSL as a full-line fallback; **Amendment 2 (sub-phase 3f) supersedes §7's cascade mechanism**: the WHERE-injected per-child pre-count rested on a premise that was factually wrong about the DSL handler (which detects cascades by before/after row-count diffing inside a transaction, not by `Expr`-derived pre-count subqueries) and would have broken the §2 parity promise by reporting `SET NULL` the DSL path doesn't; replaced by mirroring `do_delete`'s count-diff exactly (verbatim DELETE executes, child-count diff observes the cascade — `ON DELETE CASCADE` row removals only, SET NULL deferred for both paths to preserve parity), which shares the render-layer formatter for free via `CommandOutcome::Delete` and **withdraws risk R2** (no WHERE-byte extraction, no N+1 subquery)
+- [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](0034-history-journal-and-replay-filter.md) — **Accepted**, resolves a three-way tension in `history.log`'s roles found while implementing ADR-0033 3f: (1) the persistent log is success-only while the in-memory Up/Down recall ring records *every* submission (success or failure, "so users can recall and edit typo'd commands"), and the ring is re-seeded from the log on project open — so **failed commands are recallable within a session but silently lost across sessions**; (2) replay wants the state-building (successful) commands while recall wants everything typed, which one success-only file cannot serve; (3) `replay history.log` never actually worked — `run_replay` parses each whole line through the DSL parser with no understanding of the `<ts>|<status>|<source>` record shape, so a real log fails on line 1, and **no test ever fed the pipe format to replay** (the `replay_history_log_records_subcommands_only` test only checks what replay *writes*, never replays the log as input). Decision: `history.log` becomes a **complete journal** — every submission recorded, tagged `ok`/`err` via the status field the format already reserved (ADR-0015 §5) — and **each consumer filters**: hydration reads all records (cross-session recall matches in-session), replay reads `ok` only (and learns the journal format, while still accepting bare-command `.commands` scripts; detection by the leading timestamp+status prefix so a `|` inside a bare command isn't misread). Successful commands stay journalled transactionally by the worker; failed commands are journalled `err` best-effort from the runtime/app error path (a parse failure never reaches the worker). Amends ADR-0006's "successfully executed" wording and ADR-0015 §5 ("status always `ok`") / §12 (hydration). Code deferred to two tracked test-first sub-tasks (journal-failures+filtering; replay-parses-journal-format); existing all-`ok` logs need no migration