docs: ADR-0034 — history.log as a complete journal; replay reads ok-only

Found while implementing 3f: history.log is success-only, but the
in-memory Up/Down recall ring records every submission — and the ring
is re-seeded from the log on open, so failed commands are recallable
in-session yet lost across sessions. Replay and recall also want
different inputs (state-builders vs everything-typed), which one
success-only file can't serve. And replay never parsed the pipe
format (run_replay parses whole lines), so `replay history.log` fails
on line 1 with no test covering it.

Decision: history.log becomes a complete journal tagged ok/err;
hydration reads all, replay reads ok-only and learns the format.
Amends ADR-0006 + ADR-0015 §5/§12. Code deferred to two tracked
sub-tasks. No migration for existing all-ok logs.
This commit is contained in:
claude@clouddev1
2026-05-22 19:17:52 +00:00
parent 62f09bebc5
commit b935090d7b
2 changed files with 229 additions and 0 deletions
@@ -0,0 +1,228 @@
# ADR-0034: `history.log` as a complete command journal; replay reads success-only
## Status
Accepted
## Context
`history.log` serves three stated purposes (ADR-0006 "Replay
log"; ADR-0015 §5/§12):
1. **Persistent input history** — the in-memory Up/Down recall
ring is hydrated from `history.log` on project open
(ADR-0015 §12, "I2-persist").
2. **Replay / scripting**`replay <path>` re-executes a file
of commands; `history.log` itself was declared
"replay-compatible" (ADR-0015 §13 OOS-2; ADR-0006).
3. **Reproducible bug-report artifact** when a project is
shared.
Tracing the implementation surfaced three problems that pull
against each other:
### Problem 1 — failed commands are lost across sessions
`history.log` records **only successful** commands. The writer
(`src/persistence/history.rs`) hardcodes the status field to
`ok` and never appends a failed command (ADR-0015 §5: "Status
is always `ok` in v1, because failed commands are not
recorded").
But the **in-memory** recall ring (`App.history`) records
**every** submission, success or failure — `App::submit`
pushes the trimmed line "regardless of whether it parses, so
users can recall and edit typo'd commands". So:
- **Within a session**, Up/Down recalls a failed command (type
bad SQL, see it fail, fix the table, Up-arrow twice to bring
the failed command back and edit it). This works.
- **Across sessions**, the ring is re-seeded from
`history.log`, which dropped the failures — so the failed
command the user wanted to fix is gone after a restart.
This asymmetry is the core defect. The recall surface should
behave the same whether the command came from this session's
in-memory ring or a previous session's hydrated log.
(Note: ADR-0015 §12's claim that "new successful commands … are
pushed onto the in-memory stack as they are now" is itself
inaccurate about the code — the stack receives *all*
submissions, not only successful ones. This ADR's model makes
the persistent log match that existing in-memory behaviour.)
### Problem 2 — the two consumers want different inputs
- **Replay** wants the commands that *built* the current state
— the successful ones. Replaying a command that failed is
pointless (it re-fails, or worse, behaves differently against
a now-different schema).
- **Recall** wants *everything the user typed*, including
failures, so they can fix and resubmit.
A single success-only log cannot satisfy both; serving recall
forces failures into the file, which then pollutes replay. The
status field — present in the line format precisely so "future
use cases can carry additional values without a format break"
(ADR-0015 §5) — is the lever that lets one journal serve both:
record everything with a status, and let each consumer filter.
### Problem 3 — `replay history.log` does not actually work
`run_replay` (`src/runtime.rs`) reads the target file and
parses **each whole trimmed line** through the DSL parser,
skipping only blank lines and `#` comments. It has no
understanding of the `<timestamp>|<status>|<source>` record
shape. Feeding it a real `history.log` fails on line 1 (the
leading timestamp is not a command keyword), despite ADR-0006
listing "`history.log` itself can be replayed" and the
`resolve_replay_path` doc comment promising "`replay
history.log` works without ceremony".
No test covers this: every replay test feeds a hand-written
plain `.commands` script (one bare command per line). The test
named `replay_history_log_records_subcommands_only` only
verifies that replaying a plain script *writes* the right
entries *into* `history.log`; it never replays the pipe-format
file as input. So the gap between the stated capability and the
implementation is invisible to the suite.
## Decision
### 1. `history.log` is a complete command journal, status-tagged
Every submitted command is recorded, regardless of outcome,
keeping the existing three-field line shape:
```
<timestamp>|<status>|<source>
```
`status` takes at least these values:
- `ok` — the command executed successfully (the only value
written today).
- `err` — the command did **not** succeed: it either failed to
parse / validate, or it dispatched and the engine/worker
returned an error.
Finer-grained values (e.g. distinguishing a parse error from an
execution error, or marking imported vs user-issued commands)
remain a non-breaking future extension — the field already
tolerates additional values, and readers ignore values they do
not recognise.
This supersedes ADR-0015 §5's "Status is always `ok` … failed
commands are not recorded" and ADR-0006's "Every *successfully
executed* command … is appended" — both are narrowed to: *every
submitted command is appended, tagged with its resulting
status.*
### 2. Each consumer filters by status
- **Input-history hydration** (ADR-0015 §12) reads **all**
records — `ok` and `err` — so cross-session recall matches
the in-memory ring's "record everything" behaviour. A failed
command from a previous session is recallable and editable
after restart, just as it is within the session it was typed.
- **Replay** reads **`ok` records only**. A journal carrying
`err` lines stays directly replayable: the failures are
skipped, leaving exactly the sequence that built the state.
- **Bug-report artifact**: the full journal (both statuses) is
the most useful form — it shows what the user tried, not just
what worked.
### 3. Replay learns the journal format (and keeps plain scripts)
`replay <path>` must accept **both** input shapes a user can
hand it:
- A **journal record** — a line matching the
`<iso8601-timestamp>|<status>|<source>` prefix. Replay
extracts `<source>` (unescaping it per §5), and **skips the
line unless `status == ok`**.
- A **bare command** — any other non-blank, non-`#` line, as in
a hand-written `.commands` script. Replayed as-is (current
behaviour).
Detection is by the leading `<timestamp>|<status>|` prefix,
where `<status>` is one of the known tokens. A bare command
that happens to contain `|` (e.g. `select 'a|b' from t`) is not
misread, because it does not begin with a timestamp-and-status
prefix. This makes `replay history.log` work as ADR-0006
promised, without breaking `.commands` scripts.
### 4. Where the writes happen
- **Successful commands** continue to be journalled by the db
worker, transactionally coupled to the state change
(`finalize_persistence` before `tx.commit()`), exactly as
today — the `ok` line and the committed state land together
or not at all.
- **Failed commands** are journalled with `err` from the
runtime/app error path, *not* the worker: a parse failure
never reaches the worker, and an execution failure has no
committed state to couple to. The `err` append is therefore
best-effort (a failure to record a failure is logged and
ignored — it must never escalate a user error into a fatal).
This split is an implementation consequence of the journal
decision, recorded here so the implementing sub-task does not
re-derive it.
### 5. Backward compatibility
Existing `history.log` files contain only `ok` lines and remain
valid: hydration reads them (all `ok`), replay replays them
(all `ok`), nothing migrates. The malformed-line skipping in
`read_recent_sources` already tolerates unrecognised shapes, so
a mixed-version file degrades gracefully.
## Consequences
- The persistence layer gains a status parameter on the history
append; `format_record` stops hardcoding `ok`.
- The runtime/app error paths gain an `err` journal append
(best-effort) for both parse failures and execution failures.
- `read_recent_sources` (hydration) returns all records
regardless of status — no change if it already ignores the
status field, but the intent is now explicit and must be
test-pinned.
- `run_replay` gains journal-record parsing with an `ok`-only
filter, while still accepting bare-command scripts. This is
the fix for Problem 3 and **must ship with a test that feeds
an actual `history.log` to replay** (the missing-coverage gap
that hid the bug).
- The bug-report artifact becomes strictly more useful (it now
shows attempted-and-failed commands).
- A shared/exported project still excludes `history.log`
(ADR-0015 §11 / ADR-0007), so the journal's added failure
records do not leak by default.
- ADR-0006 and ADR-0015 §5/§12 are amended as stated in
Decision §1–§2; they remain canonical for everything outside
the amended clauses.
## Implementation note
Per the decision to capture this now and implement later, the
code lands as two tracked sub-tasks (test-first):
1. **Journal failures + per-consumer filtering** — status
parameter on the append; `err` writes from the error paths;
hydration reads all; replay reads `ok` only. Tests: a failed
command appears in `history.log` with `err`; it survives a
hydration round-trip into the recall ring; replay skips it.
2. **Replay parses the journal format** — dual-shape input
(journal record vs bare command), `ok`-only filter, source
unescaping. Tests: replaying an actual `history.log` (pipe
format, mixed `ok`/`err`) runs the `ok` commands and skips
the `err` ones; a plain `.commands` script still replays
unchanged; a bare command containing `|` is not misparsed.
## See also
- ADR-0004 — the project format `history.log` lives in.
- ADR-0006 — undo snapshots + the replay log this ADR amends.
- ADR-0007 — `export` excludes `history.log`.
- ADR-0015 — project storage runtime; §5 (history format) and
§12 (input-history hydration) amended here.
+1
View File
@@ -39,3 +39,4 @@ This directory contains the project's ADRs, recorded per
- [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2
- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless; Amendment 2 records that §10.6's "rewrite the highlight class" prescription is realised via the two-pass schema-existence diagnostic + the renderer's diagnostic-overlay path (no separate per-byte rewrite step needed; no new HighlightClass variant), and that the projection-before-FROM completion narrowing has been improved by an `src/completion.rs` look-ahead probe when the leading walk's `from_scope` is empty but the full input parses
- [ADR-0033 — The full SQL DML grammar (`INSERT` / `UPDATE` / `DELETE`)](0033-sql-dml-grammar.md) — **Proposed**, the Phase-3 grammar commissioned by ADR-0030 §3: single- and multi-row `INSERT` (incl. `INSERT … SELECT` recursing through ADR-0032's `SQL_SELECT_COMPOUND`), `UPDATE` with `SET` assignment list, `DELETE`, all three optionally followed by `RETURNING projection_list`, plus full `ON CONFLICT … DO NOTHING / DO UPDATE` UPSERT on INSERT; **fixes the DSL-vs-SQL dispatch architecture for shared entry words (`insert`/`update`/`delete`)**: SQL-first / DSL-fallback in Advanced mode via a `Choice(SQL_shape, DSL_shape)` per shape, gated by a new walker capability `Node::Guard(fn)` — a zero-byte-consumption gating node that fails the enclosing Seq with a `ValidationError`; carries `Command::SqlInsert` / `SqlUpdate` / `SqlDelete` variants and `do_sql_*` worker handlers each of which knows the target table (for re-persistence) and the `returning: bool` flag (for DataResult routing); `shortid` auto-fill mirrors the DSL `do_insert` mechanism via worker post-fill; SQL DELETE produces the same per-relationship cascade summary the DSL DELETE does (ADR-0014 parity); three new walker diagnostics (`insert_arity_mismatch` ERROR, `auto_column_overridden` WARNING, `not_null_missing` WARNING) with positive + negative tests each; OOS list explicitly carves out `DEFAULT VALUES` (the project's planned seed feature), SQLite-specific `OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` prefixes, `UPDATE FROM` multi-table updates, and WITH-prefixed DML; the `excluded` keyword inside `ON CONFLICT DO UPDATE` is a deliberate carve-out from ADR-0030 §7's engine-neutral posture (no standard-SQL UPSERT spelling exists that SQLite and PostgreSQL share); eleven phased sub-phases each with explicit exit gates + written DA gate, opening with the dispatch mechanism before any DML grammar lands; initial DA review recorded seven critiques that were resolved before status moved to Proposed; **Amendment 1 supersedes §2's dispatch mechanism**: the originally-chosen `Node::Guard(fn)` + `Choice(SQL_shape, DSL_shape)` was found during 3a to be unworkable as framed (any guard-in-`Choice` mechanism forces a `walk_choice` change — `walk_choice` only falls through on `NoMatch`, so Simple-mode valid-DSL would wrongly surface "this is SQL", and `walk_seq` treats a `NoMatch` past `idx 0` as a hard `Failed`, breaking Advanced-mode DSL fall-through); replaced by **category-grouped, mode-aware dispatch** in `walker::walk` (each `REGISTRY` entry tagged `CommandCategory::{Simple, Advanced}`, generalising the existing whole-command `is_advanced_only` gate), shared entry words carrying a node in both groups, no `Node::Guard` and no `walk_choice`/`walk_seq` change, advanced-mode completion SQL-first with DSL as a full-line fallback; **Amendment 2 (sub-phase 3f) supersedes §7's cascade mechanism**: the WHERE-injected per-child pre-count rested on a premise that was factually wrong about the DSL handler (which detects cascades by before/after row-count diffing inside a transaction, not by `Expr`-derived pre-count subqueries) and would have broken the §2 parity promise by reporting `SET NULL` the DSL path doesn't; replaced by mirroring `do_delete`'s count-diff exactly (verbatim DELETE executes, child-count diff observes the cascade — `ON DELETE CASCADE` row removals only, SET NULL deferred for both paths to preserve parity), which shares the render-layer formatter for free via `CommandOutcome::Delete` and **withdraws risk R2** (no WHERE-byte extraction, no N+1 subquery)
- [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](0034-history-journal-and-replay-filter.md) — **Accepted**, resolves a three-way tension in `history.log`'s roles found while implementing ADR-0033 3f: (1) the persistent log is success-only while the in-memory Up/Down recall ring records *every* submission (success or failure, "so users can recall and edit typo'd commands"), and the ring is re-seeded from the log on project open — so **failed commands are recallable within a session but silently lost across sessions**; (2) replay wants the state-building (successful) commands while recall wants everything typed, which one success-only file cannot serve; (3) `replay history.log` never actually worked — `run_replay` parses each whole line through the DSL parser with no understanding of the `<ts>|<status>|<source>` record shape, so a real log fails on line 1, and **no test ever fed the pipe format to replay** (the `replay_history_log_records_subcommands_only` test only checks what replay *writes*, never replays the log as input). Decision: `history.log` becomes a **complete journal** — every submission recorded, tagged `ok`/`err` via the status field the format already reserved (ADR-0015 §5) — and **each consumer filters**: hydration reads all records (cross-session recall matches in-session), replay reads `ok` only (and learns the journal format, while still accepting bare-command `.commands` scripts; detection by the leading timestamp+status prefix so a `|` inside a bare command isn't misread). Successful commands stay journalled transactionally by the worker; failed commands are journalled `err` best-effort from the runtime/app error path (a parse failure never reaches the worker). Amends ADR-0006's "successfully executed" wording and ADR-0015 §5 ("status always `ok`") / §12 (hydration). Code deferred to two tracked test-first sub-tasks (journal-failures+filtering; replay-parses-journal-format); existing all-`ok` logs need no migration