From 504c24c9961271a97bc921376ddcdbdf9c15381a Mon Sep 17 00:00:00 2001
From: "claude@clouddev1" <claude-clouddev1@oliversturm.ch>
Date: Sun, 24 May 2026 09:29:52 +0000
Subject: [PATCH] docs: ADR-0034 implementation plan (history journal + replay
 filter)

Two-sub-task test-first plan mapping 1:1 to ADR-0034's named sub-tasks: (1) journal failures + per-consumer filtering (status-tagged append, best-effort err writes, hydration reads all), (2) replay parses the journal format (ok-only filter, dual-shape input). Opens with a headline failing test that reproduces the live replay history.log bug.
---
 .../20260524-adr-0034-history-journal.md      | 255 ++++++++++++++++++
 1 file changed, 255 insertions(+)
 create mode 100644 docs/plans/20260524-adr-0034-history-journal.md
diff --git a/docs/plans/20260524-adr-0034-history-journal.md b/docs/plans/20260524-adr-0034-history-journal.md
new file mode 100644
index 0000000..38f8e63
--- /dev/null
+++ b/docs/plans/20260524-adr-0034-history-journal.md
@@ -0,0 +1,255 @@
+# Implementation plan — ADR-0034 (history journal + replay filter)
+
+**Plan author date:** 2026-05-24
+**Driving ADR:** [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](../adr/0034-history-journal-and-replay-filter.md) (**Accepted**)
+**Amends:** ADR-0006 (replay log), ADR-0015 §5 (history format) / §12 (hydration)
+
+## Purpose
+
+ADR-0034 is the *decision*: `history.log` becomes a complete,
+status-tagged journal; each consumer filters (hydration reads
+all, replay reads `ok` only); replay learns the
+`<ts>|<status>|<source>` format while still accepting bare
+`.commands` scripts. This plan is the *mechanics* — the build
+order, the per-sub-task exit gates, and the headline failing test
+that proves the live bug before any fix lands.
+
+It is deliberately lighter than the Phase-2/3 plans: ADR-0034 is
+a focused two-sub-task change to a small, well-isolated surface
+(`src/persistence/history.rs`, its `persistence/mod.rs` wrappers,
+`run_replay` in `src/runtime.rs`, and the runtime/app error
+paths). The ADR already names the two sub-tasks (§"Implementation
+note"); this plan refines them with explicit gates.
+
+## The live bug this fixes (verified 2026-05-23, handoff-34 §4)
+
+`replay history.log` fails on line 1: `run_replay`
+(`src/runtime.rs:1630`) feeds each whole raw line to
+`parse_command_with_schema` and only skips blank/`#` lines — it
+has no understanding of the `<ts>|<status>|<source>` record shape
+that `history::format_record` already writes. The replay tests
+all use hand-written bare-command scripts, so the gap is
+invisible to the suite (ADR-0034 Problem 3).
+
+## Working-mode reminder
+
+Solo with the DA hat, per `CLAUDE.md`. The DA critique is written
+at each sub-task gate before its verdict. Escalate ADR-vs-code
+mismatches rather than drift — this session's `/runda` round and
+the two Phase-3 escalations show the value.
+
+**Test discipline (non-negotiable, global rule).** Test-first:
+the headline failing test (Step 0) is written and confirmed RED
+before any production change. Each bug/behaviour is reproduced
+with a failing test first. "All green, no skips" is the only
+acceptable end state.
+
+**Commits.** Not pre-authorized for this plan (unlike Phases
+2–3). Each commit is proposed for the user's approval at the
+sub-task gate, per the global git-safety rule. No AI attribution.
+Push is the user's step.
+
+## Code surface (grounded — real names)
+
+- `src/persistence/history.rs`:
+  - `format_record(command_text, timestamp_iso) -> String` —
+    **hardcodes `|ok|`** (line ~28). The status-parameterisation
+    point.
+  - `append(path, line)`, `escape_command` / `unescape_command`,
+    `parse_record_source(line)` (splits on `|`, returns the
+    source), `read_recent_sources(path, max_n)` (hydration),
+    `utc_iso8601_now()`.
+- `src/persistence/mod.rs`:
+  - the `ok` journal wrapper (~line 278: `format_record` +
+    `append`) — the worker's success path
+    (`finalize_persistence`, transactionally coupled).
+  - the hydration wrapper (~line 288: `read_recent_sources`).
+- `src/runtime.rs`:
+  - `run_replay` (line 1605) — the dual-shape parsing target.
+  - `read_history_seed` (~598) → `read_recent_history` →
+    `app.seed_history(...)` — the recall-ring hydration path.
+  - the dispatch/error paths that surface parse failures and
+    worker/engine errors — the **best-effort `err` append** sites.
+
+## Step 0 — baseline + headline failing test
+
+1. Confirm the baseline: `cargo test` → **1645 / 0 / 0 / 1**;
+   `cargo clippy --all-targets -- -D warnings` clean.
+2. Author the **headline reproduction test** (Tier-3): build a
+   project, write an actual pipe-format `history.log` with mixed
+   `ok`/`err` lines (real `<iso8601>|ok|…` shape), `run_replay`
+   it, and assert the `ok` commands applied. **Confirm it FAILS
+   today** (replay chokes on the timestamp prefix). This test
+   stays RED until Sub-task 2.
+3. Record the exact failure in the sub-task notes (proves the bug
+   is real, not misdiagnosed).
+
+## Sub-task 1 — Journal failures + per-consumer filtering (ADR §1, §2, §4)
+
+### Scope (in)
+- Add a `status` parameter to the append path: `format_record`
+  stops hardcoding `ok`; the `persistence/mod.rs` wrapper takes
+  the status (worker success path passes `ok`).
+- Best-effort **`err` append** from the runtime/app error paths
+  (parse failures and dispatch/engine failures). A failure to
+  record a failure is logged and ignored — never escalates a user
+  error into a fatal (ADR-0034 §4).
+- Hydration (`read_recent_sources` / `read_history_seed`) reads
+  **all** records regardless of status — make the intent explicit
+  and test-pinned (it likely already ignores the status field;
+  confirm and lock it).
+
+### Scope (out — explicit)
+- Replay's journal-format parsing + `ok`-only filter — that is
+  Sub-task 2.
+- Finer-grained status values (parse-err vs exec-err) — ADR-0034
+  §1 leaves these a non-breaking future extension; not now.
+
+### Build steps
+1. Reproduce: a test asserting a *failed* command appears in
+   `history.log` tagged `err` — RED first.
+2. Thread the status through `format_record` + the
+   `persistence/mod.rs` wrapper; success path passes `ok`
+   (behaviour unchanged for the happy path).
+3. Wire the best-effort `err` append at the runtime/app error
+   path(s). Identify the exact sites (parse-failure surface;
+   dispatch/worker-error surface) and keep the append
+   non-fatal.
+4. Pin hydration-reads-all with a test.
+
+### Exit gate (required green tests)
+- A command that **fails to parse** appears in `history.log` as
+  `…|err|<source>`.
+- A command that **dispatches but the worker rejects** (e.g. a
+  UNIQUE violation) appears as `…|err|<source>`.
+- A successful command still appears as `…|ok|<source>`
+  (regression lock on the happy path).
+- **Hydration round-trip:** an `err` line in `history.log` is
+  recalled into the input-history ring on project open
+  (`read_history_seed` / `seed_history`), matching the
+  in-session "record everything" behaviour.
+- All prior tests green (baseline 1645); clippy clean.
+
+### DA gate (written)
+- Is the `err` append genuinely best-effort? A disk/IO failure
+  while recording a *failure* must not turn a recoverable user
+  error into a crash. Verify the error is swallowed + logged.
+- Does the success path still couple the `ok` line to the
+  committed state transactionally (`finalize_persistence` before
+  `tx.commit()`)? No regression to ADR-0015's crash-recoverable
+  ordering.
+- Are both failure *kinds* (parse-time, execute-time) covered, or
+  only one? The ADR names both.
+- Did anything beyond ADR-0034 §1/§2/§4 land (new status values,
+  format changes)? If so, surface it.
+
+## Sub-task 2 — Replay parses the journal format (ADR §3)
+
+### Scope (in)
+- `run_replay` accepts **both** input shapes:
+  - **journal record** — a line matching the
+    `<iso8601-timestamp>|<status>|` prefix → extract `<source>`
+    (unescape per the `history.rs` helper), **skip unless
+    `status == ok`**.
+  - **bare command** — any other non-blank, non-`#` line →
+    replayed as-is (current behaviour).
+- Detection by the leading timestamp+status prefix only, so a
+  bare command containing `|` (e.g. `select 'a|b' from t`) is
+  **not** misread.
+
+### Scope (out — explicit)
+- Changing the replay execution/dispatch model (mode, schema
+  re-snapshot per line) — unchanged; ADR-0033 Amendment 3 already
+  fixed replay's Advanced-mode parse. This sub-task is purely
+  about line-shape recognition.
+
+### Build steps
+1. The Step-0 headline test is the RED anchor.
+2. Add journal-record detection + source extraction in
+   `run_replay` (reuse `history.rs`'s parse/unescape helpers —
+   expose them at the right visibility rather than duplicating).
+3. Apply the `ok`-only filter (skip `err` records silently, like
+   blank/`#` lines — they are not failures of the replay).
+4. Confirm bare-command scripts are untouched.
+
+### Exit gate (required green tests)
+- **Headline (now GREEN):** replaying an actual `history.log`
+  (pipe format, mixed `ok`/`err`) runs the `ok` commands and
+  skips the `err` ones; final state reflects exactly the `ok`
+  sequence.
+- A plain `.commands` script (bare commands) still replays
+  unchanged (regression lock — re-run the existing
+  `replay_command.rs` suite + the 3k
+  `e2e_replay_phase3_dml_forms_from_a_script`).
+- A bare command containing `|` (`select 'a|b' from t` in a
+  script) is **not** misparsed as a journal record.
+- An `err` record is skipped, not treated as a parse failure (no
+  `ReplayFailed` for a well-formed `err` line).
+- All prior tests green; clippy clean.
+
+### DA gate (written)
+- Is the prefix detection robust? A bare command starting with
+  something that *looks* timestamp-ish but isn't a valid
+  `<iso8601>|<status>|` prefix must fall through to
+  bare-command handling, not be silently dropped. Test the
+  boundary.
+- Does the `ok`-only filter use the same status tokens the writer
+  emits (no drift between writer and reader)? Single source of
+  truth for the token set.
+- Is source unescaping the *same* routine hydration uses
+  (`unescape_command`)? No second, divergent unescape.
+- Round-trip integrity: a command written by the worker and then
+  replayed produces the identical command — escape→unescape is
+  lossless for the awkward cases (embedded `|`, newlines,
+  backslashes; `history.rs` already tests these for hydration —
+  mirror for replay).
+
+## Requirement / ADR mapping
+
+| ADR-0034 | Decision | Proven by (target) |
+|----------|----------|--------------------|
+| §1 | journal records every command, `ok`/`err` | Sub-task 1 writer tests |
+| §2 hydration | recall reads all records | Sub-task 1 hydration round-trip |
+| §2 replay | replay reads `ok` only | Sub-task 2 headline + skip-err |
+| §3 | dual-shape input; prefix detection | Sub-task 2 bare-`|` + boundary tests |
+| §4 | ok=worker/transactional, err=best-effort | Sub-task 1 happy-path + best-effort tests |
+| §5 | backward compat (all-`ok` files) | existing `read_recent_sources` tests stay green; replay-of-all-`ok`-log test |
+
+## Verification (phase-exit)
+
+- `cargo test` — zero failures, zero skips, no regression from
+  the 1645 baseline; report exact counts.
+- `cargo clippy --all-targets -- -D warnings` clean.
+- Every row of the mapping above has a green test; the headline
+  `replay history.log` test is green.
+- Final written DA review (specific critiques first, verdict
+  last; rubber-stamp PASS is a process failure).
+- Update ADR-0034 status note / README index if the ADR's status
+  presentation needs it (it is already Accepted — likely just a
+  pointer to the implementing commits), per the index-upkeep
+  rule.
+
+## Open questions to escalate before/at code time
+
+1. **`err` append sites.** The ADR says "runtime/app error path"
+   but does not pin the exact functions. If the natural site is
+   ambiguous (e.g. parse failures surface in more than one place),
+   surface the choice rather than guessing — especially if
+   threading the source text to the error path is non-trivial.
+2. **Status token set ownership.** Where the `ok`/`err` tokens
+   live (a shared const/enum vs string literals) so writer and
+   reader cannot drift — propose the smallest shared definition;
+   escalate only if it forces a wider refactor.
+3. **Does an `err` line need the un-parsed source verbatim?** A
+   parse failure has no validated command — the journal should
+   record what the user *typed*. Confirm the raw input is
+   available at the err-append site (it is for recall today;
+   verify for the journal).
+
+## What this plan does NOT contain
+
+- Time estimates (milestones, not hours).
+- Re-statements of ADR-0034's decisions — it refers by section;
+  the implementer reads both.
+- The deferred finer-grained status values, M4, ADR-0006
+  auto-snapshot, or Phase 4 — all tracked in handoff-34 §5.