Files

T

claude@clouddev1 504c24c996 docs: ADR-0034 implementation plan (history journal + replay filter)

Two-sub-task test-first plan mapping 1:1 to ADR-0034's named sub-tasks: (1) journal failures + per-consumer filtering (status-tagged append, best-effort err writes, hydration reads all), (2) replay parses the journal format (ok-only filter, dual-shape input). Opens with a headline failing test that reproduces the live replay history.log bug.

2026-05-24 09:29:52 +00:00

12 KiB

Raw Blame History

Implementation plan — ADR-0034 (history journal + replay filter)

Plan author date: 2026-05-24 Driving ADR: ADR-0034 — history.log as a complete command journal; replay reads success-only (Accepted) Amends: ADR-0006 (replay log), ADR-0015 §5 (history format) / §12 (hydration)

Purpose

ADR-0034 is the decision: history.log becomes a complete, status-tagged journal; each consumer filters (hydration reads all, replay reads ok only); replay learns the <ts>|<status>|<source> format while still accepting bare .commands scripts. This plan is the mechanics — the build order, the per-sub-task exit gates, and the headline failing test that proves the live bug before any fix lands.

It is deliberately lighter than the Phase-2/3 plans: ADR-0034 is a focused two-sub-task change to a small, well-isolated surface (src/persistence/history.rs, its persistence/mod.rs wrappers, run_replay in src/runtime.rs, and the runtime/app error paths). The ADR already names the two sub-tasks (§"Implementation note"); this plan refines them with explicit gates.

The live bug this fixes (verified 2026-05-23, handoff-34 §4)

replay history.log fails on line 1: run_replay (src/runtime.rs:1630) feeds each whole raw line to parse_command_with_schema and only skips blank/# lines — it has no understanding of the <ts>|<status>|<source> record shape that history::format_record already writes. The replay tests all use hand-written bare-command scripts, so the gap is invisible to the suite (ADR-0034 Problem 3).

Working-mode reminder

Solo with the DA hat, per CLAUDE.md. The DA critique is written at each sub-task gate before its verdict. Escalate ADR-vs-code mismatches rather than drift — this session's /runda round and the two Phase-3 escalations show the value.

Test discipline (non-negotiable, global rule). Test-first: the headline failing test (Step 0) is written and confirmed RED before any production change. Each bug/behaviour is reproduced with a failing test first. "All green, no skips" is the only acceptable end state.

Commits. Not pre-authorized for this plan (unlike Phases 2–3). Each commit is proposed for the user's approval at the sub-task gate, per the global git-safety rule. No AI attribution. Push is the user's step.

Code surface (grounded — real names)

src/persistence/history.rs:
- format_record(command_text, timestamp_iso) -> String — hardcodes |ok| (line ~28). The status-parameterisation point.
- append(path, line), escape_command / unescape_command, parse_record_source(line) (splits on |, returns the source), read_recent_sources(path, max_n) (hydration), utc_iso8601_now().
src/persistence/mod.rs:
- the ok journal wrapper (~line 278: format_record + append) — the worker's success path (finalize_persistence, transactionally coupled).
- the hydration wrapper (~line 288: read_recent_sources).
src/runtime.rs:
- run_replay (line 1605) — the dual-shape parsing target.
- read_history_seed (~598) → read_recent_history → app.seed_history(...) — the recall-ring hydration path.
- the dispatch/error paths that surface parse failures and worker/engine errors — the best-effort err append sites.

Step 0 — baseline + headline failing test

Confirm the baseline: cargo test → 1645 / 0 / 0 / 1; cargo clippy --all-targets -- -D warnings clean.
Author the headline reproduction test (Tier-3): build a project, write an actual pipe-format history.log with mixed ok/err lines (real <iso8601>|ok|… shape), run_replay it, and assert the ok commands applied. Confirm it FAILS today (replay chokes on the timestamp prefix). This test stays RED until Sub-task 2.
Record the exact failure in the sub-task notes (proves the bug is real, not misdiagnosed).

Sub-task 1 — Journal failures + per-consumer filtering (ADR §1, §2, §4)

Scope (in)

Add a status parameter to the append path: format_record stops hardcoding ok; the persistence/mod.rs wrapper takes the status (worker success path passes ok).
Best-effort err append from the runtime/app error paths (parse failures and dispatch/engine failures). A failure to record a failure is logged and ignored — never escalates a user error into a fatal (ADR-0034 §4).
Hydration (read_recent_sources / read_history_seed) reads all records regardless of status — make the intent explicit and test-pinned (it likely already ignores the status field; confirm and lock it).

Scope (out — explicit)

Replay's journal-format parsing + ok-only filter — that is Sub-task 2.
Finer-grained status values (parse-err vs exec-err) — ADR-0034 §1 leaves these a non-breaking future extension; not now.

Build steps

Reproduce: a test asserting a failed command appears in history.log tagged err — RED first.
Thread the status through format_record + the persistence/mod.rs wrapper; success path passes ok (behaviour unchanged for the happy path).
Wire the best-effort err append at the runtime/app error path(s). Identify the exact sites (parse-failure surface; dispatch/worker-error surface) and keep the append non-fatal.
Pin hydration-reads-all with a test.

Exit gate (required green tests)

A command that fails to parse appears in history.log as …|err|<source>.
A command that dispatches but the worker rejects (e.g. a UNIQUE violation) appears as …|err|<source>.
A successful command still appears as …|ok|<source> (regression lock on the happy path).
Hydration round-trip: an err line in history.log is recalled into the input-history ring on project open (read_history_seed / seed_history), matching the in-session "record everything" behaviour.
All prior tests green (baseline 1645); clippy clean.

DA gate (written)

Is the err append genuinely best-effort? A disk/IO failure while recording a failure must not turn a recoverable user error into a crash. Verify the error is swallowed + logged.
Does the success path still couple the ok line to the committed state transactionally (finalize_persistence before tx.commit())? No regression to ADR-0015's crash-recoverable ordering.
Are both failure kinds (parse-time, execute-time) covered, or only one? The ADR names both.
Did anything beyond ADR-0034 §1/§2/§4 land (new status values, format changes)? If so, surface it.

Sub-task 2 — Replay parses the journal format (ADR §3)

Scope (in)

run_replay accepts both input shapes:
- journal record — a line matching the <iso8601-timestamp>|<status>| prefix → extract <source> (unescape per the history.rs helper), skip unless status == ok.
- bare command — any other non-blank, non-# line → replayed as-is (current behaviour).
Detection by the leading timestamp+status prefix only, so a bare command containing | (e.g. select 'a|b' from t) is not misread.

Scope (out — explicit)

Changing the replay execution/dispatch model (mode, schema re-snapshot per line) — unchanged; ADR-0033 Amendment 3 already fixed replay's Advanced-mode parse. This sub-task is purely about line-shape recognition.

Build steps

The Step-0 headline test is the RED anchor.
Add journal-record detection + source extraction in run_replay (reuse history.rs's parse/unescape helpers — expose them at the right visibility rather than duplicating).
Apply the ok-only filter (skip err records silently, like blank/# lines — they are not failures of the replay).
Confirm bare-command scripts are untouched.

Exit gate (required green tests)

Headline (now GREEN): replaying an actual history.log (pipe format, mixed ok/err) runs the ok commands and skips the err ones; final state reflects exactly the ok sequence.
A plain .commands script (bare commands) still replays unchanged (regression lock — re-run the existing replay_command.rs suite + the 3k e2e_replay_phase3_dml_forms_from_a_script).
A bare command containing | (select 'a|b' from t in a script) is not misparsed as a journal record.
An err record is skipped, not treated as a parse failure (no ReplayFailed for a well-formed err line).
All prior tests green; clippy clean.

DA gate (written)

Is the prefix detection robust? A bare command starting with something that looks timestamp-ish but isn't a valid <iso8601>|<status>| prefix must fall through to bare-command handling, not be silently dropped. Test the boundary.
Does the ok-only filter use the same status tokens the writer emits (no drift between writer and reader)? Single source of truth for the token set.
Is source unescaping the same routine hydration uses (unescape_command)? No second, divergent unescape.
Round-trip integrity: a command written by the worker and then replayed produces the identical command — escape→unescape is lossless for the awkward cases (embedded |, newlines, backslashes; history.rs already tests these for hydration — mirror for replay).

Requirement / ADR mapping

ADR-0034	Decision	Proven by (target)
§1	journal records every command, `ok`/`err`	Sub-task 1 writer tests
§2 hydration	recall reads all records	Sub-task 1 hydration round-trip
§2 replay	replay reads `ok` only	Sub-task 2 headline + skip-err
§3	dual-shape input; prefix detection	Sub-task 2 bare-`
§4	ok=worker/transactional, err=best-effort	Sub-task 1 happy-path + best-effort tests
§5	backward compat (all-`ok` files)	existing `read_recent_sources` tests stay green; replay-of-all-`ok`-log test

Verification (phase-exit)

cargo test — zero failures, zero skips, no regression from the 1645 baseline; report exact counts.
cargo clippy --all-targets -- -D warnings clean.
Every row of the mapping above has a green test; the headline replay history.log test is green.
Final written DA review (specific critiques first, verdict last; rubber-stamp PASS is a process failure).
Update ADR-0034 status note / README index if the ADR's status presentation needs it (it is already Accepted — likely just a pointer to the implementing commits), per the index-upkeep rule.

Open questions to escalate before/at code time

err append sites. The ADR says "runtime/app error path" but does not pin the exact functions. If the natural site is ambiguous (e.g. parse failures surface in more than one place), surface the choice rather than guessing — especially if threading the source text to the error path is non-trivial.
Status token set ownership. Where the ok/err tokens live (a shared const/enum vs string literals) so writer and reader cannot drift — propose the smallest shared definition; escalate only if it forces a wider refactor.
Does an err line need the un-parsed source verbatim? A parse failure has no validated command — the journal should record what the user typed. Confirm the raw input is available at the err-append site (it is for recall today; verify for the journal).

What this plan does NOT contain

Time estimates (milestones, not hours).
Re-statements of ADR-0034's decisions — it refers by section; the implementer reads both.
The deferred finer-grained status values, M4, ADR-0006 auto-snapshot, or Phase 4 — all tracked in handoff-34 §5.

12 KiB Raw Blame History Unescape Escape

Implementation plan — ADR-0034 (history journal + replay filter)

Purpose

The live bug this fixes (verified 2026-05-23, handoff-34 §4)

Working-mode reminder

Code surface (grounded — real names)

Step 0 — baseline + headline failing test

Sub-task 1 — Journal failures + per-consumer filtering (ADR §1, §2, §4)

Scope (in)

Scope (out — explicit)

Build steps

Exit gate (required green tests)

DA gate (written)

Sub-task 2 — Replay parses the journal format (ADR §3)

Scope (in)

Scope (out — explicit)

Build steps

Exit gate (required green tests)

DA gate (written)

Requirement / ADR mapping

Verification (phase-exit)

Open questions to escalate before/at code time

What this plan does NOT contain

12 KiB

Raw Blame History