504c24c996
Two-sub-task test-first plan mapping 1:1 to ADR-0034's named sub-tasks: (1) journal failures + per-consumer filtering (status-tagged append, best-effort err writes, hydration reads all), (2) replay parses the journal format (ok-only filter, dual-shape input). Opens with a headline failing test that reproduces the live replay history.log bug.
256 lines
12 KiB
Markdown
256 lines
12 KiB
Markdown
# Implementation plan — ADR-0034 (history journal + replay filter)
|
||
|
||
**Plan author date:** 2026-05-24
|
||
**Driving ADR:** [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](../adr/0034-history-journal-and-replay-filter.md) (**Accepted**)
|
||
**Amends:** ADR-0006 (replay log), ADR-0015 §5 (history format) / §12 (hydration)
|
||
|
||
## Purpose
|
||
|
||
ADR-0034 is the *decision*: `history.log` becomes a complete,
|
||
status-tagged journal; each consumer filters (hydration reads
|
||
all, replay reads `ok` only); replay learns the
|
||
`<ts>|<status>|<source>` format while still accepting bare
|
||
`.commands` scripts. This plan is the *mechanics* — the build
|
||
order, the per-sub-task exit gates, and the headline failing test
|
||
that proves the live bug before any fix lands.
|
||
|
||
It is deliberately lighter than the Phase-2/3 plans: ADR-0034 is
|
||
a focused two-sub-task change to a small, well-isolated surface
|
||
(`src/persistence/history.rs`, its `persistence/mod.rs` wrappers,
|
||
`run_replay` in `src/runtime.rs`, and the runtime/app error
|
||
paths). The ADR already names the two sub-tasks (§"Implementation
|
||
note"); this plan refines them with explicit gates.
|
||
|
||
## The live bug this fixes (verified 2026-05-23, handoff-34 §4)
|
||
|
||
`replay history.log` fails on line 1: `run_replay`
|
||
(`src/runtime.rs:1630`) feeds each whole raw line to
|
||
`parse_command_with_schema` and only skips blank/`#` lines — it
|
||
has no understanding of the `<ts>|<status>|<source>` record shape
|
||
that `history::format_record` already writes. The replay tests
|
||
all use hand-written bare-command scripts, so the gap is
|
||
invisible to the suite (ADR-0034 Problem 3).
|
||
|
||
## Working-mode reminder
|
||
|
||
Solo with the DA hat, per `CLAUDE.md`. The DA critique is written
|
||
at each sub-task gate before its verdict. Escalate ADR-vs-code
|
||
mismatches rather than drift — this session's `/runda` round and
|
||
the two Phase-3 escalations show the value.
|
||
|
||
**Test discipline (non-negotiable, global rule).** Test-first:
|
||
the headline failing test (Step 0) is written and confirmed RED
|
||
before any production change. Each bug/behaviour is reproduced
|
||
with a failing test first. "All green, no skips" is the only
|
||
acceptable end state.
|
||
|
||
**Commits.** Not pre-authorized for this plan (unlike Phases
|
||
2–3). Each commit is proposed for the user's approval at the
|
||
sub-task gate, per the global git-safety rule. No AI attribution.
|
||
Push is the user's step.
|
||
|
||
## Code surface (grounded — real names)
|
||
|
||
- `src/persistence/history.rs`:
|
||
- `format_record(command_text, timestamp_iso) -> String` —
|
||
**hardcodes `|ok|`** (line ~28). The status-parameterisation
|
||
point.
|
||
- `append(path, line)`, `escape_command` / `unescape_command`,
|
||
`parse_record_source(line)` (splits on `|`, returns the
|
||
source), `read_recent_sources(path, max_n)` (hydration),
|
||
`utc_iso8601_now()`.
|
||
- `src/persistence/mod.rs`:
|
||
- the `ok` journal wrapper (~line 278: `format_record` +
|
||
`append`) — the worker's success path
|
||
(`finalize_persistence`, transactionally coupled).
|
||
- the hydration wrapper (~line 288: `read_recent_sources`).
|
||
- `src/runtime.rs`:
|
||
- `run_replay` (line 1605) — the dual-shape parsing target.
|
||
- `read_history_seed` (~598) → `read_recent_history` →
|
||
`app.seed_history(...)` — the recall-ring hydration path.
|
||
- the dispatch/error paths that surface parse failures and
|
||
worker/engine errors — the **best-effort `err` append** sites.
|
||
|
||
## Step 0 — baseline + headline failing test
|
||
|
||
1. Confirm the baseline: `cargo test` → **1645 / 0 / 0 / 1**;
|
||
`cargo clippy --all-targets -- -D warnings` clean.
|
||
2. Author the **headline reproduction test** (Tier-3): build a
|
||
project, write an actual pipe-format `history.log` with mixed
|
||
`ok`/`err` lines (real `<iso8601>|ok|…` shape), `run_replay`
|
||
it, and assert the `ok` commands applied. **Confirm it FAILS
|
||
today** (replay chokes on the timestamp prefix). This test
|
||
stays RED until Sub-task 2.
|
||
3. Record the exact failure in the sub-task notes (proves the bug
|
||
is real, not misdiagnosed).
|
||
|
||
## Sub-task 1 — Journal failures + per-consumer filtering (ADR §1, §2, §4)
|
||
|
||
### Scope (in)
|
||
- Add a `status` parameter to the append path: `format_record`
|
||
stops hardcoding `ok`; the `persistence/mod.rs` wrapper takes
|
||
the status (worker success path passes `ok`).
|
||
- Best-effort **`err` append** from the runtime/app error paths
|
||
(parse failures and dispatch/engine failures). A failure to
|
||
record a failure is logged and ignored — never escalates a user
|
||
error into a fatal (ADR-0034 §4).
|
||
- Hydration (`read_recent_sources` / `read_history_seed`) reads
|
||
**all** records regardless of status — make the intent explicit
|
||
and test-pinned (it likely already ignores the status field;
|
||
confirm and lock it).
|
||
|
||
### Scope (out — explicit)
|
||
- Replay's journal-format parsing + `ok`-only filter — that is
|
||
Sub-task 2.
|
||
- Finer-grained status values (parse-err vs exec-err) — ADR-0034
|
||
§1 leaves these a non-breaking future extension; not now.
|
||
|
||
### Build steps
|
||
1. Reproduce: a test asserting a *failed* command appears in
|
||
`history.log` tagged `err` — RED first.
|
||
2. Thread the status through `format_record` + the
|
||
`persistence/mod.rs` wrapper; success path passes `ok`
|
||
(behaviour unchanged for the happy path).
|
||
3. Wire the best-effort `err` append at the runtime/app error
|
||
path(s). Identify the exact sites (parse-failure surface;
|
||
dispatch/worker-error surface) and keep the append
|
||
non-fatal.
|
||
4. Pin hydration-reads-all with a test.
|
||
|
||
### Exit gate (required green tests)
|
||
- A command that **fails to parse** appears in `history.log` as
|
||
`…|err|<source>`.
|
||
- A command that **dispatches but the worker rejects** (e.g. a
|
||
UNIQUE violation) appears as `…|err|<source>`.
|
||
- A successful command still appears as `…|ok|<source>`
|
||
(regression lock on the happy path).
|
||
- **Hydration round-trip:** an `err` line in `history.log` is
|
||
recalled into the input-history ring on project open
|
||
(`read_history_seed` / `seed_history`), matching the
|
||
in-session "record everything" behaviour.
|
||
- All prior tests green (baseline 1645); clippy clean.
|
||
|
||
### DA gate (written)
|
||
- Is the `err` append genuinely best-effort? A disk/IO failure
|
||
while recording a *failure* must not turn a recoverable user
|
||
error into a crash. Verify the error is swallowed + logged.
|
||
- Does the success path still couple the `ok` line to the
|
||
committed state transactionally (`finalize_persistence` before
|
||
`tx.commit()`)? No regression to ADR-0015's crash-recoverable
|
||
ordering.
|
||
- Are both failure *kinds* (parse-time, execute-time) covered, or
|
||
only one? The ADR names both.
|
||
- Did anything beyond ADR-0034 §1/§2/§4 land (new status values,
|
||
format changes)? If so, surface it.
|
||
|
||
## Sub-task 2 — Replay parses the journal format (ADR §3)
|
||
|
||
### Scope (in)
|
||
- `run_replay` accepts **both** input shapes:
|
||
- **journal record** — a line matching the
|
||
`<iso8601-timestamp>|<status>|` prefix → extract `<source>`
|
||
(unescape per the `history.rs` helper), **skip unless
|
||
`status == ok`**.
|
||
- **bare command** — any other non-blank, non-`#` line →
|
||
replayed as-is (current behaviour).
|
||
- Detection by the leading timestamp+status prefix only, so a
|
||
bare command containing `|` (e.g. `select 'a|b' from t`) is
|
||
**not** misread.
|
||
|
||
### Scope (out — explicit)
|
||
- Changing the replay execution/dispatch model (mode, schema
|
||
re-snapshot per line) — unchanged; ADR-0033 Amendment 3 already
|
||
fixed replay's Advanced-mode parse. This sub-task is purely
|
||
about line-shape recognition.
|
||
|
||
### Build steps
|
||
1. The Step-0 headline test is the RED anchor.
|
||
2. Add journal-record detection + source extraction in
|
||
`run_replay` (reuse `history.rs`'s parse/unescape helpers —
|
||
expose them at the right visibility rather than duplicating).
|
||
3. Apply the `ok`-only filter (skip `err` records silently, like
|
||
blank/`#` lines — they are not failures of the replay).
|
||
4. Confirm bare-command scripts are untouched.
|
||
|
||
### Exit gate (required green tests)
|
||
- **Headline (now GREEN):** replaying an actual `history.log`
|
||
(pipe format, mixed `ok`/`err`) runs the `ok` commands and
|
||
skips the `err` ones; final state reflects exactly the `ok`
|
||
sequence.
|
||
- A plain `.commands` script (bare commands) still replays
|
||
unchanged (regression lock — re-run the existing
|
||
`replay_command.rs` suite + the 3k
|
||
`e2e_replay_phase3_dml_forms_from_a_script`).
|
||
- A bare command containing `|` (`select 'a|b' from t` in a
|
||
script) is **not** misparsed as a journal record.
|
||
- An `err` record is skipped, not treated as a parse failure (no
|
||
`ReplayFailed` for a well-formed `err` line).
|
||
- All prior tests green; clippy clean.
|
||
|
||
### DA gate (written)
|
||
- Is the prefix detection robust? A bare command starting with
|
||
something that *looks* timestamp-ish but isn't a valid
|
||
`<iso8601>|<status>|` prefix must fall through to
|
||
bare-command handling, not be silently dropped. Test the
|
||
boundary.
|
||
- Does the `ok`-only filter use the same status tokens the writer
|
||
emits (no drift between writer and reader)? Single source of
|
||
truth for the token set.
|
||
- Is source unescaping the *same* routine hydration uses
|
||
(`unescape_command`)? No second, divergent unescape.
|
||
- Round-trip integrity: a command written by the worker and then
|
||
replayed produces the identical command — escape→unescape is
|
||
lossless for the awkward cases (embedded `|`, newlines,
|
||
backslashes; `history.rs` already tests these for hydration —
|
||
mirror for replay).
|
||
|
||
## Requirement / ADR mapping
|
||
|
||
| ADR-0034 | Decision | Proven by (target) |
|
||
|----------|----------|--------------------|
|
||
| §1 | journal records every command, `ok`/`err` | Sub-task 1 writer tests |
|
||
| §2 hydration | recall reads all records | Sub-task 1 hydration round-trip |
|
||
| §2 replay | replay reads `ok` only | Sub-task 2 headline + skip-err |
|
||
| §3 | dual-shape input; prefix detection | Sub-task 2 bare-`|` + boundary tests |
|
||
| §4 | ok=worker/transactional, err=best-effort | Sub-task 1 happy-path + best-effort tests |
|
||
| §5 | backward compat (all-`ok` files) | existing `read_recent_sources` tests stay green; replay-of-all-`ok`-log test |
|
||
|
||
## Verification (phase-exit)
|
||
|
||
- `cargo test` — zero failures, zero skips, no regression from
|
||
the 1645 baseline; report exact counts.
|
||
- `cargo clippy --all-targets -- -D warnings` clean.
|
||
- Every row of the mapping above has a green test; the headline
|
||
`replay history.log` test is green.
|
||
- Final written DA review (specific critiques first, verdict
|
||
last; rubber-stamp PASS is a process failure).
|
||
- Update ADR-0034 status note / README index if the ADR's status
|
||
presentation needs it (it is already Accepted — likely just a
|
||
pointer to the implementing commits), per the index-upkeep
|
||
rule.
|
||
|
||
## Open questions to escalate before/at code time
|
||
|
||
1. **`err` append sites.** The ADR says "runtime/app error path"
|
||
but does not pin the exact functions. If the natural site is
|
||
ambiguous (e.g. parse failures surface in more than one place),
|
||
surface the choice rather than guessing — especially if
|
||
threading the source text to the error path is non-trivial.
|
||
2. **Status token set ownership.** Where the `ok`/`err` tokens
|
||
live (a shared const/enum vs string literals) so writer and
|
||
reader cannot drift — propose the smallest shared definition;
|
||
escalate only if it forces a wider refactor.
|
||
3. **Does an `err` line need the un-parsed source verbatim?** A
|
||
parse failure has no validated command — the journal should
|
||
record what the user *typed*. Confirm the raw input is
|
||
available at the err-append site (it is for recall today;
|
||
verify for the journal).
|
||
|
||
## What this plan does NOT contain
|
||
|
||
- Time estimates (milestones, not hours).
|
||
- Re-statements of ADR-0034's decisions — it refers by section;
|
||
the implementer reads both.
|
||
- The deferred finer-grained status values, M4, ADR-0006
|
||
auto-snapshot, or Phase 4 — all tracked in handoff-34 §5.
|