docs: ADR-0034 implementation plan (history journal + replay filter)
Two-sub-task test-first plan mapping 1:1 to ADR-0034's named sub-tasks: (1) journal failures + per-consumer filtering (status-tagged append, best-effort err writes, hydration reads all), (2) replay parses the journal format (ok-only filter, dual-shape input). Opens with a headline failing test that reproduces the live replay history.log bug.
This commit is contained in:
@@ -0,0 +1,255 @@
|
|||||||
|
# Implementation plan — ADR-0034 (history journal + replay filter)
|
||||||
|
|
||||||
|
**Plan author date:** 2026-05-24
|
||||||
|
**Driving ADR:** [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](../adr/0034-history-journal-and-replay-filter.md) (**Accepted**)
|
||||||
|
**Amends:** ADR-0006 (replay log), ADR-0015 §5 (history format) / §12 (hydration)
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
ADR-0034 is the *decision*: `history.log` becomes a complete,
|
||||||
|
status-tagged journal; each consumer filters (hydration reads
|
||||||
|
all, replay reads `ok` only); replay learns the
|
||||||
|
`<ts>|<status>|<source>` format while still accepting bare
|
||||||
|
`.commands` scripts. This plan is the *mechanics* — the build
|
||||||
|
order, the per-sub-task exit gates, and the headline failing test
|
||||||
|
that proves the live bug before any fix lands.
|
||||||
|
|
||||||
|
It is deliberately lighter than the Phase-2/3 plans: ADR-0034 is
|
||||||
|
a focused two-sub-task change to a small, well-isolated surface
|
||||||
|
(`src/persistence/history.rs`, its `persistence/mod.rs` wrappers,
|
||||||
|
`run_replay` in `src/runtime.rs`, and the runtime/app error
|
||||||
|
paths). The ADR already names the two sub-tasks (§"Implementation
|
||||||
|
note"); this plan refines them with explicit gates.
|
||||||
|
|
||||||
|
## The live bug this fixes (verified 2026-05-23, handoff-34 §4)
|
||||||
|
|
||||||
|
`replay history.log` fails on line 1: `run_replay`
|
||||||
|
(`src/runtime.rs:1630`) feeds each whole raw line to
|
||||||
|
`parse_command_with_schema` and only skips blank/`#` lines — it
|
||||||
|
has no understanding of the `<ts>|<status>|<source>` record shape
|
||||||
|
that `history::format_record` already writes. The replay tests
|
||||||
|
all use hand-written bare-command scripts, so the gap is
|
||||||
|
invisible to the suite (ADR-0034 Problem 3).
|
||||||
|
|
||||||
|
## Working-mode reminder
|
||||||
|
|
||||||
|
Solo with the DA hat, per `CLAUDE.md`. The DA critique is written
|
||||||
|
at each sub-task gate before its verdict. Escalate ADR-vs-code
|
||||||
|
mismatches rather than drift — this session's `/runda` round and
|
||||||
|
the two Phase-3 escalations show the value.
|
||||||
|
|
||||||
|
**Test discipline (non-negotiable, global rule).** Test-first:
|
||||||
|
the headline failing test (Step 0) is written and confirmed RED
|
||||||
|
before any production change. Each bug/behaviour is reproduced
|
||||||
|
with a failing test first. "All green, no skips" is the only
|
||||||
|
acceptable end state.
|
||||||
|
|
||||||
|
**Commits.** Not pre-authorized for this plan (unlike Phases
|
||||||
|
2–3). Each commit is proposed for the user's approval at the
|
||||||
|
sub-task gate, per the global git-safety rule. No AI attribution.
|
||||||
|
Push is the user's step.
|
||||||
|
|
||||||
|
## Code surface (grounded — real names)
|
||||||
|
|
||||||
|
- `src/persistence/history.rs`:
|
||||||
|
- `format_record(command_text, timestamp_iso) -> String` —
|
||||||
|
**hardcodes `|ok|`** (line ~28). The status-parameterisation
|
||||||
|
point.
|
||||||
|
- `append(path, line)`, `escape_command` / `unescape_command`,
|
||||||
|
`parse_record_source(line)` (splits on `|`, returns the
|
||||||
|
source), `read_recent_sources(path, max_n)` (hydration),
|
||||||
|
`utc_iso8601_now()`.
|
||||||
|
- `src/persistence/mod.rs`:
|
||||||
|
- the `ok` journal wrapper (~line 278: `format_record` +
|
||||||
|
`append`) — the worker's success path
|
||||||
|
(`finalize_persistence`, transactionally coupled).
|
||||||
|
- the hydration wrapper (~line 288: `read_recent_sources`).
|
||||||
|
- `src/runtime.rs`:
|
||||||
|
- `run_replay` (line 1605) — the dual-shape parsing target.
|
||||||
|
- `read_history_seed` (~598) → `read_recent_history` →
|
||||||
|
`app.seed_history(...)` — the recall-ring hydration path.
|
||||||
|
- the dispatch/error paths that surface parse failures and
|
||||||
|
worker/engine errors — the **best-effort `err` append** sites.
|
||||||
|
|
||||||
|
## Step 0 — baseline + headline failing test
|
||||||
|
|
||||||
|
1. Confirm the baseline: `cargo test` → **1645 / 0 / 0 / 1**;
|
||||||
|
`cargo clippy --all-targets -- -D warnings` clean.
|
||||||
|
2. Author the **headline reproduction test** (Tier-3): build a
|
||||||
|
project, write an actual pipe-format `history.log` with mixed
|
||||||
|
`ok`/`err` lines (real `<iso8601>|ok|…` shape), `run_replay`
|
||||||
|
it, and assert the `ok` commands applied. **Confirm it FAILS
|
||||||
|
today** (replay chokes on the timestamp prefix). This test
|
||||||
|
stays RED until Sub-task 2.
|
||||||
|
3. Record the exact failure in the sub-task notes (proves the bug
|
||||||
|
is real, not misdiagnosed).
|
||||||
|
|
||||||
|
## Sub-task 1 — Journal failures + per-consumer filtering (ADR §1, §2, §4)
|
||||||
|
|
||||||
|
### Scope (in)
|
||||||
|
- Add a `status` parameter to the append path: `format_record`
|
||||||
|
stops hardcoding `ok`; the `persistence/mod.rs` wrapper takes
|
||||||
|
the status (worker success path passes `ok`).
|
||||||
|
- Best-effort **`err` append** from the runtime/app error paths
|
||||||
|
(parse failures and dispatch/engine failures). A failure to
|
||||||
|
record a failure is logged and ignored — never escalates a user
|
||||||
|
error into a fatal (ADR-0034 §4).
|
||||||
|
- Hydration (`read_recent_sources` / `read_history_seed`) reads
|
||||||
|
**all** records regardless of status — make the intent explicit
|
||||||
|
and test-pinned (it likely already ignores the status field;
|
||||||
|
confirm and lock it).
|
||||||
|
|
||||||
|
### Scope (out — explicit)
|
||||||
|
- Replay's journal-format parsing + `ok`-only filter — that is
|
||||||
|
Sub-task 2.
|
||||||
|
- Finer-grained status values (parse-err vs exec-err) — ADR-0034
|
||||||
|
§1 leaves these a non-breaking future extension; not now.
|
||||||
|
|
||||||
|
### Build steps
|
||||||
|
1. Reproduce: a test asserting a *failed* command appears in
|
||||||
|
`history.log` tagged `err` — RED first.
|
||||||
|
2. Thread the status through `format_record` + the
|
||||||
|
`persistence/mod.rs` wrapper; success path passes `ok`
|
||||||
|
(behaviour unchanged for the happy path).
|
||||||
|
3. Wire the best-effort `err` append at the runtime/app error
|
||||||
|
path(s). Identify the exact sites (parse-failure surface;
|
||||||
|
dispatch/worker-error surface) and keep the append
|
||||||
|
non-fatal.
|
||||||
|
4. Pin hydration-reads-all with a test.
|
||||||
|
|
||||||
|
### Exit gate (required green tests)
|
||||||
|
- A command that **fails to parse** appears in `history.log` as
|
||||||
|
`…|err|<source>`.
|
||||||
|
- A command that **dispatches but the worker rejects** (e.g. a
|
||||||
|
UNIQUE violation) appears as `…|err|<source>`.
|
||||||
|
- A successful command still appears as `…|ok|<source>`
|
||||||
|
(regression lock on the happy path).
|
||||||
|
- **Hydration round-trip:** an `err` line in `history.log` is
|
||||||
|
recalled into the input-history ring on project open
|
||||||
|
(`read_history_seed` / `seed_history`), matching the
|
||||||
|
in-session "record everything" behaviour.
|
||||||
|
- All prior tests green (baseline 1645); clippy clean.
|
||||||
|
|
||||||
|
### DA gate (written)
|
||||||
|
- Is the `err` append genuinely best-effort? A disk/IO failure
|
||||||
|
while recording a *failure* must not turn a recoverable user
|
||||||
|
error into a crash. Verify the error is swallowed + logged.
|
||||||
|
- Does the success path still couple the `ok` line to the
|
||||||
|
committed state transactionally (`finalize_persistence` before
|
||||||
|
`tx.commit()`)? No regression to ADR-0015's crash-recoverable
|
||||||
|
ordering.
|
||||||
|
- Are both failure *kinds* (parse-time, execute-time) covered, or
|
||||||
|
only one? The ADR names both.
|
||||||
|
- Did anything beyond ADR-0034 §1/§2/§4 land (new status values,
|
||||||
|
format changes)? If so, surface it.
|
||||||
|
|
||||||
|
## Sub-task 2 — Replay parses the journal format (ADR §3)
|
||||||
|
|
||||||
|
### Scope (in)
|
||||||
|
- `run_replay` accepts **both** input shapes:
|
||||||
|
- **journal record** — a line matching the
|
||||||
|
`<iso8601-timestamp>|<status>|` prefix → extract `<source>`
|
||||||
|
(unescape per the `history.rs` helper), **skip unless
|
||||||
|
`status == ok`**.
|
||||||
|
- **bare command** — any other non-blank, non-`#` line →
|
||||||
|
replayed as-is (current behaviour).
|
||||||
|
- Detection by the leading timestamp+status prefix only, so a
|
||||||
|
bare command containing `|` (e.g. `select 'a|b' from t`) is
|
||||||
|
**not** misread.
|
||||||
|
|
||||||
|
### Scope (out — explicit)
|
||||||
|
- Changing the replay execution/dispatch model (mode, schema
|
||||||
|
re-snapshot per line) — unchanged; ADR-0033 Amendment 3 already
|
||||||
|
fixed replay's Advanced-mode parse. This sub-task is purely
|
||||||
|
about line-shape recognition.
|
||||||
|
|
||||||
|
### Build steps
|
||||||
|
1. The Step-0 headline test is the RED anchor.
|
||||||
|
2. Add journal-record detection + source extraction in
|
||||||
|
`run_replay` (reuse `history.rs`'s parse/unescape helpers —
|
||||||
|
expose them at the right visibility rather than duplicating).
|
||||||
|
3. Apply the `ok`-only filter (skip `err` records silently, like
|
||||||
|
blank/`#` lines — they are not failures of the replay).
|
||||||
|
4. Confirm bare-command scripts are untouched.
|
||||||
|
|
||||||
|
### Exit gate (required green tests)
|
||||||
|
- **Headline (now GREEN):** replaying an actual `history.log`
|
||||||
|
(pipe format, mixed `ok`/`err`) runs the `ok` commands and
|
||||||
|
skips the `err` ones; final state reflects exactly the `ok`
|
||||||
|
sequence.
|
||||||
|
- A plain `.commands` script (bare commands) still replays
|
||||||
|
unchanged (regression lock — re-run the existing
|
||||||
|
`replay_command.rs` suite + the 3k
|
||||||
|
`e2e_replay_phase3_dml_forms_from_a_script`).
|
||||||
|
- A bare command containing `|` (`select 'a|b' from t` in a
|
||||||
|
script) is **not** misparsed as a journal record.
|
||||||
|
- An `err` record is skipped, not treated as a parse failure (no
|
||||||
|
`ReplayFailed` for a well-formed `err` line).
|
||||||
|
- All prior tests green; clippy clean.
|
||||||
|
|
||||||
|
### DA gate (written)
|
||||||
|
- Is the prefix detection robust? A bare command starting with
|
||||||
|
something that *looks* timestamp-ish but isn't a valid
|
||||||
|
`<iso8601>|<status>|` prefix must fall through to
|
||||||
|
bare-command handling, not be silently dropped. Test the
|
||||||
|
boundary.
|
||||||
|
- Does the `ok`-only filter use the same status tokens the writer
|
||||||
|
emits (no drift between writer and reader)? Single source of
|
||||||
|
truth for the token set.
|
||||||
|
- Is source unescaping the *same* routine hydration uses
|
||||||
|
(`unescape_command`)? No second, divergent unescape.
|
||||||
|
- Round-trip integrity: a command written by the worker and then
|
||||||
|
replayed produces the identical command — escape→unescape is
|
||||||
|
lossless for the awkward cases (embedded `|`, newlines,
|
||||||
|
backslashes; `history.rs` already tests these for hydration —
|
||||||
|
mirror for replay).
|
||||||
|
|
||||||
|
## Requirement / ADR mapping
|
||||||
|
|
||||||
|
| ADR-0034 | Decision | Proven by (target) |
|
||||||
|
|----------|----------|--------------------|
|
||||||
|
| §1 | journal records every command, `ok`/`err` | Sub-task 1 writer tests |
|
||||||
|
| §2 hydration | recall reads all records | Sub-task 1 hydration round-trip |
|
||||||
|
| §2 replay | replay reads `ok` only | Sub-task 2 headline + skip-err |
|
||||||
|
| §3 | dual-shape input; prefix detection | Sub-task 2 bare-`|` + boundary tests |
|
||||||
|
| §4 | ok=worker/transactional, err=best-effort | Sub-task 1 happy-path + best-effort tests |
|
||||||
|
| §5 | backward compat (all-`ok` files) | existing `read_recent_sources` tests stay green; replay-of-all-`ok`-log test |
|
||||||
|
|
||||||
|
## Verification (phase-exit)
|
||||||
|
|
||||||
|
- `cargo test` — zero failures, zero skips, no regression from
|
||||||
|
the 1645 baseline; report exact counts.
|
||||||
|
- `cargo clippy --all-targets -- -D warnings` clean.
|
||||||
|
- Every row of the mapping above has a green test; the headline
|
||||||
|
`replay history.log` test is green.
|
||||||
|
- Final written DA review (specific critiques first, verdict
|
||||||
|
last; rubber-stamp PASS is a process failure).
|
||||||
|
- Update ADR-0034 status note / README index if the ADR's status
|
||||||
|
presentation needs it (it is already Accepted — likely just a
|
||||||
|
pointer to the implementing commits), per the index-upkeep
|
||||||
|
rule.
|
||||||
|
|
||||||
|
## Open questions to escalate before/at code time
|
||||||
|
|
||||||
|
1. **`err` append sites.** The ADR says "runtime/app error path"
|
||||||
|
but does not pin the exact functions. If the natural site is
|
||||||
|
ambiguous (e.g. parse failures surface in more than one place),
|
||||||
|
surface the choice rather than guessing — especially if
|
||||||
|
threading the source text to the error path is non-trivial.
|
||||||
|
2. **Status token set ownership.** Where the `ok`/`err` tokens
|
||||||
|
live (a shared const/enum vs string literals) so writer and
|
||||||
|
reader cannot drift — propose the smallest shared definition;
|
||||||
|
escalate only if it forces a wider refactor.
|
||||||
|
3. **Does an `err` line need the un-parsed source verbatim?** A
|
||||||
|
parse failure has no validated command — the journal should
|
||||||
|
record what the user *typed*. Confirm the raw input is
|
||||||
|
available at the err-append site (it is for recall today;
|
||||||
|
verify for the journal).
|
||||||
|
|
||||||
|
## What this plan does NOT contain
|
||||||
|
|
||||||
|
- Time estimates (milestones, not hours).
|
||||||
|
- Re-statements of ADR-0034's decisions — it refers by section;
|
||||||
|
the implementer reads both.
|
||||||
|
- The deferred finer-grained status values, M4, ADR-0006
|
||||||
|
auto-snapshot, or Phase 4 — all tracked in handoff-34 §5.
|
||||||
Reference in New Issue
Block a user