fix: ADR-0006 — clear redo when new work commits without a snapshot
/runda found silent data loss: with the non-fatal snapshot-failure policy, a committed mutation whose snapshot couldn't be staged left the redo stack stale (redo-clear was only a side effect of finalize), so a later redo silently discarded the new work. Same gap in batches. - SnapshotStore::clear_redo() drops the redo stack + payloads - snapshot_then / end_batch call it when committed user work has no staged snapshot; for disk-full it succeeds where a full backup couldn't (tiny index write + payload deletes) - unit test + integration regression (forced staging failure) - ADR-0006 implementation note records the fix + residual edge 1698 passed / 0 failed / 1 ignored; clippy clean.
This commit is contained in:
@@ -229,7 +229,22 @@ during implementation, user-confirmed where they extended the design:
|
||||
- **Snapshot-failure policy** (user-confirmed): staging / finalise /
|
||||
discard failures are **non-fatal** (logged) — the real persistence
|
||||
is the durable state and a backup hiccup must not fail the user's
|
||||
work. Only *restore* failures surface (as `UndoFailed`).
|
||||
work. Only *restore* failures surface (as `UndoFailed`). A `/runda`
|
||||
review found that this policy left a **data-loss edge**: a committed
|
||||
mutation whose snapshot could not be staged added no undo entry and
|
||||
did not clear the redo stack (clearing was a side effect of
|
||||
`finalize`), so a later `redo` could silently discard the new work.
|
||||
Fixed: any committed user mutation (and any batch with ≥1 committed
|
||||
mutation) now **clears the redo stack even when its snapshot could
|
||||
not be staged**, via an explicit `SnapshotStore::clear_redo`
|
||||
(`src/db.rs` `snapshot_then` / `end_batch`). For the realistic
|
||||
failure (disk full), `clear_redo` — which deletes redo payloads and
|
||||
rewrites a tiny index — succeeds even when a full backup couldn't.
|
||||
**Residual edge** (accepted): if the *entire* `.snapshots/`
|
||||
directory is unwritable (so `clear_redo` itself fails), a stale redo
|
||||
can survive; but that state means the whole undo subsystem is
|
||||
broken, which the user would already observe. Regression-tested in
|
||||
`tests/undo_snapshots.rs::redo_is_cleared_when_new_work_commits_without_a_snapshot`.
|
||||
- **Batch** uses `BeginBatch`/`EndBatch` worker requests; `replay`
|
||||
wraps its loop so a multi-command replay is one undo step,
|
||||
finalised only if a mutation committed.
|
||||
|
||||
Reference in New Issue
Block a user