Files
rdbms-playground/docs/handoff/20260611-handoff-65.md
claude@clouddev1 78c38e8b33 docs: ADR-0048 Phase 1 accepted/implemented + handoff 65
- ADR-0048 status -> Accepted; Phase 1 implemented (commits
  202e25a..fbd219b), with the pre-build and post-implementation /runda
  passes and the 2358-test green state recorded; index entry updated.
- requirements.md: SD1 [x] (whole-row seed + FK/junction, both modes,
  --seed reproducibility with no exceptions), SD2 [/] (core generators /
  determinism done; the set override clause + column-fill are Phase 2),
  A1 14/15 (only hint/H2 remains unregistered).
- Handoff 65: the full seed Phase-1 build, the two /runda passes, where
  the code lives, and Phase-2 / next steps.
2026-06-11 21:49:06 +00:00

145 lines
7.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session handoff — 2026-06-11 (65)
Sixty-fifth handover. Continues from handoff-64 (ADR-0047 demo
overlays). This session designed and shipped **ADR-0048 — the `seed`
fake-data generation command (SD1)**, Phase 1, end to end: an ADR with
an extended fork dialogue + two `/runda` passes, then a phased
test-first build.
## §1. State at handoff
**Branch:** `main`. **HEAD will be the doc-wrap-up commit** (see §6) —
all seed work committed, nothing pending. Unpushed (push is the user's
step; normal working state).
**Tests: 2358 passing / 0 failing / 0 skipped / 1 ignored** (the long
-standing `friendly` doctest). **Clippy clean** (nursery, all targets).
+68 over handoff-64's 2290.
**`cargo sweep` run** at wrap-up: `target/` 1.6 G → 183 M.
**This session's commits:**
```
202e25a feat(seed): fake-data generation library + fake dependency (P1.1)
f1e9484 feat(seed): command plumbing + walking skeleton (P1.2)
73493fa feat(seed): FK sampling, empty-parent error, block guard (P1.3a)
9c13501 feat(seed): uniqueness, junction distinct-combos, IN-CHECK (P1.3b)
0b3ab3c feat(seed): SeedResult outcome, capped preview, advisory, count cap (P1.3c)
e6ff63d perf(seed): single-transaction multi-row insert path (P1.3d)
fbd219b feat(seed): --seed flag, ambient wiring, and /runda hardening (P1.4 + DA)
```
(plus the earlier `4d0ae77` multi-tab-scope withdrawal and `0af7f56`
ADR-0048 doc, and the wrap-up doc commit.)
## §2. What `seed` does (Phase 1 — read ADR-0048)
`seed <table> [count] [--seed <n>]` — populate a table with realistic
fake data. **Available in both modes** (A1).
- **Realistic, name-aware generation:** the **`fake` crate** (v5,
English) driven by a **type-gated heuristic catalogue** (`src/seed/
heuristics.rs`) — `email`→email, `first_name`→first name, `price`
currency, etc., each only firing when the column *type* is
compatible. **Table-context** disambiguates `name`/`title`
(`products.name`→a hand-rolled **product** name, `users.name`→person,
`vendors.name`→company). **Bounded dates** (`dob`/`created_at`/
`date`/`timestamp` → recent windows, never "all of history", anchored
to a fixed reference epoch for reproducibility). Type-based fallback
otherwise.
- **Uniqueness (D10):** the user-fillable PK, compound UNIQUE
constraints, single-column UNIQUE, and identifier-named columns
(`id`/`code`/…) stay distinct across the batch and vs existing rows;
**junction tables** get **distinct FK combinations** (capped at the
available product, reported). Identifier ints get a monotonic
sequence.
- **FK (D14):** every FK column samples an existing parent row (compound
FK reads one consistent parent row); **empty parent → friendly
error**.
- **`IN`-CHECK (D17):** a simple `col IN ('a','b')` CHECK becomes the
value source (enum-as-CHECK just works); complex CHECKs are flagged in
the advisory and best-effort generated (a violation rolls the batch
back).
- **Reproducibility (D4):** `--seed <n>` → identical data on the same DB
state. **Holds with no exceptions** — serial (rowid/MAX+1), FK
(`ORDER BY`), **shortid (seeded RNG)**, all generators.
- **Output:** the seeded-row count, a **capped preview** (first 20
rows), and a **Hint-styled advisory** naming enum-ish / underivable-
CHECK columns filled generically. Count cap 10 000; `seed t 0` no-op.
- **Safety:** one **undo** step (snapshot wraps the whole seed);
**replay** re-runs it as a data write; the insert path is a single
transaction (O(N), atomic, commit-db-last preserved).
## §3. Where the code lives
- **`src/seed/`** — the pure generation library (no DB): `mod.rs`
(`ColumnSpec`, `Generator`, `SeedRng`, `make_rng`), `heuristics.rs`
(`choose_generator` + the catalogue + `is_enum_ish`), `generators.rs`
(`generate_value` + the `product` generator + bounded dates),
`check.rs` (`parse_in_check_values`). ~40 Tier-1 tests, deterministic.
- **`src/db.rs`** — `do_seed` (+ `SeedColPlan`, `sample_parent_key_
tuples`, `seed_value_list_key`, `seed_max_int`, `SeedResult`,
`DEFAULT_SEED_COUNT`/`MAX_SEED_COUNT`/`SEED_PREVIEW_CAP`), the new
**`insert_one_row`** core extracted from `do_insert` (shared, no
tx/persist — so seed runs N rows in one tx), and the `Request::Seed` /
`Database::seed` / worker wiring.
- **`src/dsl/grammar/data.rs`** — `SEED` `CommandNode`, `build_seed`,
the `--seed` flag grammar (`Seq[Flag("seed"), NumberLit]`, the first
DSL flag with a value). `Command::Seed` in `command.rs`.
- **Runtime/render** — `CommandOutcome::Seed`, `AppEvent::
DslSeedSucceeded`, `App::handle_dsl_seed_success`. Catalog keys
`ok.rows_seeded` / `seed.capped` / `seed.advisory_generic` /
`help.data.seed` / `parse.usage.seed`.
- **Tests** — `tests/it/seed.rs` (25 integration tests),
`tests/typing_surface/mod.rs` (`seed_completion_and_validity`),
`tests/it/parse_error_pedagogy.rs` (bare-`seed` near-miss row),
`src/app.rs` (two render tests), `src/dsl/shortid.rs`
(`generate_with_rng`).
## §4. Process notes (the two `/runda` passes)
- **Pre-build `/runda`** (on the ADR) found six blockers — undo
integration (D15), replay semantics (D16), `set`-value quoting (D2),
CHECK handling (D17), an advisory phase-ordering bug (D13), auto-show
flooding (D18) — all folded into ADR-0048 before any code; the three
genuine forks re-escalated and user-resolved.
- **Post-implementation `/runda`** (on the whole implementation) found
**eight gaps**, all closed: FK-sampling determinism (→ `ORDER BY`),
**shortid not reproducible** (→ seeded RNG, fixed not documented — the
user chose the fix), and six **untested ADR decisions** (D5 advanced
mode, D15 undo, D16 replay, D17 complex-CHECK advisory, atomic
rollback, zero-count) — tests added for each.
## §5. Phase 2 (deferred — designed in ADR-0048, NOT built)
These are the only seed pieces left; both have full designs in
ADR-0048:
1. **The `set` override clause (D2)** — `seed t 20 set role in
('a','b'), status = 'x', work_addr as email, price between 10 and
100`. Value / pick-from-list / explicit-generator / range, **quoted
literals** (grammar-consistent). This is the SD2 "override hooks"
core. The `ColumnSpec.check_in_values` → `PickFrom` plumbing and the
`Generator` vocabulary already exist; this adds the grammar + a `set`
clause that overrides the per-column plan.
2. **Column-fill (`seed <table>.<column>`, D1 form 2)** — fill one
column across *existing* rows (an UPDATE). Refuses PK/autogen targets;
empty-table no-op.
`requirements.md`: **SD1 `[x]`**, **SD2 `[/]`** (core done; the two
above open), **A1 14/15** (only `hint`/**H2** unregistered).
## §6. How to take over
1. Read handoffs 63 → 64 → 65, `CLAUDE.md`, `docs/requirements.md`,
`docs/adr/0048-seed-fake-data-generation.md` (the whole thing — D1
D18 + the as-built status block).
2. **Seed is feature-complete for Phase 1; nothing pending.** Next
options (user's call): seed **Phase 2** (`set` clause + column-fill);
**H2 `hint`** (closes A1) — own ADR; **TT5 CI**; or the larger
**V4 journal** / **tutorial** ADRs.
3. Two minor, user-deferred observations (non-blocking): the uniqueness
retry cap (`MAX_ATTEMPTS=200`) can cap a *medium* unique domain
slightly below its true size (junction/small domains are exact);
`literal_to_value` doesn't type-check an IN-CHECK literal vs a numeric
column (a malformed `int IN ('a')` CHECK fails cleanly at bind).