Files
rdbms-playground/docs/handoff/20260611-handoff-65.md
T
claude@clouddev1 78c38e8b33 docs: ADR-0048 Phase 1 accepted/implemented + handoff 65
- ADR-0048 status -> Accepted; Phase 1 implemented (commits
  202e25a..fbd219b), with the pre-build and post-implementation /runda
  passes and the 2358-test green state recorded; index entry updated.
- requirements.md: SD1 [x] (whole-row seed + FK/junction, both modes,
  --seed reproducibility with no exceptions), SD2 [/] (core generators /
  determinism done; the set override clause + column-fill are Phase 2),
  A1 14/15 (only hint/H2 remains unregistered).
- Handoff 65: the full seed Phase-1 build, the two /runda passes, where
  the code lives, and Phase-2 / next steps.
2026-06-11 21:49:06 +00:00

7.3 KiB
Raw Blame History

Session handoff — 2026-06-11 (65)

Sixty-fifth handover. Continues from handoff-64 (ADR-0047 demo overlays). This session designed and shipped ADR-0048 — the seed fake-data generation command (SD1), Phase 1, end to end: an ADR with an extended fork dialogue + two /runda passes, then a phased test-first build.

§1. State at handoff

Branch: main. HEAD will be the doc-wrap-up commit (see §6) — all seed work committed, nothing pending. Unpushed (push is the user's step; normal working state).

Tests: 2358 passing / 0 failing / 0 skipped / 1 ignored (the long -standing friendly doctest). Clippy clean (nursery, all targets). +68 over handoff-64's 2290.

cargo sweep run at wrap-up: target/ 1.6 G → 183 M.

This session's commits:

202e25a feat(seed): fake-data generation library + fake dependency (P1.1)
f1e9484 feat(seed): command plumbing + walking skeleton (P1.2)
73493fa feat(seed): FK sampling, empty-parent error, block guard (P1.3a)
9c13501 feat(seed): uniqueness, junction distinct-combos, IN-CHECK (P1.3b)
0b3ab3c feat(seed): SeedResult outcome, capped preview, advisory, count cap (P1.3c)
e6ff63d perf(seed): single-transaction multi-row insert path (P1.3d)
fbd219b feat(seed): --seed flag, ambient wiring, and /runda hardening (P1.4 + DA)

(plus the earlier 4d0ae77 multi-tab-scope withdrawal and 0af7f56 ADR-0048 doc, and the wrap-up doc commit.)

§2. What seed does (Phase 1 — read ADR-0048)

seed <table> [count] [--seed <n>] — populate a table with realistic fake data. Available in both modes (A1).

  • Realistic, name-aware generation: the fake crate (v5, English) driven by a type-gated heuristic catalogue (src/seed/ heuristics.rs) — email→email, first_name→first name, price→ currency, etc., each only firing when the column type is compatible. Table-context disambiguates name/title (products.name→a hand-rolled product name, users.name→person, vendors.name→company). Bounded dates (dob/created_at/ date/timestamp → recent windows, never "all of history", anchored to a fixed reference epoch for reproducibility). Type-based fallback otherwise.
  • Uniqueness (D10): the user-fillable PK, compound UNIQUE constraints, single-column UNIQUE, and identifier-named columns (id/code/…) stay distinct across the batch and vs existing rows; junction tables get distinct FK combinations (capped at the available product, reported). Identifier ints get a monotonic sequence.
  • FK (D14): every FK column samples an existing parent row (compound FK reads one consistent parent row); empty parent → friendly error.
  • IN-CHECK (D17): a simple col IN ('a','b') CHECK becomes the value source (enum-as-CHECK just works); complex CHECKs are flagged in the advisory and best-effort generated (a violation rolls the batch back).
  • Reproducibility (D4): --seed <n> → identical data on the same DB state. Holds with no exceptions — serial (rowid/MAX+1), FK (ORDER BY), shortid (seeded RNG), all generators.
  • Output: the seeded-row count, a capped preview (first 20 rows), and a Hint-styled advisory naming enum-ish / underivable- CHECK columns filled generically. Count cap 10 000; seed t 0 no-op.
  • Safety: one undo step (snapshot wraps the whole seed); replay re-runs it as a data write; the insert path is a single transaction (O(N), atomic, commit-db-last preserved).

§3. Where the code lives

  • src/seed/ — the pure generation library (no DB): mod.rs (ColumnSpec, Generator, SeedRng, make_rng), heuristics.rs (choose_generator + the catalogue + is_enum_ish), generators.rs (generate_value + the product generator + bounded dates), check.rs (parse_in_check_values). ~40 Tier-1 tests, deterministic.
  • src/db.rsdo_seed (+ SeedColPlan, sample_parent_key_ tuples, seed_value_list_key, seed_max_int, SeedResult, DEFAULT_SEED_COUNT/MAX_SEED_COUNT/SEED_PREVIEW_CAP), the new insert_one_row core extracted from do_insert (shared, no tx/persist — so seed runs N rows in one tx), and the Request::Seed / Database::seed / worker wiring.
  • src/dsl/grammar/data.rsSEED CommandNode, build_seed, the --seed flag grammar (Seq[Flag("seed"), NumberLit], the first DSL flag with a value). Command::Seed in command.rs.
  • Runtime/renderCommandOutcome::Seed, AppEvent:: DslSeedSucceeded, App::handle_dsl_seed_success. Catalog keys ok.rows_seeded / seed.capped / seed.advisory_generic / help.data.seed / parse.usage.seed.
  • Teststests/it/seed.rs (25 integration tests), tests/typing_surface/mod.rs (seed_completion_and_validity), tests/it/parse_error_pedagogy.rs (bare-seed near-miss row), src/app.rs (two render tests), src/dsl/shortid.rs (generate_with_rng).

§4. Process notes (the two /runda passes)

  • Pre-build /runda (on the ADR) found six blockers — undo integration (D15), replay semantics (D16), set-value quoting (D2), CHECK handling (D17), an advisory phase-ordering bug (D13), auto-show flooding (D18) — all folded into ADR-0048 before any code; the three genuine forks re-escalated and user-resolved.
  • Post-implementation /runda (on the whole implementation) found eight gaps, all closed: FK-sampling determinism (→ ORDER BY), shortid not reproducible (→ seeded RNG, fixed not documented — the user chose the fix), and six untested ADR decisions (D5 advanced mode, D15 undo, D16 replay, D17 complex-CHECK advisory, atomic rollback, zero-count) — tests added for each.

§5. Phase 2 (deferred — designed in ADR-0048, NOT built)

These are the only seed pieces left; both have full designs in ADR-0048:

  1. The set override clause (D2)seed t 20 set role in ('a','b'), status = 'x', work_addr as email, price between 10 and 100. Value / pick-from-list / explicit-generator / range, quoted literals (grammar-consistent). This is the SD2 "override hooks" core. The ColumnSpec.check_in_valuesPickFrom plumbing and the Generator vocabulary already exist; this adds the grammar + a set clause that overrides the per-column plan.
  2. Column-fill (seed <table>.<column>, D1 form 2) — fill one column across existing rows (an UPDATE). Refuses PK/autogen targets; empty-table no-op.

requirements.md: SD1 [x], SD2 [/] (core done; the two above open), A1 14/15 (only hint/H2 unregistered).

§6. How to take over

  1. Read handoffs 63 → 64 → 65, CLAUDE.md, docs/requirements.md, docs/adr/0048-seed-fake-data-generation.md (the whole thing — D1 D18 + the as-built status block).
  2. Seed is feature-complete for Phase 1; nothing pending. Next options (user's call): seed Phase 2 (set clause + column-fill); H2 hint (closes A1) — own ADR; TT5 CI; or the larger V4 journal / tutorial ADRs.
  3. Two minor, user-deferred observations (non-blocking): the uniqueness retry cap (MAX_ATTEMPTS=200) can cap a medium unique domain slightly below its true size (junction/small domains are exact); literal_to_value doesn't type-check an IN-CHECK literal vs a numeric column (a malformed int IN ('a') CHECK fails cleanly at bind).