# ADR-0048: `seed` — fake-data generation command (SD1, opens SD2) ## Status **Proposed (2026-06-11).** Design settled with the user across an extended fork dialogue (every decision below was escalated and user-chosen), then hardened by a `/runda` Devil's-Advocate pass that found six blockers — undo integration (D15), replay semantics (D16), `set` value quoting (D2), CHECK-constraint handling (D17), a phase-ordering bug in the advisory (D13), and auto-show flooding (D18) — plus refinements (state-relative reproducibility, compound-FK tuple sampling, column-fill constraint rules, the `fake` dependency scan). All are folded in below, the three genuine forks among them re-escalated and user-resolved. Pending **phased, test-first implementation**; this status flips to *Accepted / implemented* once that lands. Closes `requirements.md` **SD1** and delivers the core of **SD2** (per-type generators, determinism, the override surface). It also closes one of the two remaining gaps in **A1** ("all canonical app-level commands") — `seed`; the other, `hint` (**H2**), is separate. Builds on: ADR-0014 (data operations, the `Value`/`Bound` value model, the auto-show pattern, FK-error enrichment), ADR-0005/0011 (the type vocabulary and `Type::fk_target_type()`), ADR-0012/0013 (the column / relationship metadata tables, the rebuild-table primitive — *read* by seed for schema introspection), ADR-0024 (the unified grammar tree / `CommandNode` registration that gives completion, hints, help-id, usage-id for free), ADR-0022 (ambient typing assistance — the `KNOWN_SQL_FUNCTIONS` curated-vocabulary pattern that the generator-name list mirrors), ADR-0026 (the `in (...)` / `between ... and ...` expression grammar the override clause reuses), ADR-0027 (the validity-indicator diagnostics model), and ADR-0038 (the `OutputStyleClass::Hint` styled output used for the post-seed advisory). Honours ADR-0003 (both modes, no sigil), ADR-0009 (DSL conventions — keyword grammar, `--` flags for opt-in choices, one sigil only), ADR-0002 (no engine name in user-facing strings), and ADR-0015 (per-command write-through persistence). ## Context `seed [count]` is the last unbuilt **data-authoring** command in the requirements. The pedagogical value is high: a learner who has just modelled a schema wants rows to query against *now*, without hand-typing dozens of `insert`s. A teacher wants a one-liner that fills a demo database with believable data. SD1 commits to "plausible fake data; junction tables seeded with valid foreign-key references drawn from existing parent rows." SD2 deferred the *how* — "per-type generators, locale, determinism, override hooks" — explicitly pending this ADR. The design conversation widened the scope deliberately, with the user confirming each step: - **Realism matters more than minimalism** for a teaching tool. Random `text_a3f9` values teach nothing; `Alice Martinez` / `alice.m@example.com` make queries feel real. → adopt a faker library and make generation **name-aware**. - **The column *name* is the strongest signal** for what a value should look like, but it is **ambiguous** without the **table** for the `name`/`title` family (`products.name` ≠ `users.name`). - **Heuristics will miss**, so a **manual override** surface is required, not optional — this is SD2's "override hooks", brought forward. - **Identifiers and enums** are special: `id`-ish columns want uniqueness; `status`-ish columns have no sensible generic value and should be *flagged*, not guessed. The novel work is the **generation layer**. Everything downstream — type validation, autogen autofill (`serial`/`shortid`), FK enforcement, per-command persistence, the auto-show outcome — is reused from the existing insert/update machinery as **shared helper functions**, per the X5 architecture preference (unique commands, with mechanics shared as library functions — *not* by emitting `Command::Insert` to borrow `do_insert`). ## Decision Add a dedicated **`seed`** command (its own AST variant and its own `do_seed` worker executor) available in **both modes**, with the surface and behaviour below. Generation is realistic, name- and table-aware, type-gated, with a manual override clause and a reproducibility flag. **Command classification (important, set by the replay decision D16).** Although `requirements.md` A1 lists `seed` among the "app-level commands" (meaning: part of the canonical command surface, no sigil, both modes), `seed` is architecturally a **data-authoring command** — a sibling of `insert`/`update`/`delete`, **not** an app-lifecycle `AppCommand`. It is therefore **not** added to `is_app_lifecycle_entry_word` / completion's `empty_input_offers_app_command_entry_keywords` (those mirror the `AppCommand` set and must match — `seed` belongs in neither): `replay` re-runs it as a data write (D16). ### D1 — Command surface (fork, user-chosen: "whole-row + column-fill") Two forms: 1. **Whole-row generation** — `seed
[count]` Generates `count` new rows (an INSERT path). `count` **defaults to 20** (D6) when omitted. Every user-fillable column is filled per the generation rules (D7–D12); `serial`/`shortid` autogen columns are left to the existing autofill helpers. 2. **Column-fill on existing rows** — `seed
.` Fills `` across the table's **existing** rows (an UPDATE path) — the natural follow-up to `add column`. Combined with the `set` clause (D2) this is also the precise repair for a single mis-guessed column: `seed users.work_addr set work_addr as email`. Column-fill **refuses** PK columns and autogen (`serial`/`shortid`) columns (a friendly error — you don't "fill" an identity column), and **respects** the same UNIQUE / FK / required rules as whole-row generation (a UNIQUE target gets collision-free values; an FK target samples from the parent, D14). On an **empty** table it is a friendly no-op ("no rows to fill"). **Zero / over-cap counts.** `seed
0` is a friendly no-op; `count` over the maximum (D6) is a friendly error. The column-restricted-*insert* form (`seed t (a, b)` — new rows, only some columns filled) was considered and **rejected** as marginal and constraint-fragile (see Alternatives). **Required-column block guard (user requirement).** If seed cannot produce a value for a `NOT NULL` column — the only real case is a `NOT NULL blob` column, which has no DSL value path — it **refuses the whole operation with a friendly error** naming the column, rather than attempting a NULL insert that would violate the constraint. The check is a pre-flight over the resolved per-column plan, before any write. ### D2 — Manual override: the `set` clause (fork, user-chosen: "value + list + generator + range") An optional, comma-separated `set` clause overrides generation per column. Four forms, all reusing existing grammar vocabulary so there is nothing new to learn: | Form | Example | Meaning | |---|---|---| | Fixed value | `set status = 'pending'` | every row gets the constant | | Pick-from-list | `set role in ('admin', 'editor', 'viewer')` | uniform random choice from the list | | Explicit generator | `set work_addr as email` | force a named generator (D9) | | Range | `set price between 10 and 100` | uniform in range; **also dates** — `set signup between 2023-01-01 and 2024-12-31` | Multiple clauses combine: `seed users 20 set role in ('admin', 'user'), status = 'active', signup between 2023-01-01 and 2024-12-31`. **Quoting (fork, user-chosen: "quoted, grammar-consistent").** Text values and list items are **quoted string literals** (`'admin'`), exactly as everywhere else in the DSL — numbers and dates stay unquoted. This reuses the ADR-0026 expression grammar **unchanged**: the DA pass confirmed that the `in (...)` form's operands are typed value slots, so a *bare* `admin` would parse as a **column reference** (→ "unknown column"), not a string. Quoting is therefore not a style preference but a correctness requirement of grammar reuse. The range form is **type-aware**: numeric bounds for numeric columns, date bounds for date/datetime columns; a type-incompatible bound is a friendly error. `=`, `in (...)`, and `between ... and ...` are the ADR-0026 expression operators; `set` is the ADR-0014 UPDATE keyword; `as` is borrowed from the SQL alias slot. The `as ` operand is a bare name from the curated generator vocabulary (D9), not a value. The override takes precedence over every heuristic. ### D3 — Generation library: `fake` crate + hand-rolled gaps (fork, user-chosen: "name-aware + realistic") Add the **`fake`** crate (v5.x at time of writing; English locale for v1 per X2) for realistic values: names, emails, usernames, addresses, companies, phone numbers, lorem text, dates. Generation is driven by a per-column **generator** chosen by the heuristics (D7) or the override (D2), falling back to **type-based** generation (D8). **Two open implementation-time verifications** (flagged honestly, to be resolved when the dependency is locked, not assumed here): - **`rand` de-duplication.** The project is on `rand 0.10.1`; `fake` brings its own `rand`. Confirm a single `rand` version resolves (a duplicate is harmless but should be a conscious outcome, and `shortid.rs` + the seed RNG must share the version we standardise on). - **`fake` module inventory.** Confirm which generators v5 actually ships (strong prior: it has Name/Internet/Address/Company/Lorem/ Chrono/Currency/Job/Color but **no commerce/product module** — see D9), and the minimal feature-flag set needed (derive, chrono-backed dates). - **Security (new-dependency posture).** `fake` and its transitive tree must be scanned (`trivy fs`, `grype`, `osv-scanner`) before merge, per the global new-dependency rule; findings documented, not silently accepted. ### D4 — Determinism: `--seed ` (fork, user-chosen: "optional flag") Generation is **random by default**. The optional `--seed ` flag makes a run **reproducible**: **same database state + same `--seed` → identical data**. The "database state" qualifier matters (DA refinement) — FK sampling (D14), identifier sequencing (D10), and UNIQUE collision-avoidance all *read existing rows*, so reproducibility is relative to the data already present, not absolute. Value: teachers hand out one dataset; demos are stable; and the feature's own tests can assert **exact** output (against a known starting state). Implemented with a seedable RNG threaded through every generator (no `thread_rng` on the seeded path). `--` flag per ADR-0009 (opt-in choice). Naming note: the flag `--seed` and the command `seed` share a word but never collide grammatically (`seed users 20 --seed 42` parses unambiguously). This flag is also the determinism lever for **replay** (D16): a recorded `seed … --seed N` line reproduces on replay; a bare `seed …` line regenerates fresh data. ### D5 — Both modes (A1) `seed` is a canonical app-level command, available in **simple and advanced** mode, no sigil — like `save`/`load`/`export`/`replay`. ### D6 — Default count: 20; bounded maximum Omitted `count` → **20** rows: enough to make `where`, `group by`, `order by`, and `limit` meaningful without flooding the output pane. A **maximum** is enforced (proposed 10 000) to prevent a typo (`seed t 1000000`) from hanging the app or bloating the project; over the cap → friendly error stating the limit. ### D7 — Name-aware heuristics, type-gated (the catalogue) A column's **name** selects a generator, but a name rule only fires when the column's **type** is compatible (a column named `email` typed `int` does **not** get a string — it falls through to type-based int). Matching is **case-insensitive**, **token-based** (split on `_`, camelCase, kebab), **most-specific-first**, with documented false-positive guards. The catalogue (representative; full table lives with the implementation): | Column name (tokens) | Generator | Type gate | |---|---|---| | `first_name`/`fname` · `last_name`/`surname`/`lname` | first / last name | text | | `name`/`full_name` · `title` | **table-context** name (D11) | text | | `email`/`*_email` | email | text | | `username`/`login`/`handle` | username | text | | `password`/`pwd` | password | text | | `phone`/`mobile`/`cell`/`tel` | phone number | text | | `city`/`town` · `country` · `state`/`province` | address parts | text | | `street`/`address`/`addr` · `zip`/`postcode`/`postal` | address parts | text | | `company`/`employer`/`org` · `job`/`position`/`profession` | company / job | text | | `description`/`bio`/`notes`/`summary`/`comment` | sentence / paragraph | text | | `url`/`website`/`homepage` · `color`/`colour` | URL / hex colour | text | | `price`/`amount`/`cost`/`salary`/`balance`/`total` | currency-range number | numeric | | `age` · `quantity`/`qty`/`stock`/`count` | 18–80 · small int | numeric | | `date`/`*_date` | date, recent ~3 yr window | date | | `dob`/`birthday` | date, adult window (18–80 yr ago) | date | | `timestamp`/`datetime` · `created_at`/`updated_at`/`*_at` | datetime, recent window (`updated_at` ≥ `created_at`) | datetime | | `is_*`/`has_*`/`active`/`enabled` | boolean | bool | | **identifier family** (D10) | unique sequential | int/text | | **enum-ish family** (D12) | generic text + flag | (text) | **False-positive guards (documented):** `username`/`filename`/ `table_name`/`*_name` handled before the bare `name` rule so they do **not** resolve to person-name; the bare `name`/`title` rule requires a standalone token or a recognised `*_name` suffix. ### D8 — Type-based fallback When no name rule matches (or to satisfy a name rule's type gate), generate by **type**: `text`→realistic words/short phrase, `int`→ bounded random, `real`→random double, `decimal`→formatted number, `bool`→random, `date`/`datetime`→**bounded recent** value (never "any point in all of history" — per the user's date concern), `serial`/ `shortid`→omitted (autogen helpers fill them), `blob`→unsupported (nullable→NULL; `NOT NULL`→D1 block guard). ### D9 — Named generators + the `product` generator The generators addressable via `set ... as ` (D2) and chosen by D7 form a **curated, named vocabulary** — `name`, `first_name`, `last_name`, `email`, `username`, `phone`, `city`, `country`, `street`, `zip`, `company`, `job`, `sentence`, `paragraph`, `url`, `color`, `price`, `age`, `date`, `datetime`, `bool`, `product`, … — the single source of truth shared by the executor, the completion source, and the highlighter (mirroring `KNOWN_SQL_FUNCTIONS`, ADR-0022 Amд6). **`product`** is **hand-rolled** (the `fake` crate has no commerce/product module — D3): `{adjective} {material} {noun}` from three small baked-in word lists (~20 each) → "Sleek Bamboo Keyboard", "Vintage Leather Backpack". Seedable through the D4 RNG. Always addressable as `set as product`, and auto-selected by D11 for the `name`/`title` family in product-ish tables. ### D10 — Identifier family → unique by name (fork, user-chosen: "unique sequential") A column in the identifier family — `id`, `*_id` **that is not an FK**, `code`, `sku`, `ref`/`reference`, `number`/`no`, `barcode` — that is **not** a serial/shortid autogen column and **not** the PK is treated as an identifier and gets **unique** values: **int → sequential** (`MAX(col)+1` ascending, reads like real ids, never collides); **text → unique short code** (generate-with-retry). Precedence: **FK detection wins** over this rule (an FK `user_id` *should* have duplicates — many children per parent), so `*_id` only triggers uniqueness when the column is not a foreign key. **Constraint-driven uniqueness is independent and mandatory:** any column with a `UNIQUE` constraint (or a user-fillable single-column PK) gets guaranteed-unique generation regardless of name — a correctness requirement, not a heuristic. Generation for such columns uses retry/sequence to guarantee no collision within the batch and against existing rows. ### D11 — Table-context disambiguation for `name`/`title` (fork, user-chosen: "table-context-aware") For the `name`/`title` family **only**, the heuristic also reads the **table** name token: - `product`/`item`/`goods`/`merchandise`/`catalog`/`inventory` → `product` generator (D9) - `company`/`companies`/`vendor`/`supplier`/`manufacturer`/`brand` → company name - `user`/`customer`/`person`/`people`/`employee`/`member`/`contact`/ `author`/`student` → person name - unrecognised table → generic word This resolves the real ambiguity (`products.name` → "Sleek Bamboo Keyboard"; `users.name` → "Alice Martinez"; `vendors.name` → "Globex Corp"). It is a deliberately **scoped** use of table context — the only place the table name influences generation. ### D12 — Enum-ish names → generic + post-seed advisory (fork, user-chosen: "flag enum-ish only") Enum-ish names — `role`, `status`, `type`, `state`, `kind`, `category`, `level`, `tier`, `stage`, `priority`, `gender` — have **no sensible generic generator**, so they are **not guessed**: they fall through to generic text (they must still be filled — a `NOT NULL` status cannot be left empty). Seed then emits a **post-seed advisory** (D13) naming them and pointing at the `set ... in (...)` override. ### D13 — Reporting: post-seed advisory (fork, user-chosen: "flag enum-ish only") After a successful seed, in addition to the normal auto-show outcome (row count + the affected rows, per ADR-0014), seed appends a **`OutputStyleClass::Hint`** advisory **only** when one or more enum-ish columns (D12) — **or columns guarded by a CHECK that seed could not derive values from** (D17) — were filled generically. The wording is **phase-aware** (DA finding: the advisory must not name features that ship later). In **Phase 1** (no `set` clause yet) it names the columns and explains they were filled generically. From **Phase 2/3** it points at the concrete repair: ``` # Phase 1 wording: ✓ Seeded 20 rows into users ℹ status, role were filled with generic text — they look like fixed value sets you may want to choose deliberately. # Phase 2/3 wording (set clause + column-fill exist): ✓ Seeded 20 rows into users ℹ status, role filled generically. Fix existing rows with seed users.status set status in ('active','inactive'), or pass set … on the next seed. ``` Note the repair for **already-seeded rows** is the **column-fill** form (`seed users.status set …`), not "re-seed" (which would add more rows) — DA correction. This is a **result-time** note (cheap, reusing ADR-0038's hint rendering), not a typing-time warning. The fuller "per-column report" (every column → its generator) was considered and **deferred** (see Alternatives / Out of scope). ### D14 — Foreign keys (SD1; fork on empty-parent, user-chosen: "friendly error") - **Each FK** is filled by sampling **uniformly** from the **existing rows** of the parent table's referenced column(s). Duplicates are expected and correct (many children per parent). For a **compound FK**, the referenced **tuple is sampled jointly** (a whole existing parent key), never per-column independently — independent sampling could fabricate a `(a, b)` pair that exists in no parent row and would fail FK enforcement (DA refinement). - **Empty parent** → seed **refuses with a friendly error** naming the parent and the FK column ("seed `users` first — `orders.user_id` references it"). Safe, predictable, teaches FK dependency order. Recursive parent auto-seed is **deferred** to a future `--recursive` opt-in (Out of scope). - **Junction / compound-PK tables** (SD1's explicit case): sample **distinct combinations** of the parent PK tuples to satisfy the compound PK's uniqueness; if `count` exceeds the number of available distinct combinations, **cap** at the maximum and note it in the outcome. - **Self-referential FK** (`manager_id → id`): if nullable, leave NULL or point at an earlier row in the same batch; if `NOT NULL` on an otherwise-empty table, friendly error. Documented edge case. - **Nullable FKs** are **always filled** in v1 (predictable); occasional-NULL injection is deferred. ### D15 — Undo: one snapshot per seed (DA finding; ADR-0006) Seed is a mutation, so it must participate in undo. The draft omitted this; the DA found the codebase already has the right primitive — `BeginBatch` / `EndBatch` (`db.rs`), used by `replay` so a multi-write run collapses to **one** boundary snapshot. `do_seed` wraps its generated writes in `begin_batch` / `end_batch`, so **`seed users 20` is a single undo step**, not 20 — matching ADR-0006 Amendment 1's batch model. Column-fill's bulk UPDATE is likewise one step. (`import` remains the only data-affecting op outside undo, per ADR-0015 §11; seed is firmly inside it.) ### D16 — Replay: seed re-runs as a data write (fork, user-chosen) `replay` re-executes a recorded `seed` line as a **data-write command** — it is **not** in the app-lifecycle skip-set (see Command classification, above). Consequence, accepted by the user: a **bare** `seed users 20` regenerates **fresh, divergent** data on each replay; a `seed users 20 --seed 42` line (the determinism lever, D4) **reproduces** the original data. This keeps seed faithful to its nature as a data write and puts reproducibility exactly where the `--seed` flag already lives. (Seeded *data* is in any case durable independently of replay, via the ADR-0015 CSV store + `rebuild`; replay is the scripting re-run path, U4.) The DA confirmed the wiring trap: because seed is *not* an `AppCommand`, it is correctly absent from `is_app_lifecycle_entry_word` and replay dispatches it through the normal data path rather than aborting. ### D17 — CHECK constraints: derive from simple `IN`, else friendly-fail (fork, user-chosen) A CHECK on a generically-filled column would otherwise fail the whole batch (DA finding — the block guard only covered `NOT NULL blob`). Two-tier handling, per the user: 1. **Derive from simple `IN`-CHECKs.** When a column's CHECK is the common enum-as-CHECK shape — `col IN ('a', 'b', …)` (the column's own CHECK, single-column, literal list) — seed **parses out the allowed values and uses them as the generator** (uniform choice). The frequent `CHECK (status IN ('active','closed'))` case then "just works" with no override needed. 2. **Best-effort + friendly fail for the rest.** For CHECKs seed cannot interpret (ranges, expressions, multi-column), it generates best-effort; if a generated row violates the CHECK, the insert fails through the existing **H1 friendly-error layer** (ADR-0019) naming the constraint and pointing at `set`. Such CHECK-guarded columns are also **pre-flagged in the advisory** (D13) alongside enum-ish names, so the user is warned before hitting the failure. No new CHECK engine — tier 1 is a narrow literal-`IN` parse over the CHECK text already stored in metadata; tier 2 is the existing failure path. ### D18 — Auto-show is capped for large seeds (DA finding) ADR-0014 auto-show renders "the affected rows" — fine for one insert, a wall for a 10 000-row seed. Seed's outcome shows a **capped preview** (proposed first **20** rows) with a `(showing 20 of N)` note, not the full set. The row **count** is always reported in full; only the rendered table is capped. ## Grammar, AST, and cross-cutting wiring Per ADR-0024, `seed` is registered as a `CommandNode` so completion, hints, help, and usage flow from one definition. The wiring, as **explicit acceptance criteria** (a `/runda` pass must verify each — ADR-0045 showed "claimed verified" is not verified): - **AST + executor.** A dedicated command variant (`Seed { table, target_column: Option, count: Option, overrides: Vec, rng_seed: Option }`) and a dedicated `do_seed` worker executor. `do_seed` **reuses shared helpers** (value binding `impl_value_for`, autogen autofill, FK enrichment, the multi-row parameterised-insert pattern of `plan_autogen_autofill`, the UPDATE path for column-fill, per-command persistence, the `begin_batch`/`end_batch` undo primitive of D15) as library functions — it does **not** emit `Command::Insert`/`Command::Update` (X5). - **Replay / undo classification (D15/D16).** `do_seed` brackets its writes in one batch (one undo step). The `seed` entry word is **deliberately absent** from `is_app_lifecycle_entry_word` and completion's `empty_input_offers_app_command_entry_keywords` (the `AppCommand` mirror) so replay re-runs it as a data write — an explicit acceptance check, since the default for an unlisted recognised command must be "replayed", not "abort". - **Completion sources:** table-name (existing tables); `.column` and `set`-clause column slots (columns of the named table); the generator-name vocabulary (D9) after `as`; `count` number; `set` / `=` / `in` / `as` / `between` / `and` keywords; `--seed` flag. - **Syntax highlighting:** `seed` keyword; the generator-name vocabulary highlighted as **`tok_function`** (reuse the existing ADR-0022 Amд6 blue — no new theme colour). - **Hints:** ambient per-slot "what's next" and usage hints, both modes. - **Help:** `help seed` topic (`help_id` + per-command block); the general `help` list picks it up automatically via REGISTRY. - **Parse-error pedagogy (ADR-0042):** near-miss matrix rows for `seed` (bare / missing-table / wrong-token / malformed `set`), both modes. - **Validity indicator (ADR-0027):** typing-time `[ERR]`/`[WRN]` for unknown table, unknown column (in `.column` or `set`), unknown generator name after `as`. - **No DSL→SQL teaching echo (ADR-0038).** `seed` is a utility/app command, not a DSL form of a SQL statement, so the echo does not apply. (A future "show the generated INSERTs" is out of scope — it would dump `count` statements.) ## Implementation phasing Design is whole; the **implementation** is phased into reviewable, test-first commits: 1. **Core whole-row seed** — grammar/AST/executor; type-based generation + the `fake`-backed name heuristics (D7/D8/D11); identifier uniqueness (D10) + constraint uniqueness; FK sampling (joint tuples) + empty-parent error + junction distinct-combos (D14); `--seed` determinism (D4); default count + cap + zero-no-op (D6/D1); required-column block guard (D1); **undo batch (D15)**; **replay-as-data-write classification (D16)**; **CHECK derive / friendly-fail (D17)**; **capped auto-show (D18)**; the enum/CHECK advisory in its **Phase-1 wording** (D12/D13); full ambient wiring; both modes. 2. **The `set` override clause** (D2) — value / list / generator / range, type-aware, with completion + highlight + validity for the generator-name slot. 3. **Column-fill mode** (`seed
.`, D1 form 2) — the UPDATE path. Each phase is independently green before the next. ## Testing (ADR-0008 tiers 1–3; test-first) - **Tier 1 (unit, deterministic via `--seed`):** generator selection (name × type-gate matrix, including every false-positive guard of D7); table-context disambiguation (D11); identifier uniqueness and the FK-wins-over-`*_id` precedence (D10); bounded-date windows (D8); the `product` generator shape; override resolution + precedence (D2); the required-column block guard (D1); the count cap (D6). Exact-value assertions are possible because `--seed` fixes the RNG. - **Tier 2 (insta snapshots):** the seeded data table render and the enum advisory (D13) at representative sizes, light + dark. - **Tier 3 (integration, full event loop):** `seed users 20` end to end (rows land in db + CSV + history, auto-show, persistence); FK sampling against a populated parent (incl. a **compound FK** — every child tuple exists in the parent); **empty-parent friendly error**; **junction** seeding with distinct combinations and the over-cap note; the `set` clause forms (quoted literals); **column- fill** on existing rows (incl. refusal of PK/autogen targets, empty- table no-op); reproducibility (`--seed 42` twice → identical data from a fixed state); both modes. Plus the DA-driven cases: **one-undo-step** (seed then a single `undo` removes all rows); **replay** of a bare `seed` line (divergent) vs a `--seed` line (reproduced); **`IN`-CHECK auto-derivation** ("just works") and a **complex-CHECK friendly failure**; **capped auto-show** on a large seed. "All green, no skips" is the only acceptable end state; the Phase-1 baseline (2290 passing / 0 failing / 0 skipped / 1 ignored doctest) is the regression floor. ## Out of scope / deferred (future SD2 work) - **Recursive parent auto-seed** (`--recursive`) — D14 errors instead. - **NULL injection** for nullable columns (teaching optional relationships / `IS NULL`) — v1 always fills. - **Multi-locale** generation — English only (X2). - **User-defined custom generators** (true "override hooks" — register a named generator) — the `set ... as ` surface covers the common need; custom generators are a later SD2 increment. - **Full per-column seed report** — D13 flags enum-ish only. - **Column-restricted insert** (`seed t (a, b)`) — rejected (D1). - **"Show the generated SQL"** teaching echo for seed. ## Alternatives considered - **Hand-rolled generators only (no `fake`):** minimal dependency, but synthetic-looking data (`text_a3f9`) — rejected on pedagogy (pedagogy wins ties). - **Type-only generation (no name awareness):** simpler, but misses the biggest UX win (a `users` table that reads like real people) — rejected. - **Column-name-only `name` (no table context):** leaves `products.name` → person names, requiring a manual override on every product/company table — rejected for the `name`/`title` family (D11). - **No override clause (heuristics + type only):** could not answer "the heuristic guessed wrong, fix it" or enum columns — rejected; the `set` clause (D2) is the answer to the user's Q3. - **Recursive auto-seed of empty parents:** powerful but magical and can seed tables the user did not name — deferred behind a future flag (D14). - **Always-random (no `--seed`):** simplest, but no reproducible datasets and weaker tests — rejected (D4). - **Full per-column report by default:** a nice teaching artifact but verbose on wide tables — deferred; flag-only advisory chosen (D13). - **Reuse `Command::Insert`/`do_insert` directly** from seed: tempting for code reuse, but collapses command identity and violates X5 — rejected in favour of a dedicated `do_seed` that calls shared *helpers*. - **Skip seed on replay** (classify as app-lifecycle, D16): consistent with A1's "app-level" label and avoids divergent data, but seed is a data write and silently skipping it on a scripted re-run is surprising — rejected; `--seed` is the determinism lever instead. - **Bare-word `set` list items** (`in (admin, …)`, D2): matched the early mockups and reads cleaner, but bare words are column references in the reused grammar (would error) and would force a custom list form — rejected for quoted literals (grammar reuse + DSL consistency). - **Pre-flight refuse any CHECK-bearing table** (D17): safest but blocks seeding too many legitimate tables — rejected for the derive-`IN`-else-friendly-fail tier. - **`set`-driven NULL / per-column report / recursive parent seed:** deferred — see Out of scope.