The pure generation half of `seed` — no command wiring yet: - src/seed/: ColumnSpec + Generator model and a seeded StdRng; the type-gated name-heuristic catalogue (D7) with documented false-positive guards; table-context name disambiguation (D11); identifier (D10) and enum-ish (D12) detection; per-type + bounded-date generators (D8); the hand-rolled product generator (D9); and PickFrom for IN-CHECK / enum lists. - Adds the `fake` crate (v5, default features). Verified: single rand 0.10.1 (no duplication), determinism via one seeded StdRng driving both fake and the hand-rolled generators, security-clean across osv/grype/trivy. - ADR-0048 D3 updated to record the dependency verification. 32 Tier-1 tests (exact-value via fixed --seed); 1673 lib tests pass, clippy all-targets clean.
32 KiB
ADR-0048: seed — fake-data generation command (SD1, opens SD2)
Status
Proposed (2026-06-11). Design settled with the user across an
extended fork dialogue (every decision below was escalated and
user-chosen), then hardened by a /runda Devil's-Advocate pass that
found six blockers — undo integration (D15), replay semantics (D16),
set value quoting (D2), CHECK-constraint handling (D17), a
phase-ordering bug in the advisory (D13), and auto-show flooding
(D18) — plus refinements (state-relative reproducibility, compound-FK
tuple sampling, column-fill constraint rules, the fake dependency
scan). All are folded in below, the three genuine forks among them
re-escalated and user-resolved. Pending phased, test-first
implementation; this status flips to Accepted / implemented once
that lands.
Closes requirements.md SD1 and delivers the core of SD2
(per-type generators, determinism, the override surface). It also
closes one of the two remaining gaps in A1 ("all canonical
app-level commands") — seed; the other, hint (H2), is
separate.
Builds on: ADR-0014 (data operations, the Value/Bound value model,
the auto-show pattern, FK-error enrichment), ADR-0005/0011 (the type
vocabulary and Type::fk_target_type()), ADR-0012/0013 (the column /
relationship metadata tables, the rebuild-table primitive — read by
seed for schema introspection), ADR-0024 (the unified grammar tree /
CommandNode registration that gives completion, hints, help-id,
usage-id for free), ADR-0022 (ambient typing assistance — the
KNOWN_SQL_FUNCTIONS curated-vocabulary pattern that the
generator-name list mirrors), ADR-0026 (the in (...) / between ... and ... expression grammar the override clause reuses), ADR-0027 (the
validity-indicator diagnostics model), and ADR-0038 (the
OutputStyleClass::Hint styled output used for the post-seed
advisory). Honours ADR-0003 (both modes, no sigil), ADR-0009 (DSL
conventions — keyword grammar, -- flags for opt-in choices, one
sigil only), ADR-0002 (no engine name in user-facing strings), and
ADR-0015 (per-command write-through persistence).
Context
seed <table> [count] is the last unbuilt data-authoring command
in the requirements. The pedagogical value is high: a learner who has
just modelled a schema wants rows to query against now, without
hand-typing dozens of inserts. A teacher wants a one-liner that
fills a demo database with believable data. SD1 commits to "plausible
fake data; junction tables seeded with valid foreign-key references
drawn from existing parent rows." SD2 deferred the how — "per-type
generators, locale, determinism, override hooks" — explicitly pending
this ADR.
The design conversation widened the scope deliberately, with the user confirming each step:
- Realism matters more than minimalism for a teaching tool. Random
text_a3f9values teach nothing;Alice Martinez/alice.m@example.commake queries feel real. → adopt a faker library and make generation name-aware. - The column name is the strongest signal for what a value should
look like, but it is ambiguous without the table for the
name/titlefamily (products.name≠users.name). - Heuristics will miss, so a manual override surface is required, not optional — this is SD2's "override hooks", brought forward.
- Identifiers and enums are special:
id-ish columns want uniqueness;status-ish columns have no sensible generic value and should be flagged, not guessed.
The novel work is the generation layer. Everything downstream —
type validation, autogen autofill (serial/shortid), FK
enforcement, per-command persistence, the auto-show outcome — is
reused from the existing insert/update machinery as shared helper
functions, per the X5 architecture preference (unique commands, with
mechanics shared as library functions — not by emitting
Command::Insert to borrow do_insert).
Decision
Add a dedicated seed command (its own AST variant and its own
do_seed worker executor) available in both modes, with the
surface and behaviour below. Generation is realistic, name- and
table-aware, type-gated, with a manual override clause and a
reproducibility flag.
Command classification (important, set by the replay decision
D16). Although requirements.md A1 lists seed among the
"app-level commands" (meaning: part of the canonical command surface,
no sigil, both modes), seed is architecturally a data-authoring
command — a sibling of insert/update/delete, not an
app-lifecycle AppCommand. It is therefore not added to
is_app_lifecycle_entry_word / completion's
empty_input_offers_app_command_entry_keywords (those mirror the
AppCommand set and must match — seed belongs in neither): replay
re-runs it as a data write (D16).
D1 — Command surface (fork, user-chosen: "whole-row + column-fill")
Two forms:
-
Whole-row generation —
seed <table> [count]Generatescountnew rows (an INSERT path).countdefaults to 20 (D6) when omitted. Every user-fillable column is filled per the generation rules (D7–D12);serial/shortidautogen columns are left to the existing autofill helpers. -
Column-fill on existing rows —
seed <table>.<column>Fills<column>across the table's existing rows (an UPDATE path) — the natural follow-up toadd column. Combined with thesetclause (D2) this is also the precise repair for a single mis-guessed column:seed users.work_addr set work_addr as email. Column-fill refuses PK columns and autogen (serial/shortid) columns (a friendly error — you don't "fill" an identity column), and respects the same UNIQUE / FK / required rules as whole-row generation (a UNIQUE target gets collision-free values; an FK target samples from the parent, D14). On an empty table it is a friendly no-op ("no rows to fill").
Zero / over-cap counts. seed <table> 0 is a friendly no-op;
count over the maximum (D6) is a friendly error.
The column-restricted-insert form (seed t (a, b) — new rows, only
some columns filled) was considered and rejected as marginal and
constraint-fragile (see Alternatives).
Required-column block guard (user requirement). If seed cannot
produce a value for a NOT NULL column — the only real case is a
NOT NULL blob column, which has no DSL value path — it refuses the
whole operation with a friendly error naming the column, rather than
attempting a NULL insert that would violate the constraint. The check
is a pre-flight over the resolved per-column plan, before any write.
D2 — Manual override: the set clause (fork, user-chosen: "value + list + generator + range")
An optional, comma-separated set clause overrides generation per
column. Four forms, all reusing existing grammar vocabulary so there
is nothing new to learn:
| Form | Example | Meaning |
|---|---|---|
| Fixed value | set status = 'pending' |
every row gets the constant |
| Pick-from-list | set role in ('admin', 'editor', 'viewer') |
uniform random choice from the list |
| Explicit generator | set work_addr as email |
force a named generator (D9) |
| Range | set price between 10 and 100 |
uniform in range; also dates — set signup between 2023-01-01 and 2024-12-31 |
Multiple clauses combine: seed users 20 set role in ('admin', 'user'), status = 'active', signup between 2023-01-01 and 2024-12-31.
Quoting (fork, user-chosen: "quoted, grammar-consistent"). Text
values and list items are quoted string literals ('admin'),
exactly as everywhere else in the DSL — numbers and dates stay
unquoted. This reuses the ADR-0026 expression grammar unchanged:
the DA pass confirmed that the in (...) form's operands are typed
value slots, so a bare admin would parse as a column reference
(→ "unknown column"), not a string. Quoting is therefore not a style
preference but a correctness requirement of grammar reuse. The range
form is type-aware: numeric bounds for numeric columns, date
bounds for date/datetime columns; a type-incompatible bound is a
friendly error. =, in (...), and between ... and ... are the
ADR-0026 expression operators; set is the ADR-0014 UPDATE keyword;
as is borrowed from the SQL alias slot. The as <generator> operand
is a bare name from the curated generator vocabulary (D9), not a
value. The override takes precedence over every heuristic.
D3 — Generation library: fake crate + hand-rolled gaps (fork, user-chosen: "name-aware + realistic")
Add the fake crate (v5.x at time of writing; English locale for
v1 per X2) for realistic values: names, emails, usernames, addresses,
companies, phone numbers, lorem text, dates. Generation is driven by a
per-column generator chosen by the heuristics (D7) or the override
(D2), falling back to type-based generation (D8).
Implementation-time verifications (resolved 2026-06-11 when the dependency was added):
randde-duplication — clean.fake5.1.0 depends onrand = "0.10", the same major as the project'srand 0.10.1, socargo tree -e normalresolves a singlerand 0.10.1(no runtime duplication; therand 0.8.6visible tocargo tree -i randis onlyfake's own dev-dependency, never compiled for us). Consequence for D4: one seededrand 0.10StdRngcan drive bothfake'sfake_with_rngand the hand-rolled generators — determinism is single-RNG, single-version, and sharesshortid.rs'srandversion.fakemodule inventory / features — confirmed. Default features (["either"]) cover the core string fakers used here (Name/Internet/Address/Company/Lorem/PhoneNumber);fake'schronofeature is deliberately omitted (dates generated in-house for D8's bounded windows). No commerce/product module exists →productis hand-rolled (D9). (The exact faker call sites are pinned when the generation library is built.)- Security (new-dependency posture) — clean. The
faketree (296 packages total) scanned clean by all three mandated scanners:osv-scanner(no issues),grype(no vulnerabilities),trivy fs --scanners vuln(0). No findings to document or accept.
D4 — Determinism: --seed <n> (fork, user-chosen: "optional flag")
Generation is random by default. The optional --seed <n> flag
makes a run reproducible: same database state + same --seed →
identical data. The "database state" qualifier matters (DA
refinement) — FK sampling (D14), identifier sequencing (D10), and
UNIQUE collision-avoidance all read existing rows, so reproducibility
is relative to the data already present, not absolute. Value: teachers
hand out one dataset; demos are stable; and the feature's own tests
can assert exact output (against a known starting state).
Implemented with a seedable RNG threaded through every generator (no
thread_rng on the seeded path). -- flag per ADR-0009 (opt-in
choice). Naming note: the flag --seed and the command seed share a
word but never collide grammatically (seed users 20 --seed 42 parses
unambiguously). This flag is also the determinism lever for replay
(D16): a recorded seed … --seed N line reproduces on replay; a bare
seed … line regenerates fresh data.
D5 — Both modes (A1)
seed is a canonical app-level command, available in simple and
advanced mode, no sigil — like save/load/export/replay.
D6 — Default count: 20; bounded maximum
Omitted count → 20 rows: enough to make where, group by,
order by, and limit meaningful without flooding the output pane.
A maximum is enforced (proposed 10 000) to prevent a typo
(seed t 1000000) from hanging the app or bloating the project; over
the cap → friendly error stating the limit.
D7 — Name-aware heuristics, type-gated (the catalogue)
A column's name selects a generator, but a name rule only fires
when the column's type is compatible (a column named email typed
int does not get a string — it falls through to type-based int).
Matching is case-insensitive, token-based (split on _,
camelCase, kebab), most-specific-first, with documented
false-positive guards. The catalogue (representative; full table lives
with the implementation):
| Column name (tokens) | Generator | Type gate |
|---|---|---|
first_name/fname · last_name/surname/lname |
first / last name | text |
name/full_name · title |
table-context name (D11) | text |
email/*_email |
text | |
username/login/handle |
username | text |
password/pwd |
password | text |
phone/mobile/cell/tel |
phone number | text |
city/town · country · state/province |
address parts | text |
street/address/addr · zip/postcode/postal |
address parts | text |
company/employer/org · job/position/profession |
company / job | text |
description/bio/notes/summary/comment |
sentence / paragraph | text |
url/website/homepage · color/colour |
URL / hex colour | text |
price/amount/cost/salary/balance/total |
currency-range number | numeric |
age · quantity/qty/stock/count |
18–80 · small int | numeric |
date/*_date |
date, recent ~3 yr window | date |
dob/birthday |
date, adult window (18–80 yr ago) | date |
timestamp/datetime · created_at/updated_at/*_at |
datetime, recent window (updated_at ≥ created_at) |
datetime |
is_*/has_*/active/enabled |
boolean | bool |
| identifier family (D10) | unique sequential | int/text |
| enum-ish family (D12) | generic text + flag | (text) |
False-positive guards (documented): username/filename/
table_name/*_name handled before the bare name rule so they do
not resolve to person-name; the bare name/title rule requires a
standalone token or a recognised *_name suffix.
D8 — Type-based fallback
When no name rule matches (or to satisfy a name rule's type gate),
generate by type: text→realistic words/short phrase, int→
bounded random, real→random double, decimal→formatted number,
bool→random, date/datetime→bounded recent value (never "any
point in all of history" — per the user's date concern), serial/
shortid→omitted (autogen helpers fill them), blob→unsupported
(nullable→NULL; NOT NULL→D1 block guard).
D9 — Named generators + the product generator
The generators addressable via set ... as <generator> (D2) and
chosen by D7 form a curated, named vocabulary — name,
first_name, last_name, email, username, phone, city,
country, street, zip, company, job, sentence, paragraph,
url, color, price, age, date, datetime, bool, product,
… — the single source of truth shared by the executor, the completion
source, and the highlighter (mirroring KNOWN_SQL_FUNCTIONS,
ADR-0022 Amд6).
product is hand-rolled (the fake crate has no
commerce/product module — D3): {adjective} {material} {noun} from
three small baked-in word lists (~20 each) → "Sleek Bamboo Keyboard",
"Vintage Leather Backpack". Seedable through the D4 RNG. Always
addressable as set <col> as product, and auto-selected by D11 for
the name/title family in product-ish tables.
D10 — Identifier family → unique by name (fork, user-chosen: "unique sequential")
A column in the identifier family — id, *_id that is not an FK,
code, sku, ref/reference, number/no, barcode — that is
not a serial/shortid autogen column and not the PK is treated
as an identifier and gets unique values: int → sequential
(MAX(col)+1 ascending, reads like real ids, never collides);
text → unique short code (generate-with-retry). Precedence:
FK detection wins over this rule (an FK user_id should have
duplicates — many children per parent), so *_id only triggers
uniqueness when the column is not a foreign key.
Constraint-driven uniqueness is independent and mandatory: any
column with a UNIQUE constraint (or a user-fillable single-column
PK) gets guaranteed-unique generation regardless of name — a
correctness requirement, not a heuristic. Generation for such columns
uses retry/sequence to guarantee no collision within the batch and
against existing rows.
D11 — Table-context disambiguation for name/title (fork, user-chosen: "table-context-aware")
For the name/title family only, the heuristic also reads the
table name token:
product/item/goods/merchandise/catalog/inventory→productgenerator (D9)company/companies/vendor/supplier/manufacturer/brand→ company nameuser/customer/person/people/employee/member/contact/author/student→ person name- unrecognised table → generic word
This resolves the real ambiguity (products.name → "Sleek Bamboo
Keyboard"; users.name → "Alice Martinez"; vendors.name → "Globex
Corp"). It is a deliberately scoped use of table context — the only
place the table name influences generation.
D12 — Enum-ish names → generic + post-seed advisory (fork, user-chosen: "flag enum-ish only")
Enum-ish names — role, status, type, state, kind,
category, level, tier, stage, priority, gender — have no
sensible generic generator, so they are not guessed: they fall
through to generic text (they must still be filled — a NOT NULL
status cannot be left empty). Seed then emits a post-seed advisory
(D13) naming them and pointing at the set ... in (...) override.
D13 — Reporting: post-seed advisory (fork, user-chosen: "flag enum-ish only")
After a successful seed, in addition to the normal auto-show outcome
(row count + the affected rows, per ADR-0014), seed appends a
OutputStyleClass::Hint advisory only when one or more
enum-ish columns (D12) — or columns guarded by a CHECK that seed
could not derive values from (D17) — were filled generically.
The wording is phase-aware (DA finding: the advisory must not name
features that ship later). In Phase 1 (no set clause yet) it
names the columns and explains they were filled generically. From
Phase 2/3 it points at the concrete repair:
# Phase 1 wording:
✓ Seeded 20 rows into users
ℹ status, role were filled with generic text — they look like
fixed value sets you may want to choose deliberately.
# Phase 2/3 wording (set clause + column-fill exist):
✓ Seeded 20 rows into users
ℹ status, role filled generically. Fix existing rows with
seed users.status set status in ('active','inactive'),
or pass set … on the next seed.
Note the repair for already-seeded rows is the column-fill
form (seed users.status set …), not "re-seed" (which would add more
rows) — DA correction. This is a result-time note (cheap, reusing
ADR-0038's hint rendering), not a typing-time warning. The fuller
"per-column report" (every column → its generator) was considered and
deferred (see Alternatives / Out of scope).
D14 — Foreign keys (SD1; fork on empty-parent, user-chosen: "friendly error")
- Each FK is filled by sampling uniformly from the existing
rows of the parent table's referenced column(s). Duplicates are
expected and correct (many children per parent). For a compound
FK, the referenced tuple is sampled jointly (a whole existing
parent key), never per-column independently — independent sampling
could fabricate a
(a, b)pair that exists in no parent row and would fail FK enforcement (DA refinement). - Empty parent → seed refuses with a friendly error naming the
parent and the FK column ("seed
usersfirst —orders.user_idreferences it"). Safe, predictable, teaches FK dependency order. Recursive parent auto-seed is deferred to a future--recursiveopt-in (Out of scope). - Junction / compound-PK tables (SD1's explicit case): sample
distinct combinations of the parent PK tuples to satisfy the
compound PK's uniqueness; if
countexceeds the number of available distinct combinations, cap at the maximum and note it in the outcome. - Self-referential FK (
manager_id → id): if nullable, leave NULL or point at an earlier row in the same batch; ifNOT NULLon an otherwise-empty table, friendly error. Documented edge case. - Nullable FKs are always filled in v1 (predictable); occasional-NULL injection is deferred.
D15 — Undo: one snapshot per seed (DA finding; ADR-0006)
Seed is a mutation, so it must participate in undo. The draft omitted
this; the DA found the codebase already has the right primitive —
BeginBatch / EndBatch (db.rs), used by replay so a multi-write
run collapses to one boundary snapshot. do_seed wraps its
generated writes in begin_batch / end_batch, so seed users 20
is a single undo step, not 20 — matching ADR-0006 Amendment 1's
batch model. Column-fill's bulk UPDATE is likewise one step. (import
remains the only data-affecting op outside undo, per ADR-0015 §11;
seed is firmly inside it.)
D16 — Replay: seed re-runs as a data write (fork, user-chosen)
replay re-executes a recorded seed line as a data-write
command — it is not in the app-lifecycle skip-set (see Command
classification, above). Consequence, accepted by the user: a bare
seed users 20 regenerates fresh, divergent data on each replay;
a seed users 20 --seed 42 line (the determinism lever, D4)
reproduces the original data. This keeps seed faithful to its
nature as a data write and puts reproducibility exactly where the
--seed flag already lives. (Seeded data is in any case durable
independently of replay, via the ADR-0015 CSV store + rebuild;
replay is the scripting re-run path, U4.) The DA confirmed the wiring
trap: because seed is not an AppCommand, it is correctly absent
from is_app_lifecycle_entry_word and replay dispatches it through
the normal data path rather than aborting.
D17 — CHECK constraints: derive from simple IN, else friendly-fail (fork, user-chosen)
A CHECK on a generically-filled column would otherwise fail the whole
batch (DA finding — the block guard only covered NOT NULL blob).
Two-tier handling, per the user:
- Derive from simple
IN-CHECKs. When a column's CHECK is the common enum-as-CHECK shape —col IN ('a', 'b', …)(the column's own CHECK, single-column, literal list) — seed parses out the allowed values and uses them as the generator (uniform choice). The frequentCHECK (status IN ('active','closed'))case then "just works" with no override needed. - Best-effort + friendly fail for the rest. For CHECKs seed
cannot interpret (ranges, expressions, multi-column), it generates
best-effort; if a generated row violates the CHECK, the insert
fails through the existing H1 friendly-error layer (ADR-0019)
naming the constraint and pointing at
set. Such CHECK-guarded columns are also pre-flagged in the advisory (D13) alongside enum-ish names, so the user is warned before hitting the failure.
No new CHECK engine — tier 1 is a narrow literal-IN parse over the
CHECK text already stored in metadata; tier 2 is the existing failure
path.
D18 — Auto-show is capped for large seeds (DA finding)
ADR-0014 auto-show renders "the affected rows" — fine for one insert,
a wall for a 10 000-row seed. Seed's outcome shows a capped
preview (proposed first 20 rows) with a (showing 20 of N)
note, not the full set. The row count is always reported in full;
only the rendered table is capped.
Grammar, AST, and cross-cutting wiring
Per ADR-0024, seed is registered as a CommandNode so completion,
hints, help, and usage flow from one definition. The wiring, as
explicit acceptance criteria (a /runda pass must verify each —
ADR-0045 showed "claimed verified" is not verified):
- AST + executor. A dedicated command variant (
Seed { table, target_column: Option<String>, count: Option<u32>, overrides: Vec<SeedOverride>, rng_seed: Option<u64> }) and a dedicateddo_seedworker executor.do_seedreuses shared helpers (value bindingimpl_value_for, autogen autofill, FK enrichment, the multi-row parameterised-insert pattern ofplan_autogen_autofill, the UPDATE path for column-fill, per-command persistence, thebegin_batch/end_batchundo primitive of D15) as library functions — it does not emitCommand::Insert/Command::Update(X5). - Replay / undo classification (D15/D16).
do_seedbrackets its writes in one batch (one undo step). Theseedentry word is deliberately absent fromis_app_lifecycle_entry_wordand completion'sempty_input_offers_app_command_entry_keywords(theAppCommandmirror) so replay re-runs it as a data write — an explicit acceptance check, since the default for an unlisted recognised command must be "replayed", not "abort". - Completion sources: table-name (existing tables);
.columnandset-clause column slots (columns of the named table); the generator-name vocabulary (D9) afteras;countnumber;set/=/in/as/between/andkeywords;--seedflag. - Syntax highlighting:
seedkeyword; the generator-name vocabulary highlighted astok_function(reuse the existing ADR-0022 Amд6 blue — no new theme colour). - Hints: ambient per-slot "what's next" and usage hints, both modes.
- Help:
help seedtopic (help_id+ per-command block); the generalhelplist picks it up automatically via REGISTRY. - Parse-error pedagogy (ADR-0042): near-miss matrix rows for
seed(bare / missing-table / wrong-token / malformedset), both modes. - Validity indicator (ADR-0027): typing-time
[ERR]/[WRN]for unknown table, unknown column (in.columnorset), unknown generator name afteras. - No DSL→SQL teaching echo (ADR-0038).
seedis a utility/app command, not a DSL form of a SQL statement, so the echo does not apply. (A future "show the generated INSERTs" is out of scope — it would dumpcountstatements.)
Implementation phasing
Design is whole; the implementation is phased into reviewable, test-first commits:
- Core whole-row seed — grammar/AST/executor; type-based
generation + the
fake-backed name heuristics (D7/D8/D11); identifier uniqueness (D10) + constraint uniqueness; FK sampling (joint tuples) + empty-parent error + junction distinct-combos (D14);--seeddeterminism (D4); default count + cap + zero-no-op (D6/D1); required-column block guard (D1); undo batch (D15); replay-as-data-write classification (D16); CHECK derive / friendly-fail (D17); capped auto-show (D18); the enum/CHECK advisory in its Phase-1 wording (D12/D13); full ambient wiring; both modes. - The
setoverride clause (D2) — value / list / generator / range, type-aware, with completion + highlight + validity for the generator-name slot. - Column-fill mode (
seed <table>.<column>, D1 form 2) — the UPDATE path.
Each phase is independently green before the next.
Testing (ADR-0008 tiers 1–3; test-first)
- Tier 1 (unit, deterministic via
--seed): generator selection (name × type-gate matrix, including every false-positive guard of D7); table-context disambiguation (D11); identifier uniqueness and the FK-wins-over-*_idprecedence (D10); bounded-date windows (D8); theproductgenerator shape; override resolution + precedence (D2); the required-column block guard (D1); the count cap (D6). Exact-value assertions are possible because--seedfixes the RNG. - Tier 2 (insta snapshots): the seeded data table render and the enum advisory (D13) at representative sizes, light + dark.
- Tier 3 (integration, full event loop):
seed users 20end to end (rows land in db + CSV + history, auto-show, persistence); FK sampling against a populated parent (incl. a compound FK — every child tuple exists in the parent); empty-parent friendly error; junction seeding with distinct combinations and the over-cap note; thesetclause forms (quoted literals); column- fill on existing rows (incl. refusal of PK/autogen targets, empty- table no-op); reproducibility (--seed 42twice → identical data from a fixed state); both modes. Plus the DA-driven cases: one-undo-step (seed then a singleundoremoves all rows); replay of a bareseedline (divergent) vs a--seedline (reproduced);IN-CHECK auto-derivation ("just works") and a complex-CHECK friendly failure; capped auto-show on a large seed.
"All green, no skips" is the only acceptable end state; the Phase-1 baseline (2290 passing / 0 failing / 0 skipped / 1 ignored doctest) is the regression floor.
Out of scope / deferred (future SD2 work)
- Recursive parent auto-seed (
--recursive) — D14 errors instead. - NULL injection for nullable columns (teaching optional
relationships /
IS NULL) — v1 always fills. - Multi-locale generation — English only (X2).
- User-defined custom generators (true "override hooks" — register
a named generator) — the
set ... as <builtin>surface covers the common need; custom generators are a later SD2 increment. - Full per-column seed report — D13 flags enum-ish only.
- Column-restricted insert (
seed t (a, b)) — rejected (D1). - "Show the generated SQL" teaching echo for seed.
Alternatives considered
- Hand-rolled generators only (no
fake): minimal dependency, but synthetic-looking data (text_a3f9) — rejected on pedagogy (pedagogy wins ties). - Type-only generation (no name awareness): simpler, but misses
the biggest UX win (a
userstable that reads like real people) — rejected. - Column-name-only
name(no table context): leavesproducts.name→ person names, requiring a manual override on every product/company table — rejected for thename/titlefamily (D11). - No override clause (heuristics + type only): could not answer
"the heuristic guessed wrong, fix it" or enum columns — rejected;
the
setclause (D2) is the answer to the user's Q3. - Recursive auto-seed of empty parents: powerful but magical and can seed tables the user did not name — deferred behind a future flag (D14).
- Always-random (no
--seed): simplest, but no reproducible datasets and weaker tests — rejected (D4). - Full per-column report by default: a nice teaching artifact but verbose on wide tables — deferred; flag-only advisory chosen (D13).
- Reuse
Command::Insert/do_insertdirectly from seed: tempting for code reuse, but collapses command identity and violates X5 — rejected in favour of a dedicateddo_seedthat calls shared helpers. - Skip seed on replay (classify as app-lifecycle, D16): consistent
with A1's "app-level" label and avoids divergent data, but seed is a
data write and silently skipping it on a scripted re-run is
surprising — rejected;
--seedis the determinism lever instead. - Bare-word
setlist items (in (admin, …), D2): matched the early mockups and reads cleaner, but bare words are column references in the reused grammar (would error) and would force a custom list form — rejected for quoted literals (grammar reuse + DSL consistency). - Pre-flight refuse any CHECK-bearing table (D17): safest but
blocks seeding too many legitimate tables — rejected for the
derive-
IN-else-friendly-fail tier. set-driven NULL / per-column report / recursive parent seed: deferred — see Out of scope.