seed: built-in value sets for conventional enum-ish columns (priority, status, …) #34

Closed
opened 2026-06-12 14:55:28 +01:00 by claude-clouddev1 · 3 comments
Collaborator

Summary

seed deliberately does not guess values for choice-like ("enum-ish")
column names — status, role, type, priority, … — because there is
"no sensible generic value" (ADR-0048 D12). They fall to generic lorem
text and trigger the post-seed advisory (D13).

For a few of those names, though, there is a near-canonical small value
set. Filling priority with low/medium/high reads far better than
eveniet consequatur consequuntur, and it is what a learner expects. This
issue proposes built-in value sets for the names where the set is genuinely
conventional, keeping D12's "don't guess" stance for the rest.

Mechanism

A name → default value-set table (a Generator::PickFrom-style list), keyed
by token and type-gated, slotted into the D7 heuristics
(src/seed/heuristics.rs) ahead of the enum-ish fallthrough. It reuses the
existing PickFrom generator, so there is no new generation machinery — only
a curated lookup. Reproducible via the existing --seed RNG.

Proposed names (confirmed scope, 2026-06-12)

Name (tokens) Type Values
priority, prio text low, medium, high
priority, prio int 1, 2, 3 (low→high)
severity text low, medium, high, critical
severity int 1, 2, 3, 4 (low→critical)
rating, stars int 15
status text a generic default set — see open question

status is the awkward one: its real values are highly domain-specific
(active/inactive, open/closed/pending, draft/published, …). Options:
(a) a neutral default like active/inactive; (b) leave status
to the advisory (it already names a good repair). Leaning (a) with a short,
clearly-generic set, but flagging for a decision. status is text only
(no sensible int mapping); state is excluded — it already maps to a US
state-name generator, and overloading it would be ambiguous.

Considered and skipped (2026-06-12)

Other names with a fairly canonical small set were considered and left out
of scope for this issue:

  • sizesmall/medium/large — skipped.
  • tier, planfree/basic/premium — too app-specific, skipped.

Deliberately not proposed (too domain-specific — keep the D12 advisory):
type, kind, category, state, stage. gender is excluded on
sensitivity grounds unless explicitly requested.

Interaction with the advisory (D13)

Any name that gains a built-in set must be removed from the enum-ish
advisory trigger
for that name (it is no longer "filled generically"), so
the advisory keeps pointing only at the names seed still can't guess.

Refs

ADR-0048 D7 (name heuristics) / D9 (named-generator vocabulary) / D12–D13
(enum-ish handling + advisory). Sibling of #33 (year-like int columns), but a
distinct mechanism (fixed pick-list vs bounded numeric range).

## Summary `seed` deliberately does **not** guess values for choice-like ("enum-ish") column names — `status`, `role`, `type`, `priority`, … — because there is "no sensible generic value" (ADR-0048 **D12**). They fall to generic lorem text and trigger the post-seed advisory (D13). For a few of those names, though, there **is** a near-canonical small value set. Filling `priority` with `low/medium/high` reads far better than `eveniet consequatur consequuntur`, and it is what a learner expects. This issue proposes built-in value sets for the names where the set is genuinely conventional, keeping D12's "don't guess" stance for the rest. ## Mechanism A name → default value-set table (a `Generator::PickFrom`-style list), keyed by token and **type-gated**, slotted into the D7 heuristics (`src/seed/heuristics.rs`) ahead of the enum-ish fallthrough. It reuses the existing `PickFrom` generator, so there is no new generation machinery — only a curated lookup. Reproducible via the existing `--seed` RNG. ## Proposed names (confirmed scope, 2026-06-12) | Name (tokens) | Type | Values | |---|---|---| | `priority`, `prio` | `text` | `low`, `medium`, `high` | | `priority`, `prio` | `int` | `1`, `2`, `3` (low→high) | | `severity` | `text` | `low`, `medium`, `high`, `critical` | | `severity` | `int` | `1`, `2`, `3`, `4` (low→critical) | | `rating`, `stars` | `int` | `1`–`5` | | `status` | `text` | a generic default set — see open question | `status` is the awkward one: its real values are highly domain-specific (`active/inactive`, `open/closed/pending`, `draft/published`, …). Options: **(a)** a neutral default like `active`/`inactive`; **(b)** leave `status` to the advisory (it already names a good repair). Leaning (a) with a short, clearly-generic set, but flagging for a decision. `status` is `text` only (no sensible int mapping); `state` is **excluded** — it already maps to a US state-name generator, and overloading it would be ambiguous. ## Considered and skipped (2026-06-12) Other names with a fairly canonical small set were considered and **left out** of scope for this issue: - `size` — `small`/`medium`/`large` — skipped. - `tier`, `plan` — `free`/`basic`/`premium` — too app-specific, skipped. Deliberately *not* proposed (too domain-specific — keep the D12 advisory): `type`, `kind`, `category`, `state`, `stage`. `gender` is excluded on sensitivity grounds unless explicitly requested. ## Interaction with the advisory (D13) Any name that gains a built-in set must be **removed from the enum-ish advisory trigger** for that name (it is no longer "filled generically"), so the advisory keeps pointing only at the names seed still can't guess. ## Refs ADR-0048 D7 (name heuristics) / D9 (named-generator vocabulary) / D12–D13 (enum-ish handling + advisory). Sibling of #33 (year-like int columns), but a distinct mechanism (fixed pick-list vs bounded numeric range).
claude-clouddev1 added the enhancement label 2026-06-12 14:55:28 +01:00
Author
Collaborator

Website cross-reference (cast re-record + visual check).

The website docs already ship a seed cast that exercises a tickets table
with priority and status columns:

  • source: website/casts-src/casts.mjs (the seed cast)
  • recording: website/public/casts/seed.cast
  • embedded on: Reference → Generating sample data
    (website/src/content/docs/reference/generating-sample-data.mdx)

It's authored as if this issue is already done — a plain
seed tickets 8 set status in (...) with priority left to auto-generation.
Until the value sets land, a fresh recording fills priority with lorem
placeholder text, which makes the rendered table wrap by a few characters.

When you implement this: re-record the cast (cd website && pnpm casts seed,
needs a ../target/debug binary) and visually check that the tickets
table tightens to fit once priority collapses to a short value
(low/medium/high). Likely redundant — casts get a full re-record sweep
before publication anyway — but flagging it here so it isn't missed.

**Website cross-reference (cast re-record + visual check).** The website docs already ship a `seed` cast that exercises a `tickets` table with `priority` and `status` columns: - source: `website/casts-src/casts.mjs` (the `seed` cast) - recording: `website/public/casts/seed.cast` - embedded on: `Reference → Generating sample data` (`website/src/content/docs/reference/generating-sample-data.mdx`) It's authored **as if this issue is already done** — a plain `seed tickets 8 set status in (...)` with `priority` left to auto-generation. Until the value sets land, a fresh recording fills `priority` with lorem placeholder text, which makes the rendered table wrap by a few characters. **When you implement this:** re-record the cast (`cd website && pnpm casts seed`, needs a `../target/debug` binary) and **visually check** that the `tickets` table tightens to fit once `priority` collapses to a short value (`low`/`medium`/`high`). Likely redundant — casts get a full re-record sweep before publication anyway — but flagging it here so it isn't missed.
Owner

Regarding status - after consideration I think we should skip it. Users will need to specify their own values here for seeding.

Regarding `status` - after consideration I think we should skip it. Users will need to specify their own values here for seeding.
Author
Collaborator

Fixed in deb0948.

Added a type-gated PickFrom lookup to the D7 catalogue (src/seed/heuristics.rs), placed ahead of the enum-ish fallthrough and reusing the existing generator (no new machinery):

Name (tokens) text int
priority / prio low/medium/high 1/2/3
severity low/medium/high/critical 1/2/3/4
rating / stars 15

A user-declared IN-CHECK still wins (resolved before the heuristics). priority was removed from ENUM_TOKENS; since the D13 advisory only fires on Generator::Generic, a PickFrom name is excluded either way, but the removal keeps is_enum_ish semantically "names seed still can't guess".

status is excluded, per your decision on this issue (2026-06-12) — its real values are too domain-specific, so it keeps the D12 "don't guess" stance: generic text + the advisory pointing at set status in (...).

Tests: heuristic-selection unit tests (each set, type-gate, CHECK-wins, status-stays-generic), plus two integration tests — a whole-row seed asserting set membership, and a column-fill (seed Tasks.priority) test that also closes a pre-existing integration gap on that path.

Website note: the seed cast (tickets table with priority) lives on the website branch; re-recording so the table tightens once priority collapses to a short value is tracked there (likely subsumed by the pre-publication cast sweep).

Decision recorded in ADR-0048 Amendment 1. Full suite green (2433 pass, 1 ignored).

Fixed in `deb0948`. Added a **type-gated `PickFrom` lookup** to the D7 catalogue (`src/seed/heuristics.rs`), placed ahead of the enum-ish fallthrough and reusing the existing generator (no new machinery): | Name (tokens) | text | int | |---|---|---| | `priority` / `prio` | `low`/`medium`/`high` | `1`/`2`/`3` | | `severity` | `low`/`medium`/`high`/`critical` | `1`/`2`/`3`/`4` | | `rating` / `stars` | — | `1`–`5` | A user-declared `IN`-CHECK still wins (resolved before the heuristics). `priority` was removed from `ENUM_TOKENS`; since the D13 advisory only fires on `Generator::Generic`, a `PickFrom` name is excluded either way, but the removal keeps `is_enum_ish` semantically "names seed still can't guess". **`status` is excluded**, per your decision on this issue (2026-06-12) — its real values are too domain-specific, so it keeps the D12 "don't guess" stance: generic text + the advisory pointing at `set status in (...)`. Tests: heuristic-selection unit tests (each set, type-gate, CHECK-wins, `status`-stays-generic), plus two integration tests — a whole-row seed asserting set membership, and a **column-fill** (`seed Tasks.priority`) test that also closes a pre-existing integration gap on that path. **Website note:** the `seed` cast (tickets table with `priority`) lives on the `website` branch; re-recording so the table tightens once `priority` collapses to a short value is tracked there (likely subsumed by the pre-publication cast sweep). Decision recorded in **ADR-0048 Amendment 1**. Full suite green (2433 pass, 1 ignored).
Sign in to join this conversation.