feat(seed): year-as-int + conventional choice-set heuristics (#33, #34)

Two additive D7 catalogue rules, surfaced while writing the website seed
docs. No change to the type fallback, executor, or grammar.

#33 — year-like int columns. `published`/`birth_year` were just `int`, so
they fell to the unbounded int path and produced nonsense (`9419`). Add an
int-gated year rule (after the quantity rule, so `year_count` stays a
count): `year`/`*_year`/`published`/`founded` -> a bounded 1950-2025 year
(new `YearRecent`), or the dob-style birth window 1945-2007 for
`birth`/`born`/`dob` (new `YearBirth`). Plain int; not added to the D9
named-generator vocabulary.

#34 — conventional choice sets. A few enum-ish names have a near-canonical
small set that reads far better than lorem text. Add a type-gated PickFrom
lookup (reusing the existing generator): priority/prio, severity,
rating/stars. `status` is deliberately excluded (values too
domain-specific) and keeps the D12 advisory; a user IN-CHECK still wins.
`priority` leaves ENUM_TOKENS.

ADR-0048 Amendment 1; +8 tests (incl. a column-fill integration test that
also closes a pre-existing gap on that path).
This commit is contained in:
claude@clouddev1
2026-06-12 20:36:20 +00:00
parent fde50ce3bf
commit deb0948d6c
7 changed files with 374 additions and 4 deletions
@@ -317,6 +317,8 @@ with the implementation):
| `url`/`website`/`homepage` · `color`/`colour` | URL / hex colour | text |
| `price`/`amount`/`cost`/`salary`/`balance`/`total` | currency-range number | numeric |
| `age` · `quantity`/`qty`/`stock`/`count` | 1880 · small int | numeric |
| `year`/`*_year`/`published`/`founded` (Amendment 1) | bounded year (birth window for `birth`/`born`/`dob`, else 19502025) | int |
| `priority`/`prio` · `severity` · `rating`/`stars` (Amendment 1) | built-in `PickFrom` value set | text/int |
| `date`/`*_date` | date, recent ~3 yr window | date |
| `dob`/`birthday` | date, adult window (1880 yr ago) | date |
| `timestamp`/`datetime` · `created_at`/`updated_at`/`*_at` | datetime, recent window (`updated_at``created_at`) | datetime |
@@ -675,3 +677,66 @@ the regression floor.
derive-`IN`-else-friendly-fail tier.
- **`set`-driven NULL / per-column report / recursive parent seed:**
deferred — see Out of scope.
## Amendment 1 — year-as-int + conventional choice sets (2026-06-12)
Two SD2-style refinements to the D7 catalogue, surfaced while writing
the website `seed` docs. Both are additive name rules; no change to D8
(type fallback), the executor, or the grammar.
### Issue #33 — year-like `int` columns
A column such as `published` or `birth_year` was just an `int`, so it
fell through to the unbounded type-based `int` path (D8) and produced
nonsense like `9419` or `1426` — implausible as years, undercutting the
"realistic data" pedagogy. Added an **`int`-gated** year rule, placed
*after* the quantity rule (so `year_count` stays a count):
- `year` / `*_year` / `published` / `founded` → **`YearRecent`**, a
bounded window of **19502025** (75 years relative to the fixed
`REF_YEAR`, wide enough for published books / founding years /
release years; matches the issue's own `between 1950 and 2020`
workaround).
- the same with a `birth` / `born` / `dob` token (e.g. `birth_year`) →
**`YearBirth`**, mirroring the existing `dob → DateAdult` adult birth
window as years (**19452007**).
Both emit a plain `int`. `published` / `founded` are included
(user-confirmed): an `int` so named is almost always a year (a flag
would be `is_published`). The generators are **not** added to the D9
named-generator vocabulary — explicit control stays with `set <col>
between <lo> and <hi>`.
### Issue #34 — built-in value sets for conventional choice names
D12 deliberately does not guess values for enum-ish names. For a few,
though, there is a near-canonical small set that reads far better than
lorem text. Added a **type-gated `PickFrom`** lookup (reusing the
existing generator — no new machinery), placed ahead of the enum-ish
fallthrough:
| Name (tokens) | text | int |
|---|---|---|
| `priority` / `prio` | `low`/`medium`/`high` | `1`/`2`/`3` |
| `severity` | `low`/`medium`/`high`/`critical` | `1`/`2`/`3`/`4` |
| `rating` / `stars` | — | `1``5` |
A user-declared `IN`-CHECK (D17) still wins — it is resolved before the
heuristics. Any name that gains a set is **removed from the enum-ish
advisory trigger** (`priority` left `ENUM_TOKENS`); since the advisory
(D13) only fires on `Generator::Generic`, a `PickFrom` name is excluded
either way, but the removal keeps `is_enum_ish` semantically "names seed
still can't guess".
**`status` is deliberately excluded** (user-confirmed on the issue): its
real values are too domain-specific (`active/inactive`,
`open/closed/pending`, `draft/published`, …), so it keeps the D12
"don't guess" stance — generic text + the advisory pointing at `set
status in (…)`. `state` stays its US-state-name generator (D7);
`type`/`kind`/`category`/`stage`/`gender` and `size`/`tier`/`plan` were
considered and left to the advisory.
**Website follow-up** (tracked on the `website` branch, not here): the
`seed` cast exercises a `tickets` table with `priority`; it should be
re-recorded so the table tightens once `priority` collapses to a short
value — likely subsumed by the pre-publication cast sweep.
+1 -1
View File
File diff suppressed because one or more lines are too long