seed: year-like int columns (*_year, published) get unbounded values
#33
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
seedgenerates realistic values from a column's name, but there isno heuristic for year-like integer columns. A column such as
publishedorbirth_yearis just anintto the generator, so itfalls through to the unbounded type-based
intpath (ADR-0048 D8) andproduces values like
9419,6187, or1426— nonsensical as years.This was noticed while writing the website docs for
seed: the examplelibrary's
books.publishedandauthors.birth_yearcolumns producedimplausible years, which undercuts the "realistic data" pedagogy.
Reproduce
Observed
birth_yearvalues (seed 7):1426, 1427, 6187, 9512, 7436.Suggested fix
Add a name heuristic for the year family, type-gated to
int:year,*_year(e.g.birth_year,release_year), andarguably
published/founded.window for
birth_yearmirroring the existingdob→DateAdultrule), emitted as a plain
int.This slots into the D7 catalogue (
src/seed/heuristics.rs) next to theexisting date/
dob/created_atrules, plus aGeneratorvariant (or abounded
SmallInt-style range) insrc/seed/generators.rs. Tier-1exact-value tests via a fixed
--seed.Workaround (today)
Pin it explicitly with the
setclause:seed books 6 set published between 1950 and 2020.Refs
ADR-0048 D7 (name-aware heuristics) / D8 (type-based fallback).
Scope note: this is an SD2-style refinement, not in the shipped Phase 1/2.
Fixed in
deb0948.Added an
int-gated year rule to the D7 catalogue (src/seed/heuristics.rs), placed after the quantity rule soyear_count(a count) stays aSmallInt:year/*_year/published/founded→YearRecent, a bounded 1950–2025 window (75 years relative to the fixedREF_YEAR, matching this issue's ownbetween 1950 and 2020workaround).birth/born/dobtoken (e.g.birth_year) →YearBirth, mirroring the existingdob → DateAdultadult window as years (1945–2007).Both emit a plain
int.published/foundedare included (user-confirmed — anintso named is almost always a year; a flag would beis_published). Two newGeneratorvariants (YearRecent/YearBirth); deliberately not added to the D9 named-generator vocabulary — explicit control stays withset <col> between <lo> and <hi>.Repro now: the issue's
birth_yearexample produces plausible years instead of1426/9512.Tests: heuristic-selection unit tests (
birth_year→YearBirth,published/founded/release_year→YearRecent, type-gate,year_count→SmallInt), generator-window + determinism unit tests, and a fixed-seed integration test asserting membership in the bounded windows.Decision recorded in ADR-0048 Amendment 1. Full suite green (2433 pass, 1 ignored).