feat(seed): fake-data generation library + fake dependency (ADR-0048 P1.1)

The pure generation half of `seed` — no command wiring yet:
- src/seed/: ColumnSpec + Generator model and a seeded StdRng; the
  type-gated name-heuristic catalogue (D7) with documented
  false-positive guards; table-context name disambiguation (D11);
  identifier (D10) and enum-ish (D12) detection; per-type + bounded-date
  generators (D8); the hand-rolled product generator (D9); and PickFrom
  for IN-CHECK / enum lists.
- Adds the `fake` crate (v5, default features). Verified: single rand
  0.10.1 (no duplication), determinism via one seeded StdRng driving
  both fake and the hand-rolled generators, security-clean across
  osv/grype/trivy.
- ADR-0048 D3 updated to record the dependency verification.

32 Tier-1 tests (exact-value via fixed --seed); 1673 lib tests pass,
clippy all-targets clean.
This commit is contained in:
claude@clouddev1
2026-06-11 15:35:17 +00:00
parent 0af7f56821
commit 202e25a94f
7 changed files with 1072 additions and 16 deletions
+22 -16
View File
@@ -170,23 +170,29 @@ companies, phone numbers, lorem text, dates. Generation is driven by a
per-column **generator** chosen by the heuristics (D7) or the override
(D2), falling back to **type-based** generation (D8).
**Two open implementation-time verifications** (flagged honestly, to
be resolved when the dependency is locked, not assumed here):
**Implementation-time verifications (resolved 2026-06-11 when the
dependency was added):**
- **`rand` de-duplication.** The project is on `rand 0.10.1`; `fake`
brings its own `rand`. Confirm a single `rand` version resolves (a
duplicate is harmless but should be a conscious outcome, and
`shortid.rs` + the seed RNG must share the version we standardise
on).
- **`fake` module inventory.** Confirm which generators v5 actually
ships (strong prior: it has Name/Internet/Address/Company/Lorem/
Chrono/Currency/Job/Color but **no commerce/product module** — see
D9), and the minimal feature-flag set needed (derive, chrono-backed
dates).
- **Security (new-dependency posture).** `fake` and its transitive
tree must be scanned (`trivy fs`, `grype`, `osv-scanner`) before
merge, per the global new-dependency rule; findings documented, not
silently accepted.
- **`rand` de-duplication — clean.** `fake` 5.1.0 depends on
`rand = "0.10"`, the **same major** as the project's `rand 0.10.1`,
so `cargo tree -e normal` resolves a **single** `rand 0.10.1` (no
runtime duplication; the `rand 0.8.6` visible to `cargo tree -i
rand` is only `fake`'s own dev-dependency, never compiled for us).
Consequence for D4: one seeded `rand 0.10` `StdRng` can drive
**both** `fake`'s `fake_with_rng` and the hand-rolled generators —
determinism is single-RNG, single-version, and shares `shortid.rs`'s
`rand` version.
- **`fake` module inventory / features — confirmed.** Default features
(`["either"]`) cover the core string fakers used here
(Name/Internet/Address/Company/Lorem/PhoneNumber); `fake`'s `chrono`
feature is **deliberately omitted** (dates generated in-house for
D8's bounded windows). No commerce/product module exists → `product`
is hand-rolled (D9). (The exact faker call sites are pinned when the
generation library is built.)
- **Security (new-dependency posture) — clean.** The `fake` tree (296
packages total) scanned clean by **all three** mandated scanners:
`osv-scanner` (no issues), `grype` (no vulnerabilities), `trivy fs
--scanners vuln` (0). No findings to document or accept.
### D4 — Determinism: `--seed <n>` (fork, user-chosen: "optional flag")