# ADR-0005: Column type vocabulary ## Status Accepted ## Context Real RDBMS engines expose many type variants that exist for historical, performance, or platform reasons. A learner does not benefit from picking between `VARCHAR(255)`, `TEXT`, `CHAR(40)`, and `CLOB`. We control the user-facing surface and can present a small, semantically clear set of types that maps cleanly to the chosen backend (SQLite STRICT, ADR-0002). We also want to teach two distinct lessons about identifiers: 1. The default, easiest path: a simple auto-incrementing integer primary key. Used in 90% of intro examples. 2. Why integers aren't always the right answer: short random identifiers that survive merging data sets, sharing, or migration without collisions. Real UUIDs (36 characters) are too wide to display comfortably in TUI columns and exceed what learners actually need to understand the concept. ## Decision The user-facing column type vocabulary is: | User-facing type | SQLite STRICT mapping | Notes | |------------------|-----------------------|-------------------------------------------| | `text` | `TEXT` | Strings of any length. | | `int` | `INTEGER` | Plain integer. | | `real` | `REAL` | IEEE-754 double. | | `decimal` | `TEXT` | Stored as decimal string; rendered numeric. | | `bool` | `INTEGER` | 0/1 internally; `true`/`false` rendered. | | `date` | `TEXT` | ISO 8601 (`YYYY-MM-DD`). | | `datetime` | `TEXT` | ISO 8601 (`YYYY-MM-DDTHH:MM:SS[.fff][Z]`).| | `blob` | `BLOB` | Binary data. | | `serial` | `INTEGER PK AUTOINC.` | Auto-incrementing integer; PK by default. | | `shortid` | `TEXT` | 10–12 char base58 random; PK by default. | `shortid` uses base58 (no ambiguous `0`/`O`/`I`/`l`) and is generated client-side at insert time when the column has no value supplied. Decimal is stored as text to preserve precision — applications that need numeric comparison must use the engine's casts; this is acceptable for a teaching context and the constraint is documented. **Compound primary keys are supported.** They are essential for junction tables in m:n relationships (e.g. `OrderLines` keyed on `(order_id, product_id)`) and skipping them would teach the wrong lesson. The simplified DSL provides natural syntax for them (specifics in a later ADR). True UUIDs are intentionally **not** in the type set. ## Consequences - The type system is small enough to teach in five minutes. - Mapping to SQLite STRICT is mechanical and lossless for the intended use cases. - The shortid generator is a small, well-tested utility — bounded scope, no third-party dependency required. - Junction tables and other compound-key scenarios are first-class, reinforcing relational fundamentals. - Learners who later need a true UUID column will find that the app does not provide one; this is a deliberate trade-off in favour of TUI legibility. ## Amendment 1 — display rounding of coerced doubles (2026-06-12) Issue #32. The Decision keeps `decimal` exact by storing it as TEXT, noting that "numeric ops require casts" — the engine has no native decimal/BCD type (SQLite's storage classes are only NULL / INTEGER / REAL / TEXT / BLOB; `NUMERIC` is an affinity, not a type). What the original wording did not anticipate is that the engine performs that cast **implicitly**: `sum(price * qty)` over TEXT decimals coerces to an IEEE-754 double with no explicit cast, and the computed result carries no playground type (ADR-0030 §6), so it rendered with the double's full noise — `298.59999999999997` for `298.60`. For a teaching tool that is a confusing, off-topic lesson about float representation. ### Decision **Round floating-point values to 15 significant figures for display only.** A double carries ~15–17 significant decimal digits and the noise lives in the last one or two; rounding to 15 then taking the shortest round-tripping form of the rounded value collapses `298.59999999999997` → `298.6` and `0.30000000000000004` → `0.3`. A clean value rounds to itself, so the result is never longer than before; non-finite values pass through. Implemented as `format_real_display` in `db.rs`. The rounding is wired into **exactly one place — `format_cell`, the result-set / `show data` cell formatter** — because that is the only surface where the IEEE-754 noise actually appears: noise arises from *arithmetic/aggregation*, whose results flow through `format_cell`. Every other `f64`-to-string path deliberately keeps full precision, and the distinction is **semantic, not cosmetic**: - **Persistence stays exact.** The CSV encoder (`persistence::csv_io::format_real`) keeps the shortest round-tripping form so a stored `real` survives save/load byte-for-byte — rounding there would corrupt data. - **Uniqueness dry-runs key on exact values.** `render_value` (the diagnostic/echo formatter) is reused as a *canonical identity key* by `dry_run_unique` (ADR-0029 §5) and `check_uniqueness_collisions` (ADR-0017 §4.3): they group rows by this string to predict the duplicates the engine would reject. Rounding there would merge two distinct doubles into one key and report a collision the engine — which compares exact values — would not. So `render_value` keeps `format!("{r}")`. (It also never displays a *computed* value, so it has no noise to trim.) - **FK-key matching and EXPLAIN-SQL literals keep full precision** — neither is a data-cell display. Within `format_cell` the rounding applies to **all** REAL cells (stored `real` columns and computed results alike), for one consistent rule; the lost digits are at the double's precision limit, not real information, and a stored `real` typed by the user is itself noise-free so its display is unchanged in practice. Raw `decimal` columns are unaffected — they are TEXT and render verbatim, trailing zeros and all (`100.10`). Exact decimal *arithmetic* (a SQLite extension exposing `decimal_mul`/`decimal_sum`) was considered and rejected: it would require rewriting the user's standard-SQL operators into function calls, defeating both the "validated SQL runs verbatim" model and the goal of teaching ordinary SQL.