Files
rdbms-playground/docs/adr/0005-column-type-vocabulary.md
claude@clouddev1 3d4a0fd45e fix(render): trim IEEE-754 noise from displayed decimal arithmetic (#32)
`decimal` is stored as exact TEXT, but SQLite has no native decimal type,
so arithmetic/aggregation implicitly coerces it to an IEEE-754 double.
The computed result carries no playground type, so `sum(price * qty)`
rendered the double's full noise — `298.59999999999997` for `298.60` — a
confusing, off-topic float lesson for a teaching tool.

Add `format_real_display`: round REAL values to 15 significant figures
(a double's reliable precision) then take the shortest round-tripping
form, collapsing `298.59999999999997` to `298.6`. Wired into `format_cell`
(result-set / `show data` cells) only — the sole surface where the noise
appears, since it arises from arithmetic.

Every other f64->string path keeps full precision for semantic, not
cosmetic, reasons: CSV persistence stays byte-exact for round-trip;
`render_value` is a canonical identity key for the uniqueness dry-runs
(dry_run_unique, check_uniqueness_collisions), where rounding would
report collisions the exact-valued engine wouldn't; FK-key matching and
EXPLAIN-SQL literals likewise stay exact.

ADR-0005 Amendment 1; +7 tests.
2026-06-12 14:42:22 +00:00

6.4 KiB
Raw Permalink Blame History

ADR-0005: Column type vocabulary

Status

Accepted

Context

Real RDBMS engines expose many type variants that exist for historical, performance, or platform reasons. A learner does not benefit from picking between VARCHAR(255), TEXT, CHAR(40), and CLOB. We control the user-facing surface and can present a small, semantically clear set of types that maps cleanly to the chosen backend (SQLite STRICT, ADR-0002).

We also want to teach two distinct lessons about identifiers:

  1. The default, easiest path: a simple auto-incrementing integer primary key. Used in 90% of intro examples.
  2. Why integers aren't always the right answer: short random identifiers that survive merging data sets, sharing, or migration without collisions.

Real UUIDs (36 characters) are too wide to display comfortably in TUI columns and exceed what learners actually need to understand the concept.

Decision

The user-facing column type vocabulary is:

User-facing type SQLite STRICT mapping Notes
text TEXT Strings of any length.
int INTEGER Plain integer.
real REAL IEEE-754 double.
decimal TEXT Stored as decimal string; rendered numeric.
bool INTEGER 0/1 internally; true/false rendered.
date TEXT ISO 8601 (YYYY-MM-DD).
datetime TEXT ISO 8601 (YYYY-MM-DDTHH:MM:SS[.fff][Z]).
blob BLOB Binary data.
serial INTEGER PK AUTOINC. Auto-incrementing integer; PK by default.
shortid TEXT 1012 char base58 random; PK by default.

shortid uses base58 (no ambiguous 0/O/I/l) and is generated client-side at insert time when the column has no value supplied.

Decimal is stored as text to preserve precision — applications that need numeric comparison must use the engine's casts; this is acceptable for a teaching context and the constraint is documented.

Compound primary keys are supported. They are essential for junction tables in m:n relationships (e.g. OrderLines keyed on (order_id, product_id)) and skipping them would teach the wrong lesson. The simplified DSL provides natural syntax for them (specifics in a later ADR).

True UUIDs are intentionally not in the type set.

Consequences

  • The type system is small enough to teach in five minutes.
  • Mapping to SQLite STRICT is mechanical and lossless for the intended use cases.
  • The shortid generator is a small, well-tested utility — bounded scope, no third-party dependency required.
  • Junction tables and other compound-key scenarios are first-class, reinforcing relational fundamentals.
  • Learners who later need a true UUID column will find that the app does not provide one; this is a deliberate trade-off in favour of TUI legibility.

Amendment 1 — display rounding of coerced doubles (2026-06-12)

Issue #32. The Decision keeps decimal exact by storing it as TEXT, noting that "numeric ops require casts" — the engine has no native decimal/BCD type (SQLite's storage classes are only NULL / INTEGER / REAL / TEXT / BLOB; NUMERIC is an affinity, not a type). What the original wording did not anticipate is that the engine performs that cast implicitly: sum(price * qty) over TEXT decimals coerces to an IEEE-754 double with no explicit cast, and the computed result carries no playground type (ADR-0030 §6), so it rendered with the double's full noise — 298.59999999999997 for 298.60. For a teaching tool that is a confusing, off-topic lesson about float representation.

Decision

Round floating-point values to 15 significant figures for display only. A double carries ~1517 significant decimal digits and the noise lives in the last one or two; rounding to 15 then taking the shortest round-tripping form of the rounded value collapses 298.59999999999997298.6 and 0.300000000000000040.3. A clean value rounds to itself, so the result is never longer than before; non-finite values pass through. Implemented as format_real_display in db.rs.

The rounding is wired into exactly one place — format_cell, the result-set / show data cell formatter — because that is the only surface where the IEEE-754 noise actually appears: noise arises from arithmetic/aggregation, whose results flow through format_cell. Every other f64-to-string path deliberately keeps full precision, and the distinction is semantic, not cosmetic:

  • Persistence stays exact. The CSV encoder (persistence::csv_io::format_real) keeps the shortest round-tripping form so a stored real survives save/load byte-for-byte — rounding there would corrupt data.
  • Uniqueness dry-runs key on exact values. render_value (the diagnostic/echo formatter) is reused as a canonical identity key by dry_run_unique (ADR-0029 §5) and check_uniqueness_collisions (ADR-0017 §4.3): they group rows by this string to predict the duplicates the engine would reject. Rounding there would merge two distinct doubles into one key and report a collision the engine — which compares exact values — would not. So render_value keeps format!("{r}"). (It also never displays a computed value, so it has no noise to trim.)
  • FK-key matching and EXPLAIN-SQL literals keep full precision — neither is a data-cell display.

Within format_cell the rounding applies to all REAL cells (stored real columns and computed results alike), for one consistent rule; the lost digits are at the double's precision limit, not real information, and a stored real typed by the user is itself noise-free so its display is unchanged in practice. Raw decimal columns are unaffected — they are TEXT and render verbatim, trailing zeros and all (100.10). Exact decimal arithmetic (a SQLite extension exposing decimal_mul/decimal_sum) was considered and rejected: it would require rewriting the user's standard-SQL operators into function calls, defeating both the "validated SQL runs verbatim" model and the goal of teaching ordinary SQL.