Specifies the curated per-cell classification (clean / lossy / incompatible) for column type changes, the static transformer matrix (numeric chains, text↔structured types, always-clean stringifications), and the PK / shortid / uniqueness-bearing handling. Replaces the B2/C2 placeholder of "rely on engine STRICT and surface its errors" with a learner-friendly model that: * refuses incompatibles up-front, * refuses lossy conversions by default with a re-run-with- --force-conversion hint, * refines the PK refusal: an inbound-FK PK is only refused when the new type would change the FK target type (so `serial → int` and `shortid → text` on FK-referenced PKs are allowed; `int → text` etc. still refuse), * adds a post-transformation uniqueness check for PK and shortid columns, * uses the pretty-table renderer (ADR-0016) for all diagnostic row lists, * emits a `[client-side] …` note in the success summary whenever the transformer rewrote any cell. `--force-conversion` accepts loss; `--dont-convert` skips the client-side layer entirely; mutually exclusive. Forward-look: a future iteration may add resolution flags (`--default 0`, `--on-incompatible '<value>'`). Also amends ADR-0002 with a new "User-facing posture" section cementing that the database engine choice is an implementation detail and is never named in user-visible strings. Adds a corresponding bullet to CLAUDE.md's working-style rules so every session picks it up. Implementation lands as a follow-up.
21 KiB
ADR-0017: Column type-change compatibility
Status
Accepted
Context
ADR-0013 introduced the rebuild-table primitive that lets us
change a column's type by reading-out, recreating-with-new-shape,
and writing-back. ADR-0014 specified the value model (per-type
literal validators in dsl/value.rs). The B2/C2 work that
landed change column [in] [table] <T>: <col> (<newtype>)
plumbed those together but left a real spec gap: what
conversions are allowed, what happens to data that doesn't
cleanly fit the new type, and how the user is told.
The current behaviour ("rely on SQLite STRICT to reject incompatible cells; surface whatever error it produces") is a placeholder — pedagogically poor (raw STRICT errors are not learner-friendly) and silently permissive in cases where SQLite would coerce values in ways the user might not intend.
This ADR specifies a curated compatibility model that combines a static "is this conversion attempted at all" matrix with a per-cell runtime classification, plus two opt-in flags that give the user control over the trade between safety and force.
Decision
1. Per-cell outcome classification
Every cell of the column being changed is classified into exactly one of three outcomes during a dry-run pass that runs before any SQL writes:
- Clean — a transformer produces a value the new type accepts without information loss. The rebuild proceeds and stores the transformed value.
- Lossy — a transformer produces a valid value, but
some property of the original cell (precision, fractional
part, time component, …) is discarded. By default these
rows refuse the operation; the user opts in to the loss
with
--force-conversion. - Incompatible — no transformer for this pair can produce a valid value for this cell. The operation is refused; no flag overrides this. The user must pre-process the data (or wait for a future syntax extension; see §5 forward-look).
The classifications are properties of (source type, target
type, cell value), not just of the type pair. The same
type pair (real → int) yields clean for cells
storing whole numbers and lossy for cells with
fractional parts.
2. Default decision tree
A change column … invocation without flags:
- Static refusal check. If the type pair has no
transformer at all (see §3 — e.g.
bool → date, anything ↔blob, anything →serial), refuse immediately with a friendly explanation of the incompatibility class. - Per-cell dry run. Apply the transformer to each cell; classify into clean / lossy / incompatible.
- Refuse on incompatibles. If any cell classifies
incompatible, refuse and present a list (capped at
100 rows; tail rendered as "… and N more"). Each
listed row identifies the row index and the offending
value plus a short reason ("not a valid int", "not
'0' or '1'", etc.). The error does NOT mention
--force-conversion: that flag does not help with incompatibles. - Refuse on lossy (default). If all cells are clean
or lossy and at least one is lossy, refuse with a
capped list of the lossy rows (showing the would-be
transformation:
row 5: 3.14 → 3 (truncated)), followed by the message "if you want to execute this conversion in spite of the problems, re-run with--force-conversion." - All clean. Proceed with the rebuild. If the transformer was non-identity (i.e. any cell required actual transformation rather than passing through unchanged), emit the client-side note (§6) alongside the success summary.
3. Static transformer matrix
Pair-wise: a Some(transformer) means "attempt with a
per-cell dry-run." None means "static refusal" (§2 step
1).
The transformers below cover the numeric and text
universe. Cells of serial columns appear only on the
source side (target serial is statically refused);
relationship-involved columns and PK columns are also
statically refused (carried over from B2/C2). Anything ↔
blob is deferred for v1.
Always-clean transformers (no per-cell loss possible)
| Source | Target | Notes |
|---|---|---|
int / serial |
real |
widening; precision caveat for ¦v¦ > 2⁵³ noted in docs but not policed |
int / serial |
decimal |
exact decimal representation |
int / serial |
text |
stringify |
bool |
int |
0/1 |
bool |
real |
0.0/1.0 |
bool |
decimal |
"0"/"1" |
bool |
text |
"true"/"false" — matches the DSL boolean grammar (§5 of ADR-0014), not SQLite's native integer stringification |
decimal |
text |
already text-backed under STRICT |
date |
text |
same |
datetime |
text |
same |
shortid |
text |
same |
real |
text |
shortest-round-trip decimal form |
Per-cell-classified transformers (clean OR lossy OR incompatible per cell)
| Source | Target | Per-cell classification |
|---|---|---|
real |
int |
clean when the value is exactly representable as an integer (e.g. 3.0); lossy when there's a fractional part (e.g. 3.14 → 3); never incompatible |
real |
decimal |
clean when the f64 round-trips through the decimal grammar; lossy otherwise (precision artifacts) |
real |
bool |
clean when value is exactly 0.0 or 1.0; incompatible otherwise |
decimal |
int |
clean when integer-valued; lossy if fractional |
decimal |
real |
clean when it fits f64 exactly; lossy on precision loss |
decimal |
bool |
clean for exact 0 / 1; incompatible otherwise |
int |
bool |
clean for 0 / 1; incompatible otherwise |
text |
int |
narrowest-first chain: try int parse (clean); fall back to real parse and truncate (lossy); else incompatible |
text |
real |
try real parse (clean); else decimal parse if it fits f64 (clean) or doesn't (lossy); else incompatible |
text |
decimal |
try decimal grammar (clean); else real parse (lossy precision); else incompatible |
text |
bool |
"true" / "false" (case-insensitive) → clean; everything else incompatible (no implicit 0 / 1 parse — matches the DSL boolean grammar) |
text |
date |
match YYYY-MM-DD → clean; everything else incompatible |
text |
datetime |
ISO-8601 datetime → clean; bare date → lossy with implicit T00:00:00Z; else incompatible |
text |
shortid |
base58 alphabet, 10–12 chars (per ADR-0014) → clean; else incompatible |
Statically refused (no entry in the matrix)
- Anything →
serial(carried over from B2/C2) - Anything → or from
blob(v1 deferral; encoding ambiguity) - Same-type identity (no-op; carried over from B2/C2)
date↔datetimedirect (deferred for v1; users route viatextif needed)- All cross-domain pairs not listed above (e.g.
bool→date,real→datetime,int→shortid)
The relationship-involvement preconditions (§4) apply before this matrix is consulted.
4. Primary-key and uniqueness-bearing columns
The B2/C2 implementation refused all type changes to PK columns and to any column involved in a declared relationship. That was too coarse: it conflated two concerns that should be split.
4.1 Inbound foreign keys: when does the cascade actually bite?
The cascade only matters when the new type would change
the FK target type that referencing columns must have. Per
ADR-0011's fk_target_type() rule:
serial.fk_target_type() == Intshortid.fk_target_type() == Text- All other types: identity
So serial → int on a PK preserves fk_target_type (both
yield Int); FK columns referencing the PK stay int,
the underlying storage is unchanged, no cascade is needed.
Same for shortid → text and text → shortid (both yield
Text).
The precondition is therefore:
If the column has any inbound FK and
old_type.fk_target_type() != new_type.fk_target_type(), refuse with a friendly cascade message ("<T>.<col>is referenced by N relationship(s); changing its type to<dst>would change the type that referencing columns require — drop those relationships first or pick a target type whose FK shape matches the current one"). Otherwise allow.
This unblocks the most-natural PK conversion (serial → int, removing auto-increment while preserving stored
values) and shortid ↔ text round-trips on PKs that have
real-world relationships.
4.2 Outbound foreign keys: refuse for v1
If the column is itself an FK (the child side of a
relationship), changing its type would either require its
new type to match the parent's fk_target_type (which
typically reduces to a no-op) or break the constraint. v1
refuses outbound-FK type changes outright; the user drops
the relationship first.
4.3 Uniqueness-bearing columns: post-transformation collision check
Some types and constraints carry a uniqueness contract that the per-cell classification can't see — multiple distinct source values can collapse to the same target value under a lossy transformation, violating the contract even though every individual cell transformed "successfully."
In v1 the uniqueness check applies to:
- Primary-key columns (the SQL-level UNIQUE+NOT NULL guarantee).
- shortid columns (the design-level contract that shortids are unique short identifiers, even when not the PK).
- (Future)
UNIQUEconstraint columns when C3's full constraint set lands.
After the per-cell pass produces transformed values, the
transformed values are checked for duplicates. Any
collision is incompatible (cross-row, structural —
no --force-conversion override; the user must clean
the source data). The error reports the colliding rows:
Cannot change
T.colfrom real to int: 2 row(s) would collapse to the same value.row 5 ('3.14') and row 12 ('3.7') would both become '3'.
In practice the only realistic incoming conversion that
exercises both the per-cell shortid-grammar check and
the uniqueness check is text → shortid. Other source
types fail per-cell (int 42 doesn't match the base58 +
length grammar) before uniqueness becomes relevant.
4.4 Combined preconditions
Putting §4.1, §4.2, §4.3 together, a change column …
invocation is refused at the precondition stage when any
of:
- The column is the child side of a relationship (outbound FK on this column).
- The column is the parent side of a relationship and
old_type.fk_target_type() != new_type.fk_target_type().
…and is refused after the per-cell dry run (and so still classifies as incompatible) when:
- The column is uniqueness-bearing (PK, shortid, or future UNIQUE) and the transformed values contain duplicates.
Otherwise the matrix's per-cell classification governs.
5. Override flags
Both flags are opt-in per ADR-0009's --<name> convention.
--force-conversion
Skips the lossy-refusal in step 4 of §2. Does not
change the static refusal (step 1) or the incompatible
refusal (step 3): no flag makes text "abc" → int work.
When invoked, the dry run still classifies cells; lossy cells transform per the matrix; the client-side note (§6) includes both the count of cells that needed transformation and the count that were lossy (split by classification).
Forward-look (not in this ADR's implementation scope). A later iteration may extend the grammar to let the user specify resolutions for incompatibles, e.g.:
change column T: c (int) --default 0
change column T: c (int) --on-incompatible '0'
…which would land cells that fail the parse with the given
default value. This generalises --force-conversion from a
binary "accept loss" toggle into a continuum and gives the
learner practical experience with the kinds of resolutions
that real-world data work needs. The current ADR
deliberately doesn't commit to syntax for that — the
binary --force-conversion is enough for v1, and the
forward-look exists to preserve the design space rather
than constrain a future ADR.
--dont-convert
Skips the entire client-side layer: no transformer, no
dry-run, no per-cell classification. Hands the source
column's raw cells to the rebuild's INSERT INTO new SELECT FROM old step and lets the database's STRICT
typing decide. Engine error text is never surfaced
verbatim; failures are reported via the same
friendly-error layer the rest of the app uses.
This is the escape hatch for users who explicitly want to see what the database itself will do — a pedagogical lever for "what does raw SQL behaviour look like here?" without dropping into advanced mode.
Mutual exclusion
--force-conversion and --dont-convert are mutually
exclusive. Specifying both is a parse error: forcing
client-side conversion while disabling client-side
conversion is contradictory. The error message names both
flags and says "pick one."
6. Reporting: the "client-side conversion was applied" note
When a successful change involves any non-identity transformation (i.e. cells were rewritten before reaching the database), the success summary includes a line of the form:
[client-side] N row(s) were transformed before being stored. In raw SQL this would need an explicit
CASTor application-level code.
When --force-conversion succeeded with lossy rows, the
note adds the lossy count specifically:
[client-side] N row(s) transformed; M of those discarded information (lossy). In raw SQL this would need an explicit
CASTor application-level code.
User-facing strings throughout this ADR — and throughout the application generally — never name the underlying database engine. The engine is an implementation detail; the playground's pedagogical surface is "the database" in the abstract. (ADR-internal prose still references SQLite where technically necessary for the spec writer; that's not user-visible.)
The note's purpose is pedagogical, not diagnostic — it points at the moment where the tool went beyond what bare SQL allows. Without it, a learner would have no way to know that "this just worked" was actually the playground doing them a favour.
7. Error presentation
Tabular detail in both error and success output is
rendered through the pretty-table renderer (ADR-0016) —
no ad-hoc indented-line layouts. The rule is: anywhere
the output describes more than a handful of rows of
structured per-row detail, it goes through
render_data_table (or an equivalent helper). This keeps
the visual identity consistent across DDL, query results,
and these conversion diagnostics.
Lossy refusals
Cannot change `T.col` from real to int: 50 row(s) would
discard information.
┌─────┬───────┬─────┬───────────────────────────────────┐
│ Row │ From │ To │ Reason │
├─────┼───────┼─────┼───────────────────────────────────┤
│ 5 │ 3.14 │ 3 │ truncated; would discard 0.14 │
│ 12 │ 2.71 │ 2 │ truncated; would discard 0.71 │
│ 18 │ 1.5 │ 1 │ truncated; would discard 0.5 │
│ … │ … │ … │ … and 47 more │
└─────┴───────┴─────┴───────────────────────────────────┘
if you want to execute this conversion in spite of the
problems, re-run with `--force-conversion`.
Incompatible refusals
Cannot change `T.col` from text to int: 3 row(s) cannot
be converted.
┌─────┬───────┬───────────────────────┐
│ Row │ Value │ Reason │
├─────┼───────┼───────────────────────┤
│ 3 │ abc │ not a valid int │
│ 7 │ x42 │ not a valid int │
│ 12 │ │ not a valid int │
└─────┴───────┴───────────────────────┘
The trailing --force-conversion hint is omitted for
incompatibles (no flag helps; future syntax — §5
forward-look — would re-introduce one).
Uniqueness collisions
Cannot change `T.col` from real to int: 1 collision(s)
would violate uniqueness.
┌─────────┬─────────────────┬──────────────────┐
│ Becomes │ Source rows │ Source values │
├─────────┼─────────────────┼──────────────────┤
│ 3 │ row 5, row 12 │ 3.14, 3.7 │
└─────────┴─────────────────┴──────────────────┘
Common rules
- Each detail table is capped at 100 rows. Beyond that,
a single trailing row with
…placeholders and the literal text "and N more" inside the row is rendered inside the table — not as a footer line. Keeps the bordered shape intact. - Row indices are 1-based to match how learners count rows; the tool counts internally from 0.
- Numeric "Row" / "Becomes" columns inherit numeric right-alignment from ADR-0016 §2.
- Cells that would render multi-line content (for
text →conversions where source values contain newlines) honour ADR-0016 §3's↵substitution, so the table stays one display row per logical row.
8. Out of scope
- OOS-1. Anything ↔
blobconversion. Encoding ambiguity (base64? raw bytes? UTF-8 attempt?) deserves its own discussion. - OOS-2.
date↔datetimedirect conversion. Format rewriting is small but warrants a per-conversion test matrix; defer until a real user need surfaces. - OOS-3. Resolution-specifying flags (
--default,--on-incompatible '<value>', etc.) per the §5 forward-look. - OOS-4. Bulk conversions across multiple columns in
one command. Each
change columnruns independently. - OOS-5. Cross-row contextual transformations (e.g. "rank by value to fit a smaller numeric range"). The transformer is per-cell, deliberately stateless.
Consequences
- The placeholder behaviour ("rely on SQLite STRICT") is replaced with a documented per-cell model that produces learner-friendly errors and pedagogical client-side notes.
- The transformer matrix is an additional surface to keep
in step with
dsl/value.rs's validators. Each new user-facing type added to the type vocabulary needs its matrix entries reviewed. - The
[client-side] …note is the load-bearing pedagogical artefact: it's how a learner discovers that the tool did them a favour. Future visualisation / styling work (V4) should preserve its prominence. --dont-convertkeeps the door open to a "raw SQL behaviour" learning mode without forcing the user into advanced mode.- The forward-look in §5 means
--force-conversion's semantics may broaden later. Implementations should treat the flag's effect as "accept loss" rather than as the canonical resolution mechanism.
Relationship to earlier ADRs
- ADR-0013 — the rebuild-table primitive remains the mechanism. This ADR adds a per-row transformation step between read-out and write-back; the existing primitive can be parametrised by a row-by-row transformer or paired with a sibling helper.
- ADR-0014 — the per-type validators in
dsl/value.rspower the dry-run classification. No changes to those validators; they're consumed read-only here. - ADR-0011 — FK target-type compatibility is unaffected. Relationship-involved columns are statically refused before this ADR's matrix is consulted.
- ADR-0009 — the
--force-conversionand--dont-convertflags follow the established long-flag opt-in convention.