Files
rdbms-playground/docs/adr/0017-column-type-change-compatibility.md
T
claude@clouddev1 00947b928c ADR-0017 implementation: per-cell type-change with override flags
Replaces the placeholder "trust STRICT" body of do_change_column_type
with the per-cell transformer matrix from ADR-0017. Adds:

- src/type_change.rs: CellOutcome { Clean / Lossy / Incompatible }
  + transform_cell + static_refusal covering every matrix pair
  from §3 (54 unit tests).
- --force-conversion and --dont-convert flags on `change column`
  (mutually exclusive at parse time per §5).
- Refined PK rule (§4.1): refused only when the column has an
  inbound FK and fk_target_type would change. Outbound-FK columns
  still refused outright (§4.2). PK / shortid uniqueness checked
  post-transformation (§4.3).
- Bordered diagnostic tables (lossy / incompatible / collision)
  via the pretty-table renderer (§7) — uses ADR-0016's primitives.
- [client-side] success note (§6) when any cell was rewritten.
- Friendly wrapper for engine-level errors under --dont-convert
  so no engine vocabulary leaks (ADR-0002 user-facing posture).

ADR-0017 §3 + §7 amended in place (with user sign-off): serial->int
added explicitly to the always-clean matrix, and diagnostic rows
identify themselves by PK value(s) rather than positional indices
(SQLite returns rows unordered without ORDER BY, so positional
"row 5" is unaddressable).

Tests: 449 -> 517 (+68). Clippy clean with nursery lints.
2026-05-08 13:21:07 +00:00

23 KiB
Raw Blame History

ADR-0017: Column type-change compatibility

Status

Accepted

Context

ADR-0013 introduced the rebuild-table primitive that lets us change a column's type by reading-out, recreating-with-new-shape, and writing-back. ADR-0014 specified the value model (per-type literal validators in dsl/value.rs). The B2/C2 work that landed change column [in] [table] <T>: <col> (<newtype>) plumbed those together but left a real spec gap: what conversions are allowed, what happens to data that doesn't cleanly fit the new type, and how the user is told.

The current behaviour ("rely on SQLite STRICT to reject incompatible cells; surface whatever error it produces") is a placeholder — pedagogically poor (raw STRICT errors are not learner-friendly) and silently permissive in cases where SQLite would coerce values in ways the user might not intend.

This ADR specifies a curated compatibility model that combines a static "is this conversion attempted at all" matrix with a per-cell runtime classification, plus two opt-in flags that give the user control over the trade between safety and force.

Decision

1. Per-cell outcome classification

Every cell of the column being changed is classified into exactly one of three outcomes during a dry-run pass that runs before any SQL writes:

  • Clean — a transformer produces a value the new type accepts without information loss. The rebuild proceeds and stores the transformed value.
  • Lossy — a transformer produces a valid value, but some property of the original cell (precision, fractional part, time component, …) is discarded. By default these rows refuse the operation; the user opts in to the loss with --force-conversion.
  • Incompatible — no transformer for this pair can produce a valid value for this cell. The operation is refused; no flag overrides this. The user must pre-process the data (or wait for a future syntax extension; see §5 forward-look).

The classifications are properties of (source type, target type, cell value), not just of the type pair. The same type pair (realint) yields clean for cells storing whole numbers and lossy for cells with fractional parts.

2. Default decision tree

A change column … invocation without flags:

  1. Static refusal check. If the type pair has no transformer at all (see §3 — e.g. bool → date, anything ↔ blob, anything → serial), refuse immediately with a friendly explanation of the incompatibility class.
  2. Per-cell dry run. Apply the transformer to each cell; classify into clean / lossy / incompatible.
  3. Refuse on incompatibles. If any cell classifies incompatible, refuse and present a list (capped at 100 rows; tail rendered as "… and N more"). Each listed row identifies the row index and the offending value plus a short reason ("not a valid int", "not '0' or '1'", etc.). The error does NOT mention --force-conversion: that flag does not help with incompatibles.
  4. Refuse on lossy (default). If all cells are clean or lossy and at least one is lossy, refuse with a capped list of the lossy rows (showing the would-be transformation: row 5: 3.14 → 3 (truncated)), followed by the message "if you want to execute this conversion in spite of the problems, re-run with --force-conversion."
  5. All clean. Proceed with the rebuild. If the transformer was non-identity (i.e. any cell required actual transformation rather than passing through unchanged), emit the client-side note (§6) alongside the success summary.

3. Static transformer matrix

Pair-wise: a Some(transformer) means "attempt with a per-cell dry-run." None means "static refusal" (§2 step 1).

The transformers below cover the numeric and text universe. Cells of serial columns appear only on the source side (target serial is statically refused); relationship-involved columns and PK columns are also statically refused (carried over from B2/C2). Anything ↔ blob is deferred for v1.

Always-clean transformers (no per-cell loss possible)

Source Target Notes
int / serial real widening; precision caveat for ¦v¦ > 2⁵³ noted in docs but not policed
int / serial decimal exact decimal representation
int / serial text stringify
serial int identity at the storage class level (both store as INTEGER); drops the auto-increment metadata. The canonical PK conversion enabled by §4.1's fk_target_type-aware refinement.
bool int 0/1
bool real 0.0/1.0
bool decimal "0"/"1"
bool text "true"/"false" — matches the DSL boolean grammar (§5 of ADR-0014), not SQLite's native integer stringification
decimal text already text-backed under STRICT
date text same
datetime text same
shortid text same
real text shortest-round-trip decimal form

Per-cell-classified transformers (clean OR lossy OR incompatible per cell)

Source Target Per-cell classification
real int clean when the value is exactly representable as an integer (e.g. 3.0); lossy when there's a fractional part (e.g. 3.14 → 3); never incompatible
real decimal clean when the f64 round-trips through the decimal grammar; lossy otherwise (precision artifacts)
real bool clean when value is exactly 0.0 or 1.0; incompatible otherwise
decimal int clean when integer-valued; lossy if fractional
decimal real clean when it fits f64 exactly; lossy on precision loss
decimal bool clean for exact 0 / 1; incompatible otherwise
int bool clean for 0 / 1; incompatible otherwise
text int narrowest-first chain: try int parse (clean); fall back to real parse and truncate (lossy); else incompatible
text real try real parse (clean); else decimal parse if it fits f64 (clean) or doesn't (lossy); else incompatible
text decimal try decimal grammar (clean); else real parse (lossy precision); else incompatible
text bool "true" / "false" (case-insensitive) → clean; everything else incompatible (no implicit 0 / 1 parse — matches the DSL boolean grammar)
text date match YYYY-MM-DD → clean; everything else incompatible
text datetime ISO-8601 datetime → clean; bare date → lossy with implicit T00:00:00Z; else incompatible
text shortid base58 alphabet, 1012 chars (per ADR-0014) → clean; else incompatible

Statically refused (no entry in the matrix)

  • Anything → serial (carried over from B2/C2)
  • Anything → or from blob (v1 deferral; encoding ambiguity)
  • Same-type identity (no-op; carried over from B2/C2)
  • datedatetime direct (deferred for v1; users route via text if needed)
  • All cross-domain pairs not listed above (e.g. booldate, realdatetime, intshortid)

The relationship-involvement preconditions (§4) apply before this matrix is consulted.

4. Primary-key and uniqueness-bearing columns

The B2/C2 implementation refused all type changes to PK columns and to any column involved in a declared relationship. That was too coarse: it conflated two concerns that should be split.

4.1 Inbound foreign keys: when does the cascade actually bite?

The cascade only matters when the new type would change the FK target type that referencing columns must have. Per ADR-0011's fk_target_type() rule:

  • serial.fk_target_type() == Int
  • shortid.fk_target_type() == Text
  • All other types: identity

So serial → int on a PK preserves fk_target_type (both yield Int); FK columns referencing the PK stay int, the underlying storage is unchanged, no cascade is needed. Same for shortid → text and text → shortid (both yield Text).

The precondition is therefore:

If the column has any inbound FK and old_type.fk_target_type() != new_type.fk_target_type(), refuse with a friendly cascade message ("<T>.<col> is referenced by N relationship(s); changing its type to <dst> would change the type that referencing columns require — drop those relationships first or pick a target type whose FK shape matches the current one"). Otherwise allow.

This unblocks the most-natural PK conversion (serial → int, removing auto-increment while preserving stored values) and shortid ↔ text round-trips on PKs that have real-world relationships.

4.2 Outbound foreign keys: refuse for v1

If the column is itself an FK (the child side of a relationship), changing its type would either require its new type to match the parent's fk_target_type (which typically reduces to a no-op) or break the constraint. v1 refuses outbound-FK type changes outright; the user drops the relationship first.

4.3 Uniqueness-bearing columns: post-transformation collision check

Some types and constraints carry a uniqueness contract that the per-cell classification can't see — multiple distinct source values can collapse to the same target value under a lossy transformation, violating the contract even though every individual cell transformed "successfully."

In v1 the uniqueness check applies to:

  • Primary-key columns (the SQL-level UNIQUE+NOT NULL guarantee).
  • shortid columns (the design-level contract that shortids are unique short identifiers, even when not the PK).
  • (Future) UNIQUE constraint columns when C3's full constraint set lands.

After the per-cell pass produces transformed values, the transformed values are checked for duplicates. Any collision is incompatible (cross-row, structural — no --force-conversion override; the user must clean the source data). The error reports the colliding rows:

Cannot change T.col from real to int: 2 row(s) would collapse to the same value.

row 5 ('3.14') and row 12 ('3.7') would both become '3'.

In practice the only realistic incoming conversion that exercises both the per-cell shortid-grammar check and the uniqueness check is text → shortid. Other source types fail per-cell (int 42 doesn't match the base58 + length grammar) before uniqueness becomes relevant.

4.4 Combined preconditions

Putting §4.1, §4.2, §4.3 together, a change column … invocation is refused at the precondition stage when any of:

  • The column is the child side of a relationship (outbound FK on this column).
  • The column is the parent side of a relationship and old_type.fk_target_type() != new_type.fk_target_type().

…and is refused after the per-cell dry run (and so still classifies as incompatible) when:

  • The column is uniqueness-bearing (PK, shortid, or future UNIQUE) and the transformed values contain duplicates.

Otherwise the matrix's per-cell classification governs.

5. Override flags

Both flags are opt-in per ADR-0009's --<name> convention.

--force-conversion

Skips the lossy-refusal in step 4 of §2. Does not change the static refusal (step 1) or the incompatible refusal (step 3): no flag makes text "abc" → int work.

When invoked, the dry run still classifies cells; lossy cells transform per the matrix; the client-side note (§6) includes both the count of cells that needed transformation and the count that were lossy (split by classification).

Forward-look (not in this ADR's implementation scope). A later iteration may extend the grammar to let the user specify resolutions for incompatibles, e.g.:

change column T: c (int) --default 0
change column T: c (int) --on-incompatible '0'

…which would land cells that fail the parse with the given default value. This generalises --force-conversion from a binary "accept loss" toggle into a continuum and gives the learner practical experience with the kinds of resolutions that real-world data work needs. The current ADR deliberately doesn't commit to syntax for that — the binary --force-conversion is enough for v1, and the forward-look exists to preserve the design space rather than constrain a future ADR.

--dont-convert

Skips the entire client-side layer: no transformer, no dry-run, no per-cell classification. Hands the source column's raw cells to the rebuild's INSERT INTO new SELECT FROM old step and lets the database's STRICT typing decide. Engine error text is never surfaced verbatim; failures are reported via the same friendly-error layer the rest of the app uses.

This is the escape hatch for users who explicitly want to see what the database itself will do — a pedagogical lever for "what does raw SQL behaviour look like here?" without dropping into advanced mode.

Mutual exclusion

--force-conversion and --dont-convert are mutually exclusive. Specifying both is a parse error: forcing client-side conversion while disabling client-side conversion is contradictory. The error message names both flags and says "pick one."

6. Reporting: the "client-side conversion was applied" note

When a successful change involves any non-identity transformation (i.e. cells were rewritten before reaching the database), the success summary includes a line of the form:

[client-side] N row(s) were transformed before being stored. In raw SQL this would need an explicit CAST or application-level code.

When --force-conversion succeeded with lossy rows, the note adds the lossy count specifically:

[client-side] N row(s) transformed; M of those discarded information (lossy). In raw SQL this would need an explicit CAST or application-level code.

User-facing strings throughout this ADR — and throughout the application generally — never name the underlying database engine. The engine is an implementation detail; the playground's pedagogical surface is "the database" in the abstract. (ADR-internal prose still references SQLite where technically necessary for the spec writer; that's not user-visible.)

The note's purpose is pedagogical, not diagnostic — it points at the moment where the tool went beyond what bare SQL allows. Without it, a learner would have no way to know that "this just worked" was actually the playground doing them a favour.

7. Error presentation

Tabular detail in both error and success output is rendered through the pretty-table renderer (ADR-0016) — no ad-hoc indented-line layouts. The rule is: anywhere the output describes more than a handful of rows of structured per-row detail, it goes through render_data_table (or an equivalent helper). This keeps the visual identity consistent across DDL, query results, and these conversion diagnostics.

Lossy refusals

Cannot change `T.col` from real to int: 50 row(s) would
discard information.

┌─────────┬───────┬─────┬───────────────────────────────────┐
│ id (PK) │ From  │ To  │ Reason                            │
├─────────┼───────┼─────┼───────────────────────────────────┤
│       5 │ 3.14  │   3 │ truncated; would discard 0.14     │
│      12 │ 2.71  │   2 │ truncated; would discard 0.71     │
│      18 │ 1.5   │   1 │ truncated; would discard 0.5      │
│       … │     … │   … │ … and 47 more                     │
└─────────┴───────┴─────┴───────────────────────────────────┘

if you want to execute this conversion in spite of the
problems, re-run with `--force-conversion`.

Incompatible refusals

Cannot change `T.col` from text to int: 3 row(s) cannot
be converted.

┌─────────┬───────┬───────────────────────┐
│ id (PK) │ Value │ Reason                │
├─────────┼───────┼───────────────────────┤
│       3 │ abc   │ not a valid int       │
│       7 │ x42   │ not a valid int       │
│      12 │       │ not a valid int       │
└─────────┴───────┴───────────────────────┘

The trailing --force-conversion hint is omitted for incompatibles (no flag helps; future syntax — §5 forward-look — would re-introduce one).

Uniqueness collisions

Cannot change `T.col` from real to int: 1 collision(s)
would violate uniqueness.

┌─────────┬──────────────────┬──────────────────┐
│ Becomes │ Source rows (id) │ Source values    │
├─────────┼──────────────────┼──────────────────┤
│       3 │ 5, 12            │ 3.14, 3.7        │
└─────────┴──────────────────┴──────────────────┘

Common rules

  • Each detail table is capped at 100 rows. Beyond that, a single trailing row with placeholders and the literal text "and N more" inside the row is rendered inside the table — not as a footer line. Keeps the bordered shape intact.
  • Rows are identified by their primary-key value(s), not by positional indices. SQLite returns rows in unspecified order without ORDER BY, so a positional "row 5" would not be reproducible or addressable by the user. The PK is the natural row identifier in a relational setting and is what the user would type in a where clause to find or fix the offending cell.
    • Single PK: rendered as one column whose header is the PK column name with a trailing (PK) marker (e.g. id (PK)); cells carry the raw PK value with no column= prefix. The marker appears once per table, in the header.
    • Compound PK: one column per PK component, each header annotated (PK) (e.g. a (PK), b (PK)); cells carry the raw component values.
    • Uniqueness-collision tables list the colliding rows' PK values comma-separated inside a single Source rows cell whose header carries the PK column name(s) in parentheses (e.g. Source rows (id) or Source rows (a, b)). Compound-PK source rows render as tuples: (1,2), (1,3).
  • The change-column command always operates on a table with at least one PK column (every create table in v1 produces a PK; the AST permits PK-less tables, but no grammar produces one today). If a PK-less surface ever lands, this section will be revisited.
  • Numeric PK and "Becomes" columns inherit numeric right-alignment from ADR-0016 §2.
  • Cells that would render multi-line content (for text → conversions where source values contain newlines) honour ADR-0016 §3's substitution, so the table stays one display row per logical row.

8. Out of scope

  • OOS-1. Anything ↔ blob conversion. Encoding ambiguity (base64? raw bytes? UTF-8 attempt?) deserves its own discussion.
  • OOS-2. datedatetime direct conversion. Format rewriting is small but warrants a per-conversion test matrix; defer until a real user need surfaces.
  • OOS-3. Resolution-specifying flags (--default, --on-incompatible '<value>', etc.) per the §5 forward-look.
  • OOS-4. Bulk conversions across multiple columns in one command. Each change column runs independently.
  • OOS-5. Cross-row contextual transformations (e.g. "rank by value to fit a smaller numeric range"). The transformer is per-cell, deliberately stateless.

Consequences

  • The placeholder behaviour ("rely on SQLite STRICT") is replaced with a documented per-cell model that produces learner-friendly errors and pedagogical client-side notes.
  • The transformer matrix is an additional surface to keep in step with dsl/value.rs's validators. Each new user-facing type added to the type vocabulary needs its matrix entries reviewed.
  • The [client-side] … note is the load-bearing pedagogical artefact: it's how a learner discovers that the tool did them a favour. Future visualisation / styling work (V4) should preserve its prominence.
  • --dont-convert keeps the door open to a "raw SQL behaviour" learning mode without forcing the user into advanced mode.
  • The forward-look in §5 means --force-conversion's semantics may broaden later. Implementations should treat the flag's effect as "accept loss" rather than as the canonical resolution mechanism.

Relationship to earlier ADRs

  • ADR-0013 — the rebuild-table primitive remains the mechanism. This ADR adds a per-row transformation step between read-out and write-back; the existing primitive can be parametrised by a row-by-row transformer or paired with a sibling helper.
  • ADR-0014 — the per-type validators in dsl/value.rs power the dry-run classification. No changes to those validators; they're consumed read-only here.
  • ADR-0011 — FK target-type compatibility is unaffected. Relationship-involved columns are statically refused before this ADR's matrix is consulted.
  • ADR-0009 — the --force-conversion and --dont-convert flags follow the established long-flag opt-in convention.