# ADR-0017: Column type-change compatibility ## Status Accepted ## Context ADR-0013 introduced the rebuild-table primitive that lets us change a column's type by reading-out, recreating-with-new-shape, and writing-back. ADR-0014 specified the value model (per-type literal validators in `dsl/value.rs`). The B2/C2 work that landed `change column [in] [table] : ()` plumbed those together but left a real spec gap: what conversions are allowed, what happens to data that doesn't cleanly fit the new type, and how the user is told. The current behaviour ("rely on SQLite STRICT to reject incompatible cells; surface whatever error it produces") is a placeholder — pedagogically poor (raw STRICT errors are not learner-friendly) and silently permissive in cases where SQLite would coerce values in ways the user might not intend. This ADR specifies a curated compatibility model that combines a static "is this conversion attempted at all" matrix with a per-cell runtime classification, plus two opt-in flags that give the user control over the trade between safety and force. ## Decision ### 1. Per-cell outcome classification Every cell of the column being changed is classified into exactly one of three outcomes during a **dry-run pass that runs before any SQL writes**: - **Clean** — a transformer produces a value the new type accepts without information loss. The rebuild proceeds and stores the transformed value. - **Lossy** — a transformer produces a valid value, but some property of the original cell (precision, fractional part, time component, …) is discarded. By default these rows refuse the operation; the user opts in to the loss with `--force-conversion`. - **Incompatible** — no transformer for this pair can produce a valid value for this cell. The operation is refused; no flag overrides this. The user must pre-process the data (or wait for a future syntax extension; see §5 forward-look). The classifications are properties of *(source type, target type, cell value)*, not just of the type pair. The same type pair (`real` → `int`) yields **clean** for cells storing whole numbers and **lossy** for cells with fractional parts. ### 2. Default decision tree A `change column …` invocation without flags: 1. **Static refusal check.** If the type pair has no transformer at all (see §3 — e.g. `bool → date`, anything ↔ `blob`, anything → `serial`), refuse immediately with a friendly explanation of the incompatibility class. 2. **Per-cell dry run.** Apply the transformer to each cell; classify into clean / lossy / incompatible. 3. **Refuse on incompatibles.** If any cell classifies incompatible, refuse and present a list (capped at 100 rows; tail rendered as "… and N more"). Each listed row identifies the row index and the offending value plus a short reason ("not a valid int", "not '0' or '1'", etc.). The error does NOT mention `--force-conversion`: that flag does not help with incompatibles. 4. **Refuse on lossy (default).** If all cells are clean or lossy and at least one is lossy, refuse with a capped list of the lossy rows (showing the would-be transformation: `row 5: 3.14 → 3 (truncated)`), followed by the message *"if you want to execute this conversion in spite of the problems, re-run with `--force-conversion`."* 5. **All clean.** Proceed with the rebuild. If the transformer was non-identity (i.e. any cell required actual transformation rather than passing through unchanged), emit the **client-side note** (§6) alongside the success summary. ### 3. Static transformer matrix Pair-wise: a `Some(transformer)` means "attempt with a per-cell dry-run." `None` means "static refusal" (§2 step 1). The transformers below cover the **numeric and text universe**. Cells of `serial` columns appear only on the source side (target `serial` is statically refused); relationship-involved columns and PK columns are also statically refused (carried over from B2/C2). Anything ↔ `blob` is deferred for v1. #### Always-clean transformers (no per-cell loss possible) | Source | Target | Notes | |------------------|-----------|-------------------------------------------------| | `int` / `serial` | `real` | widening; precision caveat for ¦v¦ > 2⁵³ noted in docs but not policed | | `int` / `serial` | `decimal` | exact decimal representation | | `int` / `serial` | `text` | stringify | | `serial` | `int` | identity at the storage class level (both store as INTEGER); drops the auto-increment metadata. The canonical PK conversion enabled by §4.1's `fk_target_type`-aware refinement. | | `bool` | `int` | 0/1 | | `bool` | `real` | 0.0/1.0 | | `bool` | `decimal` | "0"/"1" | | `bool` | `text` | "true"/"false" — matches the DSL boolean grammar (§5 of ADR-0014), not SQLite's native integer stringification | | `decimal` | `text` | already text-backed under STRICT | | `date` | `text` | same | | `datetime` | `text` | same | | `shortid` | `text` | same | | `real` | `text` | shortest-round-trip decimal form | #### Per-cell-classified transformers (clean OR lossy OR incompatible per cell) | Source | Target | Per-cell classification | |------------|-----------|---------------------------------------------------------------------------------------------------------------| | `real` | `int` | clean when the value is exactly representable as an integer (e.g. `3.0`); lossy when there's a fractional part (e.g. `3.14 → 3`); never incompatible | | `real` | `decimal` | clean when the f64 round-trips through the decimal grammar; lossy otherwise (precision artifacts) | | `real` | `bool` | clean when value is exactly `0.0` or `1.0`; incompatible otherwise | | `decimal` | `int` | clean when integer-valued; lossy if fractional | | `decimal` | `real` | clean when it fits f64 exactly; lossy on precision loss | | `decimal` | `bool` | clean for exact `0` / `1`; incompatible otherwise | | `int` | `bool` | clean for `0` / `1`; incompatible otherwise | | `text` | `int` | narrowest-first chain: try `int` parse (clean); fall back to `real` parse and truncate (lossy); else incompatible | | `text` | `real` | try `real` parse (clean); else `decimal` parse if it fits f64 (clean) or doesn't (lossy); else incompatible | | `text` | `decimal` | try `decimal` grammar (clean); else `real` parse (lossy precision); else incompatible | | `text` | `bool` | "true" / "false" (case-insensitive) → clean; everything else incompatible (no implicit `0` / `1` parse — matches the DSL boolean grammar) | | `text` | `date` | match `YYYY-MM-DD` → clean; everything else incompatible | | `text` | `datetime`| ISO-8601 datetime → clean; bare date → lossy with implicit `T00:00:00Z`; else incompatible | | `text` | `shortid` | base58 alphabet, 10–12 chars (per ADR-0014) → clean; else incompatible | #### Statically refused (no entry in the matrix) - Anything → `serial` (carried over from B2/C2) - Anything → or from `blob` (v1 deferral; encoding ambiguity) - Same-type identity (no-op; carried over from B2/C2) - `date` ↔ `datetime` direct (deferred for v1; users route via `text` if needed) - All cross-domain pairs not listed above (e.g. `bool` → `date`, `real` → `datetime`, `int` → `shortid`) The relationship-involvement preconditions (§4) apply *before* this matrix is consulted. ### 4. Primary-key and uniqueness-bearing columns The B2/C2 implementation refused all type changes to PK columns and to any column involved in a declared relationship. That was too coarse: it conflated two concerns that should be split. #### 4.1 Inbound foreign keys: when does the cascade actually bite? The cascade only matters when the new type would change the FK target type that referencing columns must have. Per ADR-0011's `fk_target_type()` rule: - `serial.fk_target_type() == Int` - `shortid.fk_target_type() == Text` - All other types: identity So `serial → int` on a PK preserves `fk_target_type` (both yield `Int`); FK columns referencing the PK stay `int`, the underlying storage is unchanged, no cascade is needed. Same for `shortid → text` and `text → shortid` (both yield `Text`). The precondition is therefore: > If the column has any inbound FK *and* > `old_type.fk_target_type() != new_type.fk_target_type()`, > refuse with a friendly cascade message > ("`.` is referenced by N relationship(s); changing > its type to `` would change the type that referencing > columns require — drop those relationships first or pick a > target type whose FK shape matches the current one"). > Otherwise allow. This unblocks the most-natural PK conversion (`serial → int`, removing auto-increment while preserving stored values) and `shortid ↔ text` round-trips on PKs that have real-world relationships. #### 4.2 Outbound foreign keys: refuse for v1 If the column is itself an FK (the child side of a relationship), changing its type would either require its new type to match the parent's `fk_target_type` (which typically reduces to a no-op) or break the constraint. v1 refuses outbound-FK type changes outright; the user drops the relationship first. #### 4.3 Uniqueness-bearing columns: post-transformation collision check Some types and constraints carry a uniqueness contract that the per-cell classification can't see — multiple distinct source values can collapse to the same target value under a lossy transformation, violating the contract even though every individual cell transformed "successfully." In v1 the uniqueness check applies to: - **Primary-key columns** (the SQL-level UNIQUE+NOT NULL guarantee). - **shortid columns** (the design-level contract that shortids are unique short identifiers, even when not the PK). - *(Future)* **`UNIQUE` constraint columns** when C3's full constraint set lands. After the per-cell pass produces transformed values, the transformed values are checked for duplicates. Any collision is **incompatible** (cross-row, structural — no `--force-conversion` override; the user must clean the source data). The error reports the colliding rows: > Cannot change `T.col` from real to int: 2 row(s) > would collapse to the same value. > > row 5 ('3.14') and row 12 ('3.7') would both > become '3'. In practice the only realistic incoming conversion that exercises both the per-cell shortid-grammar check *and* the uniqueness check is `text → shortid`. Other source types fail per-cell (`int 42` doesn't match the base58 + length grammar) before uniqueness becomes relevant. #### 4.4 Combined preconditions Putting §4.1, §4.2, §4.3 together, a `change column …` invocation is refused at the precondition stage when any of: - The column is the *child* side of a relationship (outbound FK on this column). - The column is the *parent* side of a relationship and `old_type.fk_target_type() != new_type.fk_target_type()`. …and is refused after the per-cell dry run (and so still classifies as incompatible) when: - The column is uniqueness-bearing (PK, shortid, or future UNIQUE) and the transformed values contain duplicates. Otherwise the matrix's per-cell classification governs. ### 5. Override flags Both flags are opt-in per ADR-0009's `--` convention. #### `--force-conversion` Skips the lossy-refusal in step 4 of §2. **Does not** change the static refusal (step 1) or the incompatible refusal (step 3): no flag makes `text "abc" → int` work. When invoked, the dry run still classifies cells; lossy cells transform per the matrix; the client-side note (§6) includes both the count of cells that needed transformation *and* the count that were lossy (split by classification). **Forward-look (not in this ADR's implementation scope).** A later iteration may extend the grammar to let the user specify resolutions for incompatibles, e.g.: ``` change column T: c (int) --default 0 change column T: c (int) --on-incompatible '0' ``` …which would land cells that fail the parse with the given default value. This generalises `--force-conversion` from a binary "accept loss" toggle into a continuum and gives the learner practical experience with the kinds of resolutions that real-world data work needs. The current ADR deliberately doesn't commit to syntax for that — the binary `--force-conversion` is enough for v1, and the forward-look exists to preserve the design space rather than constrain a future ADR. #### `--dont-convert` Skips the entire client-side layer: no transformer, no dry-run, no per-cell classification. Hands the source column's raw cells to the rebuild's `INSERT INTO new SELECT FROM old` step and lets the database's STRICT typing decide. Engine error text is never surfaced verbatim; failures are reported via the same friendly-error layer the rest of the app uses. This is the escape hatch for users who explicitly want to see what the database itself will do — a pedagogical lever for "what does raw SQL behaviour look like here?" without dropping into advanced mode. #### Mutual exclusion `--force-conversion` and `--dont-convert` are mutually exclusive. Specifying both is a parse error: forcing client-side conversion while disabling client-side conversion is contradictory. The error message names both flags and says "pick one." ### 6. Reporting: the "client-side conversion was applied" note When a successful change involves any non-identity transformation (i.e. cells were rewritten before reaching the database), the success summary includes a line of the form: > [client-side] N row(s) were transformed before being > stored. In raw SQL this would need an explicit `CAST` or > application-level code. When `--force-conversion` succeeded with lossy rows, the note adds the lossy count specifically: > [client-side] N row(s) transformed; M of those discarded > information (lossy). In raw SQL this would need an > explicit `CAST` or application-level code. User-facing strings throughout this ADR — and throughout the application generally — never name the underlying database engine. The engine is an implementation detail; the playground's pedagogical surface is "the database" in the abstract. (ADR-internal prose still references SQLite where technically necessary for the spec writer; that's not user-visible.) The note's purpose is pedagogical, not diagnostic — it points at the moment where the tool went beyond what bare SQL allows. Without it, a learner would have no way to know that "this just worked" was actually the playground doing them a favour. ### 7. Error presentation Tabular detail in both error and success output is rendered through the pretty-table renderer (ADR-0016) — no ad-hoc indented-line layouts. The rule is: anywhere the output describes more than a handful of rows of structured per-row detail, it goes through `render_data_table` (or an equivalent helper). This keeps the visual identity consistent across DDL, query results, and these conversion diagnostics. #### Lossy refusals ``` Cannot change `T.col` from real to int: 50 row(s) would discard information. ┌─────────┬───────┬─────┬───────────────────────────────────┐ │ id (PK) │ From │ To │ Reason │ ├─────────┼───────┼─────┼───────────────────────────────────┤ │ 5 │ 3.14 │ 3 │ truncated; would discard 0.14 │ │ 12 │ 2.71 │ 2 │ truncated; would discard 0.71 │ │ 18 │ 1.5 │ 1 │ truncated; would discard 0.5 │ │ … │ … │ … │ … and 47 more │ └─────────┴───────┴─────┴───────────────────────────────────┘ if you want to execute this conversion in spite of the problems, re-run with `--force-conversion`. ``` #### Incompatible refusals ``` Cannot change `T.col` from text to int: 3 row(s) cannot be converted. ┌─────────┬───────┬───────────────────────┐ │ id (PK) │ Value │ Reason │ ├─────────┼───────┼───────────────────────┤ │ 3 │ abc │ not a valid int │ │ 7 │ x42 │ not a valid int │ │ 12 │ │ not a valid int │ └─────────┴───────┴───────────────────────┘ ``` The trailing `--force-conversion` hint is omitted for incompatibles (no flag helps; future syntax — §5 forward-look — would re-introduce one). #### Uniqueness collisions ``` Cannot change `T.col` from real to int: 1 collision(s) would violate uniqueness. ┌─────────┬──────────────────┬──────────────────┐ │ Becomes │ Source rows (id) │ Source values │ ├─────────┼──────────────────┼──────────────────┤ │ 3 │ 5, 12 │ 3.14, 3.7 │ └─────────┴──────────────────┴──────────────────┘ ``` #### Common rules - Each detail table is **capped at 100 rows**. Beyond that, a single trailing row with `…` placeholders and the literal text "and N more" inside the row is rendered inside the table — not as a footer line. Keeps the bordered shape intact. - Rows are identified by their **primary-key value(s)**, not by positional indices. SQLite returns rows in unspecified order without `ORDER BY`, so a positional "row 5" would not be reproducible or addressable by the user. The PK is the natural row identifier in a relational setting and is what the user would type in a `where` clause to find or fix the offending cell. - Single PK: rendered as one column whose header is the PK column name with a trailing `(PK)` marker (e.g. `id (PK)`); cells carry the raw PK value with no `column=` prefix. The marker appears once per table, in the header. - Compound PK: one column per PK component, each header annotated `(PK)` (e.g. `a (PK)`, `b (PK)`); cells carry the raw component values. - Uniqueness-collision tables list the colliding rows' PK values comma-separated inside a single `Source rows` cell whose header carries the PK column name(s) in parentheses (e.g. `Source rows (id)` or `Source rows (a, b)`). Compound-PK source rows render as tuples: `(1,2), (1,3)`. - The change-column command always operates on a table with at least one PK column (every `create table` in v1 produces a PK; the AST permits PK-less tables, but no grammar produces one today). If a PK-less surface ever lands, this section will be revisited. - Numeric PK and "Becomes" columns inherit numeric right-alignment from ADR-0016 §2. - Cells that would render multi-line content (for `text →` conversions where source values contain newlines) honour ADR-0016 §3's `↵` substitution, so the table stays one display row per logical row. ### 8. Out of scope - **OOS-1.** Anything ↔ `blob` conversion. Encoding ambiguity (base64? raw bytes? UTF-8 attempt?) deserves its own discussion. - **OOS-2.** `date` ↔ `datetime` direct conversion. Format rewriting is small but warrants a per-conversion test matrix; defer until a real user need surfaces. - **OOS-3.** Resolution-specifying flags (`--default`, `--on-incompatible ''`, etc.) per the §5 forward-look. - **OOS-4.** Bulk conversions across multiple columns in one command. Each `change column` runs independently. - **OOS-5.** Cross-row contextual transformations (e.g. "rank by value to fit a smaller numeric range"). The transformer is per-cell, deliberately stateless. ## Consequences - The placeholder behaviour ("rely on SQLite STRICT") is replaced with a documented per-cell model that produces learner-friendly errors and pedagogical client-side notes. - The transformer matrix is an additional surface to keep in step with `dsl/value.rs`'s validators. Each new user-facing type added to the type vocabulary needs its matrix entries reviewed. - The `[client-side] …` note is the load-bearing pedagogical artefact: it's how a learner discovers that the tool did them a favour. Future visualisation / styling work (V4) should preserve its prominence. - `--dont-convert` keeps the door open to a "raw SQL behaviour" learning mode without forcing the user into advanced mode. - The forward-look in §5 means `--force-conversion`'s semantics may broaden later. Implementations should treat the flag's effect as "accept loss" rather than as the canonical resolution mechanism. ## Relationship to earlier ADRs - **ADR-0013** — the rebuild-table primitive remains the mechanism. This ADR adds a per-row transformation step *between* read-out and write-back; the existing primitive can be parametrised by a row-by-row transformer or paired with a sibling helper. - **ADR-0014** — the per-type validators in `dsl/value.rs` power the dry-run classification. No changes to those validators; they're consumed read-only here. - **ADR-0011** — FK target-type compatibility is unaffected. Relationship-involved columns are statically refused before this ADR's matrix is consulted. - **ADR-0009** — the `--force-conversion` and `--dont-convert` flags follow the established long-flag opt-in convention.