# ADR-0017: Column type-change compatibility

## Status

Accepted

## Context

ADR-0013 introduced the rebuild-table primitive that lets us
change a column's type by reading-out, recreating-with-new-shape,
and writing-back. ADR-0014 specified the value model (per-type
literal validators in `dsl/value.rs`). The B2/C2 work that
landed `change column [in] [table] <T>: <col> (<newtype>)`
plumbed those together but left a real spec gap: what
conversions are allowed, what happens to data that doesn't
cleanly fit the new type, and how the user is told.

The current behaviour ("rely on SQLite STRICT to reject
incompatible cells; surface whatever error it produces") is
a placeholder — pedagogically poor (raw STRICT errors are
not learner-friendly) and silently permissive in cases
where SQLite would coerce values in ways the user might not
intend.

This ADR specifies a curated compatibility model that
combines a static "is this conversion attempted at all"
matrix with a per-cell runtime classification, plus
two opt-in flags that give the user control over the trade
between safety and force.

## Decision

### 1. Per-cell outcome classification

Every cell of the column being changed is classified into
exactly one of three outcomes during a **dry-run pass that
runs before any SQL writes**:

- **Clean** — a transformer produces a value the new type
  accepts without information loss. The rebuild proceeds
  and stores the transformed value.
- **Lossy** — a transformer produces a valid value, but
  some property of the original cell (precision, fractional
  part, time component, …) is discarded. By default these
  rows refuse the operation; the user opts in to the loss
  with `--force-conversion`.
- **Incompatible** — no transformer for this pair can
  produce a valid value for this cell. The operation is
  refused; no flag overrides this. The user must
  pre-process the data (or wait for a future syntax
  extension; see §5 forward-look).

The classifications are properties of *(source type, target
type, cell value)*, not just of the type pair. The same
type pair (`real` → `int`) yields **clean** for cells
storing whole numbers and **lossy** for cells with
fractional parts.

### 2. Default decision tree

A `change column …` invocation without flags:

1. **Static refusal check.** If the type pair has no
   transformer at all (see §3 — e.g. `bool → date`,
   anything ↔ `blob`, anything → `serial`), refuse
   immediately with a friendly explanation of the
   incompatibility class.
2. **Per-cell dry run.** Apply the transformer to each
   cell; classify into clean / lossy / incompatible.
3. **Refuse on incompatibles.** If any cell classifies
   incompatible, refuse and present a list (capped at
   100 rows; tail rendered as "… and N more"). Each
   listed row identifies the row index and the offending
   value plus a short reason ("not a valid int", "not
   '0' or '1'", etc.). The error does NOT mention
   `--force-conversion`: that flag does not help with
   incompatibles.
4. **Refuse on lossy (default).** If all cells are clean
   or lossy and at least one is lossy, refuse with a
   capped list of the lossy rows (showing the would-be
   transformation: `row 5: 3.14 → 3 (truncated)`),
   followed by the message *"if you want to execute this
   conversion in spite of the problems, re-run with
   `--force-conversion`."*
5. **All clean.** Proceed with the rebuild. If the
   transformer was non-identity (i.e. any cell required
   actual transformation rather than passing through
   unchanged), emit the **client-side note** (§6)
   alongside the success summary.

### 3. Static transformer matrix

Pair-wise: a `Some(transformer)` means "attempt with a
per-cell dry-run." `None` means "static refusal" (§2 step
1).

The transformers below cover the **numeric and text
universe**. Cells of `serial` columns appear only on the
source side (target `serial` is statically refused);
relationship-involved columns and PK columns are also
statically refused (carried over from B2/C2). Anything ↔
`blob` is deferred for v1.

#### Always-clean transformers (no per-cell loss possible)

| Source           | Target    | Notes                                           |
|------------------|-----------|-------------------------------------------------|
| `int` / `serial` | `real`    | widening; precision caveat for ¦v¦ > 2⁵³ noted in docs but not policed |
| `int` / `serial` | `decimal` | exact decimal representation                    |
| `int` / `serial` | `text`    | stringify                                       |
| `serial`         | `int`     | identity at the storage class level (both store as INTEGER); drops the auto-increment metadata. The canonical PK conversion enabled by §4.1's `fk_target_type`-aware refinement. |
| `bool`           | `int`     | 0/1                                             |
| `bool`           | `real`    | 0.0/1.0                                         |
| `bool`           | `decimal` | "0"/"1"                                         |
| `bool`           | `text`    | "true"/"false" — matches the DSL boolean grammar (§5 of ADR-0014), not SQLite's native integer stringification |
| `decimal`        | `text`    | already text-backed under STRICT                |
| `date`           | `text`    | same                                            |
| `datetime`       | `text`    | same                                            |
| `shortid`        | `text`    | same                                            |
| `real`           | `text`    | shortest-round-trip decimal form                |

#### Per-cell-classified transformers (clean OR lossy OR incompatible per cell)

| Source     | Target    | Per-cell classification                                                                                       |
|------------|-----------|---------------------------------------------------------------------------------------------------------------|
| `real`     | `int`     | clean when the value is exactly representable as an integer (e.g. `3.0`); lossy when there's a fractional part (e.g. `3.14 → 3`); never incompatible |
| `real`     | `decimal` | clean when the f64 round-trips through the decimal grammar; lossy otherwise (precision artifacts)             |
| `real`     | `bool`    | clean when value is exactly `0.0` or `1.0`; incompatible otherwise                                            |
| `decimal`  | `int`     | clean when integer-valued; lossy if fractional                                                                |
| `decimal`  | `real`    | clean when it fits f64 exactly; lossy on precision loss                                                       |
| `decimal`  | `bool`    | clean for exact `0` / `1`; incompatible otherwise                                                             |
| `int`      | `bool`    | clean for `0` / `1`; incompatible otherwise                                                                   |
| `text`     | `int`     | narrowest-first chain: try `int` parse (clean); fall back to `real` parse and truncate (lossy); else incompatible |
| `text`     | `real`    | try `real` parse (clean); else `decimal` parse if it fits f64 (clean) or doesn't (lossy); else incompatible   |
| `text`     | `decimal` | try `decimal` grammar (clean); else `real` parse (lossy precision); else incompatible                         |
| `text`     | `bool`    | "true" / "false" (case-insensitive) → clean; everything else incompatible (no implicit `0` / `1` parse — matches the DSL boolean grammar) |
| `text`     | `date`    | match `YYYY-MM-DD` → clean; everything else incompatible                                                      |
| `text`     | `datetime`| ISO-8601 datetime → clean; bare date → lossy with implicit `T00:00:00Z`; else incompatible                    |
| `text`     | `shortid` | base58 alphabet, 10–12 chars (per ADR-0014) → clean; else incompatible                                        |

#### Statically refused (no entry in the matrix)

- Anything → `serial` (carried over from B2/C2)
- Anything → or from `blob` (v1 deferral; encoding ambiguity)
- Same-type identity (no-op; carried over from B2/C2)
- `date` ↔ `datetime` direct (deferred for v1; users route via `text` if needed)
- All cross-domain pairs not listed above (e.g. `bool` →
  `date`, `real` → `datetime`, `int` → `shortid`)

The relationship-involvement preconditions (§4) apply
*before* this matrix is consulted.

### 4. Primary-key and uniqueness-bearing columns

The B2/C2 implementation refused all type changes to PK
columns and to any column involved in a declared
relationship. That was too coarse: it conflated two
concerns that should be split.

#### 4.1 Inbound foreign keys: when does the cascade actually bite?

The cascade only matters when the new type would change
the FK target type that referencing columns must have. Per
ADR-0011's `fk_target_type()` rule:

- `serial.fk_target_type() == Int`
- `shortid.fk_target_type() == Text`
- All other types: identity

So `serial → int` on a PK preserves `fk_target_type` (both
yield `Int`); FK columns referencing the PK stay `int`,
the underlying storage is unchanged, no cascade is needed.
Same for `shortid → text` and `text → shortid` (both yield
`Text`).

The precondition is therefore:

> If the column has any inbound FK *and*
> `old_type.fk_target_type() != new_type.fk_target_type()`,
> refuse with a friendly cascade message
> ("`<T>.<col>` is referenced by N relationship(s); changing
> its type to `<dst>` would change the type that referencing
> columns require — drop those relationships first or pick a
> target type whose FK shape matches the current one").
> Otherwise allow.

This unblocks the most-natural PK conversion (`serial →
int`, removing auto-increment while preserving stored
values) and `shortid ↔ text` round-trips on PKs that have
real-world relationships.

#### 4.2 Outbound foreign keys: refuse for v1

If the column is itself an FK (the child side of a
relationship), changing its type would either require its
new type to match the parent's `fk_target_type` (which
typically reduces to a no-op) or break the constraint. v1
refuses outbound-FK type changes outright; the user drops
the relationship first.

#### 4.3 Uniqueness-bearing columns: post-transformation collision check

Some types and constraints carry a uniqueness contract
that the per-cell classification can't see — multiple
distinct source values can collapse to the same target
value under a lossy transformation, violating the
contract even though every individual cell transformed
"successfully."

In v1 the uniqueness check applies to:

- **Primary-key columns** (the SQL-level UNIQUE+NOT NULL
  guarantee).
- **shortid columns** (the design-level contract that
  shortids are unique short identifiers, even when
  not the PK).
- *(Future)* **`UNIQUE` constraint columns** when C3's
  full constraint set lands.

After the per-cell pass produces transformed values, the
transformed values are checked for duplicates. Any
collision is **incompatible** (cross-row, structural —
no `--force-conversion` override; the user must clean
the source data). The error reports the colliding rows:

> Cannot change `T.col` from real to int: 2 row(s)
> would collapse to the same value.
>
>   row 5  ('3.14') and row 12 ('3.7') would both
>          become '3'.

In practice the only realistic incoming conversion that
exercises both the per-cell shortid-grammar check *and*
the uniqueness check is `text → shortid`. Other source
types fail per-cell (`int 42` doesn't match the base58 +
length grammar) before uniqueness becomes relevant.

#### 4.4 Combined preconditions

Putting §4.1, §4.2, §4.3 together, a `change column …`
invocation is refused at the precondition stage when any
of:

- The column is the *child* side of a relationship
  (outbound FK on this column).
- The column is the *parent* side of a relationship and
  `old_type.fk_target_type() != new_type.fk_target_type()`.

…and is refused after the per-cell dry run (and so still
classifies as incompatible) when:

- The column is uniqueness-bearing (PK, shortid, or future
  UNIQUE) and the transformed values contain duplicates.

Otherwise the matrix's per-cell classification governs.

### 5. Override flags

Both flags are opt-in per ADR-0009's `--<name>` convention.

#### `--force-conversion`

Skips the lossy-refusal in step 4 of §2. **Does not**
change the static refusal (step 1) or the incompatible
refusal (step 3): no flag makes `text "abc" → int` work.

When invoked, the dry run still classifies cells; lossy
cells transform per the matrix; the client-side note (§6)
includes both the count of cells that needed transformation
*and* the count that were lossy (split by classification).

**Forward-look (not in this ADR's implementation scope).** A
later iteration may extend the grammar to let the user
specify resolutions for incompatibles, e.g.:

```
change column T: c (int) --default 0
change column T: c (int) --on-incompatible '0'
```

…which would land cells that fail the parse with the given
default value. This generalises `--force-conversion` from a
binary "accept loss" toggle into a continuum and gives the
learner practical experience with the kinds of resolutions
that real-world data work needs. The current ADR
deliberately doesn't commit to syntax for that — the
binary `--force-conversion` is enough for v1, and the
forward-look exists to preserve the design space rather
than constrain a future ADR.

#### `--dont-convert`

Skips the entire client-side layer: no transformer, no
dry-run, no per-cell classification. Hands the source
column's raw cells to the rebuild's `INSERT INTO new
SELECT FROM old` step and lets the database's STRICT
typing decide. Engine error text is never surfaced
verbatim; failures are reported via the same
friendly-error layer the rest of the app uses.

This is the escape hatch for users who explicitly want to
see what the database itself will do — a pedagogical lever
for "what does raw SQL behaviour look like here?" without
dropping into advanced mode.

#### Mutual exclusion

`--force-conversion` and `--dont-convert` are mutually
exclusive. Specifying both is a parse error: forcing
client-side conversion while disabling client-side
conversion is contradictory. The error message names both
flags and says "pick one."

### 6. Reporting: the "client-side conversion was applied" note

When a successful change involves any non-identity
transformation (i.e. cells were rewritten before reaching
the database), the success summary includes a line of the
form:

> [client-side] N row(s) were transformed before being
> stored. In raw SQL this would need an explicit `CAST` or
> application-level code.

When `--force-conversion` succeeded with lossy rows, the
note adds the lossy count specifically:

> [client-side] N row(s) transformed; M of those discarded
> information (lossy). In raw SQL this would need an
> explicit `CAST` or application-level code.

User-facing strings throughout this ADR — and throughout
the application generally — never name the underlying
database engine. The engine is an implementation detail;
the playground's pedagogical surface is "the database" in
the abstract. (ADR-internal prose still references SQLite
where technically necessary for the spec writer; that's
not user-visible.)

The note's purpose is pedagogical, not diagnostic — it
points at the moment where the tool went beyond what bare
SQL allows. Without it, a learner would have no way to
know that "this just worked" was actually the playground
doing them a favour.

### 7. Error presentation

Tabular detail in both error and success output is
rendered through the pretty-table renderer (ADR-0016) —
no ad-hoc indented-line layouts. The rule is: anywhere
the output describes more than a handful of rows of
structured per-row detail, it goes through
`render_data_table` (or an equivalent helper). This keeps
the visual identity consistent across DDL, query results,
and these conversion diagnostics.

#### Lossy refusals

```
Cannot change `T.col` from real to int: 50 row(s) would
discard information.

┌─────────┬───────┬─────┬───────────────────────────────────┐
│ id (PK) │ From  │ To  │ Reason                            │
├─────────┼───────┼─────┼───────────────────────────────────┤
│       5 │ 3.14  │   3 │ truncated; would discard 0.14     │
│      12 │ 2.71  │   2 │ truncated; would discard 0.71     │
│      18 │ 1.5   │   1 │ truncated; would discard 0.5      │
│       … │     … │   … │ … and 47 more                     │
└─────────┴───────┴─────┴───────────────────────────────────┘

if you want to execute this conversion in spite of the
problems, re-run with `--force-conversion`.
```

#### Incompatible refusals

```
Cannot change `T.col` from text to int: 3 row(s) cannot
be converted.

┌─────────┬───────┬───────────────────────┐
│ id (PK) │ Value │ Reason                │
├─────────┼───────┼───────────────────────┤
│       3 │ abc   │ not a valid int       │
│       7 │ x42   │ not a valid int       │
│      12 │       │ not a valid int       │
└─────────┴───────┴───────────────────────┘
```

The trailing `--force-conversion` hint is omitted for
incompatibles (no flag helps; future syntax — §5
forward-look — would re-introduce one).

#### Uniqueness collisions

```
Cannot change `T.col` from real to int: 1 collision(s)
would violate uniqueness.

┌─────────┬──────────────────┬──────────────────┐
│ Becomes │ Source rows (id) │ Source values    │
├─────────┼──────────────────┼──────────────────┤
│       3 │ 5, 12            │ 3.14, 3.7        │
└─────────┴──────────────────┴──────────────────┘
```

#### Common rules

- Each detail table is **capped at 100 rows**. Beyond that,
  a single trailing row with `…` placeholders and the
  literal text "and N more" inside the row is rendered
  inside the table — not as a footer line. Keeps the
  bordered shape intact.
- Rows are identified by their **primary-key value(s)**, not
  by positional indices. SQLite returns rows in unspecified
  order without `ORDER BY`, so a positional "row 5" would
  not be reproducible or addressable by the user. The PK is
  the natural row identifier in a relational setting and is
  what the user would type in a `where` clause to find or
  fix the offending cell.
  - Single PK: rendered as one column whose header is the
    PK column name with a trailing `(PK)` marker
    (e.g. `id (PK)`); cells carry the raw PK value with no
    `column=` prefix. The marker appears once per table, in
    the header.
  - Compound PK: one column per PK component, each header
    annotated `(PK)` (e.g. `a (PK)`, `b (PK)`); cells carry
    the raw component values.
  - Uniqueness-collision tables list the colliding rows'
    PK values comma-separated inside a single `Source rows`
    cell whose header carries the PK column name(s) in
    parentheses (e.g. `Source rows (id)` or `Source rows
    (a, b)`). Compound-PK source rows render as tuples:
    `(1,2), (1,3)`.
- The change-column command always operates on a table
  with at least one PK column (every `create table` in v1
  produces a PK; the AST permits PK-less tables, but no
  grammar produces one today). If a PK-less surface ever
  lands, this section will be revisited.
- Numeric PK and "Becomes" columns inherit numeric
  right-alignment from ADR-0016 §2.
- Cells that would render multi-line content (for `text →`
  conversions where source values contain newlines) honour
  ADR-0016 §3's `↵` substitution, so the table stays one
  display row per logical row.

### 8. Out of scope

- **OOS-1.** Anything ↔ `blob` conversion. Encoding
  ambiguity (base64? raw bytes? UTF-8 attempt?) deserves
  its own discussion.
- **OOS-2.** `date` ↔ `datetime` direct conversion. Format
  rewriting is small but warrants a per-conversion test
  matrix; defer until a real user need surfaces.
- **OOS-3.** Resolution-specifying flags (`--default`,
  `--on-incompatible '<value>'`, etc.) per the §5
  forward-look.
- **OOS-4.** Bulk conversions across multiple columns in
  one command. Each `change column` runs independently.
- **OOS-5.** Cross-row contextual transformations (e.g.
  "rank by value to fit a smaller numeric range"). The
  transformer is per-cell, deliberately stateless.

## Consequences

- The placeholder behaviour ("rely on SQLite STRICT") is
  replaced with a documented per-cell model that produces
  learner-friendly errors and pedagogical client-side
  notes.
- The transformer matrix is an additional surface to keep
  in step with `dsl/value.rs`'s validators. Each new
  user-facing type added to the type vocabulary needs its
  matrix entries reviewed.
- The `[client-side] …` note is the load-bearing
  pedagogical artefact: it's how a learner discovers that
  the tool did them a favour. Future visualisation /
  styling work (V4) should preserve its prominence.
- `--dont-convert` keeps the door open to a "raw SQL
  behaviour" learning mode without forcing the user into
  advanced mode.
- The forward-look in §5 means `--force-conversion`'s
  semantics may broaden later. Implementations should
  treat the flag's effect as "accept loss" rather than as
  the canonical resolution mechanism.

## Relationship to earlier ADRs

- **ADR-0013** — the rebuild-table primitive remains the
  mechanism. This ADR adds a per-row transformation step
  *between* read-out and write-back; the existing primitive
  can be parametrised by a row-by-row transformer or paired
  with a sibling helper.
- **ADR-0014** — the per-type validators in `dsl/value.rs`
  power the dry-run classification. No changes to those
  validators; they're consumed read-only here.
- **ADR-0011** — FK target-type compatibility is
  unaffected. Relationship-involved columns are statically
  refused before this ADR's matrix is consulted.
- **ADR-0009** — the `--force-conversion` and
  `--dont-convert` flags follow the established
  long-flag opt-in convention.