diff --git a/CLAUDE.md b/CLAUDE.md index ebfc2ac..d16475f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -156,6 +156,14 @@ Key invariants in the code: - **Pedagogy wins ties.** When a design choice trades clarity for raw capability, prefer clarity. Real RDBMS power-user features exist; this app is not the place to teach them. +- **No engine name in user-facing strings.** The choice of + database engine is an implementation detail per ADR-0002 + (User-facing posture). Error messages, success notes, + help text, and any other user-visible string refer to + "the database" or "the engine" in the abstract — never + the specific product (SQLite, STRICT, rusqlite, PRAGMA). + ADR-internal prose and code comments may name it where + technically necessary for precision. - **Confirm commits.** Per the user's global rules, every `git commit` is preceded by an explicit message proposal and user approval. No AI attribution in commit messages. diff --git a/docs/adr/0002-database-engine.md b/docs/adr/0002-database-engine.md index ec3a86f..0c0704f 100644 --- a/docs/adr/0002-database-engine.md +++ b/docs/adr/0002-database-engine.md @@ -39,6 +39,30 @@ Use **SQLite** via the `rusqlite` crate. All tables are created as Simplified user-facing column types (see ADR-0005) are mapped to the underlying SQLite STRICT types at parse time. +## User-facing posture + +The choice of SQLite is an implementation detail. The +playground's user-facing surface — error messages, success +notes, help text, and any other string the user sees — +never names the underlying engine. "The database" or "the +engine" in the abstract is the canonical phrasing; the +specific product (SQLite, STRICT, rusqlite, PRAGMA) is +ADR-internal and code-comment vocabulary only. + +Rationale: students should leave with knowledge that +generalises. Naming the engine in user-visible strings +ties their mental model to a specific product when the +lessons are about relational concepts. The friendly-error +layer (H1) wraps engine error text before surfacing; +advanced mode's SQL surface doesn't expose the engine name +either — the user writes SQL because SQL is the language, +not because SQLite is the engine. + +This commitment is referenced from later ADRs (notably +ADR-0017 §6 on the client-side conversion note); when in +doubt about a user-facing string's wording, this section +governs. + ## Consequences - Tight, low-friction packaging — `rusqlite` bundles SQLite. diff --git a/docs/adr/0017-column-type-change-compatibility.md b/docs/adr/0017-column-type-change-compatibility.md new file mode 100644 index 0000000..f9d3b3b --- /dev/null +++ b/docs/adr/0017-column-type-change-compatibility.md @@ -0,0 +1,477 @@ +# ADR-0017: Column type-change compatibility + +## Status + +Accepted + +## Context + +ADR-0013 introduced the rebuild-table primitive that lets us +change a column's type by reading-out, recreating-with-new-shape, +and writing-back. ADR-0014 specified the value model (per-type +literal validators in `dsl/value.rs`). The B2/C2 work that +landed `change column [in] [table] : ()` +plumbed those together but left a real spec gap: what +conversions are allowed, what happens to data that doesn't +cleanly fit the new type, and how the user is told. + +The current behaviour ("rely on SQLite STRICT to reject +incompatible cells; surface whatever error it produces") is +a placeholder — pedagogically poor (raw STRICT errors are +not learner-friendly) and silently permissive in cases +where SQLite would coerce values in ways the user might not +intend. + +This ADR specifies a curated compatibility model that +combines a static "is this conversion attempted at all" +matrix with a per-cell runtime classification, plus +two opt-in flags that give the user control over the trade +between safety and force. + +## Decision + +### 1. Per-cell outcome classification + +Every cell of the column being changed is classified into +exactly one of three outcomes during a **dry-run pass that +runs before any SQL writes**: + +- **Clean** — a transformer produces a value the new type + accepts without information loss. The rebuild proceeds + and stores the transformed value. +- **Lossy** — a transformer produces a valid value, but + some property of the original cell (precision, fractional + part, time component, …) is discarded. By default these + rows refuse the operation; the user opts in to the loss + with `--force-conversion`. +- **Incompatible** — no transformer for this pair can + produce a valid value for this cell. The operation is + refused; no flag overrides this. The user must + pre-process the data (or wait for a future syntax + extension; see §5 forward-look). + +The classifications are properties of *(source type, target +type, cell value)*, not just of the type pair. The same +type pair (`real` → `int`) yields **clean** for cells +storing whole numbers and **lossy** for cells with +fractional parts. + +### 2. Default decision tree + +A `change column …` invocation without flags: + +1. **Static refusal check.** If the type pair has no + transformer at all (see §3 — e.g. `bool → date`, + anything ↔ `blob`, anything → `serial`), refuse + immediately with a friendly explanation of the + incompatibility class. +2. **Per-cell dry run.** Apply the transformer to each + cell; classify into clean / lossy / incompatible. +3. **Refuse on incompatibles.** If any cell classifies + incompatible, refuse and present a list (capped at + 100 rows; tail rendered as "… and N more"). Each + listed row identifies the row index and the offending + value plus a short reason ("not a valid int", "not + '0' or '1'", etc.). The error does NOT mention + `--force-conversion`: that flag does not help with + incompatibles. +4. **Refuse on lossy (default).** If all cells are clean + or lossy and at least one is lossy, refuse with a + capped list of the lossy rows (showing the would-be + transformation: `row 5: 3.14 → 3 (truncated)`), + followed by the message *"if you want to execute this + conversion in spite of the problems, re-run with + `--force-conversion`."* +5. **All clean.** Proceed with the rebuild. If the + transformer was non-identity (i.e. any cell required + actual transformation rather than passing through + unchanged), emit the **client-side note** (§6) + alongside the success summary. + +### 3. Static transformer matrix + +Pair-wise: a `Some(transformer)` means "attempt with a +per-cell dry-run." `None` means "static refusal" (§2 step +1). + +The transformers below cover the **numeric and text +universe**. Cells of `serial` columns appear only on the +source side (target `serial` is statically refused); +relationship-involved columns and PK columns are also +statically refused (carried over from B2/C2). Anything ↔ +`blob` is deferred for v1. + +#### Always-clean transformers (no per-cell loss possible) + +| Source | Target | Notes | +|------------------|-----------|-------------------------------------------------| +| `int` / `serial` | `real` | widening; precision caveat for ¦v¦ > 2⁵³ noted in docs but not policed | +| `int` / `serial` | `decimal` | exact decimal representation | +| `int` / `serial` | `text` | stringify | +| `bool` | `int` | 0/1 | +| `bool` | `real` | 0.0/1.0 | +| `bool` | `decimal` | "0"/"1" | +| `bool` | `text` | "true"/"false" — matches the DSL boolean grammar (§5 of ADR-0014), not SQLite's native integer stringification | +| `decimal` | `text` | already text-backed under STRICT | +| `date` | `text` | same | +| `datetime` | `text` | same | +| `shortid` | `text` | same | +| `real` | `text` | shortest-round-trip decimal form | + +#### Per-cell-classified transformers (clean OR lossy OR incompatible per cell) + +| Source | Target | Per-cell classification | +|------------|-----------|---------------------------------------------------------------------------------------------------------------| +| `real` | `int` | clean when the value is exactly representable as an integer (e.g. `3.0`); lossy when there's a fractional part (e.g. `3.14 → 3`); never incompatible | +| `real` | `decimal` | clean when the f64 round-trips through the decimal grammar; lossy otherwise (precision artifacts) | +| `real` | `bool` | clean when value is exactly `0.0` or `1.0`; incompatible otherwise | +| `decimal` | `int` | clean when integer-valued; lossy if fractional | +| `decimal` | `real` | clean when it fits f64 exactly; lossy on precision loss | +| `decimal` | `bool` | clean for exact `0` / `1`; incompatible otherwise | +| `int` | `bool` | clean for `0` / `1`; incompatible otherwise | +| `text` | `int` | narrowest-first chain: try `int` parse (clean); fall back to `real` parse and truncate (lossy); else incompatible | +| `text` | `real` | try `real` parse (clean); else `decimal` parse if it fits f64 (clean) or doesn't (lossy); else incompatible | +| `text` | `decimal` | try `decimal` grammar (clean); else `real` parse (lossy precision); else incompatible | +| `text` | `bool` | "true" / "false" (case-insensitive) → clean; everything else incompatible (no implicit `0` / `1` parse — matches the DSL boolean grammar) | +| `text` | `date` | match `YYYY-MM-DD` → clean; everything else incompatible | +| `text` | `datetime`| ISO-8601 datetime → clean; bare date → lossy with implicit `T00:00:00Z`; else incompatible | +| `text` | `shortid` | base58 alphabet, 10–12 chars (per ADR-0014) → clean; else incompatible | + +#### Statically refused (no entry in the matrix) + +- Anything → `serial` (carried over from B2/C2) +- Anything → or from `blob` (v1 deferral; encoding ambiguity) +- Same-type identity (no-op; carried over from B2/C2) +- `date` ↔ `datetime` direct (deferred for v1; users route via `text` if needed) +- All cross-domain pairs not listed above (e.g. `bool` → + `date`, `real` → `datetime`, `int` → `shortid`) + +The relationship-involvement preconditions (§4) apply +*before* this matrix is consulted. + +### 4. Primary-key and uniqueness-bearing columns + +The B2/C2 implementation refused all type changes to PK +columns and to any column involved in a declared +relationship. That was too coarse: it conflated two +concerns that should be split. + +#### 4.1 Inbound foreign keys: when does the cascade actually bite? + +The cascade only matters when the new type would change +the FK target type that referencing columns must have. Per +ADR-0011's `fk_target_type()` rule: + +- `serial.fk_target_type() == Int` +- `shortid.fk_target_type() == Text` +- All other types: identity + +So `serial → int` on a PK preserves `fk_target_type` (both +yield `Int`); FK columns referencing the PK stay `int`, +the underlying storage is unchanged, no cascade is needed. +Same for `shortid → text` and `text → shortid` (both yield +`Text`). + +The precondition is therefore: + +> If the column has any inbound FK *and* +> `old_type.fk_target_type() != new_type.fk_target_type()`, +> refuse with a friendly cascade message +> ("`.` is referenced by N relationship(s); changing +> its type to `` would change the type that referencing +> columns require — drop those relationships first or pick a +> target type whose FK shape matches the current one"). +> Otherwise allow. + +This unblocks the most-natural PK conversion (`serial → +int`, removing auto-increment while preserving stored +values) and `shortid ↔ text` round-trips on PKs that have +real-world relationships. + +#### 4.2 Outbound foreign keys: refuse for v1 + +If the column is itself an FK (the child side of a +relationship), changing its type would either require its +new type to match the parent's `fk_target_type` (which +typically reduces to a no-op) or break the constraint. v1 +refuses outbound-FK type changes outright; the user drops +the relationship first. + +#### 4.3 Uniqueness-bearing columns: post-transformation collision check + +Some types and constraints carry a uniqueness contract +that the per-cell classification can't see — multiple +distinct source values can collapse to the same target +value under a lossy transformation, violating the +contract even though every individual cell transformed +"successfully." + +In v1 the uniqueness check applies to: + +- **Primary-key columns** (the SQL-level UNIQUE+NOT NULL + guarantee). +- **shortid columns** (the design-level contract that + shortids are unique short identifiers, even when + not the PK). +- *(Future)* **`UNIQUE` constraint columns** when C3's + full constraint set lands. + +After the per-cell pass produces transformed values, the +transformed values are checked for duplicates. Any +collision is **incompatible** (cross-row, structural — +no `--force-conversion` override; the user must clean +the source data). The error reports the colliding rows: + +> Cannot change `T.col` from real to int: 2 row(s) +> would collapse to the same value. +> +> row 5 ('3.14') and row 12 ('3.7') would both +> become '3'. + +In practice the only realistic incoming conversion that +exercises both the per-cell shortid-grammar check *and* +the uniqueness check is `text → shortid`. Other source +types fail per-cell (`int 42` doesn't match the base58 + +length grammar) before uniqueness becomes relevant. + +#### 4.4 Combined preconditions + +Putting §4.1, §4.2, §4.3 together, a `change column …` +invocation is refused at the precondition stage when any +of: + +- The column is the *child* side of a relationship + (outbound FK on this column). +- The column is the *parent* side of a relationship and + `old_type.fk_target_type() != new_type.fk_target_type()`. + +…and is refused after the per-cell dry run (and so still +classifies as incompatible) when: + +- The column is uniqueness-bearing (PK, shortid, or future + UNIQUE) and the transformed values contain duplicates. + +Otherwise the matrix's per-cell classification governs. + +### 5. Override flags + +Both flags are opt-in per ADR-0009's `--` convention. + +#### `--force-conversion` + +Skips the lossy-refusal in step 4 of §2. **Does not** +change the static refusal (step 1) or the incompatible +refusal (step 3): no flag makes `text "abc" → int` work. + +When invoked, the dry run still classifies cells; lossy +cells transform per the matrix; the client-side note (§6) +includes both the count of cells that needed transformation +*and* the count that were lossy (split by classification). + +**Forward-look (not in this ADR's implementation scope).** A +later iteration may extend the grammar to let the user +specify resolutions for incompatibles, e.g.: + +``` +change column T: c (int) --default 0 +change column T: c (int) --on-incompatible '0' +``` + +…which would land cells that fail the parse with the given +default value. This generalises `--force-conversion` from a +binary "accept loss" toggle into a continuum and gives the +learner practical experience with the kinds of resolutions +that real-world data work needs. The current ADR +deliberately doesn't commit to syntax for that — the +binary `--force-conversion` is enough for v1, and the +forward-look exists to preserve the design space rather +than constrain a future ADR. + +#### `--dont-convert` + +Skips the entire client-side layer: no transformer, no +dry-run, no per-cell classification. Hands the source +column's raw cells to the rebuild's `INSERT INTO new +SELECT FROM old` step and lets the database's STRICT +typing decide. Engine error text is never surfaced +verbatim; failures are reported via the same +friendly-error layer the rest of the app uses. + +This is the escape hatch for users who explicitly want to +see what the database itself will do — a pedagogical lever +for "what does raw SQL behaviour look like here?" without +dropping into advanced mode. + +#### Mutual exclusion + +`--force-conversion` and `--dont-convert` are mutually +exclusive. Specifying both is a parse error: forcing +client-side conversion while disabling client-side +conversion is contradictory. The error message names both +flags and says "pick one." + +### 6. Reporting: the "client-side conversion was applied" note + +When a successful change involves any non-identity +transformation (i.e. cells were rewritten before reaching +the database), the success summary includes a line of the +form: + +> [client-side] N row(s) were transformed before being +> stored. In raw SQL this would need an explicit `CAST` or +> application-level code. + +When `--force-conversion` succeeded with lossy rows, the +note adds the lossy count specifically: + +> [client-side] N row(s) transformed; M of those discarded +> information (lossy). In raw SQL this would need an +> explicit `CAST` or application-level code. + +User-facing strings throughout this ADR — and throughout +the application generally — never name the underlying +database engine. The engine is an implementation detail; +the playground's pedagogical surface is "the database" in +the abstract. (ADR-internal prose still references SQLite +where technically necessary for the spec writer; that's +not user-visible.) + +The note's purpose is pedagogical, not diagnostic — it +points at the moment where the tool went beyond what bare +SQL allows. Without it, a learner would have no way to +know that "this just worked" was actually the playground +doing them a favour. + +### 7. Error presentation + +Tabular detail in both error and success output is +rendered through the pretty-table renderer (ADR-0016) — +no ad-hoc indented-line layouts. The rule is: anywhere +the output describes more than a handful of rows of +structured per-row detail, it goes through +`render_data_table` (or an equivalent helper). This keeps +the visual identity consistent across DDL, query results, +and these conversion diagnostics. + +#### Lossy refusals + +``` +Cannot change `T.col` from real to int: 50 row(s) would +discard information. + +┌─────┬───────┬─────┬───────────────────────────────────┐ +│ Row │ From │ To │ Reason │ +├─────┼───────┼─────┼───────────────────────────────────┤ +│ 5 │ 3.14 │ 3 │ truncated; would discard 0.14 │ +│ 12 │ 2.71 │ 2 │ truncated; would discard 0.71 │ +│ 18 │ 1.5 │ 1 │ truncated; would discard 0.5 │ +│ … │ … │ … │ … and 47 more │ +└─────┴───────┴─────┴───────────────────────────────────┘ + +if you want to execute this conversion in spite of the +problems, re-run with `--force-conversion`. +``` + +#### Incompatible refusals + +``` +Cannot change `T.col` from text to int: 3 row(s) cannot +be converted. + +┌─────┬───────┬───────────────────────┐ +│ Row │ Value │ Reason │ +├─────┼───────┼───────────────────────┤ +│ 3 │ abc │ not a valid int │ +│ 7 │ x42 │ not a valid int │ +│ 12 │ │ not a valid int │ +└─────┴───────┴───────────────────────┘ +``` + +The trailing `--force-conversion` hint is omitted for +incompatibles (no flag helps; future syntax — §5 +forward-look — would re-introduce one). + +#### Uniqueness collisions + +``` +Cannot change `T.col` from real to int: 1 collision(s) +would violate uniqueness. + +┌─────────┬─────────────────┬──────────────────┐ +│ Becomes │ Source rows │ Source values │ +├─────────┼─────────────────┼──────────────────┤ +│ 3 │ row 5, row 12 │ 3.14, 3.7 │ +└─────────┴─────────────────┴──────────────────┘ +``` + +#### Common rules + +- Each detail table is **capped at 100 rows**. Beyond that, + a single trailing row with `…` placeholders and the + literal text "and N more" inside the row is rendered + inside the table — not as a footer line. Keeps the + bordered shape intact. +- Row indices are 1-based to match how learners count + rows; the tool counts internally from 0. +- Numeric "Row" / "Becomes" columns inherit numeric + right-alignment from ADR-0016 §2. +- Cells that would render multi-line content (for `text →` + conversions where source values contain newlines) honour + ADR-0016 §3's `↵` substitution, so the table stays one + display row per logical row. + +### 8. Out of scope + +- **OOS-1.** Anything ↔ `blob` conversion. Encoding + ambiguity (base64? raw bytes? UTF-8 attempt?) deserves + its own discussion. +- **OOS-2.** `date` ↔ `datetime` direct conversion. Format + rewriting is small but warrants a per-conversion test + matrix; defer until a real user need surfaces. +- **OOS-3.** Resolution-specifying flags (`--default`, + `--on-incompatible ''`, etc.) per the §5 + forward-look. +- **OOS-4.** Bulk conversions across multiple columns in + one command. Each `change column` runs independently. +- **OOS-5.** Cross-row contextual transformations (e.g. + "rank by value to fit a smaller numeric range"). The + transformer is per-cell, deliberately stateless. + +## Consequences + +- The placeholder behaviour ("rely on SQLite STRICT") is + replaced with a documented per-cell model that produces + learner-friendly errors and pedagogical client-side + notes. +- The transformer matrix is an additional surface to keep + in step with `dsl/value.rs`'s validators. Each new + user-facing type added to the type vocabulary needs its + matrix entries reviewed. +- The `[client-side] …` note is the load-bearing + pedagogical artefact: it's how a learner discovers that + the tool did them a favour. Future visualisation / + styling work (V4) should preserve its prominence. +- `--dont-convert` keeps the door open to a "raw SQL + behaviour" learning mode without forcing the user into + advanced mode. +- The forward-look in §5 means `--force-conversion`'s + semantics may broaden later. Implementations should + treat the flag's effect as "accept loss" rather than as + the canonical resolution mechanism. + +## Relationship to earlier ADRs + +- **ADR-0013** — the rebuild-table primitive remains the + mechanism. This ADR adds a per-row transformation step + *between* read-out and write-back; the existing primitive + can be parametrised by a row-by-row transformer or paired + with a sibling helper. +- **ADR-0014** — the per-type validators in `dsl/value.rs` + power the dry-run classification. No changes to those + validators; they're consumed read-only here. +- **ADR-0011** — FK target-type compatibility is + unaffected. Relationship-involved columns are statically + refused before this ADR's matrix is consulted. +- **ADR-0009** — the `--force-conversion` and + `--dont-convert` flags follow the established + long-flag opt-in convention. diff --git a/docs/adr/README.md b/docs/adr/README.md index 4476cb6..f684bd0 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -22,3 +22,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0014 — Data operations, value literals, and the auto-show pattern](0014-data-operations-and-value-model.md) - [ADR-0015 — Project storage runtime](0015-project-storage-runtime.md) - [ADR-0016 — Pretty table rendering for data and structure views](0016-pretty-table-rendering.md) +- [ADR-0017 — Column type-change compatibility](0017-column-type-change-compatibility.md)