diff --git a/docs/adr/0029-column-constraints.md b/docs/adr/0029-column-constraints.md new file mode 100644 index 0000000..7c2b979 --- /dev/null +++ b/docs/adr/0029-column-constraints.md @@ -0,0 +1,450 @@ +# ADR-0029: Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT) + +## Status + +Accepted + +## Context + +The requirements checklist (`C3`) commits to the full schema- +constraint surface. Most of it has landed across earlier ADRs: +primary keys, single and compound, at create-table time; +foreign keys with `ON DELETE` / `ON UPDATE` actions +(ADR-0013); indexes (ADR-0025). What remains are the four +*column-level* constraints every relational schema relies on: +`NOT NULL`, `UNIQUE`, `CHECK`, and `DEFAULT`. + +They are core pedagogy. A learner cannot model a real schema +without saying "this column is required", "these values must +be distinct", "this value must be in range", or "use this +when none is given". Until now the playground can only teach +the primary key — which bundles NOT NULL + UNIQUE invisibly — +and the relational shape of foreign keys. The four standalone +constraints are the missing half of "what a column is". + +Three things make this the right moment, and shape the +decision: + +1. **The rebuild-table primitive exists** (ADR-0013). SQLite's + `ALTER TABLE` cannot add `NOT NULL` / `UNIQUE` / `CHECK` to + an existing column; the rebuild dance is the accepted + recipe, and it is already load-bearing for change-column- + type and every relationship operation. +2. **The per-cell dry-run model exists** (ADR-0017). Applying + a constraint to a *populated* column is the same shape of + problem as a type change: scan the existing data, classify + it, and refuse with a learner-friendly table of offending + rows *before* any SQL writes. ADR-0029 reuses that ethos. +3. **The WHERE-expression grammar exists** (ADR-0026). A + `CHECK` constraint is a boolean expression over the + column. Rather than invent a second expression language, + `CHECK` *reuses* the ADR-0026 `Expr` grammar through the + `Subgrammar` node — the same expression language the + learner already meets in `where` filters. + +`PRIMARY KEY` is **not** revisited here. It keeps its dedicated +`with pk …` clause: it is table-level (compound keys span +columns), it is declared once, and folding it into the +per-column constraint suffix would muddy both surfaces. A PK +column is implicitly `NOT NULL` and `UNIQUE`; this ADR makes +that explicit in §9 by rejecting redundant declarations. + +## Decision + +### 1. The constraint vocabulary + +Four column-level constraints, each appearing **at most once** +per column in v1: + +| Constraint | Payload | Meaning | +|--------------|------------------------|-------------------------------------------| +| `not null` | — | the column rejects `NULL` | +| `unique` | — | all non-`NULL` values must be distinct | +| `default` | a value literal | value used when an `insert` omits the column | +| `check` | a parenthesised `Expr` | every row must satisfy the boolean expression | + +The payload literal for `default` is an ordinary value literal +per the ADR-0014 value grammar. The `check` expression is an +ADR-0026 `Expr`, reached through `Subgrammar` — see §2. + +`unique` follows SQLite's standard semantics: multiple `NULL`s +do **not** collide (SQL's "NULLs are distinct" rule). This is +stated explicitly because it is a common learner surprise and +the friendly-error and dry-run text both depend on it. + +### 2. Grammar — declaring constraints + +#### 2.1 The constraint suffix (`create table`, `add column`) + +A column spec gains an optional, repeatable constraint suffix +**after** the `(type)` group: + +``` +create table Users with pk + id(serial), + email(text) not null unique, + age(int) default 18 check (age >= 0) + +add column to Orders: note (text) default '' +``` + +- Standard SQL writes constraints after the data type + (`email TEXT NOT NULL UNIQUE`). The playground brackets the + type as `email(text)` — a pre-existing convention this ADR + does not relitigate — so the faithful adaptation is the same + keyword order *after* the bracketed type. +- Constraint keywords are case-insensitive (ADR-0009) and may + appear in any order. Each kind at most once; a repeat + (`not null not null`) is a friendly error naming the + duplicated constraint. +- `default ` takes one value literal. +- `check ( )` parenthesises its expression — matching + SQL's `CHECK (…)`, and giving the parser an unambiguous end + for the expression before the next constraint or the + column-separating comma. The expression is the ADR-0026 + `Expr` grammar via `Node::Subgrammar(&expr::OR_EXPR)`. +- The column-separating comma of the `with pk` list follows + the last constraint of a column, exactly as today. + +The constraint suffix is one shared grammar fragment, used by +both `create table`'s column list and `add column`. + +#### 2.2 Constraints on an existing column (`add constraint …` / `drop constraint …`) + +Modifying the constraints of an *already-created* column is a +first-class v1 surface. It reuses the established `add` / `drop` +verb families and the dotted `.` reference +(ADR-0013's `add 1:n relationship from

.

`): + +``` +add constraint not null to Users.email +add constraint unique to Users.email +add constraint default 18 to Users.age +add constraint check (age >= 0) to Users.age + +drop constraint not null from Users.email +drop constraint unique from Users.email +drop constraint default from Users.age +drop constraint check from Users.age +``` + +- `add constraint to .` and `drop + constraint from .` join the + existing `add` / `drop` command `Choice`s as new forms. The + form word `constraint` discriminates them — exactly as + `column` / `index` / `relationship` discriminate the other + `add` / `drop` forms. Every `add` / `drop` stays uniformly + ` …`, which keeps the grammar hierarchy clean. +- After `add constraint`, the `` is exactly the + §2.1 suffix vocabulary: `not null`, `unique`, `default + `, `check ( )`. The constraint grammar + fragment is shared a third time. +- After `drop constraint`, only the *kind* is named — `not + null`, `unique`, `default`, `check` — since at most one of + each exists per column. + +### 3. AST + +`ColumnSpec` (`src/dsl/command.rs`) gains the four constraint +slots — each kind is at-most-one, so they are fields, not a +`Vec`: + +```rust +pub struct ColumnSpec { + pub name: String, + pub ty: Type, + pub not_null: bool, + pub unique: bool, + pub default: Option, + pub check: Option, +} +``` + +`Command::AddColumn` carries the same four slots alongside its +`table` / `column` / `ty`. + +Two new commands cover §2.2: + +```rust +Command::AddConstraint { table: String, column: String, constraint: Constraint } +Command::DropConstraint { table: String, column: String, kind: ConstraintKind } + +pub enum Constraint { NotNull, Unique, Default(Value), Check(Expr) } +pub enum ConstraintKind { NotNull, Unique, Default, Check } +``` + +`Constraint` carries the payload (`Default` / `Check`); +`ConstraintKind` is payload-free, for `drop`. + +### 4. SQLite mapping and enforcement + +Every constraint maps directly to native SQLite column-DDL — +nothing is enforced application-side that SQLite can enforce +itself, consistent with the STRICT-tables posture (ADR-0002): + +```sql +"email" TEXT NOT NULL UNIQUE +"age" INTEGER DEFAULT 18 CHECK ("age" >= 0) +``` + +- `not null` → `NOT NULL`; `unique` → `UNIQUE`; + `default v` → `DEFAULT `. +- `check` → `CHECK ()`. The `Expr` is compiled to + SQL the same way WHERE expressions are (`compile_expr`), + except DDL admits no `?` parameters, so literals are + inlined — the inlining helper from ADR-0028 §3 + (`inline_params_for_display`) already does exactly this. + Identifiers are double-quoted; the emitted SQL names no + engine product, so the ADR-0002 rule holds. + +Constraint DDL flows through the existing column-DDL +assembly (`schema_to_ddl`), so the rebuild-table primitive and +every ALTER path pick it up automatically once the constraint +data reaches `ReadColumn`. + +### 5. Applying a constraint to a populated column — the dry-run + +`add constraint not null` / `unique` / `check` against a +column that already holds data can be *violated by that +data*. Per +the ADR-0017 ethos, the violation is caught by a **dry-run +pass that runs before any SQL writes**, and reported as a +learner-friendly table — never as a raw engine error. + +| Command | Pre-flight check | +|--------------------------------|-----------------------------------------------------------| +| `add constraint not null` | `SELECT … WHERE "col" IS NULL` — any row is a violation | +| `add constraint unique` | non-`NULL` values grouped; any group of size > 1 collides | +| `add constraint check (expr)` | `SELECT … WHERE NOT (expr)` — any row is a violation | +| `add constraint default` | none — `DEFAULT` never touches existing rows | +| `drop constraint ` | none — removing a constraint cannot violate data | + +On a violation the command is **refused** and the offending +rows are rendered through the pretty-table renderer +(ADR-0016 / ADR-0017 §7) — rows identified by primary-key +value(s), capped at 100 with a trailing `… and N more` row. +For `add constraint unique`, colliding rows are grouped by +their shared value, mirroring ADR-0017 §7's +uniqueness-collision table. + +When the dry-run passes, the constraint is applied through the +rebuild-table primitive (ADR-0013) — SQLite cannot `ALTER` a +constraint onto an existing column. An empty table passes the +dry-run trivially. + +There is no `--force` override: a constraint that the data +violates is not something the learner should be able to wave +through. They fix the data (with `update` / `delete`) and +retry. This is stricter than ADR-0017's `--force-conversion` +because a constraint is a *rule*, not a lossy transform — a +forced-through violation would leave the schema lying about +the data. + +### 6. DEFAULT semantics + +- A `default` lets an `insert` **omit** the column in the + explicit-column form (`insert into T (a, b) values …`); the + omitted column takes its default. The natural-order short + form (`insert into T values (…)`) still requires every + non-auto-generated column positionally — there is no + positional "skip", exactly as in SQL. +- `add constraint default` affects only *future* inserts; + existing rows are untouched (SQL-standard behaviour, and why + §5 needs no dry-run for it). +- `default` is rejected on a `serial` or `shortid` column: + those carry their own auto-fill contract (ADR-0018), and a + second source of "the value when none is given" would be + ambiguous. The error says so. +- `add column … not null` on a **non-empty** table requires a + `default` in the same spec — the new column is `NULL` in + every existing row, which `NOT NULL` forbids. Without a + default the command is refused with a message that explains + the rule and points at `default`. On an empty table it is + fine. (This is SQLite's own `ADD COLUMN` rule, surfaced as a + friendly error rather than a raw one.) +- `add column … unique default ` on a table with more than + one row is refused: every existing row would receive the + same default value, colliding immediately. The error + explains the collision. + +### 7. Storage and round-trip + +`playground.db` is a derived artifact (ADR-0004 / ADR-0015); +constraints must round-trip through `project.yaml` or they +vanish on `rebuild` / `export` / `import`. + +- **`project.yaml`** — the `ColumnSchema` record gains + `not_null: bool`, `default: Option`, and + `check: Option`. (`unique: bool` already exists, + from ADR-0018's `serial` / `shortid` contract.) The `check` + expression is stored as its **canonical DSL text** — see the + `Expr` text renderer below. +- **Metadata table** — `NOT NULL`, `UNIQUE`, and `DEFAULT` are + all recoverable from SQLite itself (`pragma_table_info`'s + `notnull` and `dflt_value`; `pragma_index_list` origin `u`), + so they need no metadata row. `CHECK` is *not* exposed by + any pragma — only by the raw `sqlite_master` SQL, which is + in engine syntax. So `__rdbms_playground_columns` carries a + nullable `check_expr TEXT` column holding the canonical DSL + text, keeping `describe` independent of engine-syntax + parsing. It is part of the internal table's `CREATE TABLE` + definition — there are no existing databases to migrate. +- **`Expr` → DSL text renderer** — a new `render_expr` (small; + the `Expr` tree from ADR-0026 is shallow) produces canonical + DSL text for an expression. One renderer, three consumers: + `project.yaml` serialization, the `check_expr` metadata + column, and the structure view (§8). The check round-trips + text → `Expr` (re-parsed on load) → text. + +### 8. Structure rendering + +`describe_table` already reports `primary_key` and `notnull`; +`ColumnDescription` gains `unique: bool`, `default: +Option`, and `check: Option`. +`constraints_display` (`output_render.rs`) extends to list +every constraint a column carries: + +``` +┌───────┬──────┬───────────────────────────────────┐ +│ Name │ Type │ Constraints │ +├───────┼──────┼───────────────────────────────────┤ +│ id │ serial │ PK │ +│ email │ text │ NOT NULL, UNIQUE │ +│ age │ int │ DEFAULT 18, CHECK age >= 0 │ +└───────┴──────┴───────────────────────────────────┘ +``` + +The `CHECK` expression renders in DSL form (`render_expr`), +not engine SQL. + +### 9. PK columns — redundant and impossible constraints + +A primary-key column is always implicitly `NOT NULL`. A +*single-column* primary key is additionally `UNIQUE`; the +columns of a *compound* primary key are **not** individually +unique — only the combination is. + +This makes the rules precise: + +- Declaring `not null` on any PK column — in the suffix or via + `add constraint … to` — is a **friendly error**: the PK + already implies it. The message says so, rather than + silently ignoring it, so the learner learns *why*. +- Declaring `unique` on a single-column-PK column is likewise + a friendly error. Declaring `unique` on a column of a + *compound* PK is **allowed and meaningful** — it adds an + individual-uniqueness rule the compound PK does not provide. +- `drop constraint not null` from any PK column, and `drop + constraint unique` from a single-column-PK column, are + friendly errors: the primary key still enforces them, so + there is nothing to drop. +- `default` and `check` on a PK column are always allowed — + the PK implies neither. + +Friendly errors here are deliberate teaching moments, not +silent no-ops: the playground favours clarity over +permissiveness even where SQLite itself would tolerate the +redundant declaration. + +### 10. Friendly errors + +The friendly-error layer (ADR-0019) already enriches `NOT +NULL` and `UNIQUE` violations from `insert` / `update`. Two +additions: + +- **`CHECK` violation** — SQLite reports `CHECK constraint + failed`; a new catalog entry renders it as "the value for + `` breaks the rule ``", with the offending + value and the DSL-form expression. +- **The §5 / §6 refusals** — populated-column dry-run + failures, the `not null` / `unique` / `serial+default` + rejections — are all friendly catalog strings, with the + dry-run tables rendered per ADR-0016. + +### 11. Out of scope + +- **OOS-1.** Table-level `CHECK` constraints spanning multiple + columns. v1's `check` is per-column; its expression + typically references the column it sits on. A table-level + `check (…)` clause is a clean follow-up. +- **OOS-2.** More than one `CHECK` per column. v1 caps at one; + SQL allows several. +- **OOS-3.** Non-literal `DEFAULT`s — `CURRENT_TIMESTAMP`, + expressions. v1 takes a value literal only. +- **OOS-4.** Named constraints (`CONSTRAINT CHECK …`). + The playground's constraints are anonymous; only indexes and + relationships carry user names. +- **OOS-5.** Inline `REFERENCES` foreign keys in the column + suffix. Relationships keep their dedicated `add 1:n + relationship` command (ADR-0013). +- **OOS-6.** `--force`-style overrides for §5 — deliberately + omitted (see §5). + +## Consequences + +- `ColumnSpec` and `Command::AddColumn` grow four constraint + slots; two new commands (`AddConstraint` / `DropConstraint`) + and two new AST enums (`Constraint` / `ConstraintKind`) + land. Every exhaustive `match Command` gains arms. +- A shared constraint-suffix grammar fragment is used in three + places: the `create table` column list, `add column`, and + the `add constraint … to` command. `CHECK` reuses the + ADR-0026 `Expr` grammar through `Subgrammar`. +- The rebuild-table primitive (ADR-0013) becomes the path for + `add constraint …` / `drop constraint …` on populated + columns, joining its existing change-column-type and + relationship users. +- A pre-flight dry-run (§5), in the ADR-0017 style, guards + every populated-column constraint addition; its violation + tables go through the ADR-0016 pretty-table renderer. +- The internal metadata table `__rdbms_playground_columns` + carries a new `check_expr` column — its first change since + ADR-0012. +- A new `Expr` → DSL-text renderer (`render_expr`) is added; + it is also reusable by any future feature that needs to + show an expression back to the user. +- `project.yaml`'s `ColumnSchema` grows three fields; the + format stays backward-compatible (the new keys default to + "absent" — `not_null: false`, no `default`, no `check`). +- The friendly-error layer gains a `CHECK`-violation entry. + +## Implementation notes + +A sensible build order, each step test-guarded: + +1. **AST + the constraint grammar fragment.** `ColumnSpec` / + `Command` changes; the shared constraint-suffix grammar; + `create table` and `add column` parse constraints into the + AST. No execution yet — but every exhaustive `match` + compiles. (Combine with step 2 if the `match` breakage + makes a green commit impossible, exactly as ADR-0028 did.) +2. **Execution at create / add time.** `schema_to_ddl` emits + the constraint DDL; `do_create_table` / `do_add_column` + honour it, including the §6 `add column` rules. The + `Expr` → SQL compile for `CHECK`. +3. **Storage round-trip.** `ColumnSchema` fields, the + `check_expr` metadata column, `render_expr`, and the + `project.yaml` read/write paths. +4. **`add constraint …` / `drop constraint …`.** The two + commands, the rebuild-table path, and the §5 dry-run with + its pretty-table refusals. +5. **Rendering + friendly errors.** `constraints_display`, + the `CHECK`-violation catalog entry, the structure view. +6. **Typing-surface matrix cells** for the new grammar. + +## See also + +- ADR-0002 — STRICT tables; "no engine name in user-facing + strings" (the compiled `CHECK` SQL names no product). +- ADR-0009 — DSL syntax conventions; keyword-led clauses. +- ADR-0012 — internal column metadata; `__rdbms_playground_columns`. +- ADR-0013 — the rebuild-table primitive, reused by §5; the + dotted `
.` reference reused by §2.2. +- ADR-0014 — the value-literal grammar, reused by `default`. +- ADR-0016 — pretty-table rendering for the §5 dry-run tables. +- ADR-0017 — the per-cell dry-run ethos §5 follows. +- ADR-0018 — `serial` / `shortid` auto-fill, why `default` is + rejected on those columns; the existing `unique` field. +- ADR-0026 — the `Expr` grammar `CHECK` reuses via `Subgrammar`. +- ADR-0028 — the literal-inlining helper reused to compile + `CHECK` expressions into parameter-free DDL. diff --git a/docs/adr/README.md b/docs/adr/README.md index 72d33b8..00dc0ad 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -34,3 +34,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0026 — Complex WHERE expressions](0026-complex-where-expressions.md) — **Accepted**, stratified recursive expression grammar (`AND`/`OR`/`NOT`, comparisons, `LIKE`, `IS NULL`, `IN`, `BETWEEN`) for `update` / `delete` / `show data` filters; `show data` gains `where` + `limit`; adds the `Subgrammar` node and a recursive `Expr` AST (`C5a`) - [ADR-0027 — Input-field validity indicator](0027-input-validity-indicator.md) — **Accepted**, a debounced `[ERR]` / `[WRN]` marker at the input row's right edge, backed by a walker diagnostics-severity model (parse-outcome + schema-existence); advisory, never blocks submission (`S6`); Amendment 1 adds a `LIKE`-on-numeric-column WARNING - [ADR-0028 — Query plans (`EXPLAIN QUERY PLAN`)](0028-query-plans.md) — **Accepted**, an `explain` prefix command over `show data` / `update` / `delete`; an annotated, span-styled plan tree; introduces the `OutputLine` styled-runs mechanism (ADR-0016's deferred per-span styling) (`QA1` / `QA2`) +- [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`)