Files
rdbms-playground/docs/adr/0029-column-constraints.md
claude@clouddev1 a049ff9aa0 docs: handoff 23 — ADR-0029 complete; tick C3
ADR-0029 (column constraints — NOT NULL / UNIQUE / CHECK /
DEFAULT) is fully implemented across the handoff-22 and
handoff-23 sessions. Ticks requirement C3, and corrects
ADR §10's CHECK-error wording to the compiled-SQL form per
the §7 storage deviation.
2026-05-19 18:56:50 +00:00

464 lines
20 KiB
Markdown

# ADR-0029: Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)
## Status
Accepted
## Context
The requirements checklist (`C3`) commits to the full schema-
constraint surface. Most of it has landed across earlier ADRs:
primary keys, single and compound, at create-table time;
foreign keys with `ON DELETE` / `ON UPDATE` actions
(ADR-0013); indexes (ADR-0025). What remains are the four
*column-level* constraints every relational schema relies on:
`NOT NULL`, `UNIQUE`, `CHECK`, and `DEFAULT`.
They are core pedagogy. A learner cannot model a real schema
without saying "this column is required", "these values must
be distinct", "this value must be in range", or "use this
when none is given". Until now the playground can only teach
the primary key — which bundles NOT NULL + UNIQUE invisibly —
and the relational shape of foreign keys. The four standalone
constraints are the missing half of "what a column is".
Three things make this the right moment, and shape the
decision:
1. **The rebuild-table primitive exists** (ADR-0013). SQLite's
`ALTER TABLE` cannot add `NOT NULL` / `UNIQUE` / `CHECK` to
an existing column; the rebuild dance is the accepted
recipe, and it is already load-bearing for change-column-
type and every relationship operation.
2. **The per-cell dry-run model exists** (ADR-0017). Applying
a constraint to a *populated* column is the same shape of
problem as a type change: scan the existing data, classify
it, and refuse with a learner-friendly table of offending
rows *before* any SQL writes. ADR-0029 reuses that ethos.
3. **The WHERE-expression grammar exists** (ADR-0026). A
`CHECK` constraint is a boolean expression over the
column. Rather than invent a second expression language,
`CHECK` *reuses* the ADR-0026 `Expr` grammar through the
`Subgrammar` node — the same expression language the
learner already meets in `where` filters.
`PRIMARY KEY` is **not** revisited here. It keeps its dedicated
`with pk …` clause: it is table-level (compound keys span
columns), it is declared once, and folding it into the
per-column constraint suffix would muddy both surfaces. A PK
column is implicitly `NOT NULL` and `UNIQUE`; this ADR makes
that explicit in §9 by rejecting redundant declarations.
## Decision
### 1. The constraint vocabulary
Four column-level constraints, each appearing **at most once**
per column in v1:
| Constraint | Payload | Meaning |
|--------------|------------------------|-------------------------------------------|
| `not null` | — | the column rejects `NULL` |
| `unique` | — | all non-`NULL` values must be distinct |
| `default` | a value literal | value used when an `insert` omits the column |
| `check` | a parenthesised `Expr` | every row must satisfy the boolean expression |
The payload literal for `default` is an ordinary value literal
per the ADR-0014 value grammar. The `check` expression is an
ADR-0026 `Expr`, reached through `Subgrammar` — see §2.
`unique` follows SQLite's standard semantics: multiple `NULL`s
do **not** collide (SQL's "NULLs are distinct" rule). This is
stated explicitly because it is a common learner surprise and
the friendly-error and dry-run text both depend on it.
### 2. Grammar — declaring constraints
#### 2.1 The constraint suffix (`create table`, `add column`)
A column spec gains an optional, repeatable constraint suffix
**after** the `(type)` group:
```
create table Books with pk isbn(text) check (isbn like '978%')
add column to Books: title (text) not null
add column to Books: stock (int) default 0 check (stock >= 0)
```
**Where each constraint is useful.** The simple-mode `create
table … with pk …` declares *only* primary-key columns —
every column in the `with pk` list is part of the primary key
(non-PK columns are added afterward with `add column`; see
`docs/simple-mode-limitations.md`). Since a PK column is
already `NOT NULL` and `UNIQUE`, §9 rejects those two as
redundant there — so on a `create table` column the suffix is
useful for `default` / `check`. `not null` / `unique` come
into their own on `add column` (non-PK columns) and on `add
constraint`. The suffix grammar is nonetheless shared
verbatim across all three surfaces; §9 does the rejecting.
- Standard SQL writes constraints after the data type
(`email TEXT NOT NULL UNIQUE`). The playground brackets the
type as `email(text)` — a pre-existing convention this ADR
does not relitigate — so the faithful adaptation is the same
keyword order *after* the bracketed type.
- Constraint keywords are case-insensitive (ADR-0009) and may
appear in any order. Each kind at most once; a repeat
(`not null not null`) is a friendly error naming the
duplicated constraint.
- `default <literal>` takes one value literal.
- `check ( <expr> )` parenthesises its expression — matching
SQL's `CHECK (…)`, and giving the parser an unambiguous end
for the expression before the next constraint or the
column-separating comma. The expression is the ADR-0026
`Expr` grammar via `Node::Subgrammar(&expr::OR_EXPR)`.
- The column-separating comma of the `with pk` list follows
the last constraint of a column, exactly as today.
The constraint suffix is one shared grammar fragment, used by
both `create table`'s column list and `add column`.
#### 2.2 Constraints on an existing column (`add constraint …` / `drop constraint …`)
Modifying the constraints of an *already-created* column is a
first-class v1 surface. It reuses the established `add` / `drop`
verb families and the dotted `<Table>.<column>` reference
(ADR-0013's `add 1:n relationship from <P>.<col>`):
```
add constraint not null to Users.email
add constraint unique to Users.email
add constraint default 18 to Users.age
add constraint check (age >= 0) to Users.age
drop constraint not null from Users.email
drop constraint unique from Users.email
drop constraint default from Users.age
drop constraint check from Users.age
```
- `add constraint <constraint> to <T>.<col>` and `drop
constraint <constraint-kind> from <T>.<col>` join the
existing `add` / `drop` command `Choice`s as new forms. The
form word `constraint` discriminates them — exactly as
`column` / `index` / `relationship` discriminate the other
`add` / `drop` forms. Every `add` / `drop` stays uniformly
`<verb> <noun> …`, which keeps the grammar hierarchy clean.
- After `add constraint`, the `<constraint>` is exactly the
§2.1 suffix vocabulary: `not null`, `unique`, `default
<literal>`, `check ( <expr> )`. The constraint grammar
fragment is shared a third time.
- After `drop constraint`, only the *kind* is named — `not
null`, `unique`, `default`, `check` — since at most one of
each exists per column.
### 3. AST
`ColumnSpec` (`src/dsl/command.rs`) gains the four constraint
slots — each kind is at-most-one, so they are fields, not a
`Vec`:
```rust
pub struct ColumnSpec {
pub name: String,
pub ty: Type,
pub not_null: bool,
pub unique: bool,
pub default: Option<Value>,
pub check: Option<Expr>,
}
```
`Command::AddColumn` carries the same four slots alongside its
`table` / `column` / `ty`.
Two new commands cover §2.2:
```rust
Command::AddConstraint { table: String, column: String, constraint: Constraint }
Command::DropConstraint { table: String, column: String, kind: ConstraintKind }
pub enum Constraint { NotNull, Unique, Default(Value), Check(Expr) }
pub enum ConstraintKind { NotNull, Unique, Default, Check }
```
`Constraint` carries the payload (`Default` / `Check`);
`ConstraintKind` is payload-free, for `drop`.
### 4. SQLite mapping and enforcement
Every constraint maps directly to native SQLite column-DDL —
nothing is enforced application-side that SQLite can enforce
itself, consistent with the STRICT-tables posture (ADR-0002):
```sql
"email" TEXT NOT NULL UNIQUE
"age" INTEGER DEFAULT 18 CHECK ("age" >= 0)
```
- `not null` → `NOT NULL`; `unique` → `UNIQUE`;
`default v` → `DEFAULT <literal>`.
- `check` → `CHECK (<sql-expr>)`. The `Expr` is compiled to
SQL the same way WHERE expressions are (`compile_expr`),
except DDL admits no `?` parameters, so literals are
inlined — the inlining helper from ADR-0028 §3
(`inline_params_for_display`) already does exactly this.
Identifiers are double-quoted; the emitted SQL names no
engine product, so the ADR-0002 rule holds.
Constraint DDL flows through the existing column-DDL
assembly (`schema_to_ddl`), so the rebuild-table primitive and
every ALTER path pick it up automatically once the constraint
data reaches `ReadColumn`.
### 5. Applying a constraint to a populated column — the dry-run
`add constraint not null` / `unique` / `check` against a
column that already holds data can be *violated by that
data*. Per
the ADR-0017 ethos, the violation is caught by a **dry-run
pass that runs before any SQL writes**, and reported as a
learner-friendly table — never as a raw engine error.
| Command | Pre-flight check |
|--------------------------------|-----------------------------------------------------------|
| `add constraint not null` | `SELECT … WHERE "col" IS NULL` — any row is a violation |
| `add constraint unique` | non-`NULL` values grouped; any group of size > 1 collides |
| `add constraint check (expr)` | `SELECT … WHERE NOT (expr)` — any row is a violation |
| `add constraint default` | none — `DEFAULT` never touches existing rows |
| `drop constraint <any>` | none — removing a constraint cannot violate data |
On a violation the command is **refused** and the offending
rows are rendered through the pretty-table renderer
(ADR-0016 / ADR-0017 §7) — rows identified by primary-key
value(s), capped at 100 with a trailing `… and N more` row.
For `add constraint unique`, colliding rows are grouped by
their shared value, mirroring ADR-0017 §7's
uniqueness-collision table.
When the dry-run passes, the constraint is applied through the
rebuild-table primitive (ADR-0013) — SQLite cannot `ALTER` a
constraint onto an existing column. An empty table passes the
dry-run trivially.
There is no `--force` override: a constraint that the data
violates is not something the learner should be able to wave
through. They fix the data (with `update` / `delete`) and
retry. This is stricter than ADR-0017's `--force-conversion`
because a constraint is a *rule*, not a lossy transform — a
forced-through violation would leave the schema lying about
the data.
### 6. DEFAULT semantics
- A `default` lets an `insert` **omit** the column in the
explicit-column form (`insert into T (a, b) values …`); the
omitted column takes its default. The natural-order short
form (`insert into T values (…)`) still requires every
non-auto-generated column positionally — there is no
positional "skip", exactly as in SQL.
- `add constraint default` affects only *future* inserts;
existing rows are untouched (SQL-standard behaviour, and why
§5 needs no dry-run for it).
- `default` is rejected on a `serial` or `shortid` column:
those carry their own auto-fill contract (ADR-0018), and a
second source of "the value when none is given" would be
ambiguous. The error says so.
- `add column … not null` on a **non-empty** table requires a
`default` in the same spec — the new column is `NULL` in
every existing row, which `NOT NULL` forbids. Without a
default the command is refused with a message that explains
the rule and points at `default`. On an empty table it is
fine. (This is SQLite's own `ADD COLUMN` rule, surfaced as a
friendly error rather than a raw one.)
- `add column … unique default <v>` on a table with more than
one row is refused: every existing row would receive the
same default value, colliding immediately. The error
explains the collision.
### 7. Storage and round-trip
`playground.db` is a derived artifact (ADR-0004 / ADR-0015);
constraints must round-trip through `project.yaml` or they
vanish on `rebuild` / `export` / `import`.
The `CHECK` expression is stored — everywhere — as the
**compiled SQL** form: the parsed `Expr` run through
`compile_expr` with literals inlined (§4). Identifiers are
double-quoted, exactly as ADR-0028's `explain` display SQL
already renders them, so the form is consistent with what the
learner already meets. (The original plan stored canonical
*DSL text*; that needs both an `Expr`→text renderer and a
text→`Expr` re-parser for the round-trip, while the rebuild
path needs the SQL form regardless — storing SQL once removes
both.)
- **`project.yaml`** — the `ColumnSchema` record gains
`not_null: bool`, `default: Option<String>`, and
`check: Option<String>`. (`unique: bool` already exists,
from ADR-0018's `serial` / `shortid` contract.) `check`
holds the compiled SQL.
- **Metadata table** — `NOT NULL`, `UNIQUE`, and `DEFAULT` are
all recoverable from SQLite itself (`pragma_table_info`'s
`notnull` and `dflt_value`; `pragma_index_list` origin `u`),
so they need no metadata row. `CHECK` is *not* exposed by
any pragma. So `__rdbms_playground_columns` carries a
nullable `check_expr TEXT` column holding the compiled SQL,
which `schema_to_ddl` and `describe` echo verbatim. It is
part of the internal table's `CREATE TABLE` definition —
there are no existing databases to migrate.
### 8. Structure rendering
`describe_table` already reports `primary_key` and `notnull`;
`ColumnDescription` gains `unique: bool`, `default:
Option<String>`, and `check: Option<String>`.
`constraints_display` (`output_render.rs`) extends to list
every constraint a column carries:
```
┌───────┬────────┬─────────────────────────────────┐
│ Name │ Type │ Constraints │
├───────┼────────┼─────────────────────────────────┤
│ id │ serial │ PK │
│ email │ text │ NOT NULL, UNIQUE │
│ age │ int │ DEFAULT 18, CHECK ("age" >= 0) │
└───────┴────────┴─────────────────────────────────┘
```
The `CHECK` renders in its compiled-SQL form (§7) — the same
double-quoted-identifier style as ADR-0028's `explain` SQL.
### 9. PK columns — redundant and impossible constraints
A primary-key column is always implicitly `NOT NULL`. A
*single-column* primary key is additionally `UNIQUE`; the
columns of a *compound* primary key are **not** individually
unique — only the combination is.
This makes the rules precise:
- Declaring `not null` on any PK column — in the suffix or via
`add constraint … to` — is a **friendly error**: the PK
already implies it. The message says so, rather than
silently ignoring it, so the learner learns *why*.
- Declaring `unique` on a single-column-PK column is likewise
a friendly error. Declaring `unique` on a column of a
*compound* PK is **allowed and meaningful** — it adds an
individual-uniqueness rule the compound PK does not provide.
- `drop constraint not null` from any PK column, and `drop
constraint unique` from a single-column-PK column, are
friendly errors: the primary key still enforces them, so
there is nothing to drop.
- `default` and `check` on a PK column are always allowed —
the PK implies neither.
Friendly errors here are deliberate teaching moments, not
silent no-ops: the playground favours clarity over
permissiveness even where SQLite itself would tolerate the
redundant declaration.
### 10. Friendly errors
The friendly-error layer (ADR-0019) already enriches `NOT
NULL` and `UNIQUE` violations from `insert` / `update`. Two
additions:
- **`CHECK` violation** — the engine reports `CHECK constraint
failed: <col>`. The runtime enrichment resolves the column,
the offending value, and the column's `CHECK` expression; the
catalog entry renders "the value `<v>` breaks the rule
`<check-expr>`". The rule is shown in its **compiled-SQL
form** (§7) — the same form stored and displayed everywhere,
since deviation §7 removed the DSL-text representation.
- **The §5 / §6 refusals** — populated-column dry-run
failures, the `not null` / `unique` / `serial+default`
rejections — are all friendly catalog strings, with the
dry-run tables rendered per ADR-0016.
### 11. Out of scope
- **OOS-1.** Table-level `CHECK` constraints spanning multiple
columns. v1's `check` is per-column; its expression
typically references the column it sits on. A table-level
`check (…)` clause is a clean follow-up.
- **OOS-2.** More than one `CHECK` per column. v1 caps at one;
SQL allows several.
- **OOS-3.** Non-literal `DEFAULT`s — `CURRENT_TIMESTAMP`,
expressions. v1 takes a value literal only.
- **OOS-4.** Named constraints (`CONSTRAINT <name> CHECK …`).
The playground's constraints are anonymous; only indexes and
relationships carry user names.
- **OOS-5.** Inline `REFERENCES` foreign keys in the column
suffix. Relationships keep their dedicated `add 1:n
relationship` command (ADR-0013).
- **OOS-6.** `--force`-style overrides for §5 — deliberately
omitted (see §5).
## Consequences
- `ColumnSpec` and `Command::AddColumn` grow four constraint
slots; two new commands (`AddConstraint` / `DropConstraint`)
and two new AST enums (`Constraint` / `ConstraintKind`)
land. Every exhaustive `match Command` gains arms.
- A shared constraint-suffix grammar fragment is used in three
places: the `create table` column list, `add column`, and
the `add constraint … to` command. `CHECK` reuses the
ADR-0026 `Expr` grammar through `Subgrammar`.
- The rebuild-table primitive (ADR-0013) becomes the path for
`add constraint …` / `drop constraint …` on populated
columns, joining its existing change-column-type and
relationship users.
- A pre-flight dry-run (§5), in the ADR-0017 style, guards
every populated-column constraint addition; its violation
tables go through the ADR-0016 pretty-table renderer.
- The internal metadata table `__rdbms_playground_columns`
carries a new `check_expr` column — its first change since
ADR-0012 — holding the compiled-SQL form of the `CHECK`.
- `project.yaml`'s `ColumnSchema` grows three fields; the
format stays backward-compatible (the new keys default to
"absent" — `not_null: false`, no `default`, no `check`).
- The friendly-error layer gains a `CHECK`-violation entry.
## Implementation notes
A sensible build order, each step test-guarded:
1. **AST + the constraint grammar fragment.** `ColumnSpec` /
`Command` changes; the shared constraint-suffix grammar;
`create table` and `add column` parse constraints into the
AST. No execution yet — but every exhaustive `match`
compiles. (Combine with step 2 if the `match` breakage
makes a green commit impossible, exactly as ADR-0028 did.)
2. **Execution at create / add time.** `schema_to_ddl` emits
the constraint DDL; `do_create_table` / `do_add_column`
honour it, including the §6 `add column` rules. The
`Expr` → SQL compile for `CHECK`.
3. **Storage round-trip.** `ColumnSchema` fields, the
`check_expr` metadata column, and the `project.yaml`
read/write paths.
4. **`add constraint …` / `drop constraint …`.** The two
commands, the rebuild-table path, and the §5 dry-run with
its pretty-table refusals.
5. **Rendering + friendly errors.** `constraints_display`,
the `CHECK`-violation catalog entry, the structure view.
6. **Typing-surface matrix cells** for the new grammar.
## See also
- ADR-0002 — STRICT tables; "no engine name in user-facing
strings" (the compiled `CHECK` SQL names no product).
- ADR-0009 — DSL syntax conventions; keyword-led clauses.
- ADR-0012 — internal column metadata; `__rdbms_playground_columns`.
- ADR-0013 — the rebuild-table primitive, reused by §5; the
dotted `<Table>.<column>` reference reused by §2.2.
- ADR-0014 — the value-literal grammar, reused by `default`.
- ADR-0016 — pretty-table rendering for the §5 dry-run tables.
- ADR-0017 — the per-cell dry-run ethos §5 follows.
- ADR-0018 — `serial` / `shortid` auto-fill, why `default` is
rejected on those columns; the existing `unique` field.
- ADR-0026 — the `Expr` grammar `CHECK` reuses via `Subgrammar`.
- ADR-0028 — the literal-inlining helper reused to compile
`CHECK` expressions into parameter-free DDL.