Files
rdbms-playground/docs/adr/0018-auto-fill-contracts-for-serial-and-shortid.md
claude@clouddev1 5bb0a147f0 ADR-0018 implementation: auto-fill contracts for serial and shortid
Generalises serial and shortid beyond their previous restricted
forms:

- `serial` is no longer restricted to single-column PK. Non-PK
  serial columns get an emitted UNIQUE constraint and use
  application-side MAX(col)+1 at INSERT time (rowid alias still
  drives the PK case for free; per ADR-0010 worker-thread
  serialisation, the read-then-insert sequence is safe).
- `shortid` columns auto-fill existing null cells when the
  column is materialised — `add column T: x (shortid)` on a
  non-empty table no longer leaves rows in a not-really-valid
  NULL state.
- `int -> serial` joins the type-change matrix as always-clean
  identity (closes the asymmetry vs `text -> shortid`); other
  sources are refused with a route-via-int hint.
- `change column T: x (serial|shortid)` fills null source
  cells with sequence / generated values in the same rebuild
  transaction.

Internal infrastructure:

- ReadColumn gains `unique: bool`; read_schema detects single-
  column UNIQUE indexes via pragma_index_list /
  pragma_index_info; schema_to_ddl emits inline UNIQUE for
  non-PK columns.
- ColumnSchema (persistence) gains `unique: bool` so the flag
  survives YAML round-trip and rebuild-from-text reconstructs
  it faithfully — preserves the "serial -> int leaves UNIQUE
  in place" promise across save/load cycles.
- ChangeColumnTypeResult.client_side now carries `auto_filled`
  + `auto_fill_kind` alongside `transformed` + `lossy`; the
  app handler renders separate note lines when both apply.
- AddColumnResult is a new return type carrying pre-rendered
  [client-side] note lines for the auto-fill paths.

Tests: 519 -> 534 (+15). Clippy clean.
2026-05-08 14:32:19 +00:00

386 lines
16 KiB
Markdown

# ADR-0018: Auto-fill contracts for `serial` and `shortid` columns
## Status
Accepted.
Amends ADR-0005 (column type vocabulary), ADR-0014 (data
operations and value model), and ADR-0017 (column type-change
compatibility). Pulls part of C3's UNIQUE-constraint emission
forward as internal infrastructure.
## Context
`serial` and `shortid` are the two auto-generated types in
ADR-0005's vocabulary. Today they have asymmetric and
under-specified semantics:
1. **`serial` only on PK.** `Type::Serial.sqlite_strict_extra()`
returns `" PRIMARY KEY"`, and `do_add_column` explicitly
refuses serial. The implicit user-facing model is "serial =
auto-incrementing PK". This is an artefact of SQLite's only
free auto-increment mechanism (the rowid alias on `INTEGER
PRIMARY KEY`); other RDBMS — PostgreSQL's `SEQUENCE`, MySQL's
`AUTO_INCREMENT` — let auto-incrementing columns exist
anywhere. Our pedagogical intent is the broader model; the
restriction is incidental to our backend choice and leaks
that choice into the user-facing surface (against ADR-0002).
2. **`int → serial` is statically refused** in ADR-0017's
transformer matrix, while `text → shortid` is per-cell-
classified. Yet both target types are equally
"auto-generated with a uniqueness contract" — the asymmetry
isn't principled.
3. **`add column T: x (shortid)` on a non-empty table leaves
existing rows NULL.** Per the design contract, shortids are
unique non-null identifiers — so the column ends up in a
not-really-valid state until the user issues UPDATEs. The
auto-fill logic that runs at INSERT time for omitted shortid
values doesn't run at column-materialisation time.
4. **No UNIQUE constraint emission today.** A non-PK serial
column would need a UNIQUE constraint to enforce its contract
(the rowid trick isn't available off the PK). The same
applies to non-PK shortid: today it relies on a probabilistic
"won't collide" argument, not a database-enforced
guarantee. `schema_to_ddl` only emits NOT NULL inline plus PK
inline / table-level. No path emits UNIQUE.
This ADR resolves all four gaps with a single unifying
principle.
## Decision
### 1. The unifying principle
> Auto-generated column types honour their generation contract
> on every path that creates or transitions the column.
Concretely: a column declared (or converted to be) `serial` or
`shortid` always satisfies its contract — non-null,
auto-generated, unique — by the time the operation completes.
The user does not have to issue a follow-up UPDATE. The
mechanism is hidden; the user-facing model is "values appear
automatically".
### 2. `serial`: dual-implementation, single semantic
`serial` is generalised from "auto-incrementing PK" to
"auto-incrementing integer column". The column may be the table
PK or any non-PK column; the user-facing semantic is identical.
The implementation switches transparently:
- **PK case** (single-column PK on this column): rowid alias.
`INTEGER PRIMARY KEY` in DDL; SQLite's free auto-increment
applies. Unchanged from today.
- **Non-PK case**: app-level `MAX(col) + 1` lookup at INSERT
time, plus an emitted UNIQUE constraint on the column. The
worker-thread serialisation (ADR-0010) makes the read-then-
insert sequence safe without explicit locking — only one
INSERT runs at a time on the connection.
User-visible help, error messages, and `[client-side]` notes
refer to `serial` columns as "auto-incrementing" or
"auto-generated". The PK / non-PK distinction is an internal
implementation detail (ADR-0002 user-facing posture).
### 3. `shortid`: tighten the contract at column materialisation
Today: shortid generation runs only when an INSERT omits the
value. Rows existing at the moment a shortid column is created
remain NULL until the user issues an UPDATE.
Going forward: any null cell in a shortid column gets a freshly-
generated value at the operation that creates that condition:
- `add column T: x (shortid)` on a non-empty table fills every
existing row's `x` with a generated shortid before the
operation completes.
- `change column T: x (shortid)` from `text` (or any other
matrix-permitted source) fills any null cells with generated
shortids in the same rebuild transaction.
Generator collisions (vanishingly rare given the 10⁷–10⁸
namespace; see ADR-0014 §"shortid auto-generation") trigger up
to 5 retries per cell. Exhausting retries fails the operation
with a friendly diagnostic; in practice this indicates either a
generator-state bug or a pathological RNG and is not user-
recoverable.
### 4. UNIQUE story
Auto-generated non-PK columns gain an emitted UNIQUE constraint
to enforce their contract:
- Non-PK `serial`: gains UNIQUE on creation / conversion-to-
serial. Required for the contract; the rowid trick isn't
available off the PK.
- Non-PK `shortid`: gains UNIQUE on creation / conversion-to-
shortid. Strengthens today's probabilistic guarantee into a
database-enforced one.
- PK case for either type: PK already implies UNIQUE+NOT NULL.
No additional constraint needed.
The reverse direction (`serial → int`, `shortid → text`) leaves
the UNIQUE constraint **in place**. The user has not signalled
intent to drop the uniqueness guarantee; only the auto-
generation contract was dropped. When constraint-management
lands as a user-facing feature (C3-track), the user can
explicitly drop the UNIQUE if desired.
This ADR pulls forward the **internal infrastructure** to emit
and read UNIQUE constraints — `schema_to_ddl` gains UNIQUE-
column-clause emission; `read_schema` gains UNIQUE detection
via `pragma_index_list` + `pragma_index_info`; `ReadColumn`
gains a `unique: bool` field. The **user-facing constraint
surface** (declaring UNIQUE in `with pk … unique …` or via
`add unique`, dropping UNIQUE, naming UNIQUE constraints) is
not in scope here and remains C3-track work.
### 5. INSERT-path changes
For non-PK `serial` columns, when the column is omitted from
an INSERT (the existing skip-list at db.rs:3111 already covers
serial and shortid identically), the executor:
1. Queries `SELECT COALESCE(MAX(col), 0) + 1 FROM T` inside the
same transaction.
2. Binds the result as the column's value.
The MAX-based seeding mirrors SQLite's rowid behaviour: gaps
left by user-supplied explicit values are jumped over (the next
auto-fill is `MAX + 1`, not "the smallest available integer").
Worker-thread serialisation (ADR-0010) prevents the classic
read-modify-write race; the pattern is safe for our single-
writer model.
### 6. `add_column` changes
`do_add_column` lifts its blanket serial refusal (db.rs:1374).
The new behaviour is determined by the source table's state:
- **`add column T: x (serial)` on an empty table**: emit
`ALTER TABLE T ADD COLUMN x INTEGER UNIQUE`. Every table has
a PK by construction (the parser refuses `create table`
without `with pk`), so the "no PK" branch doesn't arise —
the new column joins as a non-PK serial.
- **`add column T: x (serial)` on a non-empty table**: route
through the rebuild-table primitive (ADR-0013). Create new
table with `x INTEGER UNIQUE`. Copy rows, filling `x` with
values 1..N in declaration order. Emit a `[client-side]` note
(§7).
- **`add column T: x (shortid)` on a non-empty table**: route
through the rebuild-table primitive. Create new table with
`x TEXT UNIQUE`. Copy rows, generating a fresh shortid for
each (collision-retried per §3). Emit a `[client-side]` note.
The empty-table path can stay on `ALTER TABLE ADD COLUMN` for
efficiency; the non-empty path needs the rebuild because we
need to populate the new column atomically with table
creation.
### 7. `change column` to `serial` / `shortid`
`change column T: x (serial)` from any matrix-permitted source
type (today: `int`; future expansions follow the same rule):
1. Run the per-cell dry-run (ADR-0017 §2). For non-null cells,
classify via the transformer matrix: source must produce an
integer (the existing serial pre-condition).
2. Refuse if existing non-null values have duplicates
(uniqueness collision, ADR-0017 §4.3).
3. Auto-fill any null cells with sequential values continuing
from `MAX(non-null values) + 1` (or starting at 1 if none).
4. Refuse if the auto-fill would itself produce a collision —
in practice, this can only happen if the user supplied
non-null values that already overlap the would-be sequence
(e.g., existing values [1, 2, 5] with two nulls — fill would
be 6 and 7, no collision; existing values [1, 2, 6] with
nulls — fill would be 3 and 4, no collision; the sequence
uses MAX+1, not gap-filling, so this case doesn't actually
arise — but state the rule defensively).
5. Rebuild the table with the new column type plus UNIQUE (per
§4) plus the transformed + auto-filled values.
6. Emit `[client-side]` notes (§7).
`change column T: x (shortid)` from `text`:
1. Run the per-cell dry-run. Non-null cells classify via the
text → shortid transformer (ADR-0017 §3) — must match the
shortid grammar.
2. Refuse if existing non-null shortid-valid values have
duplicates.
3. Auto-fill null cells with generated shortids (collision-
retried per §3, including against the existing values).
4. Rebuild with TEXT + UNIQUE + the validated + auto-filled
values.
5. Emit `[client-side]` notes.
### 8. Conversion matrix amendments to ADR-0017
ADR-0017 §3 "Statically refused" is amended:
- `int → serial` is **removed** from the static refusal list and
added as a **per-cell-classified** matrix entry: clean for
non-null integers (with the post-transformation uniqueness
check from §4.3), with null-cell auto-fill per §7 above.
- The general "Anything → `serial`" refusal is replaced with a
more specific list: `text → serial`, `real → serial`, etc.
remain refused for v1 (route via int first); `bool → serial`
remains refused (cross-domain).
- `text → shortid` is unchanged from ADR-0017 (still per-cell-
classified). The contract enforcement at column-materialisation
is new.
ADR-0017 §4.3 (uniqueness check) is amended to apply to
"PK columns and shortid columns and any column that gains a
UNIQUE constraint as part of the operation" — i.e., non-PK
serial / shortid targets are uniqueness-checked.
### 9. Client-side notes
ADR-0017 §6 introduced the `[client-side]` pattern: when the
playground rewrote any cell value, the success summary tells
the learner "the tool did this for you; raw SQL would need a
`CAST` or application-level code." This ADR extends the pattern
to auto-fill operations:
- **`add column T: x (serial)` on non-empty table**:
> [client-side] N row(s) given auto-generated serial values
> 1..N. In raw SQL this would need an explicit UPDATE to
> populate.
- **`add column T: x (shortid)` on non-empty table**:
> [client-side] N row(s) given auto-generated shortid values.
> In raw SQL this would need an explicit UPDATE to populate.
- **`change column T: x (serial)` with M null cells**:
> [client-side] M null cell(s) given auto-generated serial
> values. In raw SQL this would need an explicit UPDATE to
> populate.
- **`change column T: x (shortid)` with M null cells**:
> [client-side] M null cell(s) given auto-generated shortid
> values. In raw SQL this would need an explicit UPDATE to
> populate.
When both an ADR-0017 transformation note AND an ADR-0018
auto-fill note apply to the same operation (e.g., `change
column T: x (shortid)` from text where some cells need
validation and others need auto-fill), both notes are emitted
on separate lines. The success path emits them after the `[ok]`
summary and before the structure-render block.
### 10. Engine-vocabulary cleanup
While here, fix the existing user-facing string in
`do_add_column`'s serial refusal (db.rs:1374): the message
names "SQLite's ALTER TABLE" — an ADR-0002 user-facing posture
violation. This message is being replaced anyway as part of
lifting the refusal; the replacement uses abstract "the
database" / "the engine" phrasing.
## Resolutions
Three points called out as "open" during drafting, resolved
before acceptance:
1. **No-PK empty-table case**: not reachable. Every table has
a PK by construction — the `create table` parser refuses
input that produces an empty PK list. `add column T: x
(serial)` on an empty table therefore always lands on a
table that already has a PK, and the new `x` column is a
non-PK serial (gains UNIQUE per §4).
2. **Serial sequencing under explicit user inserts**: MAX+1.
If the user explicitly inserts `id = 100`, the next auto-
fill yields 101. Gappy sequences are accepted (e.g., if
the user later inserts `id = 200`, the next auto-fill is
201; the gap 102..199 is not back-filled). MAX+1 matches
SQLite's rowid behaviour for the PK case, so both
implementation paths feel uniform to the user, and gap-
detection is more expensive than its pedagogical value.
3. **UNIQUE emission style**: inline column constraint
(`x INTEGER UNIQUE`). Cleaner DDL while we don't have a
user-facing constraint surface that would benefit from
named, separately-managed indexes. Revisitable when C3
lands the user-facing constraint feature; the
`read_schema` detection via `pragma_index_list` works for
either form.
## Out of scope
- **OOS-1.** User-facing UNIQUE constraint surface (`add
unique <T>: <c>`, `drop unique`, naming, multi-column unique).
Stays as C3-track work.
- **OOS-2.** Strict-monotonic AUTOINCREMENT semantics — we
retain plain `INTEGER PRIMARY KEY` for PK serial. Rebuild-
reset of the high-water mark is acceptable for a teaching
tool; users who care can be taught the distinction in a
later iteration.
- **OOS-3.** Custom serial start values, custom step sizes, or
multi-column composite serial.
- **OOS-4.** Non-PK serial when the table has no PK at all
(caught by Open Question 1's resolution).
- **OOS-5.** A `[client-side]` note on the empty-table case
(`add column` on an empty table). No rows means nothing to
auto-fill — the operation is a structural change with no
pedagogical "the tool did this for you" content.
- **OOS-6.** Reading and emitting CHECK constraints — only
UNIQUE is required for this ADR.
## Consequences
- The "serial only on PK" mental model is replaced with
"serial works anywhere". Pedagogically richer: students see
auto-incrementing columns as a general feature, not as a
special PK-only quirk.
- One internal mechanism the user doesn't see (rowid alias vs
application MAX+1). The two paths converge to identical
user-facing behaviour, honouring ADR-0002's posture.
- `schema_to_ddl` and `read_schema` gain UNIQUE handling — a
partial pull-forward of C3 work. The user-facing constraint
surface stays deferred; this ADR only lands the internal
infrastructure required by serial / shortid contracts.
- `[client-side]` notes proliferate to cover auto-fill cases.
Strengthens the pedagogical lens: every place the playground
goes beyond what raw SQL does, the user is told.
- All four user-observed gaps from §Context closed. The
`int → serial → int` round-trip works (matching the existing
`text → shortid → text` round-trip from ADR-0017).
- Add-column-with-shortid producing a "valid" state aligns
with ADR-0005's design contract that shortids are unique
non-null identifiers.
## Relationship to earlier ADRs
- **ADR-0002** — User-facing posture honoured: the dual
serial implementation is hidden; the existing engine-name
leak in `do_add_column`'s refusal message is fixed
opportunistically.
- **ADR-0005** — Type vocabulary unchanged; `serial` definition
generalised. The keyword and the user model stay the same;
the implementation broadens.
- **ADR-0010** — Worker-thread serialisation is what makes the
non-PK serial MAX+1 path safe without explicit locks.
- **ADR-0011** — `fk_target_type` for serial unchanged
(`Serial → Int`); FK target compatibility remains as-is.
- **ADR-0013** — Rebuild-table primitive carries the auto-fill
cases for non-empty `add column` and `change column to
serial/shortid`.
- **ADR-0014** — INSERT-time auto-fill semantics extended to
non-PK serial. ADR-0014's auto-fill skip-list (which already
covers both serial and shortid symmetrically) is reused.
- **ADR-0015** — The text-format round-trip carries the new
UNIQUE constraints in metadata so a rebuild from
`project.yaml` reconstructs the database faithfully. Likely
needs a `__rdbms_playground_columns` schema additon (a
`unique` bool) — to be confirmed during implementation.
- **ADR-0017** — §3 transformer matrix amended: `int → serial`
joins per-cell-classified. §4.3 uniqueness check extended to
cover non-PK serial / shortid targets.