Files
rdbms-playground/docs/adr/0018-auto-fill-contracts-for-serial-and-shortid.md
T
claude@clouddev1 5bb0a147f0 ADR-0018 implementation: auto-fill contracts for serial and shortid
Generalises serial and shortid beyond their previous restricted
forms:

- `serial` is no longer restricted to single-column PK. Non-PK
  serial columns get an emitted UNIQUE constraint and use
  application-side MAX(col)+1 at INSERT time (rowid alias still
  drives the PK case for free; per ADR-0010 worker-thread
  serialisation, the read-then-insert sequence is safe).
- `shortid` columns auto-fill existing null cells when the
  column is materialised — `add column T: x (shortid)` on a
  non-empty table no longer leaves rows in a not-really-valid
  NULL state.
- `int -> serial` joins the type-change matrix as always-clean
  identity (closes the asymmetry vs `text -> shortid`); other
  sources are refused with a route-via-int hint.
- `change column T: x (serial|shortid)` fills null source
  cells with sequence / generated values in the same rebuild
  transaction.

Internal infrastructure:

- ReadColumn gains `unique: bool`; read_schema detects single-
  column UNIQUE indexes via pragma_index_list /
  pragma_index_info; schema_to_ddl emits inline UNIQUE for
  non-PK columns.
- ColumnSchema (persistence) gains `unique: bool` so the flag
  survives YAML round-trip and rebuild-from-text reconstructs
  it faithfully — preserves the "serial -> int leaves UNIQUE
  in place" promise across save/load cycles.
- ChangeColumnTypeResult.client_side now carries `auto_filled`
  + `auto_fill_kind` alongside `transformed` + `lossy`; the
  app handler renders separate note lines when both apply.
- AddColumnResult is a new return type carrying pre-rendered
  [client-side] note lines for the auto-fill paths.

Tests: 519 -> 534 (+15). Clippy clean.
2026-05-08 14:32:19 +00:00

16 KiB

ADR-0018: Auto-fill contracts for serial and shortid columns

Status

Accepted.

Amends ADR-0005 (column type vocabulary), ADR-0014 (data operations and value model), and ADR-0017 (column type-change compatibility). Pulls part of C3's UNIQUE-constraint emission forward as internal infrastructure.

Context

serial and shortid are the two auto-generated types in ADR-0005's vocabulary. Today they have asymmetric and under-specified semantics:

  1. serial only on PK. Type::Serial.sqlite_strict_extra() returns " PRIMARY KEY", and do_add_column explicitly refuses serial. The implicit user-facing model is "serial = auto-incrementing PK". This is an artefact of SQLite's only free auto-increment mechanism (the rowid alias on INTEGER PRIMARY KEY); other RDBMS — PostgreSQL's SEQUENCE, MySQL's AUTO_INCREMENT — let auto-incrementing columns exist anywhere. Our pedagogical intent is the broader model; the restriction is incidental to our backend choice and leaks that choice into the user-facing surface (against ADR-0002).

  2. int → serial is statically refused in ADR-0017's transformer matrix, while text → shortid is per-cell- classified. Yet both target types are equally "auto-generated with a uniqueness contract" — the asymmetry isn't principled.

  3. add column T: x (shortid) on a non-empty table leaves existing rows NULL. Per the design contract, shortids are unique non-null identifiers — so the column ends up in a not-really-valid state until the user issues UPDATEs. The auto-fill logic that runs at INSERT time for omitted shortid values doesn't run at column-materialisation time.

  4. No UNIQUE constraint emission today. A non-PK serial column would need a UNIQUE constraint to enforce its contract (the rowid trick isn't available off the PK). The same applies to non-PK shortid: today it relies on a probabilistic "won't collide" argument, not a database-enforced guarantee. schema_to_ddl only emits NOT NULL inline plus PK inline / table-level. No path emits UNIQUE.

This ADR resolves all four gaps with a single unifying principle.

Decision

1. The unifying principle

Auto-generated column types honour their generation contract on every path that creates or transitions the column.

Concretely: a column declared (or converted to be) serial or shortid always satisfies its contract — non-null, auto-generated, unique — by the time the operation completes. The user does not have to issue a follow-up UPDATE. The mechanism is hidden; the user-facing model is "values appear automatically".

2. serial: dual-implementation, single semantic

serial is generalised from "auto-incrementing PK" to "auto-incrementing integer column". The column may be the table PK or any non-PK column; the user-facing semantic is identical.

The implementation switches transparently:

  • PK case (single-column PK on this column): rowid alias. INTEGER PRIMARY KEY in DDL; SQLite's free auto-increment applies. Unchanged from today.
  • Non-PK case: app-level MAX(col) + 1 lookup at INSERT time, plus an emitted UNIQUE constraint on the column. The worker-thread serialisation (ADR-0010) makes the read-then- insert sequence safe without explicit locking — only one INSERT runs at a time on the connection.

User-visible help, error messages, and [client-side] notes refer to serial columns as "auto-incrementing" or "auto-generated". The PK / non-PK distinction is an internal implementation detail (ADR-0002 user-facing posture).

3. shortid: tighten the contract at column materialisation

Today: shortid generation runs only when an INSERT omits the value. Rows existing at the moment a shortid column is created remain NULL until the user issues an UPDATE.

Going forward: any null cell in a shortid column gets a freshly- generated value at the operation that creates that condition:

  • add column T: x (shortid) on a non-empty table fills every existing row's x with a generated shortid before the operation completes.
  • change column T: x (shortid) from text (or any other matrix-permitted source) fills any null cells with generated shortids in the same rebuild transaction.

Generator collisions (vanishingly rare given the 10⁷–10⁸ namespace; see ADR-0014 §"shortid auto-generation") trigger up to 5 retries per cell. Exhausting retries fails the operation with a friendly diagnostic; in practice this indicates either a generator-state bug or a pathological RNG and is not user- recoverable.

4. UNIQUE story

Auto-generated non-PK columns gain an emitted UNIQUE constraint to enforce their contract:

  • Non-PK serial: gains UNIQUE on creation / conversion-to- serial. Required for the contract; the rowid trick isn't available off the PK.
  • Non-PK shortid: gains UNIQUE on creation / conversion-to- shortid. Strengthens today's probabilistic guarantee into a database-enforced one.
  • PK case for either type: PK already implies UNIQUE+NOT NULL. No additional constraint needed.

The reverse direction (serial → int, shortid → text) leaves the UNIQUE constraint in place. The user has not signalled intent to drop the uniqueness guarantee; only the auto- generation contract was dropped. When constraint-management lands as a user-facing feature (C3-track), the user can explicitly drop the UNIQUE if desired.

This ADR pulls forward the internal infrastructure to emit and read UNIQUE constraints — schema_to_ddl gains UNIQUE- column-clause emission; read_schema gains UNIQUE detection via pragma_index_list + pragma_index_info; ReadColumn gains a unique: bool field. The user-facing constraint surface (declaring UNIQUE in with pk … unique … or via add unique, dropping UNIQUE, naming UNIQUE constraints) is not in scope here and remains C3-track work.

5. INSERT-path changes

For non-PK serial columns, when the column is omitted from an INSERT (the existing skip-list at db.rs:3111 already covers serial and shortid identically), the executor:

  1. Queries SELECT COALESCE(MAX(col), 0) + 1 FROM T inside the same transaction.
  2. Binds the result as the column's value.

The MAX-based seeding mirrors SQLite's rowid behaviour: gaps left by user-supplied explicit values are jumped over (the next auto-fill is MAX + 1, not "the smallest available integer").

Worker-thread serialisation (ADR-0010) prevents the classic read-modify-write race; the pattern is safe for our single- writer model.

6. add_column changes

do_add_column lifts its blanket serial refusal (db.rs:1374). The new behaviour is determined by the source table's state:

  • add column T: x (serial) on an empty table: emit ALTER TABLE T ADD COLUMN x INTEGER UNIQUE. Every table has a PK by construction (the parser refuses create table without with pk), so the "no PK" branch doesn't arise — the new column joins as a non-PK serial.
  • add column T: x (serial) on a non-empty table: route through the rebuild-table primitive (ADR-0013). Create new table with x INTEGER UNIQUE. Copy rows, filling x with values 1..N in declaration order. Emit a [client-side] note (§7).
  • add column T: x (shortid) on a non-empty table: route through the rebuild-table primitive. Create new table with x TEXT UNIQUE. Copy rows, generating a fresh shortid for each (collision-retried per §3). Emit a [client-side] note.

The empty-table path can stay on ALTER TABLE ADD COLUMN for efficiency; the non-empty path needs the rebuild because we need to populate the new column atomically with table creation.

7. change column to serial / shortid

change column T: x (serial) from any matrix-permitted source type (today: int; future expansions follow the same rule):

  1. Run the per-cell dry-run (ADR-0017 §2). For non-null cells, classify via the transformer matrix: source must produce an integer (the existing serial pre-condition).
  2. Refuse if existing non-null values have duplicates (uniqueness collision, ADR-0017 §4.3).
  3. Auto-fill any null cells with sequential values continuing from MAX(non-null values) + 1 (or starting at 1 if none).
  4. Refuse if the auto-fill would itself produce a collision — in practice, this can only happen if the user supplied non-null values that already overlap the would-be sequence (e.g., existing values [1, 2, 5] with two nulls — fill would be 6 and 7, no collision; existing values [1, 2, 6] with nulls — fill would be 3 and 4, no collision; the sequence uses MAX+1, not gap-filling, so this case doesn't actually arise — but state the rule defensively).
  5. Rebuild the table with the new column type plus UNIQUE (per §4) plus the transformed + auto-filled values.
  6. Emit [client-side] notes (§7).

change column T: x (shortid) from text:

  1. Run the per-cell dry-run. Non-null cells classify via the text → shortid transformer (ADR-0017 §3) — must match the shortid grammar.
  2. Refuse if existing non-null shortid-valid values have duplicates.
  3. Auto-fill null cells with generated shortids (collision- retried per §3, including against the existing values).
  4. Rebuild with TEXT + UNIQUE + the validated + auto-filled values.
  5. Emit [client-side] notes.

8. Conversion matrix amendments to ADR-0017

ADR-0017 §3 "Statically refused" is amended:

  • int → serial is removed from the static refusal list and added as a per-cell-classified matrix entry: clean for non-null integers (with the post-transformation uniqueness check from §4.3), with null-cell auto-fill per §7 above.
  • The general "Anything → serial" refusal is replaced with a more specific list: text → serial, real → serial, etc. remain refused for v1 (route via int first); bool → serial remains refused (cross-domain).
  • text → shortid is unchanged from ADR-0017 (still per-cell- classified). The contract enforcement at column-materialisation is new.

ADR-0017 §4.3 (uniqueness check) is amended to apply to "PK columns and shortid columns and any column that gains a UNIQUE constraint as part of the operation" — i.e., non-PK serial / shortid targets are uniqueness-checked.

9. Client-side notes

ADR-0017 §6 introduced the [client-side] pattern: when the playground rewrote any cell value, the success summary tells the learner "the tool did this for you; raw SQL would need a CAST or application-level code." This ADR extends the pattern to auto-fill operations:

  • add column T: x (serial) on non-empty table:

    [client-side] N row(s) given auto-generated serial values 1..N. In raw SQL this would need an explicit UPDATE to populate.

  • add column T: x (shortid) on non-empty table:

    [client-side] N row(s) given auto-generated shortid values. In raw SQL this would need an explicit UPDATE to populate.

  • change column T: x (serial) with M null cells:

    [client-side] M null cell(s) given auto-generated serial values. In raw SQL this would need an explicit UPDATE to populate.

  • change column T: x (shortid) with M null cells:

    [client-side] M null cell(s) given auto-generated shortid values. In raw SQL this would need an explicit UPDATE to populate.

When both an ADR-0017 transformation note AND an ADR-0018 auto-fill note apply to the same operation (e.g., change column T: x (shortid) from text where some cells need validation and others need auto-fill), both notes are emitted on separate lines. The success path emits them after the [ok] summary and before the structure-render block.

10. Engine-vocabulary cleanup

While here, fix the existing user-facing string in do_add_column's serial refusal (db.rs:1374): the message names "SQLite's ALTER TABLE" — an ADR-0002 user-facing posture violation. This message is being replaced anyway as part of lifting the refusal; the replacement uses abstract "the database" / "the engine" phrasing.

Resolutions

Three points called out as "open" during drafting, resolved before acceptance:

  1. No-PK empty-table case: not reachable. Every table has a PK by construction — the create table parser refuses input that produces an empty PK list. add column T: x (serial) on an empty table therefore always lands on a table that already has a PK, and the new x column is a non-PK serial (gains UNIQUE per §4).

  2. Serial sequencing under explicit user inserts: MAX+1. If the user explicitly inserts id = 100, the next auto- fill yields 101. Gappy sequences are accepted (e.g., if the user later inserts id = 200, the next auto-fill is 201; the gap 102..199 is not back-filled). MAX+1 matches SQLite's rowid behaviour for the PK case, so both implementation paths feel uniform to the user, and gap- detection is more expensive than its pedagogical value.

  3. UNIQUE emission style: inline column constraint (x INTEGER UNIQUE). Cleaner DDL while we don't have a user-facing constraint surface that would benefit from named, separately-managed indexes. Revisitable when C3 lands the user-facing constraint feature; the read_schema detection via pragma_index_list works for either form.

Out of scope

  • OOS-1. User-facing UNIQUE constraint surface (add unique <T>: <c>, drop unique, naming, multi-column unique). Stays as C3-track work.
  • OOS-2. Strict-monotonic AUTOINCREMENT semantics — we retain plain INTEGER PRIMARY KEY for PK serial. Rebuild- reset of the high-water mark is acceptable for a teaching tool; users who care can be taught the distinction in a later iteration.
  • OOS-3. Custom serial start values, custom step sizes, or multi-column composite serial.
  • OOS-4. Non-PK serial when the table has no PK at all (caught by Open Question 1's resolution).
  • OOS-5. A [client-side] note on the empty-table case (add column on an empty table). No rows means nothing to auto-fill — the operation is a structural change with no pedagogical "the tool did this for you" content.
  • OOS-6. Reading and emitting CHECK constraints — only UNIQUE is required for this ADR.

Consequences

  • The "serial only on PK" mental model is replaced with "serial works anywhere". Pedagogically richer: students see auto-incrementing columns as a general feature, not as a special PK-only quirk.
  • One internal mechanism the user doesn't see (rowid alias vs application MAX+1). The two paths converge to identical user-facing behaviour, honouring ADR-0002's posture.
  • schema_to_ddl and read_schema gain UNIQUE handling — a partial pull-forward of C3 work. The user-facing constraint surface stays deferred; this ADR only lands the internal infrastructure required by serial / shortid contracts.
  • [client-side] notes proliferate to cover auto-fill cases. Strengthens the pedagogical lens: every place the playground goes beyond what raw SQL does, the user is told.
  • All four user-observed gaps from §Context closed. The int → serial → int round-trip works (matching the existing text → shortid → text round-trip from ADR-0017).
  • Add-column-with-shortid producing a "valid" state aligns with ADR-0005's design contract that shortids are unique non-null identifiers.

Relationship to earlier ADRs

  • ADR-0002 — User-facing posture honoured: the dual serial implementation is hidden; the existing engine-name leak in do_add_column's refusal message is fixed opportunistically.
  • ADR-0005 — Type vocabulary unchanged; serial definition generalised. The keyword and the user model stay the same; the implementation broadens.
  • ADR-0010 — Worker-thread serialisation is what makes the non-PK serial MAX+1 path safe without explicit locks.
  • ADR-0011fk_target_type for serial unchanged (Serial → Int); FK target compatibility remains as-is.
  • ADR-0013 — Rebuild-table primitive carries the auto-fill cases for non-empty add column and change column to serial/shortid.
  • ADR-0014 — INSERT-time auto-fill semantics extended to non-PK serial. ADR-0014's auto-fill skip-list (which already covers both serial and shortid symmetrically) is reused.
  • ADR-0015 — The text-format round-trip carries the new UNIQUE constraints in metadata so a rebuild from project.yaml reconstructs the database faithfully. Likely needs a __rdbms_playground_columns schema additon (a unique bool) — to be confirmed during implementation.
  • ADR-0017 — §3 transformer matrix amended: int → serial joins per-cell-classified. §4.3 uniqueness check extended to cover non-PK serial / shortid targets.