Generalises serial and shortid beyond their previous restricted forms: - `serial` is no longer restricted to single-column PK. Non-PK serial columns get an emitted UNIQUE constraint and use application-side MAX(col)+1 at INSERT time (rowid alias still drives the PK case for free; per ADR-0010 worker-thread serialisation, the read-then-insert sequence is safe). - `shortid` columns auto-fill existing null cells when the column is materialised — `add column T: x (shortid)` on a non-empty table no longer leaves rows in a not-really-valid NULL state. - `int -> serial` joins the type-change matrix as always-clean identity (closes the asymmetry vs `text -> shortid`); other sources are refused with a route-via-int hint. - `change column T: x (serial|shortid)` fills null source cells with sequence / generated values in the same rebuild transaction. Internal infrastructure: - ReadColumn gains `unique: bool`; read_schema detects single- column UNIQUE indexes via pragma_index_list / pragma_index_info; schema_to_ddl emits inline UNIQUE for non-PK columns. - ColumnSchema (persistence) gains `unique: bool` so the flag survives YAML round-trip and rebuild-from-text reconstructs it faithfully — preserves the "serial -> int leaves UNIQUE in place" promise across save/load cycles. - ChangeColumnTypeResult.client_side now carries `auto_filled` + `auto_fill_kind` alongside `transformed` + `lossy`; the app handler renders separate note lines when both apply. - AddColumnResult is a new return type carrying pre-rendered [client-side] note lines for the auto-fill paths. Tests: 519 -> 534 (+15). Clippy clean.
16 KiB
ADR-0018: Auto-fill contracts for serial and shortid columns
Status
Accepted.
Amends ADR-0005 (column type vocabulary), ADR-0014 (data operations and value model), and ADR-0017 (column type-change compatibility). Pulls part of C3's UNIQUE-constraint emission forward as internal infrastructure.
Context
serial and shortid are the two auto-generated types in
ADR-0005's vocabulary. Today they have asymmetric and
under-specified semantics:
-
serialonly on PK.Type::Serial.sqlite_strict_extra()returns" PRIMARY KEY", anddo_add_columnexplicitly refuses serial. The implicit user-facing model is "serial = auto-incrementing PK". This is an artefact of SQLite's only free auto-increment mechanism (the rowid alias onINTEGER PRIMARY KEY); other RDBMS — PostgreSQL'sSEQUENCE, MySQL'sAUTO_INCREMENT— let auto-incrementing columns exist anywhere. Our pedagogical intent is the broader model; the restriction is incidental to our backend choice and leaks that choice into the user-facing surface (against ADR-0002). -
int → serialis statically refused in ADR-0017's transformer matrix, whiletext → shortidis per-cell- classified. Yet both target types are equally "auto-generated with a uniqueness contract" — the asymmetry isn't principled. -
add column T: x (shortid)on a non-empty table leaves existing rows NULL. Per the design contract, shortids are unique non-null identifiers — so the column ends up in a not-really-valid state until the user issues UPDATEs. The auto-fill logic that runs at INSERT time for omitted shortid values doesn't run at column-materialisation time. -
No UNIQUE constraint emission today. A non-PK serial column would need a UNIQUE constraint to enforce its contract (the rowid trick isn't available off the PK). The same applies to non-PK shortid: today it relies on a probabilistic "won't collide" argument, not a database-enforced guarantee.
schema_to_ddlonly emits NOT NULL inline plus PK inline / table-level. No path emits UNIQUE.
This ADR resolves all four gaps with a single unifying principle.
Decision
1. The unifying principle
Auto-generated column types honour their generation contract on every path that creates or transitions the column.
Concretely: a column declared (or converted to be) serial or
shortid always satisfies its contract — non-null,
auto-generated, unique — by the time the operation completes.
The user does not have to issue a follow-up UPDATE. The
mechanism is hidden; the user-facing model is "values appear
automatically".
2. serial: dual-implementation, single semantic
serial is generalised from "auto-incrementing PK" to
"auto-incrementing integer column". The column may be the table
PK or any non-PK column; the user-facing semantic is identical.
The implementation switches transparently:
- PK case (single-column PK on this column): rowid alias.
INTEGER PRIMARY KEYin DDL; SQLite's free auto-increment applies. Unchanged from today. - Non-PK case: app-level
MAX(col) + 1lookup at INSERT time, plus an emitted UNIQUE constraint on the column. The worker-thread serialisation (ADR-0010) makes the read-then- insert sequence safe without explicit locking — only one INSERT runs at a time on the connection.
User-visible help, error messages, and [client-side] notes
refer to serial columns as "auto-incrementing" or
"auto-generated". The PK / non-PK distinction is an internal
implementation detail (ADR-0002 user-facing posture).
3. shortid: tighten the contract at column materialisation
Today: shortid generation runs only when an INSERT omits the value. Rows existing at the moment a shortid column is created remain NULL until the user issues an UPDATE.
Going forward: any null cell in a shortid column gets a freshly- generated value at the operation that creates that condition:
add column T: x (shortid)on a non-empty table fills every existing row'sxwith a generated shortid before the operation completes.change column T: x (shortid)fromtext(or any other matrix-permitted source) fills any null cells with generated shortids in the same rebuild transaction.
Generator collisions (vanishingly rare given the 10⁷–10⁸ namespace; see ADR-0014 §"shortid auto-generation") trigger up to 5 retries per cell. Exhausting retries fails the operation with a friendly diagnostic; in practice this indicates either a generator-state bug or a pathological RNG and is not user- recoverable.
4. UNIQUE story
Auto-generated non-PK columns gain an emitted UNIQUE constraint to enforce their contract:
- Non-PK
serial: gains UNIQUE on creation / conversion-to- serial. Required for the contract; the rowid trick isn't available off the PK. - Non-PK
shortid: gains UNIQUE on creation / conversion-to- shortid. Strengthens today's probabilistic guarantee into a database-enforced one. - PK case for either type: PK already implies UNIQUE+NOT NULL. No additional constraint needed.
The reverse direction (serial → int, shortid → text) leaves
the UNIQUE constraint in place. The user has not signalled
intent to drop the uniqueness guarantee; only the auto-
generation contract was dropped. When constraint-management
lands as a user-facing feature (C3-track), the user can
explicitly drop the UNIQUE if desired.
This ADR pulls forward the internal infrastructure to emit
and read UNIQUE constraints — schema_to_ddl gains UNIQUE-
column-clause emission; read_schema gains UNIQUE detection
via pragma_index_list + pragma_index_info; ReadColumn
gains a unique: bool field. The user-facing constraint
surface (declaring UNIQUE in with pk … unique … or via
add unique, dropping UNIQUE, naming UNIQUE constraints) is
not in scope here and remains C3-track work.
5. INSERT-path changes
For non-PK serial columns, when the column is omitted from
an INSERT (the existing skip-list at db.rs:3111 already covers
serial and shortid identically), the executor:
- Queries
SELECT COALESCE(MAX(col), 0) + 1 FROM Tinside the same transaction. - Binds the result as the column's value.
The MAX-based seeding mirrors SQLite's rowid behaviour: gaps
left by user-supplied explicit values are jumped over (the next
auto-fill is MAX + 1, not "the smallest available integer").
Worker-thread serialisation (ADR-0010) prevents the classic read-modify-write race; the pattern is safe for our single- writer model.
6. add_column changes
do_add_column lifts its blanket serial refusal (db.rs:1374).
The new behaviour is determined by the source table's state:
add column T: x (serial)on an empty table: emitALTER TABLE T ADD COLUMN x INTEGER UNIQUE. Every table has a PK by construction (the parser refusescreate tablewithoutwith pk), so the "no PK" branch doesn't arise — the new column joins as a non-PK serial.add column T: x (serial)on a non-empty table: route through the rebuild-table primitive (ADR-0013). Create new table withx INTEGER UNIQUE. Copy rows, fillingxwith values 1..N in declaration order. Emit a[client-side]note (§7).add column T: x (shortid)on a non-empty table: route through the rebuild-table primitive. Create new table withx TEXT UNIQUE. Copy rows, generating a fresh shortid for each (collision-retried per §3). Emit a[client-side]note.
The empty-table path can stay on ALTER TABLE ADD COLUMN for
efficiency; the non-empty path needs the rebuild because we
need to populate the new column atomically with table
creation.
7. change column to serial / shortid
change column T: x (serial) from any matrix-permitted source
type (today: int; future expansions follow the same rule):
- Run the per-cell dry-run (ADR-0017 §2). For non-null cells, classify via the transformer matrix: source must produce an integer (the existing serial pre-condition).
- Refuse if existing non-null values have duplicates (uniqueness collision, ADR-0017 §4.3).
- Auto-fill any null cells with sequential values continuing
from
MAX(non-null values) + 1(or starting at 1 if none). - Refuse if the auto-fill would itself produce a collision — in practice, this can only happen if the user supplied non-null values that already overlap the would-be sequence (e.g., existing values [1, 2, 5] with two nulls — fill would be 6 and 7, no collision; existing values [1, 2, 6] with nulls — fill would be 3 and 4, no collision; the sequence uses MAX+1, not gap-filling, so this case doesn't actually arise — but state the rule defensively).
- Rebuild the table with the new column type plus UNIQUE (per §4) plus the transformed + auto-filled values.
- Emit
[client-side]notes (§7).
change column T: x (shortid) from text:
- Run the per-cell dry-run. Non-null cells classify via the text → shortid transformer (ADR-0017 §3) — must match the shortid grammar.
- Refuse if existing non-null shortid-valid values have duplicates.
- Auto-fill null cells with generated shortids (collision- retried per §3, including against the existing values).
- Rebuild with TEXT + UNIQUE + the validated + auto-filled values.
- Emit
[client-side]notes.
8. Conversion matrix amendments to ADR-0017
ADR-0017 §3 "Statically refused" is amended:
int → serialis removed from the static refusal list and added as a per-cell-classified matrix entry: clean for non-null integers (with the post-transformation uniqueness check from §4.3), with null-cell auto-fill per §7 above.- The general "Anything →
serial" refusal is replaced with a more specific list:text → serial,real → serial, etc. remain refused for v1 (route via int first);bool → serialremains refused (cross-domain). text → shortidis unchanged from ADR-0017 (still per-cell- classified). The contract enforcement at column-materialisation is new.
ADR-0017 §4.3 (uniqueness check) is amended to apply to "PK columns and shortid columns and any column that gains a UNIQUE constraint as part of the operation" — i.e., non-PK serial / shortid targets are uniqueness-checked.
9. Client-side notes
ADR-0017 §6 introduced the [client-side] pattern: when the
playground rewrote any cell value, the success summary tells
the learner "the tool did this for you; raw SQL would need a
CAST or application-level code." This ADR extends the pattern
to auto-fill operations:
-
add column T: x (serial)on non-empty table:[client-side] N row(s) given auto-generated serial values 1..N. In raw SQL this would need an explicit UPDATE to populate.
-
add column T: x (shortid)on non-empty table:[client-side] N row(s) given auto-generated shortid values. In raw SQL this would need an explicit UPDATE to populate.
-
change column T: x (serial)with M null cells:[client-side] M null cell(s) given auto-generated serial values. In raw SQL this would need an explicit UPDATE to populate.
-
change column T: x (shortid)with M null cells:[client-side] M null cell(s) given auto-generated shortid values. In raw SQL this would need an explicit UPDATE to populate.
When both an ADR-0017 transformation note AND an ADR-0018
auto-fill note apply to the same operation (e.g., change column T: x (shortid) from text where some cells need
validation and others need auto-fill), both notes are emitted
on separate lines. The success path emits them after the [ok]
summary and before the structure-render block.
10. Engine-vocabulary cleanup
While here, fix the existing user-facing string in
do_add_column's serial refusal (db.rs:1374): the message
names "SQLite's ALTER TABLE" — an ADR-0002 user-facing posture
violation. This message is being replaced anyway as part of
lifting the refusal; the replacement uses abstract "the
database" / "the engine" phrasing.
Resolutions
Three points called out as "open" during drafting, resolved before acceptance:
-
No-PK empty-table case: not reachable. Every table has a PK by construction — the
create tableparser refuses input that produces an empty PK list.add column T: x (serial)on an empty table therefore always lands on a table that already has a PK, and the newxcolumn is a non-PK serial (gains UNIQUE per §4). -
Serial sequencing under explicit user inserts: MAX+1. If the user explicitly inserts
id = 100, the next auto- fill yields 101. Gappy sequences are accepted (e.g., if the user later insertsid = 200, the next auto-fill is 201; the gap 102..199 is not back-filled). MAX+1 matches SQLite's rowid behaviour for the PK case, so both implementation paths feel uniform to the user, and gap- detection is more expensive than its pedagogical value. -
UNIQUE emission style: inline column constraint (
x INTEGER UNIQUE). Cleaner DDL while we don't have a user-facing constraint surface that would benefit from named, separately-managed indexes. Revisitable when C3 lands the user-facing constraint feature; theread_schemadetection viapragma_index_listworks for either form.
Out of scope
- OOS-1. User-facing UNIQUE constraint surface (
add unique <T>: <c>,drop unique, naming, multi-column unique). Stays as C3-track work. - OOS-2. Strict-monotonic AUTOINCREMENT semantics — we
retain plain
INTEGER PRIMARY KEYfor PK serial. Rebuild- reset of the high-water mark is acceptable for a teaching tool; users who care can be taught the distinction in a later iteration. - OOS-3. Custom serial start values, custom step sizes, or multi-column composite serial.
- OOS-4. Non-PK serial when the table has no PK at all (caught by Open Question 1's resolution).
- OOS-5. A
[client-side]note on the empty-table case (add columnon an empty table). No rows means nothing to auto-fill — the operation is a structural change with no pedagogical "the tool did this for you" content. - OOS-6. Reading and emitting CHECK constraints — only UNIQUE is required for this ADR.
Consequences
- The "serial only on PK" mental model is replaced with "serial works anywhere". Pedagogically richer: students see auto-incrementing columns as a general feature, not as a special PK-only quirk.
- One internal mechanism the user doesn't see (rowid alias vs application MAX+1). The two paths converge to identical user-facing behaviour, honouring ADR-0002's posture.
schema_to_ddlandread_schemagain UNIQUE handling — a partial pull-forward of C3 work. The user-facing constraint surface stays deferred; this ADR only lands the internal infrastructure required by serial / shortid contracts.[client-side]notes proliferate to cover auto-fill cases. Strengthens the pedagogical lens: every place the playground goes beyond what raw SQL does, the user is told.- All four user-observed gaps from §Context closed. The
int → serial → intround-trip works (matching the existingtext → shortid → textround-trip from ADR-0017). - Add-column-with-shortid producing a "valid" state aligns with ADR-0005's design contract that shortids are unique non-null identifiers.
Relationship to earlier ADRs
- ADR-0002 — User-facing posture honoured: the dual
serial implementation is hidden; the existing engine-name
leak in
do_add_column's refusal message is fixed opportunistically. - ADR-0005 — Type vocabulary unchanged;
serialdefinition generalised. The keyword and the user model stay the same; the implementation broadens. - ADR-0010 — Worker-thread serialisation is what makes the non-PK serial MAX+1 path safe without explicit locks.
- ADR-0011 —
fk_target_typefor serial unchanged (Serial → Int); FK target compatibility remains as-is. - ADR-0013 — Rebuild-table primitive carries the auto-fill
cases for non-empty
add columnandchange column to serial/shortid. - ADR-0014 — INSERT-time auto-fill semantics extended to non-PK serial. ADR-0014's auto-fill skip-list (which already covers both serial and shortid symmetrically) is reused.
- ADR-0015 — The text-format round-trip carries the new
UNIQUE constraints in metadata so a rebuild from
project.yamlreconstructs the database faithfully. Likely needs a__rdbms_playground_columnsschema additon (auniquebool) — to be confirmed during implementation. - ADR-0017 — §3 transformer matrix amended:
int → serialjoins per-cell-classified. §4.3 uniqueness check extended to cover non-PK serial / shortid targets.