# ADR-0018: Auto-fill contracts for `serial` and `shortid` columns ## Status Accepted. Amends ADR-0005 (column type vocabulary), ADR-0014 (data operations and value model), and ADR-0017 (column type-change compatibility). Pulls part of C3's UNIQUE-constraint emission forward as internal infrastructure. ## Context `serial` and `shortid` are the two auto-generated types in ADR-0005's vocabulary. Today they have asymmetric and under-specified semantics: 1. **`serial` only on PK.** `Type::Serial.sqlite_strict_extra()` returns `" PRIMARY KEY"`, and `do_add_column` explicitly refuses serial. The implicit user-facing model is "serial = auto-incrementing PK". This is an artefact of SQLite's only free auto-increment mechanism (the rowid alias on `INTEGER PRIMARY KEY`); other RDBMS — PostgreSQL's `SEQUENCE`, MySQL's `AUTO_INCREMENT` — let auto-incrementing columns exist anywhere. Our pedagogical intent is the broader model; the restriction is incidental to our backend choice and leaks that choice into the user-facing surface (against ADR-0002). 2. **`int → serial` is statically refused** in ADR-0017's transformer matrix, while `text → shortid` is per-cell- classified. Yet both target types are equally "auto-generated with a uniqueness contract" — the asymmetry isn't principled. 3. **`add column T: x (shortid)` on a non-empty table leaves existing rows NULL.** Per the design contract, shortids are unique non-null identifiers — so the column ends up in a not-really-valid state until the user issues UPDATEs. The auto-fill logic that runs at INSERT time for omitted shortid values doesn't run at column-materialisation time. 4. **No UNIQUE constraint emission today.** A non-PK serial column would need a UNIQUE constraint to enforce its contract (the rowid trick isn't available off the PK). The same applies to non-PK shortid: today it relies on a probabilistic "won't collide" argument, not a database-enforced guarantee. `schema_to_ddl` only emits NOT NULL inline plus PK inline / table-level. No path emits UNIQUE. This ADR resolves all four gaps with a single unifying principle. ## Decision ### 1. The unifying principle > Auto-generated column types honour their generation contract > on every path that creates or transitions the column. Concretely: a column declared (or converted to be) `serial` or `shortid` always satisfies its contract — non-null, auto-generated, unique — by the time the operation completes. The user does not have to issue a follow-up UPDATE. The mechanism is hidden; the user-facing model is "values appear automatically". ### 2. `serial`: dual-implementation, single semantic `serial` is generalised from "auto-incrementing PK" to "auto-incrementing integer column". The column may be the table PK or any non-PK column; the user-facing semantic is identical. The implementation switches transparently: - **PK case** (single-column PK on this column): rowid alias. `INTEGER PRIMARY KEY` in DDL; SQLite's free auto-increment applies. Unchanged from today. - **Non-PK case**: app-level `MAX(col) + 1` lookup at INSERT time, plus an emitted UNIQUE constraint on the column. The worker-thread serialisation (ADR-0010) makes the read-then- insert sequence safe without explicit locking — only one INSERT runs at a time on the connection. User-visible help, error messages, and `[client-side]` notes refer to `serial` columns as "auto-incrementing" or "auto-generated". The PK / non-PK distinction is an internal implementation detail (ADR-0002 user-facing posture). ### 3. `shortid`: tighten the contract at column materialisation Today: shortid generation runs only when an INSERT omits the value. Rows existing at the moment a shortid column is created remain NULL until the user issues an UPDATE. Going forward: any null cell in a shortid column gets a freshly- generated value at the operation that creates that condition: - `add column T: x (shortid)` on a non-empty table fills every existing row's `x` with a generated shortid before the operation completes. - `change column T: x (shortid)` from `text` (or any other matrix-permitted source) fills any null cells with generated shortids in the same rebuild transaction. Generator collisions (vanishingly rare given the 10⁷–10⁸ namespace; see ADR-0014 §"shortid auto-generation") trigger up to 5 retries per cell. Exhausting retries fails the operation with a friendly diagnostic; in practice this indicates either a generator-state bug or a pathological RNG and is not user- recoverable. ### 4. UNIQUE story Auto-generated non-PK columns gain an emitted UNIQUE constraint to enforce their contract: - Non-PK `serial`: gains UNIQUE on creation / conversion-to- serial. Required for the contract; the rowid trick isn't available off the PK. - Non-PK `shortid`: gains UNIQUE on creation / conversion-to- shortid. Strengthens today's probabilistic guarantee into a database-enforced one. - PK case for either type: PK already implies UNIQUE+NOT NULL. No additional constraint needed. The reverse direction (`serial → int`, `shortid → text`) leaves the UNIQUE constraint **in place**. The user has not signalled intent to drop the uniqueness guarantee; only the auto- generation contract was dropped. When constraint-management lands as a user-facing feature (C3-track), the user can explicitly drop the UNIQUE if desired. This ADR pulls forward the **internal infrastructure** to emit and read UNIQUE constraints — `schema_to_ddl` gains UNIQUE- column-clause emission; `read_schema` gains UNIQUE detection via `pragma_index_list` + `pragma_index_info`; `ReadColumn` gains a `unique: bool` field. The **user-facing constraint surface** (declaring UNIQUE in `with pk … unique …` or via `add unique`, dropping UNIQUE, naming UNIQUE constraints) is not in scope here and remains C3-track work. ### 5. INSERT-path changes For non-PK `serial` columns, when the column is omitted from an INSERT (the existing skip-list at db.rs:3111 already covers serial and shortid identically), the executor: 1. Queries `SELECT COALESCE(MAX(col), 0) + 1 FROM T` inside the same transaction. 2. Binds the result as the column's value. The MAX-based seeding mirrors SQLite's rowid behaviour: gaps left by user-supplied explicit values are jumped over (the next auto-fill is `MAX + 1`, not "the smallest available integer"). Worker-thread serialisation (ADR-0010) prevents the classic read-modify-write race; the pattern is safe for our single- writer model. ### 6. `add_column` changes `do_add_column` lifts its blanket serial refusal (db.rs:1374). The new behaviour is determined by the source table's state: - **`add column T: x (serial)` on an empty table**: emit `ALTER TABLE T ADD COLUMN x INTEGER UNIQUE`. Every table has a PK by construction (the parser refuses `create table` without `with pk`), so the "no PK" branch doesn't arise — the new column joins as a non-PK serial. - **`add column T: x (serial)` on a non-empty table**: route through the rebuild-table primitive (ADR-0013). Create new table with `x INTEGER UNIQUE`. Copy rows, filling `x` with values 1..N in declaration order. Emit a `[client-side]` note (§7). - **`add column T: x (shortid)` on a non-empty table**: route through the rebuild-table primitive. Create new table with `x TEXT UNIQUE`. Copy rows, generating a fresh shortid for each (collision-retried per §3). Emit a `[client-side]` note. The empty-table path can stay on `ALTER TABLE ADD COLUMN` for efficiency; the non-empty path needs the rebuild because we need to populate the new column atomically with table creation. ### 7. `change column` to `serial` / `shortid` `change column T: x (serial)` from any matrix-permitted source type (today: `int`; future expansions follow the same rule): 1. Run the per-cell dry-run (ADR-0017 §2). For non-null cells, classify via the transformer matrix: source must produce an integer (the existing serial pre-condition). 2. Refuse if existing non-null values have duplicates (uniqueness collision, ADR-0017 §4.3). 3. Auto-fill any null cells with sequential values continuing from `MAX(non-null values) + 1` (or starting at 1 if none). 4. Refuse if the auto-fill would itself produce a collision — in practice, this can only happen if the user supplied non-null values that already overlap the would-be sequence (e.g., existing values [1, 2, 5] with two nulls — fill would be 6 and 7, no collision; existing values [1, 2, 6] with nulls — fill would be 3 and 4, no collision; the sequence uses MAX+1, not gap-filling, so this case doesn't actually arise — but state the rule defensively). 5. Rebuild the table with the new column type plus UNIQUE (per §4) plus the transformed + auto-filled values. 6. Emit `[client-side]` notes (§7). `change column T: x (shortid)` from `text`: 1. Run the per-cell dry-run. Non-null cells classify via the text → shortid transformer (ADR-0017 §3) — must match the shortid grammar. 2. Refuse if existing non-null shortid-valid values have duplicates. 3. Auto-fill null cells with generated shortids (collision- retried per §3, including against the existing values). 4. Rebuild with TEXT + UNIQUE + the validated + auto-filled values. 5. Emit `[client-side]` notes. ### 8. Conversion matrix amendments to ADR-0017 ADR-0017 §3 "Statically refused" is amended: - `int → serial` is **removed** from the static refusal list and added as a **per-cell-classified** matrix entry: clean for non-null integers (with the post-transformation uniqueness check from §4.3), with null-cell auto-fill per §7 above. - The general "Anything → `serial`" refusal is replaced with a more specific list: `text → serial`, `real → serial`, etc. remain refused for v1 (route via int first); `bool → serial` remains refused (cross-domain). - `text → shortid` is unchanged from ADR-0017 (still per-cell- classified). The contract enforcement at column-materialisation is new. ADR-0017 §4.3 (uniqueness check) is amended to apply to "PK columns and shortid columns and any column that gains a UNIQUE constraint as part of the operation" — i.e., non-PK serial / shortid targets are uniqueness-checked. ### 9. Client-side notes ADR-0017 §6 introduced the `[client-side]` pattern: when the playground rewrote any cell value, the success summary tells the learner "the tool did this for you; raw SQL would need a `CAST` or application-level code." This ADR extends the pattern to auto-fill operations: - **`add column T: x (serial)` on non-empty table**: > [client-side] N row(s) given auto-generated serial values > 1..N. In raw SQL this would need an explicit UPDATE to > populate. - **`add column T: x (shortid)` on non-empty table**: > [client-side] N row(s) given auto-generated shortid values. > In raw SQL this would need an explicit UPDATE to populate. - **`change column T: x (serial)` with M null cells**: > [client-side] M null cell(s) given auto-generated serial > values. In raw SQL this would need an explicit UPDATE to > populate. - **`change column T: x (shortid)` with M null cells**: > [client-side] M null cell(s) given auto-generated shortid > values. In raw SQL this would need an explicit UPDATE to > populate. When both an ADR-0017 transformation note AND an ADR-0018 auto-fill note apply to the same operation (e.g., `change column T: x (shortid)` from text where some cells need validation and others need auto-fill), both notes are emitted on separate lines. The success path emits them after the `[ok]` summary and before the structure-render block. ### 10. Engine-vocabulary cleanup While here, fix the existing user-facing string in `do_add_column`'s serial refusal (db.rs:1374): the message names "SQLite's ALTER TABLE" — an ADR-0002 user-facing posture violation. This message is being replaced anyway as part of lifting the refusal; the replacement uses abstract "the database" / "the engine" phrasing. ## Resolutions Three points called out as "open" during drafting, resolved before acceptance: 1. **No-PK empty-table case**: not reachable. Every table has a PK by construction — the `create table` parser refuses input that produces an empty PK list. `add column T: x (serial)` on an empty table therefore always lands on a table that already has a PK, and the new `x` column is a non-PK serial (gains UNIQUE per §4). 2. **Serial sequencing under explicit user inserts**: MAX+1. If the user explicitly inserts `id = 100`, the next auto- fill yields 101. Gappy sequences are accepted (e.g., if the user later inserts `id = 200`, the next auto-fill is 201; the gap 102..199 is not back-filled). MAX+1 matches SQLite's rowid behaviour for the PK case, so both implementation paths feel uniform to the user, and gap- detection is more expensive than its pedagogical value. 3. **UNIQUE emission style**: inline column constraint (`x INTEGER UNIQUE`). Cleaner DDL while we don't have a user-facing constraint surface that would benefit from named, separately-managed indexes. Revisitable when C3 lands the user-facing constraint feature; the `read_schema` detection via `pragma_index_list` works for either form. ## Out of scope - **OOS-1.** User-facing UNIQUE constraint surface (`add unique : `, `drop unique`, naming, multi-column unique). Stays as C3-track work. - **OOS-2.** Strict-monotonic AUTOINCREMENT semantics — we retain plain `INTEGER PRIMARY KEY` for PK serial. Rebuild- reset of the high-water mark is acceptable for a teaching tool; users who care can be taught the distinction in a later iteration. - **OOS-3.** Custom serial start values, custom step sizes, or multi-column composite serial. - **OOS-4.** Non-PK serial when the table has no PK at all (caught by Open Question 1's resolution). - **OOS-5.** A `[client-side]` note on the empty-table case (`add column` on an empty table). No rows means nothing to auto-fill — the operation is a structural change with no pedagogical "the tool did this for you" content. - **OOS-6.** Reading and emitting CHECK constraints — only UNIQUE is required for this ADR. ## Consequences - The "serial only on PK" mental model is replaced with "serial works anywhere". Pedagogically richer: students see auto-incrementing columns as a general feature, not as a special PK-only quirk. - One internal mechanism the user doesn't see (rowid alias vs application MAX+1). The two paths converge to identical user-facing behaviour, honouring ADR-0002's posture. - `schema_to_ddl` and `read_schema` gain UNIQUE handling — a partial pull-forward of C3 work. The user-facing constraint surface stays deferred; this ADR only lands the internal infrastructure required by serial / shortid contracts. - `[client-side]` notes proliferate to cover auto-fill cases. Strengthens the pedagogical lens: every place the playground goes beyond what raw SQL does, the user is told. - All four user-observed gaps from §Context closed. The `int → serial → int` round-trip works (matching the existing `text → shortid → text` round-trip from ADR-0017). - Add-column-with-shortid producing a "valid" state aligns with ADR-0005's design contract that shortids are unique non-null identifiers. ## Relationship to earlier ADRs - **ADR-0002** — User-facing posture honoured: the dual serial implementation is hidden; the existing engine-name leak in `do_add_column`'s refusal message is fixed opportunistically. - **ADR-0005** — Type vocabulary unchanged; `serial` definition generalised. The keyword and the user model stay the same; the implementation broadens. - **ADR-0010** — Worker-thread serialisation is what makes the non-PK serial MAX+1 path safe without explicit locks. - **ADR-0011** — `fk_target_type` for serial unchanged (`Serial → Int`); FK target compatibility remains as-is. - **ADR-0013** — Rebuild-table primitive carries the auto-fill cases for non-empty `add column` and `change column to serial/shortid`. - **ADR-0014** — INSERT-time auto-fill semantics extended to non-PK serial. ADR-0014's auto-fill skip-list (which already covers both serial and shortid symmetrically) is reused. - **ADR-0015** — The text-format round-trip carries the new UNIQUE constraints in metadata so a rebuild from `project.yaml` reconstructs the database faithfully. Likely needs a `__rdbms_playground_columns` schema additon (a `unique` bool) — to be confirmed during implementation. - **ADR-0017** — §3 transformer matrix amended: `int → serial` joins per-cell-classified. §4.3 uniqueness check extended to cover non-PK serial / shortid targets.