ADR-0018 implementation: auto-fill contracts for serial and shortid
Generalises serial and shortid beyond their previous restricted forms: - `serial` is no longer restricted to single-column PK. Non-PK serial columns get an emitted UNIQUE constraint and use application-side MAX(col)+1 at INSERT time (rowid alias still drives the PK case for free; per ADR-0010 worker-thread serialisation, the read-then-insert sequence is safe). - `shortid` columns auto-fill existing null cells when the column is materialised — `add column T: x (shortid)` on a non-empty table no longer leaves rows in a not-really-valid NULL state. - `int -> serial` joins the type-change matrix as always-clean identity (closes the asymmetry vs `text -> shortid`); other sources are refused with a route-via-int hint. - `change column T: x (serial|shortid)` fills null source cells with sequence / generated values in the same rebuild transaction. Internal infrastructure: - ReadColumn gains `unique: bool`; read_schema detects single- column UNIQUE indexes via pragma_index_list / pragma_index_info; schema_to_ddl emits inline UNIQUE for non-PK columns. - ColumnSchema (persistence) gains `unique: bool` so the flag survives YAML round-trip and rebuild-from-text reconstructs it faithfully — preserves the "serial -> int leaves UNIQUE in place" promise across save/load cycles. - ChangeColumnTypeResult.client_side now carries `auto_filled` + `auto_fill_kind` alongside `transformed` + `lossy`; the app handler renders separate note lines when both apply. - AddColumnResult is a new return type carrying pre-rendered [client-side] note lines for the auto-fill paths. Tests: 519 -> 534 (+15). Clippy clean.
This commit is contained in:
@@ -0,0 +1,385 @@
|
||||
# ADR-0018: Auto-fill contracts for `serial` and `shortid` columns
|
||||
|
||||
## Status
|
||||
|
||||
Accepted.
|
||||
|
||||
Amends ADR-0005 (column type vocabulary), ADR-0014 (data
|
||||
operations and value model), and ADR-0017 (column type-change
|
||||
compatibility). Pulls part of C3's UNIQUE-constraint emission
|
||||
forward as internal infrastructure.
|
||||
|
||||
## Context
|
||||
|
||||
`serial` and `shortid` are the two auto-generated types in
|
||||
ADR-0005's vocabulary. Today they have asymmetric and
|
||||
under-specified semantics:
|
||||
|
||||
1. **`serial` only on PK.** `Type::Serial.sqlite_strict_extra()`
|
||||
returns `" PRIMARY KEY"`, and `do_add_column` explicitly
|
||||
refuses serial. The implicit user-facing model is "serial =
|
||||
auto-incrementing PK". This is an artefact of SQLite's only
|
||||
free auto-increment mechanism (the rowid alias on `INTEGER
|
||||
PRIMARY KEY`); other RDBMS — PostgreSQL's `SEQUENCE`, MySQL's
|
||||
`AUTO_INCREMENT` — let auto-incrementing columns exist
|
||||
anywhere. Our pedagogical intent is the broader model; the
|
||||
restriction is incidental to our backend choice and leaks
|
||||
that choice into the user-facing surface (against ADR-0002).
|
||||
|
||||
2. **`int → serial` is statically refused** in ADR-0017's
|
||||
transformer matrix, while `text → shortid` is per-cell-
|
||||
classified. Yet both target types are equally
|
||||
"auto-generated with a uniqueness contract" — the asymmetry
|
||||
isn't principled.
|
||||
|
||||
3. **`add column T: x (shortid)` on a non-empty table leaves
|
||||
existing rows NULL.** Per the design contract, shortids are
|
||||
unique non-null identifiers — so the column ends up in a
|
||||
not-really-valid state until the user issues UPDATEs. The
|
||||
auto-fill logic that runs at INSERT time for omitted shortid
|
||||
values doesn't run at column-materialisation time.
|
||||
|
||||
4. **No UNIQUE constraint emission today.** A non-PK serial
|
||||
column would need a UNIQUE constraint to enforce its contract
|
||||
(the rowid trick isn't available off the PK). The same
|
||||
applies to non-PK shortid: today it relies on a probabilistic
|
||||
"won't collide" argument, not a database-enforced
|
||||
guarantee. `schema_to_ddl` only emits NOT NULL inline plus PK
|
||||
inline / table-level. No path emits UNIQUE.
|
||||
|
||||
This ADR resolves all four gaps with a single unifying
|
||||
principle.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. The unifying principle
|
||||
|
||||
> Auto-generated column types honour their generation contract
|
||||
> on every path that creates or transitions the column.
|
||||
|
||||
Concretely: a column declared (or converted to be) `serial` or
|
||||
`shortid` always satisfies its contract — non-null,
|
||||
auto-generated, unique — by the time the operation completes.
|
||||
The user does not have to issue a follow-up UPDATE. The
|
||||
mechanism is hidden; the user-facing model is "values appear
|
||||
automatically".
|
||||
|
||||
### 2. `serial`: dual-implementation, single semantic
|
||||
|
||||
`serial` is generalised from "auto-incrementing PK" to
|
||||
"auto-incrementing integer column". The column may be the table
|
||||
PK or any non-PK column; the user-facing semantic is identical.
|
||||
|
||||
The implementation switches transparently:
|
||||
|
||||
- **PK case** (single-column PK on this column): rowid alias.
|
||||
`INTEGER PRIMARY KEY` in DDL; SQLite's free auto-increment
|
||||
applies. Unchanged from today.
|
||||
- **Non-PK case**: app-level `MAX(col) + 1` lookup at INSERT
|
||||
time, plus an emitted UNIQUE constraint on the column. The
|
||||
worker-thread serialisation (ADR-0010) makes the read-then-
|
||||
insert sequence safe without explicit locking — only one
|
||||
INSERT runs at a time on the connection.
|
||||
|
||||
User-visible help, error messages, and `[client-side]` notes
|
||||
refer to `serial` columns as "auto-incrementing" or
|
||||
"auto-generated". The PK / non-PK distinction is an internal
|
||||
implementation detail (ADR-0002 user-facing posture).
|
||||
|
||||
### 3. `shortid`: tighten the contract at column materialisation
|
||||
|
||||
Today: shortid generation runs only when an INSERT omits the
|
||||
value. Rows existing at the moment a shortid column is created
|
||||
remain NULL until the user issues an UPDATE.
|
||||
|
||||
Going forward: any null cell in a shortid column gets a freshly-
|
||||
generated value at the operation that creates that condition:
|
||||
|
||||
- `add column T: x (shortid)` on a non-empty table fills every
|
||||
existing row's `x` with a generated shortid before the
|
||||
operation completes.
|
||||
- `change column T: x (shortid)` from `text` (or any other
|
||||
matrix-permitted source) fills any null cells with generated
|
||||
shortids in the same rebuild transaction.
|
||||
|
||||
Generator collisions (vanishingly rare given the 10⁷–10⁸
|
||||
namespace; see ADR-0014 §"shortid auto-generation") trigger up
|
||||
to 5 retries per cell. Exhausting retries fails the operation
|
||||
with a friendly diagnostic; in practice this indicates either a
|
||||
generator-state bug or a pathological RNG and is not user-
|
||||
recoverable.
|
||||
|
||||
### 4. UNIQUE story
|
||||
|
||||
Auto-generated non-PK columns gain an emitted UNIQUE constraint
|
||||
to enforce their contract:
|
||||
|
||||
- Non-PK `serial`: gains UNIQUE on creation / conversion-to-
|
||||
serial. Required for the contract; the rowid trick isn't
|
||||
available off the PK.
|
||||
- Non-PK `shortid`: gains UNIQUE on creation / conversion-to-
|
||||
shortid. Strengthens today's probabilistic guarantee into a
|
||||
database-enforced one.
|
||||
- PK case for either type: PK already implies UNIQUE+NOT NULL.
|
||||
No additional constraint needed.
|
||||
|
||||
The reverse direction (`serial → int`, `shortid → text`) leaves
|
||||
the UNIQUE constraint **in place**. The user has not signalled
|
||||
intent to drop the uniqueness guarantee; only the auto-
|
||||
generation contract was dropped. When constraint-management
|
||||
lands as a user-facing feature (C3-track), the user can
|
||||
explicitly drop the UNIQUE if desired.
|
||||
|
||||
This ADR pulls forward the **internal infrastructure** to emit
|
||||
and read UNIQUE constraints — `schema_to_ddl` gains UNIQUE-
|
||||
column-clause emission; `read_schema` gains UNIQUE detection
|
||||
via `pragma_index_list` + `pragma_index_info`; `ReadColumn`
|
||||
gains a `unique: bool` field. The **user-facing constraint
|
||||
surface** (declaring UNIQUE in `with pk … unique …` or via
|
||||
`add unique`, dropping UNIQUE, naming UNIQUE constraints) is
|
||||
not in scope here and remains C3-track work.
|
||||
|
||||
### 5. INSERT-path changes
|
||||
|
||||
For non-PK `serial` columns, when the column is omitted from
|
||||
an INSERT (the existing skip-list at db.rs:3111 already covers
|
||||
serial and shortid identically), the executor:
|
||||
|
||||
1. Queries `SELECT COALESCE(MAX(col), 0) + 1 FROM T` inside the
|
||||
same transaction.
|
||||
2. Binds the result as the column's value.
|
||||
|
||||
The MAX-based seeding mirrors SQLite's rowid behaviour: gaps
|
||||
left by user-supplied explicit values are jumped over (the next
|
||||
auto-fill is `MAX + 1`, not "the smallest available integer").
|
||||
|
||||
Worker-thread serialisation (ADR-0010) prevents the classic
|
||||
read-modify-write race; the pattern is safe for our single-
|
||||
writer model.
|
||||
|
||||
### 6. `add_column` changes
|
||||
|
||||
`do_add_column` lifts its blanket serial refusal (db.rs:1374).
|
||||
The new behaviour is determined by the source table's state:
|
||||
|
||||
- **`add column T: x (serial)` on an empty table**: emit
|
||||
`ALTER TABLE T ADD COLUMN x INTEGER UNIQUE`. Every table has
|
||||
a PK by construction (the parser refuses `create table`
|
||||
without `with pk`), so the "no PK" branch doesn't arise —
|
||||
the new column joins as a non-PK serial.
|
||||
- **`add column T: x (serial)` on a non-empty table**: route
|
||||
through the rebuild-table primitive (ADR-0013). Create new
|
||||
table with `x INTEGER UNIQUE`. Copy rows, filling `x` with
|
||||
values 1..N in declaration order. Emit a `[client-side]` note
|
||||
(§7).
|
||||
- **`add column T: x (shortid)` on a non-empty table**: route
|
||||
through the rebuild-table primitive. Create new table with
|
||||
`x TEXT UNIQUE`. Copy rows, generating a fresh shortid for
|
||||
each (collision-retried per §3). Emit a `[client-side]` note.
|
||||
|
||||
The empty-table path can stay on `ALTER TABLE ADD COLUMN` for
|
||||
efficiency; the non-empty path needs the rebuild because we
|
||||
need to populate the new column atomically with table
|
||||
creation.
|
||||
|
||||
### 7. `change column` to `serial` / `shortid`
|
||||
|
||||
`change column T: x (serial)` from any matrix-permitted source
|
||||
type (today: `int`; future expansions follow the same rule):
|
||||
|
||||
1. Run the per-cell dry-run (ADR-0017 §2). For non-null cells,
|
||||
classify via the transformer matrix: source must produce an
|
||||
integer (the existing serial pre-condition).
|
||||
2. Refuse if existing non-null values have duplicates
|
||||
(uniqueness collision, ADR-0017 §4.3).
|
||||
3. Auto-fill any null cells with sequential values continuing
|
||||
from `MAX(non-null values) + 1` (or starting at 1 if none).
|
||||
4. Refuse if the auto-fill would itself produce a collision —
|
||||
in practice, this can only happen if the user supplied
|
||||
non-null values that already overlap the would-be sequence
|
||||
(e.g., existing values [1, 2, 5] with two nulls — fill would
|
||||
be 6 and 7, no collision; existing values [1, 2, 6] with
|
||||
nulls — fill would be 3 and 4, no collision; the sequence
|
||||
uses MAX+1, not gap-filling, so this case doesn't actually
|
||||
arise — but state the rule defensively).
|
||||
5. Rebuild the table with the new column type plus UNIQUE (per
|
||||
§4) plus the transformed + auto-filled values.
|
||||
6. Emit `[client-side]` notes (§7).
|
||||
|
||||
`change column T: x (shortid)` from `text`:
|
||||
|
||||
1. Run the per-cell dry-run. Non-null cells classify via the
|
||||
text → shortid transformer (ADR-0017 §3) — must match the
|
||||
shortid grammar.
|
||||
2. Refuse if existing non-null shortid-valid values have
|
||||
duplicates.
|
||||
3. Auto-fill null cells with generated shortids (collision-
|
||||
retried per §3, including against the existing values).
|
||||
4. Rebuild with TEXT + UNIQUE + the validated + auto-filled
|
||||
values.
|
||||
5. Emit `[client-side]` notes.
|
||||
|
||||
### 8. Conversion matrix amendments to ADR-0017
|
||||
|
||||
ADR-0017 §3 "Statically refused" is amended:
|
||||
|
||||
- `int → serial` is **removed** from the static refusal list and
|
||||
added as a **per-cell-classified** matrix entry: clean for
|
||||
non-null integers (with the post-transformation uniqueness
|
||||
check from §4.3), with null-cell auto-fill per §7 above.
|
||||
- The general "Anything → `serial`" refusal is replaced with a
|
||||
more specific list: `text → serial`, `real → serial`, etc.
|
||||
remain refused for v1 (route via int first); `bool → serial`
|
||||
remains refused (cross-domain).
|
||||
- `text → shortid` is unchanged from ADR-0017 (still per-cell-
|
||||
classified). The contract enforcement at column-materialisation
|
||||
is new.
|
||||
|
||||
ADR-0017 §4.3 (uniqueness check) is amended to apply to
|
||||
"PK columns and shortid columns and any column that gains a
|
||||
UNIQUE constraint as part of the operation" — i.e., non-PK
|
||||
serial / shortid targets are uniqueness-checked.
|
||||
|
||||
### 9. Client-side notes
|
||||
|
||||
ADR-0017 §6 introduced the `[client-side]` pattern: when the
|
||||
playground rewrote any cell value, the success summary tells
|
||||
the learner "the tool did this for you; raw SQL would need a
|
||||
`CAST` or application-level code." This ADR extends the pattern
|
||||
to auto-fill operations:
|
||||
|
||||
- **`add column T: x (serial)` on non-empty table**:
|
||||
> [client-side] N row(s) given auto-generated serial values
|
||||
> 1..N. In raw SQL this would need an explicit UPDATE to
|
||||
> populate.
|
||||
|
||||
- **`add column T: x (shortid)` on non-empty table**:
|
||||
> [client-side] N row(s) given auto-generated shortid values.
|
||||
> In raw SQL this would need an explicit UPDATE to populate.
|
||||
|
||||
- **`change column T: x (serial)` with M null cells**:
|
||||
> [client-side] M null cell(s) given auto-generated serial
|
||||
> values. In raw SQL this would need an explicit UPDATE to
|
||||
> populate.
|
||||
|
||||
- **`change column T: x (shortid)` with M null cells**:
|
||||
> [client-side] M null cell(s) given auto-generated shortid
|
||||
> values. In raw SQL this would need an explicit UPDATE to
|
||||
> populate.
|
||||
|
||||
When both an ADR-0017 transformation note AND an ADR-0018
|
||||
auto-fill note apply to the same operation (e.g., `change
|
||||
column T: x (shortid)` from text where some cells need
|
||||
validation and others need auto-fill), both notes are emitted
|
||||
on separate lines. The success path emits them after the `[ok]`
|
||||
summary and before the structure-render block.
|
||||
|
||||
### 10. Engine-vocabulary cleanup
|
||||
|
||||
While here, fix the existing user-facing string in
|
||||
`do_add_column`'s serial refusal (db.rs:1374): the message
|
||||
names "SQLite's ALTER TABLE" — an ADR-0002 user-facing posture
|
||||
violation. This message is being replaced anyway as part of
|
||||
lifting the refusal; the replacement uses abstract "the
|
||||
database" / "the engine" phrasing.
|
||||
|
||||
## Resolutions
|
||||
|
||||
Three points called out as "open" during drafting, resolved
|
||||
before acceptance:
|
||||
|
||||
1. **No-PK empty-table case**: not reachable. Every table has
|
||||
a PK by construction — the `create table` parser refuses
|
||||
input that produces an empty PK list. `add column T: x
|
||||
(serial)` on an empty table therefore always lands on a
|
||||
table that already has a PK, and the new `x` column is a
|
||||
non-PK serial (gains UNIQUE per §4).
|
||||
|
||||
2. **Serial sequencing under explicit user inserts**: MAX+1.
|
||||
If the user explicitly inserts `id = 100`, the next auto-
|
||||
fill yields 101. Gappy sequences are accepted (e.g., if
|
||||
the user later inserts `id = 200`, the next auto-fill is
|
||||
201; the gap 102..199 is not back-filled). MAX+1 matches
|
||||
SQLite's rowid behaviour for the PK case, so both
|
||||
implementation paths feel uniform to the user, and gap-
|
||||
detection is more expensive than its pedagogical value.
|
||||
|
||||
3. **UNIQUE emission style**: inline column constraint
|
||||
(`x INTEGER UNIQUE`). Cleaner DDL while we don't have a
|
||||
user-facing constraint surface that would benefit from
|
||||
named, separately-managed indexes. Revisitable when C3
|
||||
lands the user-facing constraint feature; the
|
||||
`read_schema` detection via `pragma_index_list` works for
|
||||
either form.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- **OOS-1.** User-facing UNIQUE constraint surface (`add
|
||||
unique <T>: <c>`, `drop unique`, naming, multi-column unique).
|
||||
Stays as C3-track work.
|
||||
- **OOS-2.** Strict-monotonic AUTOINCREMENT semantics — we
|
||||
retain plain `INTEGER PRIMARY KEY` for PK serial. Rebuild-
|
||||
reset of the high-water mark is acceptable for a teaching
|
||||
tool; users who care can be taught the distinction in a
|
||||
later iteration.
|
||||
- **OOS-3.** Custom serial start values, custom step sizes, or
|
||||
multi-column composite serial.
|
||||
- **OOS-4.** Non-PK serial when the table has no PK at all
|
||||
(caught by Open Question 1's resolution).
|
||||
- **OOS-5.** A `[client-side]` note on the empty-table case
|
||||
(`add column` on an empty table). No rows means nothing to
|
||||
auto-fill — the operation is a structural change with no
|
||||
pedagogical "the tool did this for you" content.
|
||||
- **OOS-6.** Reading and emitting CHECK constraints — only
|
||||
UNIQUE is required for this ADR.
|
||||
|
||||
## Consequences
|
||||
|
||||
- The "serial only on PK" mental model is replaced with
|
||||
"serial works anywhere". Pedagogically richer: students see
|
||||
auto-incrementing columns as a general feature, not as a
|
||||
special PK-only quirk.
|
||||
- One internal mechanism the user doesn't see (rowid alias vs
|
||||
application MAX+1). The two paths converge to identical
|
||||
user-facing behaviour, honouring ADR-0002's posture.
|
||||
- `schema_to_ddl` and `read_schema` gain UNIQUE handling — a
|
||||
partial pull-forward of C3 work. The user-facing constraint
|
||||
surface stays deferred; this ADR only lands the internal
|
||||
infrastructure required by serial / shortid contracts.
|
||||
- `[client-side]` notes proliferate to cover auto-fill cases.
|
||||
Strengthens the pedagogical lens: every place the playground
|
||||
goes beyond what raw SQL does, the user is told.
|
||||
- All four user-observed gaps from §Context closed. The
|
||||
`int → serial → int` round-trip works (matching the existing
|
||||
`text → shortid → text` round-trip from ADR-0017).
|
||||
- Add-column-with-shortid producing a "valid" state aligns
|
||||
with ADR-0005's design contract that shortids are unique
|
||||
non-null identifiers.
|
||||
|
||||
## Relationship to earlier ADRs
|
||||
|
||||
- **ADR-0002** — User-facing posture honoured: the dual
|
||||
serial implementation is hidden; the existing engine-name
|
||||
leak in `do_add_column`'s refusal message is fixed
|
||||
opportunistically.
|
||||
- **ADR-0005** — Type vocabulary unchanged; `serial` definition
|
||||
generalised. The keyword and the user model stay the same;
|
||||
the implementation broadens.
|
||||
- **ADR-0010** — Worker-thread serialisation is what makes the
|
||||
non-PK serial MAX+1 path safe without explicit locks.
|
||||
- **ADR-0011** — `fk_target_type` for serial unchanged
|
||||
(`Serial → Int`); FK target compatibility remains as-is.
|
||||
- **ADR-0013** — Rebuild-table primitive carries the auto-fill
|
||||
cases for non-empty `add column` and `change column to
|
||||
serial/shortid`.
|
||||
- **ADR-0014** — INSERT-time auto-fill semantics extended to
|
||||
non-PK serial. ADR-0014's auto-fill skip-list (which already
|
||||
covers both serial and shortid symmetrically) is reused.
|
||||
- **ADR-0015** — The text-format round-trip carries the new
|
||||
UNIQUE constraints in metadata so a rebuild from
|
||||
`project.yaml` reconstructs the database faithfully. Likely
|
||||
needs a `__rdbms_playground_columns` schema additon (a
|
||||
`unique` bool) — to be confirmed during implementation.
|
||||
- **ADR-0017** — §3 transformer matrix amended: `int → serial`
|
||||
joins per-cell-classified. §4.3 uniqueness check extended to
|
||||
cover non-PK serial / shortid targets.
|
||||
@@ -23,3 +23,4 @@ This directory contains the project's ADRs, recorded per
|
||||
- [ADR-0015 — Project storage runtime](0015-project-storage-runtime.md)
|
||||
- [ADR-0016 — Pretty table rendering for data and structure views](0016-pretty-table-rendering.md)
|
||||
- [ADR-0017 — Column type-change compatibility](0017-column-type-change-compatibility.md)
|
||||
- [ADR-0018 — Auto-fill contracts for `serial` and `shortid` columns](0018-auto-fill-contracts-for-serial-and-shortid.md)
|
||||
|
||||
Reference in New Issue
Block a user