5bb0a147f0
Generalises serial and shortid beyond their previous restricted forms: - `serial` is no longer restricted to single-column PK. Non-PK serial columns get an emitted UNIQUE constraint and use application-side MAX(col)+1 at INSERT time (rowid alias still drives the PK case for free; per ADR-0010 worker-thread serialisation, the read-then-insert sequence is safe). - `shortid` columns auto-fill existing null cells when the column is materialised — `add column T: x (shortid)` on a non-empty table no longer leaves rows in a not-really-valid NULL state. - `int -> serial` joins the type-change matrix as always-clean identity (closes the asymmetry vs `text -> shortid`); other sources are refused with a route-via-int hint. - `change column T: x (serial|shortid)` fills null source cells with sequence / generated values in the same rebuild transaction. Internal infrastructure: - ReadColumn gains `unique: bool`; read_schema detects single- column UNIQUE indexes via pragma_index_list / pragma_index_info; schema_to_ddl emits inline UNIQUE for non-PK columns. - ColumnSchema (persistence) gains `unique: bool` so the flag survives YAML round-trip and rebuild-from-text reconstructs it faithfully — preserves the "serial -> int leaves UNIQUE in place" promise across save/load cycles. - ChangeColumnTypeResult.client_side now carries `auto_filled` + `auto_fill_kind` alongside `transformed` + `lossy`; the app handler renders separate note lines when both apply. - AddColumnResult is a new return type carrying pre-rendered [client-side] note lines for the auto-fill paths. Tests: 519 -> 534 (+15). Clippy clean.
386 lines
16 KiB
Markdown
386 lines
16 KiB
Markdown
# ADR-0018: Auto-fill contracts for `serial` and `shortid` columns
|
|
|
|
## Status
|
|
|
|
Accepted.
|
|
|
|
Amends ADR-0005 (column type vocabulary), ADR-0014 (data
|
|
operations and value model), and ADR-0017 (column type-change
|
|
compatibility). Pulls part of C3's UNIQUE-constraint emission
|
|
forward as internal infrastructure.
|
|
|
|
## Context
|
|
|
|
`serial` and `shortid` are the two auto-generated types in
|
|
ADR-0005's vocabulary. Today they have asymmetric and
|
|
under-specified semantics:
|
|
|
|
1. **`serial` only on PK.** `Type::Serial.sqlite_strict_extra()`
|
|
returns `" PRIMARY KEY"`, and `do_add_column` explicitly
|
|
refuses serial. The implicit user-facing model is "serial =
|
|
auto-incrementing PK". This is an artefact of SQLite's only
|
|
free auto-increment mechanism (the rowid alias on `INTEGER
|
|
PRIMARY KEY`); other RDBMS — PostgreSQL's `SEQUENCE`, MySQL's
|
|
`AUTO_INCREMENT` — let auto-incrementing columns exist
|
|
anywhere. Our pedagogical intent is the broader model; the
|
|
restriction is incidental to our backend choice and leaks
|
|
that choice into the user-facing surface (against ADR-0002).
|
|
|
|
2. **`int → serial` is statically refused** in ADR-0017's
|
|
transformer matrix, while `text → shortid` is per-cell-
|
|
classified. Yet both target types are equally
|
|
"auto-generated with a uniqueness contract" — the asymmetry
|
|
isn't principled.
|
|
|
|
3. **`add column T: x (shortid)` on a non-empty table leaves
|
|
existing rows NULL.** Per the design contract, shortids are
|
|
unique non-null identifiers — so the column ends up in a
|
|
not-really-valid state until the user issues UPDATEs. The
|
|
auto-fill logic that runs at INSERT time for omitted shortid
|
|
values doesn't run at column-materialisation time.
|
|
|
|
4. **No UNIQUE constraint emission today.** A non-PK serial
|
|
column would need a UNIQUE constraint to enforce its contract
|
|
(the rowid trick isn't available off the PK). The same
|
|
applies to non-PK shortid: today it relies on a probabilistic
|
|
"won't collide" argument, not a database-enforced
|
|
guarantee. `schema_to_ddl` only emits NOT NULL inline plus PK
|
|
inline / table-level. No path emits UNIQUE.
|
|
|
|
This ADR resolves all four gaps with a single unifying
|
|
principle.
|
|
|
|
## Decision
|
|
|
|
### 1. The unifying principle
|
|
|
|
> Auto-generated column types honour their generation contract
|
|
> on every path that creates or transitions the column.
|
|
|
|
Concretely: a column declared (or converted to be) `serial` or
|
|
`shortid` always satisfies its contract — non-null,
|
|
auto-generated, unique — by the time the operation completes.
|
|
The user does not have to issue a follow-up UPDATE. The
|
|
mechanism is hidden; the user-facing model is "values appear
|
|
automatically".
|
|
|
|
### 2. `serial`: dual-implementation, single semantic
|
|
|
|
`serial` is generalised from "auto-incrementing PK" to
|
|
"auto-incrementing integer column". The column may be the table
|
|
PK or any non-PK column; the user-facing semantic is identical.
|
|
|
|
The implementation switches transparently:
|
|
|
|
- **PK case** (single-column PK on this column): rowid alias.
|
|
`INTEGER PRIMARY KEY` in DDL; SQLite's free auto-increment
|
|
applies. Unchanged from today.
|
|
- **Non-PK case**: app-level `MAX(col) + 1` lookup at INSERT
|
|
time, plus an emitted UNIQUE constraint on the column. The
|
|
worker-thread serialisation (ADR-0010) makes the read-then-
|
|
insert sequence safe without explicit locking — only one
|
|
INSERT runs at a time on the connection.
|
|
|
|
User-visible help, error messages, and `[client-side]` notes
|
|
refer to `serial` columns as "auto-incrementing" or
|
|
"auto-generated". The PK / non-PK distinction is an internal
|
|
implementation detail (ADR-0002 user-facing posture).
|
|
|
|
### 3. `shortid`: tighten the contract at column materialisation
|
|
|
|
Today: shortid generation runs only when an INSERT omits the
|
|
value. Rows existing at the moment a shortid column is created
|
|
remain NULL until the user issues an UPDATE.
|
|
|
|
Going forward: any null cell in a shortid column gets a freshly-
|
|
generated value at the operation that creates that condition:
|
|
|
|
- `add column T: x (shortid)` on a non-empty table fills every
|
|
existing row's `x` with a generated shortid before the
|
|
operation completes.
|
|
- `change column T: x (shortid)` from `text` (or any other
|
|
matrix-permitted source) fills any null cells with generated
|
|
shortids in the same rebuild transaction.
|
|
|
|
Generator collisions (vanishingly rare given the 10⁷–10⁸
|
|
namespace; see ADR-0014 §"shortid auto-generation") trigger up
|
|
to 5 retries per cell. Exhausting retries fails the operation
|
|
with a friendly diagnostic; in practice this indicates either a
|
|
generator-state bug or a pathological RNG and is not user-
|
|
recoverable.
|
|
|
|
### 4. UNIQUE story
|
|
|
|
Auto-generated non-PK columns gain an emitted UNIQUE constraint
|
|
to enforce their contract:
|
|
|
|
- Non-PK `serial`: gains UNIQUE on creation / conversion-to-
|
|
serial. Required for the contract; the rowid trick isn't
|
|
available off the PK.
|
|
- Non-PK `shortid`: gains UNIQUE on creation / conversion-to-
|
|
shortid. Strengthens today's probabilistic guarantee into a
|
|
database-enforced one.
|
|
- PK case for either type: PK already implies UNIQUE+NOT NULL.
|
|
No additional constraint needed.
|
|
|
|
The reverse direction (`serial → int`, `shortid → text`) leaves
|
|
the UNIQUE constraint **in place**. The user has not signalled
|
|
intent to drop the uniqueness guarantee; only the auto-
|
|
generation contract was dropped. When constraint-management
|
|
lands as a user-facing feature (C3-track), the user can
|
|
explicitly drop the UNIQUE if desired.
|
|
|
|
This ADR pulls forward the **internal infrastructure** to emit
|
|
and read UNIQUE constraints — `schema_to_ddl` gains UNIQUE-
|
|
column-clause emission; `read_schema` gains UNIQUE detection
|
|
via `pragma_index_list` + `pragma_index_info`; `ReadColumn`
|
|
gains a `unique: bool` field. The **user-facing constraint
|
|
surface** (declaring UNIQUE in `with pk … unique …` or via
|
|
`add unique`, dropping UNIQUE, naming UNIQUE constraints) is
|
|
not in scope here and remains C3-track work.
|
|
|
|
### 5. INSERT-path changes
|
|
|
|
For non-PK `serial` columns, when the column is omitted from
|
|
an INSERT (the existing skip-list at db.rs:3111 already covers
|
|
serial and shortid identically), the executor:
|
|
|
|
1. Queries `SELECT COALESCE(MAX(col), 0) + 1 FROM T` inside the
|
|
same transaction.
|
|
2. Binds the result as the column's value.
|
|
|
|
The MAX-based seeding mirrors SQLite's rowid behaviour: gaps
|
|
left by user-supplied explicit values are jumped over (the next
|
|
auto-fill is `MAX + 1`, not "the smallest available integer").
|
|
|
|
Worker-thread serialisation (ADR-0010) prevents the classic
|
|
read-modify-write race; the pattern is safe for our single-
|
|
writer model.
|
|
|
|
### 6. `add_column` changes
|
|
|
|
`do_add_column` lifts its blanket serial refusal (db.rs:1374).
|
|
The new behaviour is determined by the source table's state:
|
|
|
|
- **`add column T: x (serial)` on an empty table**: emit
|
|
`ALTER TABLE T ADD COLUMN x INTEGER UNIQUE`. Every table has
|
|
a PK by construction (the parser refuses `create table`
|
|
without `with pk`), so the "no PK" branch doesn't arise —
|
|
the new column joins as a non-PK serial.
|
|
- **`add column T: x (serial)` on a non-empty table**: route
|
|
through the rebuild-table primitive (ADR-0013). Create new
|
|
table with `x INTEGER UNIQUE`. Copy rows, filling `x` with
|
|
values 1..N in declaration order. Emit a `[client-side]` note
|
|
(§7).
|
|
- **`add column T: x (shortid)` on a non-empty table**: route
|
|
through the rebuild-table primitive. Create new table with
|
|
`x TEXT UNIQUE`. Copy rows, generating a fresh shortid for
|
|
each (collision-retried per §3). Emit a `[client-side]` note.
|
|
|
|
The empty-table path can stay on `ALTER TABLE ADD COLUMN` for
|
|
efficiency; the non-empty path needs the rebuild because we
|
|
need to populate the new column atomically with table
|
|
creation.
|
|
|
|
### 7. `change column` to `serial` / `shortid`
|
|
|
|
`change column T: x (serial)` from any matrix-permitted source
|
|
type (today: `int`; future expansions follow the same rule):
|
|
|
|
1. Run the per-cell dry-run (ADR-0017 §2). For non-null cells,
|
|
classify via the transformer matrix: source must produce an
|
|
integer (the existing serial pre-condition).
|
|
2. Refuse if existing non-null values have duplicates
|
|
(uniqueness collision, ADR-0017 §4.3).
|
|
3. Auto-fill any null cells with sequential values continuing
|
|
from `MAX(non-null values) + 1` (or starting at 1 if none).
|
|
4. Refuse if the auto-fill would itself produce a collision —
|
|
in practice, this can only happen if the user supplied
|
|
non-null values that already overlap the would-be sequence
|
|
(e.g., existing values [1, 2, 5] with two nulls — fill would
|
|
be 6 and 7, no collision; existing values [1, 2, 6] with
|
|
nulls — fill would be 3 and 4, no collision; the sequence
|
|
uses MAX+1, not gap-filling, so this case doesn't actually
|
|
arise — but state the rule defensively).
|
|
5. Rebuild the table with the new column type plus UNIQUE (per
|
|
§4) plus the transformed + auto-filled values.
|
|
6. Emit `[client-side]` notes (§7).
|
|
|
|
`change column T: x (shortid)` from `text`:
|
|
|
|
1. Run the per-cell dry-run. Non-null cells classify via the
|
|
text → shortid transformer (ADR-0017 §3) — must match the
|
|
shortid grammar.
|
|
2. Refuse if existing non-null shortid-valid values have
|
|
duplicates.
|
|
3. Auto-fill null cells with generated shortids (collision-
|
|
retried per §3, including against the existing values).
|
|
4. Rebuild with TEXT + UNIQUE + the validated + auto-filled
|
|
values.
|
|
5. Emit `[client-side]` notes.
|
|
|
|
### 8. Conversion matrix amendments to ADR-0017
|
|
|
|
ADR-0017 §3 "Statically refused" is amended:
|
|
|
|
- `int → serial` is **removed** from the static refusal list and
|
|
added as a **per-cell-classified** matrix entry: clean for
|
|
non-null integers (with the post-transformation uniqueness
|
|
check from §4.3), with null-cell auto-fill per §7 above.
|
|
- The general "Anything → `serial`" refusal is replaced with a
|
|
more specific list: `text → serial`, `real → serial`, etc.
|
|
remain refused for v1 (route via int first); `bool → serial`
|
|
remains refused (cross-domain).
|
|
- `text → shortid` is unchanged from ADR-0017 (still per-cell-
|
|
classified). The contract enforcement at column-materialisation
|
|
is new.
|
|
|
|
ADR-0017 §4.3 (uniqueness check) is amended to apply to
|
|
"PK columns and shortid columns and any column that gains a
|
|
UNIQUE constraint as part of the operation" — i.e., non-PK
|
|
serial / shortid targets are uniqueness-checked.
|
|
|
|
### 9. Client-side notes
|
|
|
|
ADR-0017 §6 introduced the `[client-side]` pattern: when the
|
|
playground rewrote any cell value, the success summary tells
|
|
the learner "the tool did this for you; raw SQL would need a
|
|
`CAST` or application-level code." This ADR extends the pattern
|
|
to auto-fill operations:
|
|
|
|
- **`add column T: x (serial)` on non-empty table**:
|
|
> [client-side] N row(s) given auto-generated serial values
|
|
> 1..N. In raw SQL this would need an explicit UPDATE to
|
|
> populate.
|
|
|
|
- **`add column T: x (shortid)` on non-empty table**:
|
|
> [client-side] N row(s) given auto-generated shortid values.
|
|
> In raw SQL this would need an explicit UPDATE to populate.
|
|
|
|
- **`change column T: x (serial)` with M null cells**:
|
|
> [client-side] M null cell(s) given auto-generated serial
|
|
> values. In raw SQL this would need an explicit UPDATE to
|
|
> populate.
|
|
|
|
- **`change column T: x (shortid)` with M null cells**:
|
|
> [client-side] M null cell(s) given auto-generated shortid
|
|
> values. In raw SQL this would need an explicit UPDATE to
|
|
> populate.
|
|
|
|
When both an ADR-0017 transformation note AND an ADR-0018
|
|
auto-fill note apply to the same operation (e.g., `change
|
|
column T: x (shortid)` from text where some cells need
|
|
validation and others need auto-fill), both notes are emitted
|
|
on separate lines. The success path emits them after the `[ok]`
|
|
summary and before the structure-render block.
|
|
|
|
### 10. Engine-vocabulary cleanup
|
|
|
|
While here, fix the existing user-facing string in
|
|
`do_add_column`'s serial refusal (db.rs:1374): the message
|
|
names "SQLite's ALTER TABLE" — an ADR-0002 user-facing posture
|
|
violation. This message is being replaced anyway as part of
|
|
lifting the refusal; the replacement uses abstract "the
|
|
database" / "the engine" phrasing.
|
|
|
|
## Resolutions
|
|
|
|
Three points called out as "open" during drafting, resolved
|
|
before acceptance:
|
|
|
|
1. **No-PK empty-table case**: not reachable. Every table has
|
|
a PK by construction — the `create table` parser refuses
|
|
input that produces an empty PK list. `add column T: x
|
|
(serial)` on an empty table therefore always lands on a
|
|
table that already has a PK, and the new `x` column is a
|
|
non-PK serial (gains UNIQUE per §4).
|
|
|
|
2. **Serial sequencing under explicit user inserts**: MAX+1.
|
|
If the user explicitly inserts `id = 100`, the next auto-
|
|
fill yields 101. Gappy sequences are accepted (e.g., if
|
|
the user later inserts `id = 200`, the next auto-fill is
|
|
201; the gap 102..199 is not back-filled). MAX+1 matches
|
|
SQLite's rowid behaviour for the PK case, so both
|
|
implementation paths feel uniform to the user, and gap-
|
|
detection is more expensive than its pedagogical value.
|
|
|
|
3. **UNIQUE emission style**: inline column constraint
|
|
(`x INTEGER UNIQUE`). Cleaner DDL while we don't have a
|
|
user-facing constraint surface that would benefit from
|
|
named, separately-managed indexes. Revisitable when C3
|
|
lands the user-facing constraint feature; the
|
|
`read_schema` detection via `pragma_index_list` works for
|
|
either form.
|
|
|
|
## Out of scope
|
|
|
|
- **OOS-1.** User-facing UNIQUE constraint surface (`add
|
|
unique <T>: <c>`, `drop unique`, naming, multi-column unique).
|
|
Stays as C3-track work.
|
|
- **OOS-2.** Strict-monotonic AUTOINCREMENT semantics — we
|
|
retain plain `INTEGER PRIMARY KEY` for PK serial. Rebuild-
|
|
reset of the high-water mark is acceptable for a teaching
|
|
tool; users who care can be taught the distinction in a
|
|
later iteration.
|
|
- **OOS-3.** Custom serial start values, custom step sizes, or
|
|
multi-column composite serial.
|
|
- **OOS-4.** Non-PK serial when the table has no PK at all
|
|
(caught by Open Question 1's resolution).
|
|
- **OOS-5.** A `[client-side]` note on the empty-table case
|
|
(`add column` on an empty table). No rows means nothing to
|
|
auto-fill — the operation is a structural change with no
|
|
pedagogical "the tool did this for you" content.
|
|
- **OOS-6.** Reading and emitting CHECK constraints — only
|
|
UNIQUE is required for this ADR.
|
|
|
|
## Consequences
|
|
|
|
- The "serial only on PK" mental model is replaced with
|
|
"serial works anywhere". Pedagogically richer: students see
|
|
auto-incrementing columns as a general feature, not as a
|
|
special PK-only quirk.
|
|
- One internal mechanism the user doesn't see (rowid alias vs
|
|
application MAX+1). The two paths converge to identical
|
|
user-facing behaviour, honouring ADR-0002's posture.
|
|
- `schema_to_ddl` and `read_schema` gain UNIQUE handling — a
|
|
partial pull-forward of C3 work. The user-facing constraint
|
|
surface stays deferred; this ADR only lands the internal
|
|
infrastructure required by serial / shortid contracts.
|
|
- `[client-side]` notes proliferate to cover auto-fill cases.
|
|
Strengthens the pedagogical lens: every place the playground
|
|
goes beyond what raw SQL does, the user is told.
|
|
- All four user-observed gaps from §Context closed. The
|
|
`int → serial → int` round-trip works (matching the existing
|
|
`text → shortid → text` round-trip from ADR-0017).
|
|
- Add-column-with-shortid producing a "valid" state aligns
|
|
with ADR-0005's design contract that shortids are unique
|
|
non-null identifiers.
|
|
|
|
## Relationship to earlier ADRs
|
|
|
|
- **ADR-0002** — User-facing posture honoured: the dual
|
|
serial implementation is hidden; the existing engine-name
|
|
leak in `do_add_column`'s refusal message is fixed
|
|
opportunistically.
|
|
- **ADR-0005** — Type vocabulary unchanged; `serial` definition
|
|
generalised. The keyword and the user model stay the same;
|
|
the implementation broadens.
|
|
- **ADR-0010** — Worker-thread serialisation is what makes the
|
|
non-PK serial MAX+1 path safe without explicit locks.
|
|
- **ADR-0011** — `fk_target_type` for serial unchanged
|
|
(`Serial → Int`); FK target compatibility remains as-is.
|
|
- **ADR-0013** — Rebuild-table primitive carries the auto-fill
|
|
cases for non-empty `add column` and `change column to
|
|
serial/shortid`.
|
|
- **ADR-0014** — INSERT-time auto-fill semantics extended to
|
|
non-PK serial. ADR-0014's auto-fill skip-list (which already
|
|
covers both serial and shortid symmetrically) is reused.
|
|
- **ADR-0015** — The text-format round-trip carries the new
|
|
UNIQUE constraints in metadata so a rebuild from
|
|
`project.yaml` reconstructs the database faithfully. Likely
|
|
needs a `__rdbms_playground_columns` schema additon (a
|
|
`unique` bool) — to be confirmed during implementation.
|
|
- **ADR-0017** — §3 transformer matrix amended: `int → serial`
|
|
joins per-cell-classified. §4.3 uniqueness check extended to
|
|
cover non-PK serial / shortid targets.
|