Files
rdbms-playground/docs/adr/0043-compound-pk-foreign-key-references.md
T
claude@clouddev1 274e2b17b7 docs: ADR-0043 compound-PK foreign-key references (T3); accepted
Audit found single-column FK woven through ~15-20 sites; earns an
ADR. Decision: reference the parent's full compound PK, matched
positionally to an equal-length child list, per-pair type compat.
DSL `from P.(a,b) to C.(x,y)`; SQL `FOREIGN KEY (x,y) REFERENCES
P(a,b)` with bare-FK auto-expansion. Storage follows the existing
primary_key: [...] list convention (yaml columns: [a,b], uniform
JSON in unchanged metadata TEXT cols); back-compat not required,
so no migration. Also marks T3's verified scope.
2026-06-09 17:01:38 +00:00

261 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0043: Compound-primary-key foreign-key references (T3)
## Status
**Accepted** — 2026-06-09. All four genuine forks confirmed by the
user at the recommended option: **F-A** full PK in order, **F-B**
house-style uniform column lists (no migration; back-compat not
required), **F-C** parenthesized DSL lists, **F-D** bare table-level
SQL FK auto-expands to the parent's full PK. Closes the one open
leg of
`requirements.md` **T3** ("compound primary keys handled
end-to-end (DSL, storage, display, **FK reference**)"): a foreign
key that *references* a compound (multi-column) primary key.
Cross-references **ADR-0011** (FK column type compatibility —
`Type::fk_target_type`), **ADR-0013** (relationships, naming, the
rebuild-table strategy, and the `__rdbms_playground_relationships`
metadata table), **ADR-0035 §4b** (the SQL `FOREIGN KEY` surface),
**ADR-0004 / ADR-0015** (`project.yaml` as the authoritative
format; `playground.db` is a derived artifact), and **ADR-0009**
(DSL surface conventions).
## Context
Compound PRIMARY KEYs are declared, stored, and displayed today
(`create table T with pk a(int), b(int)``primary_key:
Vec<String>`). The missing leg is the *reference*: a child table
whose foreign key points at a parent's compound PK. A 2026-06-09
codebase audit found single-column FK is a pervasive assumption —
~1520 sites across 6+ files:
- **Metadata** — `__rdbms_playground_relationships` stores scalar
`parent_column TEXT` / `child_column TEXT`
(`PRIMARY KEY (child_table, child_column)`).
- **Persistence** — `RelationshipSchema { parent_column: String,
child_column: String }`; `project.yaml` `RawEndpoint { table,
column }`.
- **Grammar** — `add 1:n relationship … from <P>.<col> to
<C>.<col>` (one ident per side); SQL `FOREIGN KEY (<col>)
REFERENCES <P>(<col>)` (parens that hold exactly one ident).
- **AST** — `Command::AddRelationship { parent_column: String,
child_column: String }`; `SqlForeignKey { child_column: String,
parent_column: Option<String> }`.
- **Executor** — `schema_to_ddl` emits a single-column
`FOREIGN KEY (c) REFERENCES P(p)`; `check_fk_type_compat`
compares one parent type to one child type; bare
`REFERENCES <P>` on a compound-PK parent is refused as
ambiguous (`resolve_create_table_fks`,
`do_alter_add_foreign_key`).
- **Display** — `RelationshipEnd { other_column: String,
local_column: String }`.
This is not a sweep-sized change, which is why it earns an ADR
rather than an inline build. The decisions below also turn the
audit's worst-case framing (a metadata-schema + yaml-format
migration via the F3 framework) into a **no-migration** change.
### Why no migration is needed
**Decision input (user, 2026-06-09): back-compatibility with
existing saved projects is not required.** The project is
pre-release; there is no installed base of `project.yaml` /
`playground.db` files to preserve. This removes the only force
that would have demanded an F3 migrator or a version bump, and —
more importantly — it lets the representation be chosen for
*cleanliness and consistency* rather than for byte-identical
back-compat. The consequence is explicit and accepted: a
`project.yaml` written before this change that contains
relationships will not load under the new format.
Freed of back-compat, the storage follows the convention the file
**already uses** for ordered column lists rather than inventing a
new one:
- `project.yaml` already writes `primary_key: [id]` (a compound PK
is `primary_key: [a, b]`) and index `columns: [a, b]`
(`RawIndex { columns: Vec<String> }`). The relationship endpoint
is the lone multi-column-capable slot still using a scalar
`column:`. It joins the house style (D5).
- The metadata columns are `TEXT`; SQLite has no array type, so a
list lives in a text cell as JSON regardless. That JSON is now a
*uniform* encoding (a one-element array for the single-column
case), not a "bare-name-or-JSON, sniff which" fallback — the
fallback only existed to keep old rows identical, which is no
longer a goal.
So this is not a clever back-compat dodge; it is "use the existing
list convention, uniformly." No version bump, no F3 migrator.
## Decision
Support a foreign key that references a parent's **full** compound
primary key, matched **positionally** to an equal-length child
column list, with per-pair type compatibility — across both the
DSL and SQL surfaces — using format-flexible storage that needs no
migration.
### D1 — Matching policy: the full PK, in order
A compound-PK FK references **all** columns of the parent's
primary key, in PK declaration order, matched 1:1 to the child's
column list (same length). Referencing a *subset* of a compound PK
is **out of scope**: SQL/SQLite require FK parent columns to form a
PK or UNIQUE key, and a strict subset of a compound PK is not
itself unique unless separately constrained. Teaching-clean rule:
*a foreign key to a compound key names every column of that key.*
A length mismatch (child supplies N columns, parent PK has M ≠ N)
is a friendly error naming both counts.
### D2 — Type compatibility: per pair, positional
Each child column's type must satisfy
`parent_pk_col.fk_target_type() == child_col` for the
corresponding pair (the existing ADR-0011 rule, applied
element-wise in order). `check_fk_type_compat` generalises to walk
the pairs and report the **first** offending pair with the same
wording it uses today.
### D3 — DSL syntax: parenthesized column lists
`add 1:n relationship [as <name>]
from <P>.(<a>, <b>) to <C>.(<x>, <y>)
[on delete …] [on update …] [--create-fk]`
The single-column form `from <P>.<col> to <C>.<col>` is unchanged
(no parens) — back-compatible and the common case. The
parenthesized list is the multi-column form. Both sides must use
the same arity (enforced as a D1 length check). Parentheses mirror
the existing compound-PK *declaration* syntax (`with pk a(int),
b(int)` uses parens around the per-column type; the FK list uses
parens around the column names) and the SQL `FOREIGN KEY (…)`
shape, so the surface stays internally consistent.
### D4 — SQL syntax: extend the existing lists
`FOREIGN KEY (<x>, <y>) REFERENCES <P> (<a>, <b>)` — the grammar's
child and parent column slots become comma-separated **lists**
(today capped at one). Inline `<col> <type> REFERENCES <P>(<a>,
<b>)` stays single-child-column (one inline column can't match a
2-column key) — a compound FK uses the table-level form. Bare
table-level `FOREIGN KEY (x, y) REFERENCES <P>` (no parent
columns) **auto-expands to the parent's full PK** when the arities
match; bare inline `<col> REFERENCES <P>` on a compound-PK parent
keeps today's friendly refusal, with the message pointing at the
table-level multi-column form.
### D5 — Storage: uniform column lists, matching the house style
Both stores hold an **ordered column list**, uniformly (a
one-element list for the single-column case), following the
convention `project.yaml` already uses for `primary_key` and index
`columns`.
- **`project.yaml`**: `RawEndpoint` becomes `{ table, columns:
Vec<String> }` and writes `columns: [a, b]` (single-column →
`columns: [id]`), exactly parallel to `primary_key: [id]`. No
scalar `column:` form, no dual-shape reader.
- **Metadata** (`__rdbms_playground_relationships`): no
`CREATE TABLE` change (the `TEXT` columns and
`PRIMARY KEY (child_table, child_column)` are untouched).
`parent_column` / `child_column` store the list as a JSON array
string — uniformly, including `["id"]` for a single column
(SQLite has no array type, so a text cell is where a list lives).
The actual enforced FK lives on the rebuilt child table's DDL
(`FOREIGN KEY (a, b) REFERENCES P(x, y)`), emitted by
`schema_to_ddl`, exactly as the single-column FK is today via the
rebuild-table primitive (ADR-0013) — one relationship, one undo
step.
### D6 — In-memory model: `Vec<String>` column lists
`Command::AddRelationship`, `SqlForeignKey`, `RelationshipSchema`,
the internal `ReadForeignKey`, and `RelationshipEnd` (display) all
carry `parent_columns: Vec<String>` / `child_columns: Vec<String>`
(or `Option<Vec<String>>` for the bare-SQL parent case). A
one-element vec is the single-column case; nothing about the
single-column UX changes.
## Genuine forks (escalated for sign-off)
These are decisions, not facts. Recommendations are marked; the
user confirms before this ADR moves to Accepted.
- **F-A — matching policy.** Full PK only (D1, *recommended*) vs.
allow a subset (needs a separate UNIQUE key; larger, less
teaching-clean).
- **F-B — storage encoding.** Uniform column lists in the existing
house style — `columns: [a, b]` in yaml (like `primary_key`),
JSON-array in the unchanged metadata `TEXT` columns; no
back-compat, no migration (D5, *recommended*) vs. a normalized
relationship-columns child table (more "correct" but a schema
change with joins on read, no learner-visible payoff). Premise:
no existing projects to preserve (confirmed).
- **F-C — DSL multi-column syntax.** `from P.(a, b) to C.(x, y)`
parenthesized (D3, *recommended*) vs. a repeated-dotted form
(`from P.a, P.b to C.x, C.y`, more ambiguous to parse and read).
- **F-D — bare table-level SQL FK auto-expansion.** Auto-expand
`FOREIGN KEY (x,y) REFERENCES P` to P's full PK when arities
match (D4, *recommended*) vs. always require explicit parent
columns.
## Implementation sketch (change sites)
Grouped; each lands behind tests. No migration step.
1. **AST** — `AddRelationship` + `SqlForeignKey` column fields →
`Vec<String>` / `Option<Vec<String>>` (`command.rs`).
2. **Grammar** — DSL endpoint column slot → optional
parenthesized list (`ddl.rs`); SQL child/parent column slots →
comma lists (`sql_create_table.rs`). Builders collect lists.
3. **Metadata** — `insert_relationship_metadata` /
`read_all_relationships` encode/decode bare-or-JSON
(`db.rs`); no `CREATE TABLE` change.
4. **Persistence** — `RelationshipSchema` → `Vec<String>`;
`RawEndpoint` becomes `{ table, columns: Vec<String> }`, written
`columns: [a, b]` like `primary_key`
(`persistence/mod.rs`, `persistence/yaml.rs`).
5. **Executor** — `do_add_relationship` /
`resolve_create_table_fks` / `do_alter_add_foreign_key` walk
column lists; `schema_to_ddl` emits multi-column `FOREIGN KEY
(…) REFERENCES P(…)`; `check_fk_type_compat` loops pairs;
bare-reference paths auto-expand to the full PK (D4) or refuse
with the improved message (`db.rs`).
6. **Display** — `RelationshipEnd` → column lists; `describe` /
echo render `(a, b) → (x, y)` (`db.rs`, `echo.rs`).
7. **Tests** — parse (DSL + SQL, single still works, multi parses,
arity mismatch errors); worker round-trip (declare a 2-col FK,
rebuild, FK enforced, type-mismatch refused); persistence
round-trip (yaml `columns:` reads + writes; a legacy
single-column yaml still loads); display.
## Consequences
- T3 closes; a learner can model a real composite-key relationship
end to end.
- No migration, and the on-disk representation gets *more*
consistent: the relationship endpoint joins the `primary_key:
[...]` / index `columns: [...]` list convention. The in-app
single-column UX is untouched (one-element vecs).
- Accepted trade-off (user, 2026-06-09): a `project.yaml` written
before this change that contains relationships will not load
under the new format. There is no installed base to preserve, so
this is a clean cutover, not data loss.
- The relationship model becomes list-based throughout, which is
the natural foundation if subset/UNIQUE-targeted FKs are ever
wanted (explicitly OOS here).
- A modest, broad refactor (the `Vec` field change ripples through
the 6 layers) — methodical, not deep; locked by tests at each
layer.
## Out of scope
- Subset/non-PK FK targets (referencing a UNIQUE key that isn't
the PK) — possible later on this list-based foundation.
- Any change to single-column behaviour, the rebuild-table
primitive, or the undo model (one relationship = one undo step
stands).
- A `project.yaml` version bump or F3 migrator (not needed —
no installed base to migrate; clean cutover per D5).