Files
rdbms-playground/docs/adr/0043-compound-pk-foreign-key-references.md
T
claude@clouddev1 274e2b17b7 docs: ADR-0043 compound-PK foreign-key references (T3); accepted
Audit found single-column FK woven through ~15-20 sites; earns an
ADR. Decision: reference the parent's full compound PK, matched
positionally to an equal-length child list, per-pair type compat.
DSL `from P.(a,b) to C.(x,y)`; SQL `FOREIGN KEY (x,y) REFERENCES
P(a,b)` with bare-FK auto-expansion. Storage follows the existing
primary_key: [...] list convention (yaml columns: [a,b], uniform
JSON in unchanged metadata TEXT cols); back-compat not required,
so no migration. Also marks T3's verified scope.
2026-06-09 17:01:38 +00:00

12 KiB
Raw Blame History

ADR-0043: Compound-primary-key foreign-key references (T3)

Status

Accepted — 2026-06-09. All four genuine forks confirmed by the user at the recommended option: F-A full PK in order, F-B house-style uniform column lists (no migration; back-compat not required), F-C parenthesized DSL lists, F-D bare table-level SQL FK auto-expands to the parent's full PK. Closes the one open leg of requirements.md T3 ("compound primary keys handled end-to-end (DSL, storage, display, FK reference)"): a foreign key that references a compound (multi-column) primary key.

Cross-references ADR-0011 (FK column type compatibility — Type::fk_target_type), ADR-0013 (relationships, naming, the rebuild-table strategy, and the __rdbms_playground_relationships metadata table), ADR-0035 §4b (the SQL FOREIGN KEY surface), ADR-0004 / ADR-0015 (project.yaml as the authoritative format; playground.db is a derived artifact), and ADR-0009 (DSL surface conventions).

Context

Compound PRIMARY KEYs are declared, stored, and displayed today (create table T with pk a(int), b(int)primary_key: Vec<String>). The missing leg is the reference: a child table whose foreign key points at a parent's compound PK. A 2026-06-09 codebase audit found single-column FK is a pervasive assumption — ~1520 sites across 6+ files:

  • Metadata__rdbms_playground_relationships stores scalar parent_column TEXT / child_column TEXT (PRIMARY KEY (child_table, child_column)).
  • PersistenceRelationshipSchema { parent_column: String, child_column: String }; project.yaml RawEndpoint { table, column }.
  • Grammaradd 1:n relationship … from <P>.<col> to <C>.<col> (one ident per side); SQL FOREIGN KEY (<col>) REFERENCES <P>(<col>) (parens that hold exactly one ident).
  • ASTCommand::AddRelationship { parent_column: String, child_column: String }; SqlForeignKey { child_column: String, parent_column: Option<String> }.
  • Executorschema_to_ddl emits a single-column FOREIGN KEY (c) REFERENCES P(p); check_fk_type_compat compares one parent type to one child type; bare REFERENCES <P> on a compound-PK parent is refused as ambiguous (resolve_create_table_fks, do_alter_add_foreign_key).
  • DisplayRelationshipEnd { other_column: String, local_column: String }.

This is not a sweep-sized change, which is why it earns an ADR rather than an inline build. The decisions below also turn the audit's worst-case framing (a metadata-schema + yaml-format migration via the F3 framework) into a no-migration change.

Why no migration is needed

Decision input (user, 2026-06-09): back-compatibility with existing saved projects is not required. The project is pre-release; there is no installed base of project.yaml / playground.db files to preserve. This removes the only force that would have demanded an F3 migrator or a version bump, and — more importantly — it lets the representation be chosen for cleanliness and consistency rather than for byte-identical back-compat. The consequence is explicit and accepted: a project.yaml written before this change that contains relationships will not load under the new format.

Freed of back-compat, the storage follows the convention the file already uses for ordered column lists rather than inventing a new one:

  • project.yaml already writes primary_key: [id] (a compound PK is primary_key: [a, b]) and index columns: [a, b] (RawIndex { columns: Vec<String> }). The relationship endpoint is the lone multi-column-capable slot still using a scalar column:. It joins the house style (D5).
  • The metadata columns are TEXT; SQLite has no array type, so a list lives in a text cell as JSON regardless. That JSON is now a uniform encoding (a one-element array for the single-column case), not a "bare-name-or-JSON, sniff which" fallback — the fallback only existed to keep old rows identical, which is no longer a goal.

So this is not a clever back-compat dodge; it is "use the existing list convention, uniformly." No version bump, no F3 migrator.

Decision

Support a foreign key that references a parent's full compound primary key, matched positionally to an equal-length child column list, with per-pair type compatibility — across both the DSL and SQL surfaces — using format-flexible storage that needs no migration.

D1 — Matching policy: the full PK, in order

A compound-PK FK references all columns of the parent's primary key, in PK declaration order, matched 1:1 to the child's column list (same length). Referencing a subset of a compound PK is out of scope: SQL/SQLite require FK parent columns to form a PK or UNIQUE key, and a strict subset of a compound PK is not itself unique unless separately constrained. Teaching-clean rule: a foreign key to a compound key names every column of that key.

A length mismatch (child supplies N columns, parent PK has M ≠ N) is a friendly error naming both counts.

D2 — Type compatibility: per pair, positional

Each child column's type must satisfy parent_pk_col.fk_target_type() == child_col for the corresponding pair (the existing ADR-0011 rule, applied element-wise in order). check_fk_type_compat generalises to walk the pairs and report the first offending pair with the same wording it uses today.

D3 — DSL syntax: parenthesized column lists

add 1:n relationship [as <name>] from <P>.(<a>, <b>) to <C>.(<x>, <y>) [on delete …] [on update …] [--create-fk]

The single-column form from <P>.<col> to <C>.<col> is unchanged (no parens) — back-compatible and the common case. The parenthesized list is the multi-column form. Both sides must use the same arity (enforced as a D1 length check). Parentheses mirror the existing compound-PK declaration syntax (with pk a(int), b(int) uses parens around the per-column type; the FK list uses parens around the column names) and the SQL FOREIGN KEY (…) shape, so the surface stays internally consistent.

D4 — SQL syntax: extend the existing lists

FOREIGN KEY (<x>, <y>) REFERENCES <P> (<a>, <b>) — the grammar's child and parent column slots become comma-separated lists (today capped at one). Inline <col> <type> REFERENCES <P>(<a>, <b>) stays single-child-column (one inline column can't match a 2-column key) — a compound FK uses the table-level form. Bare table-level FOREIGN KEY (x, y) REFERENCES <P> (no parent columns) auto-expands to the parent's full PK when the arities match; bare inline <col> REFERENCES <P> on a compound-PK parent keeps today's friendly refusal, with the message pointing at the table-level multi-column form.

D5 — Storage: uniform column lists, matching the house style

Both stores hold an ordered column list, uniformly (a one-element list for the single-column case), following the convention project.yaml already uses for primary_key and index columns.

  • project.yaml: RawEndpoint becomes { table, columns: Vec<String> } and writes columns: [a, b] (single-column → columns: [id]), exactly parallel to primary_key: [id]. No scalar column: form, no dual-shape reader.
  • Metadata (__rdbms_playground_relationships): no CREATE TABLE change (the TEXT columns and PRIMARY KEY (child_table, child_column) are untouched). parent_column / child_column store the list as a JSON array string — uniformly, including ["id"] for a single column (SQLite has no array type, so a text cell is where a list lives). The actual enforced FK lives on the rebuilt child table's DDL (FOREIGN KEY (a, b) REFERENCES P(x, y)), emitted by schema_to_ddl, exactly as the single-column FK is today via the rebuild-table primitive (ADR-0013) — one relationship, one undo step.

D6 — In-memory model: Vec<String> column lists

Command::AddRelationship, SqlForeignKey, RelationshipSchema, the internal ReadForeignKey, and RelationshipEnd (display) all carry parent_columns: Vec<String> / child_columns: Vec<String> (or Option<Vec<String>> for the bare-SQL parent case). A one-element vec is the single-column case; nothing about the single-column UX changes.

Genuine forks (escalated for sign-off)

These are decisions, not facts. Recommendations are marked; the user confirms before this ADR moves to Accepted.

  • F-A — matching policy. Full PK only (D1, recommended) vs. allow a subset (needs a separate UNIQUE key; larger, less teaching-clean).
  • F-B — storage encoding. Uniform column lists in the existing house style — columns: [a, b] in yaml (like primary_key), JSON-array in the unchanged metadata TEXT columns; no back-compat, no migration (D5, recommended) vs. a normalized relationship-columns child table (more "correct" but a schema change with joins on read, no learner-visible payoff). Premise: no existing projects to preserve (confirmed).
  • F-C — DSL multi-column syntax. from P.(a, b) to C.(x, y) parenthesized (D3, recommended) vs. a repeated-dotted form (from P.a, P.b to C.x, C.y, more ambiguous to parse and read).
  • F-D — bare table-level SQL FK auto-expansion. Auto-expand FOREIGN KEY (x,y) REFERENCES P to P's full PK when arities match (D4, recommended) vs. always require explicit parent columns.

Implementation sketch (change sites)

Grouped; each lands behind tests. No migration step.

  1. ASTAddRelationship + SqlForeignKey column fields → Vec<String> / Option<Vec<String>> (command.rs).
  2. Grammar — DSL endpoint column slot → optional parenthesized list (ddl.rs); SQL child/parent column slots → comma lists (sql_create_table.rs). Builders collect lists.
  3. Metadatainsert_relationship_metadata / read_all_relationships encode/decode bare-or-JSON (db.rs); no CREATE TABLE change.
  4. PersistenceRelationshipSchemaVec<String>; RawEndpoint becomes { table, columns: Vec<String> }, written columns: [a, b] like primary_key (persistence/mod.rs, persistence/yaml.rs).
  5. Executordo_add_relationship / resolve_create_table_fks / do_alter_add_foreign_key walk column lists; schema_to_ddl emits multi-column FOREIGN KEY (…) REFERENCES P(…); check_fk_type_compat loops pairs; bare-reference paths auto-expand to the full PK (D4) or refuse with the improved message (db.rs).
  6. DisplayRelationshipEnd → column lists; describe / echo render (a, b) → (x, y) (db.rs, echo.rs).
  7. Tests — parse (DSL + SQL, single still works, multi parses, arity mismatch errors); worker round-trip (declare a 2-col FK, rebuild, FK enforced, type-mismatch refused); persistence round-trip (yaml columns: reads + writes; a legacy single-column yaml still loads); display.

Consequences

  • T3 closes; a learner can model a real composite-key relationship end to end.
  • No migration, and the on-disk representation gets more consistent: the relationship endpoint joins the primary_key: [...] / index columns: [...] list convention. The in-app single-column UX is untouched (one-element vecs).
  • Accepted trade-off (user, 2026-06-09): a project.yaml written before this change that contains relationships will not load under the new format. There is no installed base to preserve, so this is a clean cutover, not data loss.
  • The relationship model becomes list-based throughout, which is the natural foundation if subset/UNIQUE-targeted FKs are ever wanted (explicitly OOS here).
  • A modest, broad refactor (the Vec field change ripples through the 6 layers) — methodical, not deep; locked by tests at each layer.

Out of scope

  • Subset/non-PK FK targets (referencing a UNIQUE key that isn't the PK) — possible later on this list-based foundation.
  • Any change to single-column behaviour, the rebuild-table primitive, or the undo model (one relationship = one undo step stands).
  • A project.yaml version bump or F3 migrator (not needed — no installed base to migrate; clean cutover per D5).