Files
rdbms-playground/docs/adr/0043-compound-pk-foreign-key-references.md
T
claude@clouddev1 b688592b4c docs: ADR-0043 implementation-readiness notes from /runda DA pass
DA pass found three change sites the first sketch missed
(teaching-echo renderers, --create-fk per-column creation, the
auto-name generator) and made explicit the rules the forks left
implicit: SQLite FK precondition (compound PK provides the unique
index), explicit parent cols must be the PK set (any order,
positional), arity/empty/inline-rejection wording, single-in-parens
accepted, --create-fk per-column typed to fk_target_type. Expanded
the test plan to cover enforcement, auto-expand, undo, round-trip.
Fixed a stale 'legacy yaml loads' test line (no back-compat).
2026-06-09 17:11:01 +00:00

16 KiB
Raw Blame History

ADR-0043: Compound-primary-key foreign-key references (T3)

Status

Accepted — 2026-06-09. All four genuine forks confirmed by the user at the recommended option: F-A full PK in order, F-B house-style uniform column lists (no migration; back-compat not required), F-C parenthesized DSL lists, F-D bare table-level SQL FK auto-expands to the parent's full PK. Closes the one open leg of requirements.md T3 ("compound primary keys handled end-to-end (DSL, storage, display, FK reference)"): a foreign key that references a compound (multi-column) primary key.

Cross-references ADR-0011 (FK column type compatibility — Type::fk_target_type), ADR-0013 (relationships, naming, the rebuild-table strategy, and the __rdbms_playground_relationships metadata table), ADR-0035 §4b (the SQL FOREIGN KEY surface), ADR-0004 / ADR-0015 (project.yaml as the authoritative format; playground.db is a derived artifact), and ADR-0009 (DSL surface conventions).

Context

Compound PRIMARY KEYs are declared, stored, and displayed today (create table T with pk a(int), b(int)primary_key: Vec<String>). The missing leg is the reference: a child table whose foreign key points at a parent's compound PK. A 2026-06-09 codebase audit found single-column FK is a pervasive assumption — ~1520 sites across 6+ files:

  • Metadata__rdbms_playground_relationships stores scalar parent_column TEXT / child_column TEXT (PRIMARY KEY (child_table, child_column)).
  • PersistenceRelationshipSchema { parent_column: String, child_column: String }; project.yaml RawEndpoint { table, column }.
  • Grammaradd 1:n relationship … from <P>.<col> to <C>.<col> (one ident per side); SQL FOREIGN KEY (<col>) REFERENCES <P>(<col>) (parens that hold exactly one ident).
  • ASTCommand::AddRelationship { parent_column: String, child_column: String }; SqlForeignKey { child_column: String, parent_column: Option<String> }.
  • Executorschema_to_ddl emits a single-column FOREIGN KEY (c) REFERENCES P(p); check_fk_type_compat compares one parent type to one child type; bare REFERENCES <P> on a compound-PK parent is refused as ambiguous (resolve_create_table_fks, do_alter_add_foreign_key).
  • DisplayRelationshipEnd { other_column: String, local_column: String }.

This is not a sweep-sized change, which is why it earns an ADR rather than an inline build. The decisions below also turn the audit's worst-case framing (a metadata-schema + yaml-format migration via the F3 framework) into a no-migration change.

Why no migration is needed

Decision input (user, 2026-06-09): back-compatibility with existing saved projects is not required. The project is pre-release; there is no installed base of project.yaml / playground.db files to preserve. This removes the only force that would have demanded an F3 migrator or a version bump, and — more importantly — it lets the representation be chosen for cleanliness and consistency rather than for byte-identical back-compat. The consequence is explicit and accepted: a project.yaml written before this change that contains relationships will not load under the new format.

Freed of back-compat, the storage follows the convention the file already uses for ordered column lists rather than inventing a new one:

  • project.yaml already writes primary_key: [id] (a compound PK is primary_key: [a, b]) and index columns: [a, b] (RawIndex { columns: Vec<String> }). The relationship endpoint is the lone multi-column-capable slot still using a scalar column:. It joins the house style (D5).
  • The metadata columns are TEXT; SQLite has no array type, so a list lives in a text cell as JSON regardless. That JSON is now a uniform encoding (a one-element array for the single-column case), not a "bare-name-or-JSON, sniff which" fallback — the fallback only existed to keep old rows identical, which is no longer a goal.

So this is not a clever back-compat dodge; it is "use the existing list convention, uniformly." No version bump, no F3 migrator.

Decision

Support a foreign key that references a parent's full compound primary key, matched positionally to an equal-length child column list, with per-pair type compatibility — across both the DSL and SQL surfaces — using format-flexible storage that needs no migration.

D1 — Matching policy: the full PK, in order

A compound-PK FK references all columns of the parent's primary key, in PK declaration order, matched 1:1 to the child's column list (same length). Referencing a subset of a compound PK is out of scope: SQL/SQLite require FK parent columns to form a PK or UNIQUE key, and a strict subset of a compound PK is not itself unique unless separately constrained. Teaching-clean rule: a foreign key to a compound key names every column of that key.

A length mismatch (child supplies N columns, parent PK has M ≠ N) is a friendly error naming both counts.

D2 — Type compatibility: per pair, positional

Each child column's type must satisfy parent_pk_col.fk_target_type() == child_col for the corresponding pair (the existing ADR-0011 rule, applied element-wise in order). check_fk_type_compat generalises to walk the pairs and report the first offending pair with the same wording it uses today.

D3 — DSL syntax: parenthesized column lists

add 1:n relationship [as <name>] from <P>.(<a>, <b>) to <C>.(<x>, <y>) [on delete …] [on update …] [--create-fk]

The single-column form from <P>.<col> to <C>.<col> is unchanged (no parens) — back-compatible and the common case. The parenthesized list is the multi-column form. Both sides must use the same arity (enforced as a D1 length check). Parentheses mirror the existing compound-PK declaration syntax (with pk a(int), b(int) uses parens around the per-column type; the FK list uses parens around the column names) and the SQL FOREIGN KEY (…) shape, so the surface stays internally consistent.

D4 — SQL syntax: extend the existing lists

FOREIGN KEY (<x>, <y>) REFERENCES <P> (<a>, <b>) — the grammar's child and parent column slots become comma-separated lists (today capped at one). Inline <col> <type> REFERENCES <P>(<a>, <b>) stays single-child-column (one inline column can't match a 2-column key) — a compound FK uses the table-level form. Bare table-level FOREIGN KEY (x, y) REFERENCES <P> (no parent columns) auto-expands to the parent's full PK when the arities match; bare inline <col> REFERENCES <P> on a compound-PK parent keeps today's friendly refusal, with the message pointing at the table-level multi-column form.

D5 — Storage: uniform column lists, matching the house style

Both stores hold an ordered column list, uniformly (a one-element list for the single-column case), following the convention project.yaml already uses for primary_key and index columns.

  • project.yaml: RawEndpoint becomes { table, columns: Vec<String> } and writes columns: [a, b] (single-column → columns: [id]), exactly parallel to primary_key: [id]. No scalar column: form, no dual-shape reader.
  • Metadata (__rdbms_playground_relationships): no CREATE TABLE change (the TEXT columns and PRIMARY KEY (child_table, child_column) are untouched). parent_column / child_column store the list as a JSON array string — uniformly, including ["id"] for a single column (SQLite has no array type, so a text cell is where a list lives). The actual enforced FK lives on the rebuilt child table's DDL (FOREIGN KEY (a, b) REFERENCES P(x, y)), emitted by schema_to_ddl, exactly as the single-column FK is today via the rebuild-table primitive (ADR-0013) — one relationship, one undo step.

D6 — In-memory model: Vec<String> column lists

Command::AddRelationship, SqlForeignKey, RelationshipSchema, the internal ReadForeignKey, and RelationshipEnd (display) all carry parent_columns: Vec<String> / child_columns: Vec<String> (or Option<Vec<String>> for the bare-SQL parent case). A one-element vec is the single-column case; nothing about the single-column UX changes.

Genuine forks (escalated for sign-off)

These are decisions, not facts. Recommendations are marked; the user confirms before this ADR moves to Accepted.

  • F-A — matching policy. Full PK only (D1, recommended) vs. allow a subset (needs a separate UNIQUE key; larger, less teaching-clean).
  • F-B — storage encoding. Uniform column lists in the existing house style — columns: [a, b] in yaml (like primary_key), JSON-array in the unchanged metadata TEXT columns; no back-compat, no migration (D5, recommended) vs. a normalized relationship-columns child table (more "correct" but a schema change with joins on read, no learner-visible payoff). Premise: no existing projects to preserve (confirmed).
  • F-C — DSL multi-column syntax. from P.(a, b) to C.(x, y) parenthesized (D3, recommended) vs. a repeated-dotted form (from P.a, P.b to C.x, C.y, more ambiguous to parse and read).
  • F-D — bare table-level SQL FK auto-expansion. Auto-expand FOREIGN KEY (x,y) REFERENCES P to P's full PK when arities match (D4, recommended) vs. always require explicit parent columns.

Implementation sketch (change sites)

Grouped; each lands behind tests. No migration step.

  1. ASTAddRelationship + SqlForeignKey column fields → Vec<String> / Option<Vec<String>> (command.rs).
  2. Grammar — DSL endpoint column slot → optional parenthesized list (ddl.rs); SQL child/parent column slots → comma lists (sql_create_table.rs). Builders collect lists.
  3. Metadatainsert_relationship_metadata / read_all_relationships encode/decode bare-or-JSON (db.rs); no CREATE TABLE change.
  4. PersistenceRelationshipSchemaVec<String>; RawEndpoint becomes { table, columns: Vec<String> }, written columns: [a, b] like primary_key (persistence/mod.rs, persistence/yaml.rs).
  5. Executordo_add_relationship / resolve_create_table_fks / do_alter_add_foreign_key walk column lists; schema_to_ddl emits multi-column FOREIGN KEY (…) REFERENCES P(…); check_fk_type_compat loops pairs; bare-reference paths auto-expand to the full PK (D4) or refuse with the improved message; the default relationship-name generator (db.rs:6850) joins the column lists; --create-fk creates one child column per parent PK column (db.rs).
  6. DisplayRelationshipEnd → column lists; describe renders (a, b) → (x, y) symmetrically (outbound + inbound, ADR-0013) (db.rs, output_render.rs).
  7. Teaching echo (ADR-0038)render_add_relationship and render_add_relationship_create_fk (echo.rs) go multi-column: the FK line emits FOREIGN KEY (a, b) REFERENCES P (x, y), and --create-fk emits one ADD COLUMN line per newly-created child column (each typed to the matching parent PK column's fk_target_type) before the FK line. Copy-paste contract (ADR-0038) holds: every echoed line is runnable advanced SQL.
  8. Tests — parse (DSL + SQL: single-col still works; multi parses; arity mismatch errors; empty () rejected; inline col REFERENCES P(a,b) rejected with the table-level pointer); worker round-trip (declare a 2-col FK, rebuild, the FK is enforced — an insert violating it is refused; per-pair type-mismatch refused; bare-FK auto-expand to the parent PK; --create-fk creates both child columns); persistence round-trip (a single-col relationship writes columns: [id] and reads back; a 2-col writes columns: [a, b] and reads back; full save→rebuild reconstructs the FK); undo (add a 2-col relationship, undo, it is gone — one step); display (describe shows (a, b) → (x, y) both directions).

Implementation-readiness notes (DA pass, 2026-06-09)

Verified against the code before build; folded in so the plan is complete.

  • SQLite precondition holds. A FK's parent columns must be a PK or a UNIQUE-indexed set. A SQLite PRIMARY KEY (a, b) creates the requisite unique index, so FOREIGN KEY (x, y) REFERENCES P(a, b) is valid against a compound PK with no extra index. STRICT tables do not change FK rules. (F-A's "full PK" therefore always targets a valid key; a subset would not be unique — the reason F-A excludes it.)
  • Explicit parent columns must be exactly the PK set. Under F-A, REFERENCES P(<cols>) is accepted iff <cols> is the parent's PK column set; any ordering is accepted and maps positionally to the child list (SQLite matches the set to the unique index; the child↔parent pairing is positional). A non-PK, partial, or super-set list is refused with a friendly message naming the parent's actual PK (subset/UNIQUE targets are OOS).
  • Arity + emptiness. Child and parent lists must be equal, non-zero length; a mismatch reports both counts ("N child column(s) but M in P's key"). An empty () list is a parse error. Inline single-column col REFERENCES P(a, b) is refused (one inline column can't satisfy a 2-column key) with a pointer to the table-level FOREIGN KEY (…) form (D4).
  • DSL from P.(a) (single in parens) is accepted — equivalent to bare from P.a — so the parenthesized form is uniform across arities; the bare form stays the idiomatic single-column spelling.
  • --create-fk is per-column. When child columns are missing, one is created per parent PK column, each typed to that parent column's fk_target_type (ADR-0011) — generalising today's single-column behaviour; the echo mirrors this (sketch step 7).
  • Metadata identity unchanged. PRIMARY KEY (child_table, child_column) still holds with the JSON-array string as the key — so a child column set still participates in at most one relationship (pre-existing behaviour, now per-set). Distinct sets on the same child table are distinct keys.
  • Auto-name generation (db.rs:6850, the [as <name>]-less default) is single-column today ({parent_table}_{parent_column}_to_{child_table}_{child_column}) — it must join the column lists (e.g. Orders_a_b_to_Customers_x_y). A found change site the first sketch missed; added to the executor step.
  • Undo / batch unchanged. One add 1:n relationship is one rebuild = one undo step (ADR-0013/0006), independent of arity.

Consequences

  • T3 closes; a learner can model a real composite-key relationship end to end.
  • No migration, and the on-disk representation gets more consistent: the relationship endpoint joins the primary_key: [...] / index columns: [...] list convention. The in-app single-column UX is untouched (one-element vecs).
  • Accepted trade-off (user, 2026-06-09): a project.yaml written before this change that contains relationships will not load under the new format. There is no installed base to preserve, so this is a clean cutover, not data loss.
  • The relationship model becomes list-based throughout, which is the natural foundation if subset/UNIQUE-targeted FKs are ever wanted (explicitly OOS here).
  • A modest, broad refactor (the Vec field change ripples through the 6 layers) — methodical, not deep; locked by tests at each layer.

Out of scope

  • Subset/non-PK FK targets (referencing a UNIQUE key that isn't the PK) — possible later on this list-based foundation.
  • Any change to single-column behaviour, the rebuild-table primitive, or the undo model (one relationship = one undo step stands).
  • A project.yaml version bump or F3 migrator (not needed — no installed base to migrate; clean cutover per D5).