rdbms-playground/docs/adr/0043-compound-pk-foreign-key-references.md

# ADR-0043: Compound-primary-key foreign-key references (T3)

## Status

**Accepted** — 2026-06-09. All four genuine forks confirmed by the
user at the recommended option: **F-A** full PK in order, **F-B**
house-style uniform column lists (no migration; back-compat not
required), **F-C** parenthesized DSL lists, **F-D** bare table-level
SQL FK auto-expands to the parent's full PK. Closes the one open
leg of
`requirements.md` **T3** ("compound primary keys handled
end-to-end (DSL, storage, display, **FK reference**)"): a foreign
key that *references* a compound (multi-column) primary key.

Cross-references **ADR-0011** (FK column type compatibility —
`Type::fk_target_type`), **ADR-0013** (relationships, naming, the
rebuild-table strategy, and the `__rdbms_playground_relationships`
metadata table), **ADR-0035 §4b** (the SQL `FOREIGN KEY` surface),
**ADR-0004 / ADR-0015** (`project.yaml` as the authoritative
format; `playground.db` is a derived artifact), and **ADR-0009**
(DSL surface conventions).

## Context

Compound PRIMARY KEYs are declared, stored, and displayed today
(`create table T with pk a(int), b(int)` → `primary_key:
Vec<String>`). The missing leg is the *reference*: a child table
whose foreign key points at a parent's compound PK. A 2026-06-09
codebase audit found single-column FK is a pervasive assumption —
~15–20 sites across 6+ files:

- **Metadata** — `__rdbms_playground_relationships` stores scalar
  `parent_column TEXT` / `child_column TEXT`
  (`PRIMARY KEY (child_table, child_column)`).
- **Persistence** — `RelationshipSchema { parent_column: String,
  child_column: String }`; `project.yaml` `RawEndpoint { table,
  column }`.
- **Grammar** — `add 1:n relationship … from <P>.<col> to
  <C>.<col>` (one ident per side); SQL `FOREIGN KEY (<col>)
  REFERENCES <P>(<col>)` (parens that hold exactly one ident).
- **AST** — `Command::AddRelationship { parent_column: String,
  child_column: String }`; `SqlForeignKey { child_column: String,
  parent_column: Option<String> }`.
- **Executor** — `schema_to_ddl` emits a single-column
  `FOREIGN KEY (c) REFERENCES P(p)`; `check_fk_type_compat`
  compares one parent type to one child type; bare
  `REFERENCES <P>` on a compound-PK parent is refused as
  ambiguous (`resolve_create_table_fks`,
  `do_alter_add_foreign_key`).
- **Display** — `RelationshipEnd { other_column: String,
  local_column: String }`.

This is not a sweep-sized change, which is why it earns an ADR
rather than an inline build. The decisions below also turn the
audit's worst-case framing (a metadata-schema + yaml-format
migration via the F3 framework) into a **no-migration** change.

### Why no migration is needed

**Decision input (user, 2026-06-09): back-compatibility with
existing saved projects is not required.** The project is
pre-release; there is no installed base of `project.yaml` /
`playground.db` files to preserve. This removes the only force
that would have demanded an F3 migrator or a version bump, and —
more importantly — it lets the representation be chosen for
*cleanliness and consistency* rather than for byte-identical
back-compat. The consequence is explicit and accepted: a
`project.yaml` written before this change that contains
relationships will not load under the new format.

Freed of back-compat, the storage follows the convention the file
**already uses** for ordered column lists rather than inventing a
new one:

- `project.yaml` already writes `primary_key: [id]` (a compound PK
  is `primary_key: [a, b]`) and index `columns: [a, b]`
  (`RawIndex { columns: Vec<String> }`). The relationship endpoint
  is the lone multi-column-capable slot still using a scalar
  `column:`. It joins the house style (D5).
- The metadata columns are `TEXT`; SQLite has no array type, so a
  list lives in a text cell as JSON regardless. That JSON is now a
  *uniform* encoding (a one-element array for the single-column
  case), not a "bare-name-or-JSON, sniff which" fallback — the
  fallback only existed to keep old rows identical, which is no
  longer a goal.

So this is not a clever back-compat dodge; it is "use the existing
list convention, uniformly." No version bump, no F3 migrator.

## Decision

Support a foreign key that references a parent's **full** compound
primary key, matched **positionally** to an equal-length child
column list, with per-pair type compatibility — across both the
DSL and SQL surfaces — using format-flexible storage that needs no
migration.

### D1 — Matching policy: the full PK, in order

A compound-PK FK references **all** columns of the parent's
primary key, in PK declaration order, matched 1:1 to the child's
column list (same length). Referencing a *subset* of a compound PK
is **out of scope**: SQL/SQLite require FK parent columns to form a
PK or UNIQUE key, and a strict subset of a compound PK is not
itself unique unless separately constrained. Teaching-clean rule:
*a foreign key to a compound key names every column of that key.*

A length mismatch (child supplies N columns, parent PK has M ≠ N)
is a friendly error naming both counts.

### D2 — Type compatibility: per pair, positional

Each child column's type must satisfy
`parent_pk_col.fk_target_type() == child_col` for the
corresponding pair (the existing ADR-0011 rule, applied
element-wise in order). `check_fk_type_compat` generalises to walk
the pairs and report the **first** offending pair with the same
wording it uses today.

### D3 — DSL syntax: parenthesized column lists

`add 1:n relationship [as <name>]
   from <P>.(<a>, <b>) to <C>.(<x>, <y>)
   [on delete …] [on update …] [--create-fk]`

The single-column form `from <P>.<col> to <C>.<col>` is unchanged
(no parens) — back-compatible and the common case. The
parenthesized list is the multi-column form. Both sides must use
the same arity (enforced as a D1 length check). Parentheses mirror
the existing compound-PK *declaration* syntax (`with pk a(int),
b(int)` uses parens around the per-column type; the FK list uses
parens around the column names) and the SQL `FOREIGN KEY (…)`
shape, so the surface stays internally consistent.

### D4 — SQL syntax: extend the existing lists

`FOREIGN KEY (<x>, <y>) REFERENCES <P> (<a>, <b>)` — the grammar's
child and parent column slots become comma-separated **lists**
(today capped at one). Inline `<col> <type> REFERENCES <P>(<a>,
<b>)` stays single-child-column (one inline column can't match a
2-column key) — a compound FK uses the table-level form. Bare
table-level `FOREIGN KEY (x, y) REFERENCES <P>` (no parent
columns) **auto-expands to the parent's full PK** when the arities
match; bare inline `<col> REFERENCES <P>` on a compound-PK parent
keeps today's friendly refusal, with the message pointing at the
table-level multi-column form.

### D5 — Storage: uniform column lists, matching the house style

Both stores hold an **ordered column list**, uniformly (a
one-element list for the single-column case), following the
convention `project.yaml` already uses for `primary_key` and index
`columns`.

- **`project.yaml`**: `RawEndpoint` becomes `{ table, columns:
  Vec<String> }` and writes `columns: [a, b]` (single-column →
  `columns: [id]`), exactly parallel to `primary_key: [id]`. No
  scalar `column:` form, no dual-shape reader.
- **Metadata** (`__rdbms_playground_relationships`): no
  `CREATE TABLE` change (the `TEXT` columns and
  `PRIMARY KEY (child_table, child_column)` are untouched).
  `parent_column` / `child_column` store the list **comma-joined**
  in the same text cell (`a,b`; a single column is just its bare
  name). *As-built note:* the ADR first said "JSON array"; the
  implementation uses a comma delimiter, which is safe because
  column identifiers are `[A-Za-z0-9_]+` (no commas — `parser.rs`)
  and simpler (no `serde_json` dependency). This is an internal
  encoding detail below fork F-B — the user-visible `project.yaml`
  is still the `columns: [a, b]` list.
  The actual enforced FK lives on the rebuilt child table's DDL
  (`FOREIGN KEY (a, b) REFERENCES P(x, y)`), emitted by
  `schema_to_ddl`, exactly as the single-column FK is today via the
  rebuild-table primitive (ADR-0013) — one relationship, one undo
  step.

### D6 — In-memory model: `Vec<String>` column lists

`Command::AddRelationship`, `SqlForeignKey`, `RelationshipSchema`,
the internal `ReadForeignKey`, and `RelationshipEnd` (display) all
carry `parent_columns: Vec<String>` / `child_columns: Vec<String>`
(or `Option<Vec<String>>` for the bare-SQL parent case). A
one-element vec is the single-column case; nothing about the
single-column UX changes.

## Genuine forks (escalated for sign-off)

These are decisions, not facts. Recommendations are marked; the
user confirms before this ADR moves to Accepted.

- **F-A — matching policy.** Full PK only (D1, *recommended*) vs.
  allow a subset (needs a separate UNIQUE key; larger, less
  teaching-clean).
- **F-B — storage encoding.** Uniform column lists in the existing
  house style — `columns: [a, b]` in yaml (like `primary_key`),
  JSON-array in the unchanged metadata `TEXT` columns; no
  back-compat, no migration (D5, *recommended*) vs. a normalized
  relationship-columns child table (more "correct" but a schema
  change with joins on read, no learner-visible payoff). Premise:
  no existing projects to preserve (confirmed).
- **F-C — DSL multi-column syntax.** `from P.(a, b) to C.(x, y)`
  parenthesized (D3, *recommended*) vs. a repeated-dotted form
  (`from P.a, P.b to C.x, C.y`, more ambiguous to parse and read).
- **F-D — bare table-level SQL FK auto-expansion.** Auto-expand
  `FOREIGN KEY (x,y) REFERENCES P` to P's full PK when arities
  match (D4, *recommended*) vs. always require explicit parent
  columns.

## Implementation sketch (change sites)

Grouped; each lands behind tests. No migration step.

1. **AST** — `AddRelationship` + `SqlForeignKey` column fields →
   `Vec<String>` / `Option<Vec<String>>` (`command.rs`).
2. **Grammar** — DSL endpoint column slot → optional
   parenthesized list (`ddl.rs`); SQL child/parent column slots →
   comma lists (`sql_create_table.rs`). Builders collect lists.
3. **Metadata** — `insert_relationship_metadata` /
   `read_all_relationships` encode/decode bare-or-JSON
   (`db.rs`); no `CREATE TABLE` change.
4. **Persistence** — `RelationshipSchema` → `Vec<String>`;
   `RawEndpoint` becomes `{ table, columns: Vec<String> }`, written
   `columns: [a, b]` like `primary_key`
   (`persistence/mod.rs`, `persistence/yaml.rs`).
5. **Executor** — `do_add_relationship` /
   `resolve_create_table_fks` / `do_alter_add_foreign_key` walk
   column lists; `schema_to_ddl` emits multi-column `FOREIGN KEY
   (…) REFERENCES P(…)`; `check_fk_type_compat` loops pairs;
   bare-reference paths auto-expand to the full PK (D4) or refuse
   with the improved message; the default relationship-name
   generator (`db.rs:6850`) joins the column lists; `--create-fk`
   creates one child column per parent PK column (`db.rs`).
6. **Display** — `RelationshipEnd` → column lists; `describe`
   renders `(a, b) → (x, y)` symmetrically (outbound + inbound,
   ADR-0013) (`db.rs`, `output_render.rs`).
7. **Teaching echo (ADR-0038)** — `render_add_relationship` and
   `render_add_relationship_create_fk` (`echo.rs`) go multi-column:
   the FK line emits `FOREIGN KEY (a, b) REFERENCES P (x, y)`, and
   `--create-fk` emits **one `ADD COLUMN` line per newly-created
   child column** (each typed to the matching parent PK column's
   `fk_target_type`) before the FK line. Copy-paste contract
   (ADR-0038) holds: every echoed line is runnable advanced SQL.
8. **Tests** — parse (DSL + SQL: single-col still works; multi
   parses; arity mismatch errors; empty `()` rejected; inline
   `col REFERENCES P(a,b)` rejected with the table-level pointer);
   worker round-trip (declare a 2-col FK, rebuild, the FK is
   **enforced** — an insert violating it is refused; per-pair
   type-mismatch refused; bare-FK **auto-expand** to the parent PK;
   `--create-fk` creates both child columns); persistence
   round-trip (a single-col relationship writes `columns: [id]` and
   reads back; a 2-col writes `columns: [a, b]` and reads back;
   full save→rebuild reconstructs the FK); **undo** (add a 2-col
   relationship, undo, it is gone — one step); display
   (`describe` shows `(a, b) → (x, y)` both directions).

## Implementation-readiness notes (DA pass, 2026-06-09)

Verified against the code before build; folded in so the plan is
complete.

- **SQLite precondition holds.** A FK's parent columns must be a
  PK or a UNIQUE-indexed set. A SQLite `PRIMARY KEY (a, b)` creates
  the requisite unique index, so `FOREIGN KEY (x, y) REFERENCES
  P(a, b)` is valid against a compound PK with no extra index.
  STRICT tables do not change FK rules. (F-A's "full PK" therefore
  always targets a valid key; a subset would not be unique — the
  reason F-A excludes it.)
- **Explicit parent columns must be exactly the PK set.** Under
  F-A, `REFERENCES P(<cols>)` is accepted iff `<cols>` is the
  parent's PK column **set**; any ordering is accepted and maps
  positionally to the child list (SQLite matches the set to the
  unique index; the child↔parent pairing is positional). A
  non-PK, partial, or super-set list is refused with a friendly
  message naming the parent's actual PK (subset/UNIQUE targets are
  OOS).
- **Arity + emptiness.** Child and parent lists must be equal,
  non-zero length; a mismatch reports both counts
  ("N child column(s) but M in `P`'s key"). An empty `()` list is
  a parse error. Inline single-column `col REFERENCES P(a, b)` is
  refused (one inline column can't satisfy a 2-column key) with a
  pointer to the table-level `FOREIGN KEY (…)` form (D4).
- **DSL `from P.(a)` (single in parens)** is accepted — equivalent
  to bare `from P.a` — so the parenthesized form is uniform across
  arities; the bare form stays the idiomatic single-column
  spelling.
- **`--create-fk` is per-column.** When child columns are missing,
  one is created per parent PK column, each typed to that parent
  column's `fk_target_type` (ADR-0011) — generalising today's
  single-column behaviour; the echo mirrors this (sketch step 7).
- **Metadata identity unchanged.** `PRIMARY KEY (child_table,
  child_column)` still holds with the JSON-array string as the
  key — so a child column **set** still participates in at most one
  relationship (pre-existing behaviour, now per-set). Distinct
  sets on the same child table are distinct keys.
- **Auto-name generation** (`db.rs:6850`, the `[as <name>]`-less
  default) is single-column today
  (`{parent_table}_{parent_column}_to_{child_table}_{child_column}`)
  — it must join the column lists (e.g.
  `Orders_a_b_to_Customers_x_y`). A found change site the first
  sketch missed; added to the executor step.
- **Undo / batch unchanged.** One `add 1:n relationship` is one
  rebuild = one undo step (ADR-0013/0006), independent of arity.

## Consequences

- T3 closes; a learner can model a real composite-key relationship
  end to end.
- No migration, and the on-disk representation gets *more*
  consistent: the relationship endpoint joins the `primary_key:
  [...]` / index `columns: [...]` list convention. The in-app
  single-column UX is untouched (one-element vecs).
- Accepted trade-off (user, 2026-06-09): a `project.yaml` written
  before this change that contains relationships will not load
  under the new format. There is no installed base to preserve, so
  this is a clean cutover, not data loss.
- The relationship model becomes list-based throughout, which is
  the natural foundation if subset/UNIQUE-targeted FKs are ever
  wanted (explicitly OOS here).
- A modest, broad refactor (the `Vec` field change ripples through
  the 6 layers) — methodical, not deep; locked by tests at each
  layer.

## Out of scope

- Subset/non-PK FK targets (referencing a UNIQUE key that isn't
  the PK) — possible later on this list-based foundation.
- Any change to single-column behaviour, the rebuild-table
  primitive, or the undo model (one relationship = one undo step
  stands).
- A `project.yaml` version bump or F3 migrator (not needed —
  no installed base to migrate; clean cutover per D5).