Files
rdbms-playground/docs/adr/0043-compound-pk-foreign-key-references.md
T
claude@clouddev1 b14f0199e9 refactor: relationship model to column lists for compound FK (ADR-0043)
Move the FK column fields String->Vec<String> through all six
layers (AddRelationship/SqlForeignKey AST, RelationshipSchema,
metadata, project.yaml, ReadForeignKey, RelationshipEnd). Metadata
stores comma-joined lists in the existing TEXT cells; project.yaml
endpoints now columns: [a, b] (house style). Executor logic is
multi-column ready: resolve_fk_parent_columns (full-PK F-A +
auto-expand F-D), per-pair type-compat, schema_to_ddl multi-column
emission, pragma FK read grouped by id, auto-name + --create-fk
per-column, multi-column teaching echo. Single-column behaviour
preserved (one-element vecs); all 2181 tests green. The grammar to
parse multi-column input lands next.
2026-06-09 18:25:40 +00:00

331 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0043: Compound-primary-key foreign-key references (T3)
## Status
**Accepted** — 2026-06-09. All four genuine forks confirmed by the
user at the recommended option: **F-A** full PK in order, **F-B**
house-style uniform column lists (no migration; back-compat not
required), **F-C** parenthesized DSL lists, **F-D** bare table-level
SQL FK auto-expands to the parent's full PK. Closes the one open
leg of
`requirements.md` **T3** ("compound primary keys handled
end-to-end (DSL, storage, display, **FK reference**)"): a foreign
key that *references* a compound (multi-column) primary key.
Cross-references **ADR-0011** (FK column type compatibility —
`Type::fk_target_type`), **ADR-0013** (relationships, naming, the
rebuild-table strategy, and the `__rdbms_playground_relationships`
metadata table), **ADR-0035 §4b** (the SQL `FOREIGN KEY` surface),
**ADR-0004 / ADR-0015** (`project.yaml` as the authoritative
format; `playground.db` is a derived artifact), and **ADR-0009**
(DSL surface conventions).
## Context
Compound PRIMARY KEYs are declared, stored, and displayed today
(`create table T with pk a(int), b(int)``primary_key:
Vec<String>`). The missing leg is the *reference*: a child table
whose foreign key points at a parent's compound PK. A 2026-06-09
codebase audit found single-column FK is a pervasive assumption —
~1520 sites across 6+ files:
- **Metadata** — `__rdbms_playground_relationships` stores scalar
`parent_column TEXT` / `child_column TEXT`
(`PRIMARY KEY (child_table, child_column)`).
- **Persistence** — `RelationshipSchema { parent_column: String,
child_column: String }`; `project.yaml` `RawEndpoint { table,
column }`.
- **Grammar** — `add 1:n relationship … from <P>.<col> to
<C>.<col>` (one ident per side); SQL `FOREIGN KEY (<col>)
REFERENCES <P>(<col>)` (parens that hold exactly one ident).
- **AST** — `Command::AddRelationship { parent_column: String,
child_column: String }`; `SqlForeignKey { child_column: String,
parent_column: Option<String> }`.
- **Executor** — `schema_to_ddl` emits a single-column
`FOREIGN KEY (c) REFERENCES P(p)`; `check_fk_type_compat`
compares one parent type to one child type; bare
`REFERENCES <P>` on a compound-PK parent is refused as
ambiguous (`resolve_create_table_fks`,
`do_alter_add_foreign_key`).
- **Display** — `RelationshipEnd { other_column: String,
local_column: String }`.
This is not a sweep-sized change, which is why it earns an ADR
rather than an inline build. The decisions below also turn the
audit's worst-case framing (a metadata-schema + yaml-format
migration via the F3 framework) into a **no-migration** change.
### Why no migration is needed
**Decision input (user, 2026-06-09): back-compatibility with
existing saved projects is not required.** The project is
pre-release; there is no installed base of `project.yaml` /
`playground.db` files to preserve. This removes the only force
that would have demanded an F3 migrator or a version bump, and —
more importantly — it lets the representation be chosen for
*cleanliness and consistency* rather than for byte-identical
back-compat. The consequence is explicit and accepted: a
`project.yaml` written before this change that contains
relationships will not load under the new format.
Freed of back-compat, the storage follows the convention the file
**already uses** for ordered column lists rather than inventing a
new one:
- `project.yaml` already writes `primary_key: [id]` (a compound PK
is `primary_key: [a, b]`) and index `columns: [a, b]`
(`RawIndex { columns: Vec<String> }`). The relationship endpoint
is the lone multi-column-capable slot still using a scalar
`column:`. It joins the house style (D5).
- The metadata columns are `TEXT`; SQLite has no array type, so a
list lives in a text cell as JSON regardless. That JSON is now a
*uniform* encoding (a one-element array for the single-column
case), not a "bare-name-or-JSON, sniff which" fallback — the
fallback only existed to keep old rows identical, which is no
longer a goal.
So this is not a clever back-compat dodge; it is "use the existing
list convention, uniformly." No version bump, no F3 migrator.
## Decision
Support a foreign key that references a parent's **full** compound
primary key, matched **positionally** to an equal-length child
column list, with per-pair type compatibility — across both the
DSL and SQL surfaces — using format-flexible storage that needs no
migration.
### D1 — Matching policy: the full PK, in order
A compound-PK FK references **all** columns of the parent's
primary key, in PK declaration order, matched 1:1 to the child's
column list (same length). Referencing a *subset* of a compound PK
is **out of scope**: SQL/SQLite require FK parent columns to form a
PK or UNIQUE key, and a strict subset of a compound PK is not
itself unique unless separately constrained. Teaching-clean rule:
*a foreign key to a compound key names every column of that key.*
A length mismatch (child supplies N columns, parent PK has M ≠ N)
is a friendly error naming both counts.
### D2 — Type compatibility: per pair, positional
Each child column's type must satisfy
`parent_pk_col.fk_target_type() == child_col` for the
corresponding pair (the existing ADR-0011 rule, applied
element-wise in order). `check_fk_type_compat` generalises to walk
the pairs and report the **first** offending pair with the same
wording it uses today.
### D3 — DSL syntax: parenthesized column lists
`add 1:n relationship [as <name>]
from <P>.(<a>, <b>) to <C>.(<x>, <y>)
[on delete …] [on update …] [--create-fk]`
The single-column form `from <P>.<col> to <C>.<col>` is unchanged
(no parens) — back-compatible and the common case. The
parenthesized list is the multi-column form. Both sides must use
the same arity (enforced as a D1 length check). Parentheses mirror
the existing compound-PK *declaration* syntax (`with pk a(int),
b(int)` uses parens around the per-column type; the FK list uses
parens around the column names) and the SQL `FOREIGN KEY (…)`
shape, so the surface stays internally consistent.
### D4 — SQL syntax: extend the existing lists
`FOREIGN KEY (<x>, <y>) REFERENCES <P> (<a>, <b>)` — the grammar's
child and parent column slots become comma-separated **lists**
(today capped at one). Inline `<col> <type> REFERENCES <P>(<a>,
<b>)` stays single-child-column (one inline column can't match a
2-column key) — a compound FK uses the table-level form. Bare
table-level `FOREIGN KEY (x, y) REFERENCES <P>` (no parent
columns) **auto-expands to the parent's full PK** when the arities
match; bare inline `<col> REFERENCES <P>` on a compound-PK parent
keeps today's friendly refusal, with the message pointing at the
table-level multi-column form.
### D5 — Storage: uniform column lists, matching the house style
Both stores hold an **ordered column list**, uniformly (a
one-element list for the single-column case), following the
convention `project.yaml` already uses for `primary_key` and index
`columns`.
- **`project.yaml`**: `RawEndpoint` becomes `{ table, columns:
Vec<String> }` and writes `columns: [a, b]` (single-column →
`columns: [id]`), exactly parallel to `primary_key: [id]`. No
scalar `column:` form, no dual-shape reader.
- **Metadata** (`__rdbms_playground_relationships`): no
`CREATE TABLE` change (the `TEXT` columns and
`PRIMARY KEY (child_table, child_column)` are untouched).
`parent_column` / `child_column` store the list **comma-joined**
in the same text cell (`a,b`; a single column is just its bare
name). *As-built note:* the ADR first said "JSON array"; the
implementation uses a comma delimiter, which is safe because
column identifiers are `[A-Za-z0-9_]+` (no commas — `parser.rs`)
and simpler (no `serde_json` dependency). This is an internal
encoding detail below fork F-B — the user-visible `project.yaml`
is still the `columns: [a, b]` list.
The actual enforced FK lives on the rebuilt child table's DDL
(`FOREIGN KEY (a, b) REFERENCES P(x, y)`), emitted by
`schema_to_ddl`, exactly as the single-column FK is today via the
rebuild-table primitive (ADR-0013) — one relationship, one undo
step.
### D6 — In-memory model: `Vec<String>` column lists
`Command::AddRelationship`, `SqlForeignKey`, `RelationshipSchema`,
the internal `ReadForeignKey`, and `RelationshipEnd` (display) all
carry `parent_columns: Vec<String>` / `child_columns: Vec<String>`
(or `Option<Vec<String>>` for the bare-SQL parent case). A
one-element vec is the single-column case; nothing about the
single-column UX changes.
## Genuine forks (escalated for sign-off)
These are decisions, not facts. Recommendations are marked; the
user confirms before this ADR moves to Accepted.
- **F-A — matching policy.** Full PK only (D1, *recommended*) vs.
allow a subset (needs a separate UNIQUE key; larger, less
teaching-clean).
- **F-B — storage encoding.** Uniform column lists in the existing
house style — `columns: [a, b]` in yaml (like `primary_key`),
JSON-array in the unchanged metadata `TEXT` columns; no
back-compat, no migration (D5, *recommended*) vs. a normalized
relationship-columns child table (more "correct" but a schema
change with joins on read, no learner-visible payoff). Premise:
no existing projects to preserve (confirmed).
- **F-C — DSL multi-column syntax.** `from P.(a, b) to C.(x, y)`
parenthesized (D3, *recommended*) vs. a repeated-dotted form
(`from P.a, P.b to C.x, C.y`, more ambiguous to parse and read).
- **F-D — bare table-level SQL FK auto-expansion.** Auto-expand
`FOREIGN KEY (x,y) REFERENCES P` to P's full PK when arities
match (D4, *recommended*) vs. always require explicit parent
columns.
## Implementation sketch (change sites)
Grouped; each lands behind tests. No migration step.
1. **AST** — `AddRelationship` + `SqlForeignKey` column fields →
`Vec<String>` / `Option<Vec<String>>` (`command.rs`).
2. **Grammar** — DSL endpoint column slot → optional
parenthesized list (`ddl.rs`); SQL child/parent column slots →
comma lists (`sql_create_table.rs`). Builders collect lists.
3. **Metadata** — `insert_relationship_metadata` /
`read_all_relationships` encode/decode bare-or-JSON
(`db.rs`); no `CREATE TABLE` change.
4. **Persistence** — `RelationshipSchema` → `Vec<String>`;
`RawEndpoint` becomes `{ table, columns: Vec<String> }`, written
`columns: [a, b]` like `primary_key`
(`persistence/mod.rs`, `persistence/yaml.rs`).
5. **Executor** — `do_add_relationship` /
`resolve_create_table_fks` / `do_alter_add_foreign_key` walk
column lists; `schema_to_ddl` emits multi-column `FOREIGN KEY
(…) REFERENCES P(…)`; `check_fk_type_compat` loops pairs;
bare-reference paths auto-expand to the full PK (D4) or refuse
with the improved message; the default relationship-name
generator (`db.rs:6850`) joins the column lists; `--create-fk`
creates one child column per parent PK column (`db.rs`).
6. **Display** — `RelationshipEnd` → column lists; `describe`
renders `(a, b) → (x, y)` symmetrically (outbound + inbound,
ADR-0013) (`db.rs`, `output_render.rs`).
7. **Teaching echo (ADR-0038)** — `render_add_relationship` and
`render_add_relationship_create_fk` (`echo.rs`) go multi-column:
the FK line emits `FOREIGN KEY (a, b) REFERENCES P (x, y)`, and
`--create-fk` emits **one `ADD COLUMN` line per newly-created
child column** (each typed to the matching parent PK column's
`fk_target_type`) before the FK line. Copy-paste contract
(ADR-0038) holds: every echoed line is runnable advanced SQL.
8. **Tests** — parse (DSL + SQL: single-col still works; multi
parses; arity mismatch errors; empty `()` rejected; inline
`col REFERENCES P(a,b)` rejected with the table-level pointer);
worker round-trip (declare a 2-col FK, rebuild, the FK is
**enforced** — an insert violating it is refused; per-pair
type-mismatch refused; bare-FK **auto-expand** to the parent PK;
`--create-fk` creates both child columns); persistence
round-trip (a single-col relationship writes `columns: [id]` and
reads back; a 2-col writes `columns: [a, b]` and reads back;
full save→rebuild reconstructs the FK); **undo** (add a 2-col
relationship, undo, it is gone — one step); display
(`describe` shows `(a, b) → (x, y)` both directions).
## Implementation-readiness notes (DA pass, 2026-06-09)
Verified against the code before build; folded in so the plan is
complete.
- **SQLite precondition holds.** A FK's parent columns must be a
PK or a UNIQUE-indexed set. A SQLite `PRIMARY KEY (a, b)` creates
the requisite unique index, so `FOREIGN KEY (x, y) REFERENCES
P(a, b)` is valid against a compound PK with no extra index.
STRICT tables do not change FK rules. (F-A's "full PK" therefore
always targets a valid key; a subset would not be unique — the
reason F-A excludes it.)
- **Explicit parent columns must be exactly the PK set.** Under
F-A, `REFERENCES P(<cols>)` is accepted iff `<cols>` is the
parent's PK column **set**; any ordering is accepted and maps
positionally to the child list (SQLite matches the set to the
unique index; the child↔parent pairing is positional). A
non-PK, partial, or super-set list is refused with a friendly
message naming the parent's actual PK (subset/UNIQUE targets are
OOS).
- **Arity + emptiness.** Child and parent lists must be equal,
non-zero length; a mismatch reports both counts
("N child column(s) but M in `P`'s key"). An empty `()` list is
a parse error. Inline single-column `col REFERENCES P(a, b)` is
refused (one inline column can't satisfy a 2-column key) with a
pointer to the table-level `FOREIGN KEY (…)` form (D4).
- **DSL `from P.(a)` (single in parens)** is accepted — equivalent
to bare `from P.a` — so the parenthesized form is uniform across
arities; the bare form stays the idiomatic single-column
spelling.
- **`--create-fk` is per-column.** When child columns are missing,
one is created per parent PK column, each typed to that parent
column's `fk_target_type` (ADR-0011) — generalising today's
single-column behaviour; the echo mirrors this (sketch step 7).
- **Metadata identity unchanged.** `PRIMARY KEY (child_table,
child_column)` still holds with the JSON-array string as the
key — so a child column **set** still participates in at most one
relationship (pre-existing behaviour, now per-set). Distinct
sets on the same child table are distinct keys.
- **Auto-name generation** (`db.rs:6850`, the `[as <name>]`-less
default) is single-column today
(`{parent_table}_{parent_column}_to_{child_table}_{child_column}`)
— it must join the column lists (e.g.
`Orders_a_b_to_Customers_x_y`). A found change site the first
sketch missed; added to the executor step.
- **Undo / batch unchanged.** One `add 1:n relationship` is one
rebuild = one undo step (ADR-0013/0006), independent of arity.
## Consequences
- T3 closes; a learner can model a real composite-key relationship
end to end.
- No migration, and the on-disk representation gets *more*
consistent: the relationship endpoint joins the `primary_key:
[...]` / index `columns: [...]` list convention. The in-app
single-column UX is untouched (one-element vecs).
- Accepted trade-off (user, 2026-06-09): a `project.yaml` written
before this change that contains relationships will not load
under the new format. There is no installed base to preserve, so
this is a clean cutover, not data loss.
- The relationship model becomes list-based throughout, which is
the natural foundation if subset/UNIQUE-targeted FKs are ever
wanted (explicitly OOS here).
- A modest, broad refactor (the `Vec` field change ripples through
the 6 layers) — methodical, not deep; locked by tests at each
layer.
## Out of scope
- Subset/non-PK FK targets (referencing a UNIQUE key that isn't
the PK) — possible later on this list-based foundation.
- Any change to single-column behaviour, the rebuild-table
primitive, or the undo model (one relationship = one undo step
stands).
- A `project.yaml` version bump or F3 migrator (not needed —
no installed base to migrate; clean cutover per D5).