docs: ADR-0032 Amendment 1 — empirical scope of column-origin metadata

§12 was written conservatively, classifying projection items
structurally and listing "subquery expressions" alongside
arithmetic / CASE as cases that stay None. The Phase-2 plan's
Open Question 1 captured the matching uncertainty about CTEs
and scalar subqueries.

A throwaway probe against the pinned bundled SQLite +
rusqlite 0.39.0 (with the `column_metadata` feature) settles
the question across 20 representative query shapes. The
engine's column_table_name / column_origin_name metadata
follows through non-recursive CTEs (SELECT *, bare-ref,
qualified-ref, and (col-list)-renamed bodies; CTE chains),
scalar subqueries (aliased and unaliased), derived tables
(out of scope per §13 OOS-1 but useful to note), all four
set ops, multi-table JOIN projections, and IN-subquery
WHERE clauses (the inner subquery does not affect the
outer projection's origin).

The structural-None classes reduce to computed projections
(function calls, arithmetic, CASE, literals, wildcards —
expected and pedagogically obvious) and recursive CTE result
columns (the one structural surprise — the recursive
temporary table has no base-column origin to point at).

Amendment 1 supersedes §12's "Resolution rule" with a simpler
engine-driven rule: trust column_table_name(i) /
column_origin_name(i) verbatim, with no grammar-side
structural classification. The speculative MatchedPath-walk
fallback is moot. The Phase-2 plan's sub-phase 2f exit gate
gains explicit positive assertions for CTE pass-through and
scalar-subquery type recovery, and a new explicit negative
assertion for the recursive-CTE limitation.

README.md index entry extended in the same style as ADR-0027's
Amendment-1 line. Closes Plan §Open-1.
This commit is contained in:
claude@clouddev1
2026-05-20 11:04:48 +00:00
parent 3db292c795
commit e032f01b2d
2 changed files with 106 additions and 1 deletions
+105
View File
@@ -1285,6 +1285,111 @@ allowlist, quoted identifiers) remain tracked separately and are
authored if and when they are taken up; they are not implicit
follow-ups of Phase 2.
## Amendment 1 — Empirical scope of column-origin metadata (2026-05-20)
§12 was written conservatively: it constrained type recovery to
projection items "structurally a single column reference" and
listed "subquery expressions" alongside arithmetic and `CASE` as
cases that stay `None`. The implementation plan's Open Question 1
(`docs/plans/20260520-adr-0032-phase-2.md`) captured the matching
uncertainty about CTEs and scalar subqueries, leaving the test in
sub-phase 2f to "assert the actual behaviour (not the wished-for
behaviour)".
A throwaway probe against the pinned bundled SQLite (run
2026-05-20, with `rusqlite` 0.39.0 + `column_metadata`) settles
the question. Across twenty representative query shapes, the
engine's `sqlite3_column_table_name` / `sqlite3_column_origin_name`
metadata follows through:
- direct bare column refs (the baseline);
- `AS alias` projections (the alias remaps the output name but
the origin pair stays the source `(table, column)`);
- table-alias qualified refs (`u.name` → `(users, name)`);
- non-recursive CTEs, including `SELECT *` bodies, bare-ref
bodies, qualified-ref bodies, and `(col-list)`-renamed
bodies (the rename remaps the output name; origin stays the
underlying column);
- CTE chains (a CTE that selects from a prior CTE — origin
traces back to the base table);
- derived tables in `FROM (SELECT …) AS sub` (out-of-scope for
Phase 2 per §13 OOS-1, but useful to note: if ever admitted,
type recovery comes for free);
- scalar subqueries used as a projection primary (`SELECT
(SELECT name FROM users WHERE id = 1)` — origin is preserved
whether the subquery has an outer alias or not);
- `UNION` / `UNION ALL` / `INTERSECT` / `EXCEPT` compound
queries (result columns carry the first leg's origin);
- multi-table `JOIN` projections (per-column origin per leg);
- `IN (SELECT …)` subqueries in `WHERE` (the inner subquery
does not affect the outer projection's origin).
The metadata returns `None` for exactly two structural classes:
- **Computed projections** — function calls, arithmetic
expressions, string concatenation, `CASE` expressions,
literals, the `*` and `t.*` wildcards. Expected; pedagogically
obvious; no surprise for the learner.
- **Recursive CTE result columns** (`WITH RECURSIVE r(n) AS
(SELECT 1 UNION ALL SELECT n + 1 FROM r WHERE n < 5) SELECT n
FROM r`). The recursion materialises through an internal
temporary table that has no base-column origin to point at.
This is the one structural surprise — a recursive-CTE result
column is typeless even when it is structurally a bare name
reference, because the engine cannot trace the column back
past the recursion.
### What §12's resolution rule becomes
The original §12 rule classifies projection items structurally
(unqualified ident / qualified ref → recover; everything else →
None). The empirical finding makes that classification redundant
and slightly wrong: it misses scalar subqueries and CTE-routed
refs that the engine does carry through, and it would have
needed extending for `(col-list)`-renamed CTEs.
The amended posture: **trust the engine's column-origin metadata
verbatim**. For each result column, call
`column_table_name(i)` / `column_origin_name(i)`. If both return
`Some`, look the pair up in the active `SchemaCache` and use the
playground type. If either is `None`, the slot stays `None` and
the renderer falls back to neutral alignment. No structural
classification of the projection item is needed; the grammar tier
stays uninvolved (preserving ADR-0031 §2's "no AST" decision and
ADR-0030's "one source of truth" rule, both as before).
The "structurally a single column reference" definition in §12's
**Resolution rule** is superseded by the engine-driven rule
above. The §12 **Implementation seam** is unchanged in approach
(engine-side column-origin lookup is still the mechanism), but
the speculative fallback paragraph ("If exposure turns out to be
awkward, the fallback is a small post-parse walk over the
projection-item subtrees in the `MatchedPath`") is moot — the
exposure works, and the engine's metadata is broader than a
grammar-side walk could be without re-implementing SQLite's
query-planner traceback. The fallback path is removed.
### Effect on the Phase-2 plan's sub-phase 2f
The 2f exit gate's "CTE pass-through" row should be asserted
positive (recovers `Some(text)`). The "Subquery result" row,
which the plan left as "assert whichever behaviour the engine
exhibits", should be asserted positive as well. A new explicit
2f test row covers the named limitation: a recursive CTE result
column must produce `column_types[0] = None` and the renderer
must fall back to neutral alignment without panicking.
The catalog and grammar-side work in 2a2e is unaffected by this
amendment. Only 2f's test list and the worker's
`resolve_select_column_types` helper change shape (the helper
becomes simpler — no structural classification, just a direct
metadata lookup per result column).
This amendment narrows the honest limitation in §12 from
"computed / non-direct projection items" to "computed projections
and recursive CTE result columns" — a tighter, factually
verified carve-out.
## See also
- ADR-0005 — the ten-type vocabulary §10 resolves back to.