docs: ADR-0032 Amendment 1 — empirical scope of column-origin metadata
§12 was written conservatively, classifying projection items structurally and listing "subquery expressions" alongside arithmetic / CASE as cases that stay None. The Phase-2 plan's Open Question 1 captured the matching uncertainty about CTEs and scalar subqueries. A throwaway probe against the pinned bundled SQLite + rusqlite 0.39.0 (with the `column_metadata` feature) settles the question across 20 representative query shapes. The engine's column_table_name / column_origin_name metadata follows through non-recursive CTEs (SELECT *, bare-ref, qualified-ref, and (col-list)-renamed bodies; CTE chains), scalar subqueries (aliased and unaliased), derived tables (out of scope per §13 OOS-1 but useful to note), all four set ops, multi-table JOIN projections, and IN-subquery WHERE clauses (the inner subquery does not affect the outer projection's origin). The structural-None classes reduce to computed projections (function calls, arithmetic, CASE, literals, wildcards — expected and pedagogically obvious) and recursive CTE result columns (the one structural surprise — the recursive temporary table has no base-column origin to point at). Amendment 1 supersedes §12's "Resolution rule" with a simpler engine-driven rule: trust column_table_name(i) / column_origin_name(i) verbatim, with no grammar-side structural classification. The speculative MatchedPath-walk fallback is moot. The Phase-2 plan's sub-phase 2f exit gate gains explicit positive assertions for CTE pass-through and scalar-subquery type recovery, and a new explicit negative assertion for the recursive-CTE limitation. README.md index entry extended in the same style as ADR-0027's Amendment-1 line. Closes Plan §Open-1.
This commit is contained in:
@@ -1285,6 +1285,111 @@ allowlist, quoted identifiers) remain tracked separately and are
|
||||
authored if and when they are taken up; they are not implicit
|
||||
follow-ups of Phase 2.
|
||||
|
||||
## Amendment 1 — Empirical scope of column-origin metadata (2026-05-20)
|
||||
|
||||
§12 was written conservatively: it constrained type recovery to
|
||||
projection items "structurally a single column reference" and
|
||||
listed "subquery expressions" alongside arithmetic and `CASE` as
|
||||
cases that stay `None`. The implementation plan's Open Question 1
|
||||
(`docs/plans/20260520-adr-0032-phase-2.md`) captured the matching
|
||||
uncertainty about CTEs and scalar subqueries, leaving the test in
|
||||
sub-phase 2f to "assert the actual behaviour (not the wished-for
|
||||
behaviour)".
|
||||
|
||||
A throwaway probe against the pinned bundled SQLite (run
|
||||
2026-05-20, with `rusqlite` 0.39.0 + `column_metadata`) settles
|
||||
the question. Across twenty representative query shapes, the
|
||||
engine's `sqlite3_column_table_name` / `sqlite3_column_origin_name`
|
||||
metadata follows through:
|
||||
|
||||
- direct bare column refs (the baseline);
|
||||
- `AS alias` projections (the alias remaps the output name but
|
||||
the origin pair stays the source `(table, column)`);
|
||||
- table-alias qualified refs (`u.name` → `(users, name)`);
|
||||
- non-recursive CTEs, including `SELECT *` bodies, bare-ref
|
||||
bodies, qualified-ref bodies, and `(col-list)`-renamed
|
||||
bodies (the rename remaps the output name; origin stays the
|
||||
underlying column);
|
||||
- CTE chains (a CTE that selects from a prior CTE — origin
|
||||
traces back to the base table);
|
||||
- derived tables in `FROM (SELECT …) AS sub` (out-of-scope for
|
||||
Phase 2 per §13 OOS-1, but useful to note: if ever admitted,
|
||||
type recovery comes for free);
|
||||
- scalar subqueries used as a projection primary (`SELECT
|
||||
(SELECT name FROM users WHERE id = 1)` — origin is preserved
|
||||
whether the subquery has an outer alias or not);
|
||||
- `UNION` / `UNION ALL` / `INTERSECT` / `EXCEPT` compound
|
||||
queries (result columns carry the first leg's origin);
|
||||
- multi-table `JOIN` projections (per-column origin per leg);
|
||||
- `IN (SELECT …)` subqueries in `WHERE` (the inner subquery
|
||||
does not affect the outer projection's origin).
|
||||
|
||||
The metadata returns `None` for exactly two structural classes:
|
||||
|
||||
- **Computed projections** — function calls, arithmetic
|
||||
expressions, string concatenation, `CASE` expressions,
|
||||
literals, the `*` and `t.*` wildcards. Expected; pedagogically
|
||||
obvious; no surprise for the learner.
|
||||
- **Recursive CTE result columns** (`WITH RECURSIVE r(n) AS
|
||||
(SELECT 1 UNION ALL SELECT n + 1 FROM r WHERE n < 5) SELECT n
|
||||
FROM r`). The recursion materialises through an internal
|
||||
temporary table that has no base-column origin to point at.
|
||||
This is the one structural surprise — a recursive-CTE result
|
||||
column is typeless even when it is structurally a bare name
|
||||
reference, because the engine cannot trace the column back
|
||||
past the recursion.
|
||||
|
||||
### What §12's resolution rule becomes
|
||||
|
||||
The original §12 rule classifies projection items structurally
|
||||
(unqualified ident / qualified ref → recover; everything else →
|
||||
None). The empirical finding makes that classification redundant
|
||||
and slightly wrong: it misses scalar subqueries and CTE-routed
|
||||
refs that the engine does carry through, and it would have
|
||||
needed extending for `(col-list)`-renamed CTEs.
|
||||
|
||||
The amended posture: **trust the engine's column-origin metadata
|
||||
verbatim**. For each result column, call
|
||||
`column_table_name(i)` / `column_origin_name(i)`. If both return
|
||||
`Some`, look the pair up in the active `SchemaCache` and use the
|
||||
playground type. If either is `None`, the slot stays `None` and
|
||||
the renderer falls back to neutral alignment. No structural
|
||||
classification of the projection item is needed; the grammar tier
|
||||
stays uninvolved (preserving ADR-0031 §2's "no AST" decision and
|
||||
ADR-0030's "one source of truth" rule, both as before).
|
||||
|
||||
The "structurally a single column reference" definition in §12's
|
||||
**Resolution rule** is superseded by the engine-driven rule
|
||||
above. The §12 **Implementation seam** is unchanged in approach
|
||||
(engine-side column-origin lookup is still the mechanism), but
|
||||
the speculative fallback paragraph ("If exposure turns out to be
|
||||
awkward, the fallback is a small post-parse walk over the
|
||||
projection-item subtrees in the `MatchedPath`") is moot — the
|
||||
exposure works, and the engine's metadata is broader than a
|
||||
grammar-side walk could be without re-implementing SQLite's
|
||||
query-planner traceback. The fallback path is removed.
|
||||
|
||||
### Effect on the Phase-2 plan's sub-phase 2f
|
||||
|
||||
The 2f exit gate's "CTE pass-through" row should be asserted
|
||||
positive (recovers `Some(text)`). The "Subquery result" row,
|
||||
which the plan left as "assert whichever behaviour the engine
|
||||
exhibits", should be asserted positive as well. A new explicit
|
||||
2f test row covers the named limitation: a recursive CTE result
|
||||
column must produce `column_types[0] = None` and the renderer
|
||||
must fall back to neutral alignment without panicking.
|
||||
|
||||
The catalog and grammar-side work in 2a–2e is unaffected by this
|
||||
amendment. Only 2f's test list and the worker's
|
||||
`resolve_select_column_types` helper change shape (the helper
|
||||
becomes simpler — no structural classification, just a direct
|
||||
metadata lookup per result column).
|
||||
|
||||
This amendment narrows the honest limitation in §12 from
|
||||
"computed / non-direct projection items" to "computed projections
|
||||
and recursive CTE result columns" — a tighter, factually
|
||||
verified carve-out.
|
||||
|
||||
## See also
|
||||
|
||||
- ADR-0005 — the ten-type vocabulary §10 resolves back to.
|
||||
|
||||
+1
-1
@@ -37,4 +37,4 @@ This directory contains the project's ADRs, recorded per
|
||||
- [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`)
|
||||
- [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`)
|
||||
- [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2
|
||||
- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot
|
||||
- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless
|
||||
|
||||
Reference in New Issue
Block a user