diff --git a/docs/adr/0032-sql-select-grammar.md b/docs/adr/0032-sql-select-grammar.md index 74fc265..fd511eb 100644 --- a/docs/adr/0032-sql-select-grammar.md +++ b/docs/adr/0032-sql-select-grammar.md @@ -1285,6 +1285,111 @@ allowlist, quoted identifiers) remain tracked separately and are authored if and when they are taken up; they are not implicit follow-ups of Phase 2. +## Amendment 1 — Empirical scope of column-origin metadata (2026-05-20) + +§12 was written conservatively: it constrained type recovery to +projection items "structurally a single column reference" and +listed "subquery expressions" alongside arithmetic and `CASE` as +cases that stay `None`. The implementation plan's Open Question 1 +(`docs/plans/20260520-adr-0032-phase-2.md`) captured the matching +uncertainty about CTEs and scalar subqueries, leaving the test in +sub-phase 2f to "assert the actual behaviour (not the wished-for +behaviour)". + +A throwaway probe against the pinned bundled SQLite (run +2026-05-20, with `rusqlite` 0.39.0 + `column_metadata`) settles +the question. Across twenty representative query shapes, the +engine's `sqlite3_column_table_name` / `sqlite3_column_origin_name` +metadata follows through: + +- direct bare column refs (the baseline); +- `AS alias` projections (the alias remaps the output name but + the origin pair stays the source `(table, column)`); +- table-alias qualified refs (`u.name` → `(users, name)`); +- non-recursive CTEs, including `SELECT *` bodies, bare-ref + bodies, qualified-ref bodies, and `(col-list)`-renamed + bodies (the rename remaps the output name; origin stays the + underlying column); +- CTE chains (a CTE that selects from a prior CTE — origin + traces back to the base table); +- derived tables in `FROM (SELECT …) AS sub` (out-of-scope for + Phase 2 per §13 OOS-1, but useful to note: if ever admitted, + type recovery comes for free); +- scalar subqueries used as a projection primary (`SELECT + (SELECT name FROM users WHERE id = 1)` — origin is preserved + whether the subquery has an outer alias or not); +- `UNION` / `UNION ALL` / `INTERSECT` / `EXCEPT` compound + queries (result columns carry the first leg's origin); +- multi-table `JOIN` projections (per-column origin per leg); +- `IN (SELECT …)` subqueries in `WHERE` (the inner subquery + does not affect the outer projection's origin). + +The metadata returns `None` for exactly two structural classes: + +- **Computed projections** — function calls, arithmetic + expressions, string concatenation, `CASE` expressions, + literals, the `*` and `t.*` wildcards. Expected; pedagogically + obvious; no surprise for the learner. +- **Recursive CTE result columns** (`WITH RECURSIVE r(n) AS + (SELECT 1 UNION ALL SELECT n + 1 FROM r WHERE n < 5) SELECT n + FROM r`). The recursion materialises through an internal + temporary table that has no base-column origin to point at. + This is the one structural surprise — a recursive-CTE result + column is typeless even when it is structurally a bare name + reference, because the engine cannot trace the column back + past the recursion. + +### What §12's resolution rule becomes + +The original §12 rule classifies projection items structurally +(unqualified ident / qualified ref → recover; everything else → +None). The empirical finding makes that classification redundant +and slightly wrong: it misses scalar subqueries and CTE-routed +refs that the engine does carry through, and it would have +needed extending for `(col-list)`-renamed CTEs. + +The amended posture: **trust the engine's column-origin metadata +verbatim**. For each result column, call +`column_table_name(i)` / `column_origin_name(i)`. If both return +`Some`, look the pair up in the active `SchemaCache` and use the +playground type. If either is `None`, the slot stays `None` and +the renderer falls back to neutral alignment. No structural +classification of the projection item is needed; the grammar tier +stays uninvolved (preserving ADR-0031 §2's "no AST" decision and +ADR-0030's "one source of truth" rule, both as before). + +The "structurally a single column reference" definition in §12's +**Resolution rule** is superseded by the engine-driven rule +above. The §12 **Implementation seam** is unchanged in approach +(engine-side column-origin lookup is still the mechanism), but +the speculative fallback paragraph ("If exposure turns out to be +awkward, the fallback is a small post-parse walk over the +projection-item subtrees in the `MatchedPath`") is moot — the +exposure works, and the engine's metadata is broader than a +grammar-side walk could be without re-implementing SQLite's +query-planner traceback. The fallback path is removed. + +### Effect on the Phase-2 plan's sub-phase 2f + +The 2f exit gate's "CTE pass-through" row should be asserted +positive (recovers `Some(text)`). The "Subquery result" row, +which the plan left as "assert whichever behaviour the engine +exhibits", should be asserted positive as well. A new explicit +2f test row covers the named limitation: a recursive CTE result +column must produce `column_types[0] = None` and the renderer +must fall back to neutral alignment without panicking. + +The catalog and grammar-side work in 2a–2e is unaffected by this +amendment. Only 2f's test list and the worker's +`resolve_select_column_types` helper change shape (the helper +becomes simpler — no structural classification, just a direct +metadata lookup per result column). + +This amendment narrows the honest limitation in §12 from +"computed / non-direct projection items" to "computed projections +and recursive CTE result columns" — a tighter, factually +verified carve-out. + ## See also - ADR-0005 — the ten-type vocabulary §10 resolves back to. diff --git a/docs/adr/README.md b/docs/adr/README.md index c18a213..d4fb2d2 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -37,4 +37,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`) - [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`) - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2 -- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot +- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless