From a7db7dd2da21fbec5632aed7f28232b665bc186f Mon Sep 17 00:00:00 2001 From: "claude@clouddev1" Date: Wed, 20 May 2026 10:25:43 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20ADR-0032=20+=20Phase=202=20plan=20?= =?UTF-8?q?=E2=80=94=20full=20SQL=20SELECT=20grammar?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-0030 §3 commissioned a focused ADR for the full SELECT grammar (the "SELECT — full" phase). ADR-0032 records the decisions; docs/plans/20260520-adr-0032-phase-2.md is the implementation plan walking the work. Phase 2's grammar surface: - Five JOIN flavours (INNER, LEFT, RIGHT, FULL OUTER, CROSS). NATURAL/USING/comma-FROM explicitly OOS. - All four set ops (UNION, UNION ALL, INTERSECT, EXCEPT). - WITH and WITH RECURSIVE CTEs, with optional (col-list) renaming. - Scalar subqueries, IN (SELECT …), [NOT] EXISTS as additive primary branches in sql_expr (redeems ADR-0031 §7 OOS-1). - Qualified column refs t.c / alias.c as a name_or_call tail (redeems ADR-0031 §7 OOS-2). - LIMIT n [OFFSET m]; legacy `LIMIT m, n` OOS. - DISTINCT/ALL, t.* projection, bare-alias projection (lifts Phase-1 §4.2's autonomous decision). Walker-capability honesty (§10): ADR-0030 §8's "ambient assistance comes for free" holds for grammar recursion (reuses ADR-0026's Subgrammar + depth cap unchanged) but not for completion scope. Phase 2 adds a new Node::ScopedSubgrammar variant alongside the existing Node::Subgrammar (DSL Expr and sql_expr recursion untouched), a from_scope_stack of ScopeFrames holding from_scope / cte_bindings / projection_aliases, qualified-prefix completion narrowing, and a post-walk fixup pass that re-resolves projection-list identifier highlighting/validity once FROM is parsed (the projection-before-FROM problem). CTE column resolution (§10.3): SELECT * and explicit-projection CTE bodies both yield real column completion past cte_alias.| via a body-projection derivation rule that runs at the body's ScopedSubgrammar exit and writes derived columns back into the binding. Diagnostics (§11): every Phase-2 validation case classified against ADR-0027's ERROR/WARNING guideline. Five new diagnostic.* catalog keys for parse-time-detectable cases (unknown_qualifier, ambiguous_column, projection_alias_misplaced, cte_arity_mismatch, compound_arity_mismatch) plus eight engine.* translation keys. A MatchedPath-walking predicate-warnings variant closes the Phase-1 carry-over gap where SQL WHERE expressions emitted no LIKE-on-numeric / = NULL / type-mismatch warnings — ADR-0027 Amendment 1 finally extends to the SQL surface. Result-column type resolution (§12): rusqlite 0.39.0 exposes column_table_name / column_origin_name / column_database_name behind a `column_metadata` feature; verified. Bare column refs recover their playground type — partially lifts Phase-1 §4.5's bool→0/1 deferral. The implementation plan breaks Phase 2 into seven sub-phases (2a–2g) with explicit exit gates per sub-phase and a cross-cut verification matrix that names every "X comes for free" claim from ADR-0030/0031/0032. The Phase-1 SQL-expression predicate-warning gap is a named row, preventing an analogous silent gap from shipping. The plan encodes the user's standing authorization for the implementer to walk uninterrupted between gates and commit with standard messages — escalation discipline preserved for design ambiguities and real blockers. Pushes remain user-only. New docs/plans/ directory sets a pattern for future phase plans. Status: Accepted. --- docs/adr/0032-sql-select-grammar.md | 1322 +++++++++++++++++++++++ docs/adr/README.md | 1 + docs/plans/20260520-adr-0032-phase-2.md | 881 +++++++++++++++ 3 files changed, 2204 insertions(+) create mode 100644 docs/adr/0032-sql-select-grammar.md create mode 100644 docs/plans/20260520-adr-0032-phase-2.md diff --git a/docs/adr/0032-sql-select-grammar.md b/docs/adr/0032-sql-select-grammar.md new file mode 100644 index 0000000..74fc265 --- /dev/null +++ b/docs/adr/0032-sql-select-grammar.md @@ -0,0 +1,1322 @@ +# ADR-0032: The full SQL `SELECT` grammar + +## Status + +Accepted + +## Context + +ADR-0030 commissions advanced mode as a body of **SQL grammar +inside the unified grammar tree** (ADR-0023/0024), phased. Phase 1 +("Foundations + first `SELECT`") shipped: a single-table `SELECT` +with projection, `WHERE`, `ORDER BY`, and `LIMIT`, executed as +validated SQL text through the existing data-table renderer. +ADR-0031 authored the SQL **expression grammar** the Phase-1 +`SELECT` consumed. + +Phase 2 — "`SELECT` — full" — is the next slice. ADR-0030 §3 lists +it: `JOIN`s, `GROUP BY` / `HAVING`, aggregates, subquery +expressions, `UNION`/`INTERSECT`/`EXCEPT`, common table +expressions, `LIMIT … OFFSET`, qualified column references. +ADR-0030 §3 also says the full `SELECT` grammar "is each large +enough to warrant their own focused ADR when implemented — the +precedent is ADR-0026 for the `WHERE` grammar." This is that ADR. + +The architecture is fixed (ADR-0030 §1, §4, §6, §8): one walker, +grammar-as-text execution, ambient assistance for free. This ADR +fixes the **shape** of the grammar — the productions, the +recursion, the additive extensions to ADR-0031's expression +fragment, and the few execution-path implications (worker-side +column-origin lookup so result columns recover their playground +type). It deliberately does **not** revisit ADR-0030's structural +decisions; references in this ADR's text to ADR-0030 §X mean +"that decision is the controlling one." + +### What ADR-0030 and ADR-0031 already fix + +- **No batch parser; SQL is grammar in the unified tree.** + Subquery recursion is a `Node::Subgrammar(&NAMED)` reference, + exactly as the expression ladder uses it (ADR-0031 §3). +- **No AST builder for the parts that execute as text.** + `Command::Select { sql: String }` carries the validated source; + the worker prepares and runs it (ADR-0030 §4/§6, ADR-0031 §2). +- **The `__rdbms_*` rejection** at every table-name slot + (ADR-0030 §6) — re-applied to every Phase-2 table-source slot + (`FROM`, `JOIN`, CTE-name). +- **No allowlist for function names** (ADR-0030 §13 OOS-3, + ADR-0031 §6). Aggregates (`count`, `sum`, `avg`, `min`, `max`) + parse through the generic `name_or_call` path — the grammar is + structurally aggregate-blind, by design. +- **No quoted identifiers** (ADR-0031 §7 OOS-3) — unchanged. +- **`MAX_SUBGRAMMAR_DEPTH = 64`** (ADR-0026) is the shared + recursion budget across DSL `Expr`, SQL expression, and (added + here) SQL `SELECT` recursion. No new walker capability is + introduced (§9). + +### The boundary with ADR-0031 + +ADR-0031 §7 named two additive extensions deferred to this ADR: + +- **OOS-1: subquery expressions** — `( SELECT … )` as a `primary`, + `IN ( SELECT … )`, `EXISTS ( SELECT … )`. Their grammar is fixed + in §6; they are additive `Choice` branches in `sql_expr.rs`, + recursing into the named `SELECT` fragment authored here. +- **OOS-2: qualified column references** — `t.c` / `alias.c`. + Their grammar is fixed in §5; they are an additive tail on + `name_or_call` in `sql_expr.rs`. + +`sql_expr.rs` was shaped to receive both branches without +restructuring (ADR-0031 §7 promise). This ADR redeems that +promise; the changes there are strictly additive. + +## Decision + +### 1. The top-level `SELECT` grammar + +The full statement decomposes into a top-level *compound query* +(set-operator chains around per-leg *core selects*), wrapped by +an optional `WITH` prefix and trailing `ORDER BY` / `LIMIT`: + +``` +select_statement := [ with_clause ] compound_select +compound_select := select_core ( set_op select_core )* + [ order_by_clause ] + [ limit_clause ] +set_op := UNION [ ALL ] | INTERSECT | EXCEPT +select_core := SELECT [ DISTINCT | ALL ] + projection_list + [ from_clause ] + [ where_clause ] + [ group_by_clause ] + [ having_clause ] +with_clause := WITH [ RECURSIVE ] cte_def + ( ',' cte_def )* +cte_def := identifier [ '(' column_name_list ')' ] + AS '(' compound_select ')' +projection_list := projection_item ( ',' projection_item )* +projection_item := '*' + | identifier '.' '*' + | sql_expr [ [ AS ] identifier ] +from_clause := FROM table_source ( join_clause )* +table_source := identifier [ [ AS ] identifier ] +join_clause := [ INNER ] JOIN table_source ON sql_expr + | LEFT [ OUTER ] JOIN table_source ON sql_expr + | RIGHT [ OUTER ] JOIN table_source ON sql_expr + | FULL [ OUTER ] JOIN table_source ON sql_expr + | CROSS JOIN table_source +where_clause := WHERE sql_expr +group_by_clause := GROUP BY sql_expr ( ',' sql_expr )* +having_clause := HAVING sql_expr +order_by_clause := ORDER BY order_item ( ',' order_item )* +order_item := sql_expr [ ASC | DESC ] +limit_clause := LIMIT sql_expr [ OFFSET sql_expr ] +``` + +`sql_expr` is ADR-0031's `SQL_OR_EXPR`, extended additively per +§5 and §6. `column_name_list` is `identifier (, identifier)*`. + +The named `static Node` exported by the new +`src/dsl/grammar/sql_select.rs` is `SQL_SELECT_STATEMENT` +(matching the full statement) and `SQL_SELECT_COMPOUND` (the +embedded form, omitting the outer `WITH`; this is what subqueries +recurse into — see §6, §9). + +Notes on specific productions: + +- **`FROM` stays optional.** Phase 1's autonomous decision §4.1 + is upheld: `SELECT 1` and `SELECT upper('x')` continue to + parse. With JOINs landing, the absence of a `FROM` simply + means no `from_clause`/`join_clause` was matched; no extra + shape is needed. +- **Bare-alias projection (`select a x`) is admitted.** Phase 1's + autonomous decision §4.2 deliberately rejected it as + structurally ambiguous. With Phase 2's grammar — `FROM` is + the only word that can legitimately follow a projection list, + and it is a keyword in the walker's expected-set — the + ambiguity dissolves: an identifier following the last + projection expression that is not `FROM`, `,`, `WHERE`, + `GROUP`, `ORDER`, `LIMIT`, or a set-op keyword is a bare + alias, and is so admitted. This lifts a small but visible + Phase-1 limitation. +- **`SELECT [ DISTINCT | ALL ]`.** `ALL` is the default and is + admitted for symmetry; `DISTINCT` is the meaningful case. They + are mutually exclusive at this position (a `Choice`, not two + `Optional`s). +- **`identifier '.' '*'`** lives only in `projection_item`, never + in `sql_expr`. This is intentional: `t.*` is *projection + syntax*, not an expression, and admitting it as an expression + primary would let it appear in `WHERE` / `ORDER BY` / etc. + where the engine would reject it and the engine-neutral error + would be hard to phrase. The grammar simply refuses it + structurally outside projection. +- **`UNION ALL` is a single set-op,** not `UNION` followed by an + `ALL` modifier on the next leg. `set_op` is a `Choice` of the + four atoms (with `UNION` and `UNION ALL` as separate branches); + factoring `UNION [ ALL ]` is also valid but the explicit four + branches keep the matched-path classes cleaner for + highlighting. + +### 2. JOIN flavours admitted + +The grammar admits exactly the flavours the user picked: + +- `INNER JOIN` / bare `JOIN` +- `LEFT [ OUTER ] JOIN` +- `RIGHT [ OUTER ] JOIN` +- `FULL [ OUTER ] JOIN` +- `CROSS JOIN` + +The first four take a mandatory `ON sql_expr`; `CROSS JOIN` +takes none. `OUTER` is the optional explicit modifier on +`LEFT` / `RIGHT` / `FULL`. + +**Explicitly out (§11):** `NATURAL JOIN`, `JOIN … USING (col)`, +and comma-list `FROM t1, t2` (the legacy implicit cross join). +The first two add grammar weight for limited teaching value; +comma-FROM teaches habits we do not want to encourage — +`CROSS JOIN` covers the same shape explicitly. + +JOIN chains are admitted as a flat `( join_clause )*`. Standard +SQL is left-associative; since the grammar builds no AST and the +engine receives the source text verbatim (ADR-0030 §4), the +engine resolves the associativity. The grammar's job ends at "the +chain parses". + +### 3. Set operators and compound queries + +`UNION`, `UNION ALL`, `INTERSECT`, `EXCEPT` all admitted — +ADR-0030 §3's full set. + +The compound shape (§1) is `select_core (set_op select_core)*`, +flat. Standard SQL gives `INTERSECT` higher precedence than +`UNION` / `EXCEPT`; the engine resolves this — the grammar admits +the chain as written. This mirrors §2's JOIN-chain decision. +A user who wants explicit grouping writes +`(SELECT … INTERSECT SELECT …) UNION SELECT …`, which falls out +of the subquery-`primary` branch (§6) — though for a top-level +statement this requires an extra `SELECT` wrapping. In practice +the engine's precedence is what learners encounter; calling it +out in the `help sql` page (ADR-0030 Phase 6) is sufficient. + +`ORDER BY` / `LIMIT` on a compound apply to the whole compound, +not to a leg — fixed by the position of `order_by_clause` and +`limit_clause` in §1's `compound_select`. + +### 4. CTEs (`WITH` and `WITH RECURSIVE`) + +The full `with_clause` per §1. Both forms admitted: non-recursive +`WITH` for naming intermediate results, and `WITH RECURSIVE` for +recursive queries (tree traversals, transitive closure, +generated sequences). + +The `cte_def` body is a parenthesised `compound_select`, so the +recursion is into `SQL_SELECT_COMPOUND` via `Subgrammar` — the +same recursion mechanism subqueries use (§9). + +**CTE-name collisions.** A CTE name shares the table-name +namespace at the engine. Standard SQL: the CTE shadows a +same-named base table within the statement. The grammar is +agnostic — both are identifiers in a table-source slot — so the +shadowing falls out of engine resolution. The +`reject_internal_table` validator still rejects any `__rdbms_*` +identifier in any table-source slot, **including** CTE-name +slots and the `FROM`s inside CTE bodies. That is the right +posture: the reserved namespace is reserved everywhere. + +Recursive CTEs use the standard `cte_name AS ( base_case UNION +[ALL] recursive_case )` shape — already admitted by §1's +`compound_select` body. No grammar branch specific to recursion +is needed; the `RECURSIVE` keyword is a hint to the engine, not +a grammar gate. + +### 5. Qualified column references + +Additive extension to `sql_expr.rs` (ADR-0031 §7 OOS-2). +`name_or_call`'s identifier prefix gains a `Choice` tail: + +``` +name_or_call := identifier + ( '.' identifier + | '(' call_args? ')' + )? +``` + +The leading identifier is matched once (preserving ADR-0031 §1's +factoring — no `Choice` branch begins with an identifier). The +optional tail is *either* a qualified-reference suffix +(`. identifier`) *or* a function-call argument list (`( … )`), +not both. A bare identifier with no tail remains a plain column +reference. + +A function call with a qualified name — `schema.f(…)` — is not in +scope (we have no schemas) and is structurally inadmissible by +construction: there is no production that admits both a `.`-tail +and a `(`-tail. + +Completion for the qualified form: when the cursor is past +`identifier '.'`, the completion source is "columns of the table +or alias named by the leading identifier", resolved from the +active `SchemaCache` (the same source the DSL completion uses, +ADR-0030 §8). This is a small extension to the existing +`IdentSource::Columns` machinery — when in scope, column +completion is scoped to the named source. + +### 6. Subquery expressions + +Additive extensions to `sql_expr.rs` (ADR-0031 §7 OOS-1): + +- **Scalar subquery as `primary`.** A `Choice` branch + `'(' compound_select ')'`. The existing `'(' or_expr ')'` + branch handles parenthesised expressions. Both start with + `'('`, so per ADR-0031 §1's factoring principle, the `'('` is + matched once and the inside is a `Choice` between + `compound_select` and `or_expr`. The first inside token + disambiguates: `SELECT` or `WITH` → subquery; anything else → + expression. The two `Choice` branches have non-overlapping + first-token sets, so the walker's expected-set at the + ambiguity point merges naturally without `Optional`-first + hazards. + +- **`IN ( subquery )`.** The existing `predicate_tail`'s + `IN '(' additive (',' additive)* ')'` branch gains a sibling + `IN '(' compound_select ')'`. Same `'('` factoring as the + scalar case: after `'('`, branch on `SELECT`/`WITH` (subquery) + vs additive-first-token (literal list). `NOT IN` follows from + the existing `[ NOT ]` factoring on the predicate tail. + +- **`[ NOT ] EXISTS ( subquery )`.** Added as a `primary` + `Choice` branch: + + ``` + primary := … | EXISTS '(' compound_select ')' + ``` + + The bare `EXISTS` form lives in `primary`; `NOT EXISTS` falls + out of the existing `not_expr := NOT not_expr` tier above + `primary` in the precedence ladder. This is structurally + cleaner than putting `[ NOT ] EXISTS` inside `primary`: there + is only one place `NOT` is admitted, and it composes uniformly. + +All three branches recurse through `Subgrammar(&SQL_SELECT_COMPOUND)`. +Correlated subqueries fall out for free — a subquery's +`sql_expr` reaches identifiers, which the engine resolves +against outer scopes. The grammar imposes no correlation +constraint; correlation is engine-side semantics. + +### 7. `GROUP BY` and `HAVING` + +`GROUP BY` takes a comma-separated list of `sql_expr`s. +Standard SQL admits any expression as a grouping key (not just +bare columns) — e.g. `GROUP BY date(created_at)`. The grammar +admits this without special-casing. + +`HAVING` is a single `sql_expr`. Its semantics is "boolean over +grouped rows"; the grammar does not enforce that — the +expression's typing is the engine's concern. + +**Aggregate correctness is not grammar-checked.** Whether a +projection's non-aggregated columns are valid given the +`GROUP BY` keys is a semantic question. ADR-0030 §9 settled this: +the grammar admits structurally, the engine rejects semantically, +and the friendly-error layer renders engine-neutral wording +(ADR-0019). A learner who writes `SELECT Name, COUNT(*) FROM t` +sees an engine-neutral "Name must appear in a GROUP BY clause or +be wrapped in an aggregate function"-style message, not a raw +engine string and not a parse error. This is the project's +honest limitation (ADR-0030 §7) and remains so. + +### 8. `LIMIT` / `OFFSET` and `ORDER BY` extras + +`LIMIT n [ OFFSET m ]` — the standard form. Both `n` and `m` are +`sql_expr`s (in practice integer literals, but the grammar +admits the general form so e.g. `LIMIT max(10, x) OFFSET 0` is +structurally accepted; the engine constrains values). + +The MySQL/SQLite legacy comma form `LIMIT m, n` is **out** (§11). +Its argument order (offset first, then count) inverts the +keyword form — a needless source of confusion. + +`ORDER BY` already admits `sql_expr` items with optional +`ASC` / `DESC` (Phase 1). With Phase 2: + +- **Column-position references** (`ORDER BY 1, 3 DESC`) fall out + for free — an integer literal is a valid `sql_expr`, and the + engine interprets a bare positive integer in `ORDER BY` as a + column position. The grammar does not distinguish the case; + rendering interprets the position. Document in `help sql`. +- **Qualified refs** in `ORDER BY` (e.g. `ORDER BY t.c`) fall + out of §5 — the grammar uses the same `sql_expr` body. + +### 9. Recursion, the depth budget, and the walker + +`SELECT` recurses into itself at four points: + +- A subquery `primary` in `sql_expr` (§6). +- An `IN ( subquery )` predicate tail (§6). +- An `EXISTS ( subquery )` primary (§6). +- A CTE body (§4). + +Every recursion is wired through +`Node::Subgrammar(&SQL_SELECT_COMPOUND)` — the named `static` Node +exported by `sql_select.rs`. The recursion is token-guarded in +every case: a subquery `primary` is preceded by `'('`; an +`IN ( subquery )` by `IN (`; an `EXISTS ( subquery )` by +`EXISTS (`; a CTE body by `AS (`. There is no left recursion; +the walker always makes progress. + +`MAX_SUBGRAMMAR_DEPTH = 64` (ADR-0026, reused by ADR-0031) is +**shared**: DSL `Expr` recursion, SQL expression recursion, and +SQL `SELECT` recursion all increment the same +`WalkContext::subgrammar_depth`. A worst-case learner query +might be `SELECT … WHERE id IN (SELECT … WHERE id IN (SELECT …))` +with each inner select carrying a few-deep expression — well +below the cap. The cap remains purely a stack-overflow guard; +**this ADR does not raise it**. If pathological-but-realistic +learner queries reach 64 in practice, a focused ADR lifts it +with measurements. Speculative raising would weaken the guard +without evidence. + +**No new walker capability is introduced.** `Subgrammar`, the +depth counter, the cap, and the friendly depth-exceeded error +all carry over from ADR-0026 unchanged — the same posture +ADR-0031 took. This is a non-trivial property: Phase 2 is the +biggest single grammar slice in the project, and it lands +without changing the walker's contract. + +### 10. Completion scope and the `WalkContext` extension + +ADR-0030 §8 promises that "ambient assistance comes for free" +because SQL is grammar in the unified tree. For Phase 1's +single-table `SELECT` this was substantially true: the existing +`WalkContext::current_table` mechanism (populated via the +`writes_table: true` flag on the `FROM` table-name slot) gave +`WHERE` and `ORDER BY` column-name completion against the right +table at no incremental cost. + +Phase 2 breaks the "free" claim. Multiple `FROM` tables via +`JOIN`s, aliases, CTE-defined table sources, subqueries with their +own `FROM` scope, qualified `t.c` references, projection aliases +referenced in `ORDER BY` — every Phase-2 surface needs **scope +information that `WalkContext` does not currently carry**. §9's +"no new walker capability" claim holds for grammar recursion +(`Subgrammar` and the depth cap suffice); for completion scope it +is too strong, and is softened here to an honest split. + +The current `WalkContext` carries one table at a time +(`current_table: Option` + `current_table_columns`), set +by `writes_table: true` on a `Tables` identifier. DSL paths +(`update T`, `delete from T`, `insert into T`) rely on this +single-table contract and continue to work unchanged. Phase 2 +adds layered accumulators alongside, not in place of. + +#### 10.1. The from-scope accumulator + +A new `WalkContext` field: + +``` +from_scope: Vec +TableBinding { table: String, alias: Option, + columns: Vec } +``` + +Populated incrementally as the walker descends through +`from_clause` and each `join_clause` (§1). The first table-source +slot pushes a binding; every subsequent `JOIN` pushes another. +`Ident` slots whose `IdentSource` is `Columns` now resolve against +the union of every binding's columns, with deduplication. + +`current_table` / `current_table_columns` remain as derived +helpers: when `from_scope.len() == 1`, they expose that single +binding's data, preserving the contract every existing DSL path +relies on. DSL `UPDATE` / `DELETE` / `INSERT` continue to push +exactly one binding via the existing `writes_table: true` +mechanism, unchanged. + +#### 10.2. Scope-stack discipline at `Subgrammar` boundaries + +Subqueries (§6) and CTE bodies (§4) introduce new lexical scopes. +A column reference inside `SELECT … WHERE id IN (SELECT id FROM +u)` resolves first against the inner `SELECT`'s `FROM` (`u`), and +— for correlation — also against the outer scope. +`subgrammar_depth` is a counter; it suffices for §9's depth cap +but not for scope. + +Phase 2 layers a stack on top. A new field: + +``` +from_scope_stack: Vec +ScopeFrame { + from_scope: Vec, + cte_bindings: Vec, + projection_aliases: Vec, +} +``` + +The new walker node variant — `Node::ScopedSubgrammar(&Node)` — +is what triggers a scope push. It is a sibling of the existing +`Node::Subgrammar(&Node)`, with the same recursion semantics +(reference-following, depth-counted) and one additional driver +behaviour: on entry, push the current `ScopeFrame` onto +`from_scope_stack` and start a fresh empty frame; on exit, pop +back. The existing `Node::Subgrammar` variant is unchanged — DSL +`Expr` recursion (ADR-0026) and the `sql_expr.rs` precedence- +ladder recursion (ADR-0031) keep using it and never push a scope. + +The grammar source spells the choice explicitly at each call +site: subqueries in `sql_expr.rs` and CTE bodies in +`sql_select.rs` reference the compound-SELECT through +`Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND)`; predicate-ladder +recursion in `sql_expr.rs` continues to use +`Node::Subgrammar(&SQL_OR_EXPR)`. Self-documenting, no flag +bookkeeping, and the walker change is localised to one extra arm +in the driver's `match` over `Node` variants. + +Column-completion candidates inside a scope frame are the union +of the current frame's `from_scope` and (for correlated refs) +all outer frames; outer-frame columns are admitted as additional +candidates so correlated references work. Ordering or visual +differentiation between current-frame and outer-frame candidates +is completion-tier polish and is not specified by this ADR — the +current completion API (`candidates_at_cursor*`) returns a flat +`Vec`, and adding a priority dimension is a separate concern. +CTE bindings resolve the same way (outward-walking) — a CTE +defined in an outer query is visible inside an inner subquery as +a table source, unless the inner subquery defines a CTE of the +same name and shadows it. + +This is the one explicit walker-capability extension Phase 2 +makes. It is scoped: one new node variant, no new walker entry +point, no change to how Subgrammar bodies are entered +structurally. The depth cap (§9) applies to both variants +uniformly through the shared `subgrammar_depth` counter. + +#### 10.3. CTE bindings + +A frame-local accumulator carries CTE definitions visible in the +current scope: + +``` +cte_bindings: Vec +CteBinding { + name: String, + columns: Vec, +} +CteColumn { + name: Option, // None for unnamed + // computed projections + type_: Option, // resolved playground type + // if derivable +} +``` + +A CTE definition `cte_name [(col-list)] AS (compound_select)` +produces a binding in two stages: + +1. **Pre-body push** (so `WITH RECURSIVE` self-references resolve). + When the walker reaches `AS` and is about to enter the body's + `Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND)`, it pushes a + placeholder binding into the *outer* frame's `cte_bindings` + with `columns = []` (an empty stand-in). The CTE name is now + visible as a table source from inside the body. +2. **Body-finalised harvest** (when the body's scope frame + completes). On `ScopedSubgrammar` exit, before popping the + frame, the driver derives the body's projection-list output + columns (rules below) and rewrites the placeholder binding in + the outer frame. + +**Output-column derivation rules.** Walking the body's +projection items: + +| Projection item | Derived CTE column(s) | +|---------------------------------------|----------------------------------------------------------------------------------------| +| `*` | Every column from the body frame's `from_scope`, in order, with their resolved types | +| `t.*` (qualified wildcard) | Every column from binding `t` in the body frame's `from_scope`, with their types | +| `col` (bare ref, resolves uniquely) | One column: name = `col`, type = the resolved column's playground type | +| `t.col` (qualified ref) | One column: name = `col`, type = `t`'s column's type | +| `expr AS alias` or bare `expr alias` | One column: name = `alias`, type = the underlying type if `expr` is a single column ref, else `None` | +| `expr` (computed, no alias) | One column: name = `None`, type = `None` — engine assigns an implementation-defined name | + +For compound bodies (`UNION` / `INTERSECT` / `EXCEPT`) the columns +come from the **first leg** per standard SQL. For recursive CTE +bodies (`WITH RECURSIVE`) the same rule — the non-recursive leg +dictates. + +If a `(col-list)` was supplied on the CTE name, it **renames** the +derived columns positionally and overrides their names; types are +preserved from the derivation. If the column-count of `(col-list)` +disagrees with the body's projection arity, the grammar admits +this and the engine surfaces the mismatch — `do_run_select`'s +engine-neutral error layer carries the message (ADR-0030 §9, +ADR-0019). + +**Completion past `cte_alias.|`.** Where the derivation produced +named columns (every form above except computed-no-alias), they +complete with their names and (where typed) participate in §11's +result-type resolution if the CTE's columns are projected +upstream. Where the derivation produced an unnamed column slot, +that slot is silently skipped from the qualified-prefix candidate +list — the user typing `cte.|` past it sees only the nameable +columns. The cure for "I want my expression to be referenceable +from outside the CTE" is to add an alias, which is the same cure +the engine itself enforces at execution time. + +This is substantially better than the earlier "honest limitation" +posture: the common `SELECT *` body is fully resolvable; explicit +projections are resolvable; only un-aliased computed columns +elude us, and the right learner response there is the same as +the engine's right learner response — write an alias. + +`cte_bindings` lives on the scope frame, so a CTE defined in an +outer query is visible inside an inner subquery as a table source +unless that subquery defines a CTE of the same name (which +shadows it, per standard SQL). + +#### 10.4. Projection-alias bindings + +Standard SQL admits `ORDER BY` referencing a SELECT-list alias: +`SELECT a + b AS total FROM t ORDER BY total`. A third frame-local +accumulator: + +``` +projection_aliases: Vec +``` + +Each `projection_item`'s optional alias (whether `AS x` or bare +`x` — see §1) appends its name. `Ident` slots inside the trailing +`ORDER BY`'s `sql_expr`s offer projection aliases as additional +candidates alongside column names. This addresses §1's bare-alias +admission's completion behaviour at the same time. + +The accumulator is not consulted inside `WHERE`, `GROUP BY`, or +`HAVING` — standard SQL forbids alias references there +(aliases are not yet bound at evaluation time). The grammar +admits them structurally regardless; the engine rejects; ADR-0019 +renders the engine-neutral error. + +#### 10.5. Qualified-prefix completion + +§5 fixed the grammar for `t.c` references. The completion +behaviour at qualified positions: + +- At an `Ident` cursor with **no prefix**, candidates are the + union of every `from_scope` binding's columns, plus + `projection_aliases` when in `ORDER BY`, deduplicated. CTE-name + candidates apply only in table-source slots, not column slots. +- At an `Ident` cursor immediately after `prefix '.'`, candidates + are **scoped**: resolve `prefix` against the active `from_scope` + (preferring alias matches over table matches, since aliases + shadow), and offer that binding's columns alone. If `prefix` + doesn't resolve to a binding, the candidate list is empty — the + walker's expected-set still surfaces the syntactic alternatives + (the user sees no column candidates but the structural error + message reports the unresolved prefix). + +The qualified-prefix narrowing is a small extension to the +existing `IdentSource::Columns` handling: when the matched-path +immediately preceding the `Ident` ends with `Ident '.'`, the +completer is told the prefix and narrows accordingly. This is the +only completion-source-level change; the rest is data flowing +through the new accumulators. + +#### 10.6. The projection-before-FROM problem + +Standard SQL writes projection **before** `FROM`. A user typing +`select col1, col2 from mytable` produces, mid-typing, a state +where the projection list has been parsed but the `FROM` has not. +At that point the column-name completer cannot scope to +`mytable` — it does not know `mytable` is coming. Validation and +highlighting face the same problem: `col1` and `col2` cannot be +checked as belonging to `mytable` until the user types `from +mytable`. The debounced re-walk on every keystroke (ADR-0027) is +**not** sufficient on its own to fix this in a single-pass walker, +because by the time the FROM is parsed, the projection +identifiers have already been resolved (left-to-right) against +the only scope information available at that moment — the empty +`from_scope`. + +There is no fully satisfying single-pass answer. Phase 2's +posture is therefore explicit: + +1. **During-typing completion** of projection-list column names, + when `from_scope` is empty (no `FROM` yet), uses the unioned + `SchemaCache.columns` — every column known to the schema — + as the candidate set. This is the same global fallback Phase 1 + uses and remains the right behaviour: a noisier-but-useful + completion is better than no completion. +2. **A post-walk fixup pass** re-evaluates projection-list column + refs against the *final* `from_scope` after the walk + completes. The walk records each projection `Ident`'s + span and matched-path location; once the walk reaches end-of- + input (or end-of-statement), the fixup walks the recorded + list, looks up each identifier against the final `from_scope`, + and: + - **Rewrites the highlight class** on that terminal — + downgrading "column" → "unknown identifier" when the + identifier doesn't belong to any in-scope binding, + upgrading "unknown identifier" → "column" when it does. + - **Updates the diagnostic** for the validity indicator + (ADR-0027) — a column-not-found ERROR either appears or + disappears based on the post-walk scope. + + **Integration point.** The fixup runs as the **final stage of + the walk itself**, after all grammar nodes have been processed + but before `WalkResult` is returned to the caller. It mutates + the walker's accumulated highlight runs and diagnostics vector + in place, so the consumer (the renderer, the validity + indicator) sees a single coherent snapshot. This keeps the + walker the single source of truth for what reaches the + renderer — the fixup is conceptually part of "what the walker + produces", not a separate post-processing layer. The same + convention applies to the §11.6 SQL-expression predicate + warnings, which also run as a final walk stage. +3. The fixup runs on every debounced re-walk (ADR-0027 already + triggers the full walk per keystroke), so the user observes: + typing `col1, col2 from mytable`, the `col1` / `col2` + initially highlight as generic identifiers (with a soft + warning if not found anywhere in the schema); the moment + `mytable` is typed, the highlight snaps to the column class + if `col1` / `col2` belong to `mytable`, or to the + unknown-identifier diagnostic if they don't — within one + debounce cycle. + +The fixup pass does not re-parse; it only re-resolves +identifiers against the final `from_scope`. + +`ORDER BY` alias resolution needs no fixup. Projection precedes +`ORDER BY` in walk order, so `projection_aliases` is fully +populated by the time the walker reaches an `ORDER BY` `Ident`; +the alias-as-column-candidate is resolved in the single forward +pass. + +This is the answer to the user's "I think this may be automatic" +intuition: the debounced re-walk is automatic; the +post-walk fixup pass is the new infrastructure that makes the +re-walk produce *correct* results. Without it, projection-list +column refs would forever validate against the global column set +even after the `FROM` is typed. + +#### 10.7. The honest split + +§9 still holds for **grammar recursion**: `Subgrammar` and the +depth cap are reused unchanged. For **completion scope**, this +section introduces: + +- New `WalkContext` fields: `from_scope`, `from_scope_stack`, + `cte_bindings`, `projection_aliases`. +- Scope push/pop discipline at `SQL_SELECT_COMPOUND` `Subgrammar` + boundaries — driven by a marker on the Subgrammar target so DSL + Subgrammars are unaffected. +- A qualified-prefix narrowing in the `IdentSource::Columns` + completion path. +- A post-walk fixup pass for projection-list identifier + highlighting and validity (§10.6). + +These are real walker-contract extensions. They are scoped: no +new node kinds, no new walk-driver entry points, no changes to +how Subgrammar bodies are entered structurally. The existing DSL +paths are unaffected — their grammars never push a SELECT scope, +never define a CTE, never carry projection aliases — and the +single-table `current_table` / `current_table_columns` view is +preserved as a derived helper. + +§9's claim is therefore restated honestly: **grammar recursion +needs no new walker capability; completion scope needs the +additions above.** + +### 11. Diagnostics for Phase-2 validation cases + +ADR-0027 fixes the warning-vs-error guideline verbatim: + +> **ERROR** — the input is *known* to fail. Either it does not +> parse (incomplete, or a mismatched / invalid token), or it +> parses but names something that does not exist (an unknown +> table or column). +> +> **WARNING** — the input is valid and *will* run, but is very +> likely not what a knowledgeable user wants: a type-mismatched +> comparison, or `= NULL` (both from ADR-0026 §7). Amendment 1 +> adds a third trigger — `LIKE` against a numeric column. +> +> The split is *certainty of failure* versus *likely misleading*. + +This section walks the Phase-2 surface case-by-case, classifies +each against that guideline, and identifies the diagnostic +machinery additions needed. It also flags a Phase-1 carry-over +gap (§11.6) that Phase 2 closes. + +#### 11.1. Existing diagnostics, briefly + +Two post-walk passes today (`src/dsl/walker/mod.rs`): + +- **Schema-existence pass** (ERROR). Walks the `MatchedPath`, + checks every `IdentSource::Tables` / `IdentSource::Columns` + ident against `SchemaCache`. Emits `diagnostic.unknown_table` + and `diagnostic.unknown_column`. Today this assumes a single + `current_table` for column resolution. +- **Expression predicate-warnings pass** (WARNING). Walks the + parsed DSL `Expr` AST emitted by `expr.rs`'s builder. + Emits `diagnostic.eq_null`, `diagnostic.type_mismatch`, + `diagnostic.like_numeric`. Runs only on WHERE expressions in + the DSL. + +Phase 2 extends both, and §11.6 fills a SQL-side gap. + +#### 11.2. Phase-2 new ERROR cases + +Every case below is "known to fail on the engine" — the engine +would surface a message the friendly-error layer would translate +(ADR-0019). Surfacing them as pre-flight ERROR diagnostics gives +the learner the answer one debounce cycle faster, with the +walker as the single source of truth. + +- **Unknown table in any `FROM`/`JOIN` slot.** The existing + schema-existence pass extends from "the one + `current_table`" to walking every `from_scope` binding's + `table` and emitting `diagnostic.unknown_table` per + unresolved name. CTE-name slots in the active + `cte_bindings` are valid table sources and exempt from + this check. +- **Unknown CTE-as-table.** A table-source slot whose name is + not in `SchemaCache.tables` *and* not in the active + `cte_bindings` chain emits `diagnostic.unknown_table` (same + catalog key — from the learner's perspective the engine + message is the same; the slot is a "table that doesn't + exist", whether they meant a CTE or a base table). +- **Unknown table or alias in a qualified column reference** + (`t.c` where `t` doesn't resolve in the active + `from_scope`). New catalog key + `diagnostic.unknown_qualifier` `{qualifier}`. +- **Unknown column in a qualified reference** (`t.c` where `t` + resolves but `c` is not a column of that binding). Reuses + `diagnostic.unknown_column` with the column name in context. +- **Ambiguous unqualified column reference** — a column name + used unqualified that exists in two or more `from_scope` + bindings. The engine raises "ambiguous column name"; we + surface it as ERROR with a new catalog key + `diagnostic.ambiguous_column` `{column}, {qualifiers}` so + the learner sees which two tables the name appeared in. +- **Reference to a projection alias in `WHERE` / `GROUP BY` / + `HAVING`.** Standard SQL forbids it (aliases are not bound + at evaluation time). The grammar admits the identifier + structurally; a new diagnostic pass emits ERROR with a new + catalog key `diagnostic.projection_alias_misplaced` + `{alias}, {clause}`. +- **CTE column-list arity mismatch.** When `cte_name (col1, + col2, …) AS (compound_select)` declares N columns and the + body's projection (§10.3) derives M columns with N ≠ M, the + CTE harvest pass (§10.3 stage 2) emits ERROR with a new + catalog key `diagnostic.cte_arity_mismatch` `{cte}, + {declared}, {actual}`. +- **Compound-query column-count mismatch.** When a `UNION` / + `INTERSECT` / `EXCEPT` chain has legs whose projection + arities differ, the engine errors at execution. Phase 2 + catches it pre-flight: each leg's derived arity (the same + derivation the CTE harvest uses) is compared as the + compound is assembled. ERROR with a new catalog key + `diagnostic.compound_arity_mismatch` `{op}, {left_n}, + {right_n}`. +- **Internal-table reference in any new table-source slot.** + Already a parse-time rejection via + `reject_internal_table` (§1, §4) — surfaces as a parse + error, not a post-walk diagnostic. Listed here for + completeness: the catalog key `select.internal_table` + authored in Phase 1 covers every Phase-2 slot too. + +#### 11.3. Phase-2 new WARNING cases + +The existing WARNING set (`= NULL`, type-mismatched +comparison, `LIKE`-on-numeric) is the right set. Phase-2 adds +**no new WARNING categories** — every Phase-2-specific case +falls into ERROR (§11.2) or engine-rejected (§11.4). + +Considered and rejected as WARNINGs: + +- **CTE name shadowing a base table.** Standard SQL behaviour; + often intentional (the canonical "filter to a subset, then + query as if it were the base table" pattern). No diagnostic. +- **Correlated reference without explicit qualification.** + Correlation is implicit in standard SQL; per the user + guideline a knowledgeable user does want this. The walker + validates the reference silently against the outer-frame + scope; no warning, no diagnostic. +- **Unused CTE.** A CTE defined in `WITH` but never referenced. + The engine ignores it; many learners write CTEs as + intermediate scratch space. Not a warning. + +#### 11.4. Engine-rejected (no diagnostic) + +These fail on the engine and surface via ADR-0019's +friendly-error layer at execution time. The walker does not +attempt pre-flight detection because: + +- **Non-aggregated columns in projection with `GROUP BY`** — + detecting requires knowing which function names are + aggregates; ADR-0030 §13 OOS-3 / ADR-0031 §6 keep us + allowlist-free. +- **Aggregate function in `WHERE`** — same reason. +- **Scalar subquery returning multiple rows** — semantic, not + syntactic; requires execution. +- **Recursive CTE without a `UNION`** — requires inspection of + the body's compound shape against the recursive contract; + doable in principle, deferred as engine territory. +- **Duplicate CTE names within the same `WITH`** — checkable + in principle (walking `cte_bindings` for duplicates), but + the engine catches it cleanly. Phase 2 does not pre-flight + it; could be added later if its absence proves confusing. +- **Type-mismatched JOIN ON predicates** — the existing + expression type-mismatch warning (extended per §11.6) + handles the explicit-literal case; arbitrary-expression + cases require type inference and stay engine-side. + +#### 11.5. Catalog additions + +Phase 2 adds the following message-catalog keys (ADR-0019). +Every key is engine-neutral by construction. + +Parse-time-detectable (post-walk diagnostic passes): + +| Key | Slots | +|----------------------------------------|--------------------------------------------------| +| `diagnostic.unknown_qualifier` | `{qualifier}` | +| `diagnostic.ambiguous_column` | `{column}, {qualifiers}` | +| `diagnostic.projection_alias_misplaced`| `{alias}, {clause}` | +| `diagnostic.cte_arity_mismatch` | `{cte}, {declared}, {actual}` | +| `diagnostic.compound_arity_mismatch` | `{op}, {left_n}, {right_n}` | + +Engine-error translations (friendly-error layer; reached on +execution failure): + +| Key | Engine cause | +|----------------------------------------|--------------------------------------------------| +| `engine.no_such_table` | `no such table: ` (post-execution path) | +| `engine.no_such_column` | `no such column: ` (post-execution path) | +| `engine.ambiguous_column` | `ambiguous column name: ` | +| `engine.aggregate_misuse` | `misuse of aggregate function ()` | +| `engine.group_by_required` | `column must appear in the GROUP BY clause or be used in an aggregate function` (or equivalent) | +| `engine.compound_arity_mismatch` | `SELECTs to the left and right of UNION do not have the same number of result columns` (or equivalent) | +| `engine.scalar_subquery_too_many_rows` | scalar subquery cardinality violation | +| `engine.recursive_cte_malformed` | recursive CTE shape errors | + +The parse-time keys and the engine keys are intentionally +separate even when they describe the same situation +(`engine.ambiguous_column` mirrors +`diagnostic.ambiguous_column`) — the parse-time message can +include the learner's typed text and span; the engine-time +message catches what the parser missed and routes through the +friendly-error layer with whatever context the engine yielded. + +Two pre-existing parse-time keys are reused unchanged for +Phase-2 slots: `diagnostic.unknown_table`, +`diagnostic.unknown_column`, and the Phase-1 +`select.internal_table`. + +#### 11.6. The Phase-1 SQL-expression predicate-warning gap + +ADR-0027 Amendment 1's `LIKE`-on-numeric warning, and ADR-0026 +§7's `= NULL` and type-mismatch warnings, are emitted by a pass +that walks the **DSL** `Expr` AST. Phase 1's `sql_expr.rs` +deliberately builds **no AST** (ADR-0031 §2). The consequence +is a Phase-1 carry-over gap: **SQL `WHERE` expressions today +emit none of these warnings** — `select * from t where name +like 5` parses, the engine runs it, and the learner gets the +engine's verdict, not the friendly pre-flight nudge ADR-0027 +Amendment 1 promised. + +Phase 2 closes this. The predicate-warnings pass gains a +**MatchedPath-walking variant** that runs over the SQL +expression nodes and identifies the predicate shapes +structurally (a `LIKE` predicate-tail with a column-ref left +operand; a `=`/`!=` predicate-tail with a `NULL` literal +operand; a comparison predicate-tail with a column-literal +operand pair of mismatched types). It does not need an `Expr` +AST because the matched-path terminals carry both the byte spans +(for the diagnostic) and the node-name labels (for shape +identification). The same catalog keys (`diagnostic.eq_null`, +`diagnostic.type_mismatch`, `diagnostic.like_numeric`) apply +unchanged; only the pass implementation is new. + +The MatchedPath-walking pass runs over **every** Phase-2 +`sql_expr` slot — `WHERE`, `HAVING`, `ON`, `CASE` branches, +projection items, `ORDER BY` items — so warnings surface +uniformly across the SQL surface rather than just `WHERE`. This +is a strict improvement over Phase 1's behaviour, where even +Phase-1 SELECT WHERE expressions got no predicate warnings. + +Type-resolution for the MatchedPath-walking pass: a column ref's +type comes from §10's `from_scope` (or, for `t.c`, the specific +binding); a literal's type comes from its lexical class. When +the column ref doesn't resolve (the schema-existence ERROR pass +will already have flagged it), the warning pass skips the +predicate — no point compounding diagnostics on an already- +broken reference. + +#### 11.7. Mechanism summary + +Three diagnostic passes by end of Phase 2, all running as final +stages of the walk (per §10.6's integration-point convention): + +1. **Schema-existence ERROR pass** — extended from single + `current_table` to walking every `from_scope` binding and + the active `cte_bindings`. Adds the qualified-reference + and ambiguity checks (§11.2). +2. **Arity-check ERROR pass** (new) — runs at CTE-body and + compound-query frame-exits (the same `ScopedSubgrammar` + exit hook §10.3 uses), comparing declared vs derived + column counts. +3. **Predicate-warnings pass** — extended with a + MatchedPath-walking variant for `sql_expr` (§11.6) covering + `= NULL`, type mismatch, and `LIKE`-on-numeric across every + SQL expression slot, in addition to the existing DSL `Expr` + AST variant for DSL expressions. + +Per the integration-point convention (§10.6), each pass +mutates the walker's accumulated highlight runs and diagnostics +in place; the consumer sees a single coherent snapshot. + +The projection-list fixup of §10.6 is conceptually part of pass +(1) — it is the same "re-resolve identifier against final +scope" operation, applied to the small subset of identifiers +whose scope wasn't fully known at first-pass walk time. + +### 12. Result-column type resolution + +Phase 1's `column_types: Vec` is partially lifted: where a +projection item is structurally a single column reference, the +worker resolves it back to the source column's playground type +(ADR-0005) and populates that slot in `DataResult.column_types`. +Everything else stays `None`. + +This addresses Phase-1 autonomous decision §4.5 (bool SELECT +results render as `0`/`1`): a bare `bool` column now renders as +`true` / `false` again, alignment recovers, and the `show data` +rendering path is reached for the common case. + +**Resolution rule.** A projection item is "structurally a single +column reference" when, after stripping an optional `[ AS ] +alias`, its expression is one of: + +- An unqualified identifier (`Name`) that resolves uniquely to a + single column across the FROM tables; +- A qualified reference (`t.c` / `alias.c`) that resolves + unambiguously through the FROM aliases. + +Anything else — function calls, arithmetic, `CASE`, literals, +subquery expressions, the `*` and `t.*` wildcards — keeps +`column_types[i] = None`. When resolution is ambiguous +(unqualified column name appears in two FROM tables) the +grammar admits it (engine resolves or errors); the type-resolver +returns `None` and the renderer falls back to neutral alignment. + +**Implementation seam.** The strongly preferred mechanism is +**engine-side column-origin lookup**: after preparing the +statement, query the prepared statement for each result column's +underlying table and column. The engine knows authoritatively +which result columns are direct references and which are +expressions; for direct references it returns the source +table+column, for expressions it returns nothing. This avoids +re-parsing the source or adding structured projection-item data +to the `MatchedPath` — the grammar tier is not involved at all, +which preserves ADR-0031 §2's "no AST" decision and stays on the +right side of ADR-0030's "one source of truth" rule. + +The Phase-2 implementer verifies that the rusqlite version +pinned in `Cargo.toml` exposes this metadata (the SQLite C API +calls are `sqlite3_column_table_name` / +`sqlite3_column_origin_name` — they have been stable for two +decades; rusqlite either exposes them directly or via the +underlying `*mut sqlite3_stmt` handle). If exposure turns out +to be awkward, the fallback is a small post-parse walk over the +projection-item subtrees in the `MatchedPath` — strictly worse +because it duplicates a slice of parsing, but available. + +The resolution pass adds one method on `Database` (something +like `resolve_select_column_types`) called from `do_run_select` +before the `DataResult` is shipped. It takes the prepared +statement and the active `SchemaCache`, and returns +`Vec>`. The renderer needs no change — `None` +slots already render as typeless. + +This is the only execution-path change Phase 2 makes; everything +else routes through Phase 1's grammar-as-text execution. + +### 13. Out of scope + +- **OOS-1. Derived tables in `FROM`** — `FROM (SELECT …) [AS] + alias`. The same shapes are reachable via CTEs (§4), which + Phase 2 ships. Derived tables in `FROM` are not authored here. +- **OOS-2. `NATURAL JOIN` and `JOIN … USING (col)`.** Both are + convenience forms. NATURAL is widely considered a footgun; + USING is cleaner but adds grammar weight without lifting any + expressive ceiling. Out. +- **OOS-3. Comma-list `FROM t1, t2` (implicit cross join).** Out. + `CROSS JOIN` covers the same shape explicitly. +- **OOS-4. `LIMIT m, n` (the legacy comma form).** Out (§8). +- **OOS-5. Window functions** (`OVER (…)`, `PARTITION BY`, + window-frame syntax). A meaningful learning topic, but a large + surface of its own and out of ADR-0030's commissioned set. +- **OOS-6. `LATERAL` joins.** Not commissioned by ADR-0030. +- **OOS-7. `VALUES (…)` as a row source.** Not commissioned. +- **OOS-8. A function/aggregate allowlist** — ADR-0030 §13 + OOS-3 / ADR-0031 §7 OOS-4 still apply: aggregate names parse + generically through `name_or_call`. +- **OOS-9. Quoted identifiers** (`"column name"`). Tracked as + ADR-0031 §7 OOS-3, still tracked. +- **OOS-10. Engine-checked aggregate correctness at parse + time.** The grammar admits structurally; engine rejects + semantically; ADR-0019 surfaces the engine's verdict in + engine-neutral wording (§7). +- **OOS-11. Result-column type resolution beyond bare column + refs.** Computed columns (`a + b`, `upper(name)`, `CASE …`) + stay typeless (§10). +- **OOS-12. The `help sql` page and parse-error usage entries** + for the Phase-2 surface. The grammar carries the `help_id`s + authored in this phase, but the page content and the rich + per-command usage messages are Phase 6 (ADR-0030 §10) and + ADR-0021. Phase 2 leaves the same `help_id: None` shape Phase + 1 used for `select`. + +## Consequences + +- A new grammar file, `src/dsl/grammar/sql_select.rs`, parallel + to `sql_expr.rs`, exporting `pub static SQL_SELECT_STATEMENT: + Node` and `pub static SQL_SELECT_COMPOUND: Node`. The Phase-1 + `data::SELECT` `CommandNode` is rebuilt against + `SQL_SELECT_STATEMENT` (its body becomes a `Subgrammar` + reference); the `CommandNode` itself stays. +- **Phase-1 SQL `SELECT` grammar nodes migrate.** The Phase-1 + static nodes that live in `src/dsl/grammar/data.rs` for the + single-table SELECT (the projection, FROM, WHERE, ORDER-BY, + LIMIT sub-trees) move into `sql_select.rs` as the + starting-point for the §1 productions; the file leaves only + the `CommandNode` shell behind. The seven Phase-1 SQL `SELECT` + integration tests are part of the safety net for this + migration — they must continue to pass under the rebuilt + grammar, in addition to the new Phase-2 integration tests + authored in step 4 of the implementation notes. +- **Hint-panel prose** for the new clauses (JOIN flavours, ON, + GROUP BY, HAVING, UNION / INTERSECT / EXCEPT, WITH, OFFSET, the + qualified-prefix and CTE-prefix completion states) is + authored at the structural level alongside each grammar node + in step 1 — a one-liner per slot, enough to drive the hint + panel. Richer per-clause teaching prose and the `help sql` + reference page remain ADR-0030 Phase 6 work (§12 OOS-12). +- **Walker cost is expected to stay proportional to source + length.** The new accumulators are `O(bindings + aliases)` + per frame; the scope stack is bounded by `MAX_SUBGRAMMAR_DEPTH + = 64` (§9); the §10.6 post-walk fixup pass touches one entry + per projection-list `Ident` (a small set). Each debounced + keystroke (ADR-0027) walks once, fixes up once, and emits a + single coherent output. No new pathological case is + introduced — if a learner-realistic query produces a + noticeable typing-time stall, measure first and revisit the + recursion budget or the accumulator structure on evidence. +- `sql_expr.rs` gains three additive `Choice` branches and one + additive tail on `name_or_call` (§5, §6). The existing tiers + and the depth-cap discipline are unchanged. The Phase-1 tests + continue to exercise the existing branches as they stand. +- **No new walker capability** (§9). `Subgrammar`, the depth + counter, the cap, and the friendly depth error are all reused + unchanged — the same posture ADR-0031 took. +- `Command::Select { sql: String }` is unchanged. The validated + source SQL is simply larger; the worker still routes it through + `Database::run_select` and `do_run_select` (Phase 1 path). +- The worker gains a post-prepare type-resolution helper that + populates `column_types` for direct-reference projection items + (§12) via the engine's column-origin metadata. **`Cargo.toml` + gains `column_metadata` to `rusqlite`'s feature list** + (alongside the existing `bundled`); this pulls in the SQLite + `SQLITE_ENABLE_COLUMN_METADATA` compile flag and exposes + `RawStatement::column_table_name` / + `column_origin_name` / `column_database_name` on the prepared + statement. Verified against the project's pinned rusqlite + 0.39.0. This is the only Phase-2 execution-path change. +- **Three diagnostic passes** (§11.7) — schema-existence + (extended), CTE/compound arity-check (new), and predicate + warnings (extended with a MatchedPath-walking variant for + `sql_expr` — §11.6). All run as final walk stages and + mutate the walker's accumulated output in place. Closes the + Phase-1 carry-over gap where SQL `WHERE` expressions emitted + no `LIKE`-on-numeric / type-mismatch / `= NULL` warnings. +- **Catalog additions** (§11.5) — five new `diagnostic.*` keys + for parse-time-detectable cases and eight new `engine.*` + keys for friendly-error layer translations of engine + messages. +- The walker's `WalkContext` gains the completion-scope + accumulators of §10 — a `from_scope_stack: Vec` + whose top frame is the active `from_scope` / `cte_bindings` / + `projection_aliases`. A **new node variant `Node::Scoped­ + Subgrammar(&Node)`** (§10.2) is the trigger for push/pop; + existing `Node::Subgrammar` is unchanged so DSL `Expr` and + `sql_expr` recursion are unaffected. A post-walk fixup pass + re-resolves projection-list identifier highlighting and + validity once the final `from_scope` is known (§10.6). CTE + output columns are derived from the body's projection list at + body-frame exit, populating the binding back into the outer + frame (§10.3) — so `SELECT *` and explicit-projection CTE + bodies both yield real column completion past `cte_alias.|`. + This **softens §9's "no new walker capability" claim** for + completion scope; grammar recursion still needs nothing new. +- `__rdbms_*` rejection extends to **every** table-source slot + introduced by Phase 2: the `FROM` table, each `JOIN`'s table, + each CTE name, and the `FROM` table inside any CTE body + (§4, §6). The `reject_internal_table` validator is reused. +- Completion gains: SQL keywords for joins / set ops / `WITH` / + `GROUP` / `HAVING` / `OFFSET` (all walker-derived, no + bespoke code); column completion scoped to a qualified prefix + `t.` resolves through the active `SchemaCache` (§5). +- Phase-1 autonomous decisions §4.1 and §4.3–§4.4 stand (optional + `FROM`, `help_id: None`, walker-mode defaults). §4.2 is lifted + (bare-alias projection admitted, §1). §4.5 is partially lifted + (bare bool column refs recover their type via §12). +- `requirements.md`'s `Q1` / `Q2` advance further; `Q4` was + already ticked by ADR-0030 and ADR-0031. + +## Implementation notes + +A build order, each step guarded by the test suite. The phases +within Phase 2 mirror the ADR-0030 / ADR-0031 staging — grammar +first, execution-path change last. + +**Detailed plan: `docs/plans/20260520-adr-0032-phase-2.md`.** +The notes below are the outline; the plan refines them into +seven sub-phases (2a–2g) with per-gate exit criteria, a +cross-cut verification matrix that explicitly tests every +"X comes for free" claim from ADR-0030/0031/0032 (the kind of +implicit claim that produced the Phase-1 SQL-expression +predicate-warning gap §11.6 closes), and a final phase-exit +verification report template. Implementers work through the +plan; the ADR remains the decisions. + +1. **The `sql_select.rs` grammar fragment.** Author the + stratified tiers of §1 as named `static` `Node`s, recursion + via `Subgrammar`. Export `SQL_SELECT_STATEMENT` and + `SQL_SELECT_COMPOUND`. The existing `data::SELECT` + `CommandNode` is rebuilt against `SQL_SELECT_STATEMENT`. +2. **Unit tests** against the fragment directly (the + `expr.rs` / `sql_expr.rs` test pattern): JOIN flavours, + GROUP BY / HAVING, qualified refs, every set-op, recursive + and non-recursive CTEs, `LIMIT … OFFSET`, `DISTINCT`, + `t.*` projection, the bare-alias projection, plus the + keyword-case-insensitivity check. +3. **`sql_expr.rs` additive extensions** (§5, §6): the + qualified-ref tail on `name_or_call`; the scalar-subquery + `primary` branch; the `IN (subquery)` predicate-tail branch; + the `EXISTS (subquery)` `primary` branch. Unit tests for each. +4. **Integration tests** (the `tests/` Tier-3 path, building on + Phase 1's SQL `SELECT` tests): each JOIN flavour returns the + expected rows; GROUP BY / HAVING aggregates over real data; + `UNION` / `INTERSECT` / `EXCEPT` between two SELECTs; a + non-recursive CTE; a recursive CTE (a small tree traversal + or generated-sequence example); a scalar subquery in + `WHERE`; `IN (SELECT …)`; `EXISTS (…)`; qualified refs + resolving correctly. +5. **The `WalkContext` scope accumulators** (§10). Add the + `ScopeFrame` type (`from_scope` / `cte_bindings` / + `projection_aliases`) and the `from_scope_stack`; add the + `Node::ScopedSubgrammar(&Node)` variant alongside the + existing `Node::Subgrammar`; teach the driver to push/pop a + fresh frame on `ScopedSubgrammar` entry/exit; rewrite every + reference to `&SQL_SELECT_COMPOUND` from outside its own + definition to use the new variant (subqueries in + `sql_expr.rs`, CTE bodies in `sql_select.rs`); teach + `from_clause` / `join_clause` to populate the frame's + `from_scope`; teach `with_clause` to push placeholder CTE + bindings before the body and harvest derived output columns + on body-exit per §10.3; teach `projection_item` to append to + `projection_aliases`. Keep `current_table` / + `current_table_columns` as derived helpers (top frame's + single-binding view) so the DSL paths stay green. +6. **Qualified-prefix completion** (§10.5). When the + matched-path immediately preceding an `IdentSource::Columns` + slot ends with `Ident '.'`, narrow candidates to the named + binding's columns. Unit tests: `select t.` Tab offers + `t`'s columns; an unresolved prefix returns an empty list. +7. **Post-walk fixup pass** (§10.6). Collect projection-list + `Ident` terminals during the walk; after the walk, re-resolve + each against the final `from_scope`, rewriting the highlight + class and validity diagnostic. Tests: typing `select col1 + from t` lights `col1` correctly once `t` is typed; typing + `select bogus from t` produces a column-not-found diagnostic. +8. **Diagnostic passes** (§11). Extend the schema-existence + ERROR pass to walk every `from_scope` binding plus + `cte_bindings`; add the qualified-reference and ambiguity + checks (§11.2). Add the new arity-check ERROR pass at the + CTE-body and compound-query frame-exit hooks (§11.7 case + 2). Extend the predicate-warnings pass with a + MatchedPath-walking variant covering every Phase-2 + `sql_expr` slot (§11.6) — closes the Phase-1 carry-over + gap. Author the five new `diagnostic.*` catalog keys and + the eight new `engine.*` translation keys (§11.5). + Tests: one positive and one negative case per new ERROR + key; predicate warnings firing on `select * from t where + col like 5` (the Phase-1 gap closure); arity-mismatch + ERRORs on a CTE and on a `UNION`. +9. **Result-column type resolution** (§12). Add + `"column_metadata"` to rusqlite's feature list in + `Cargo.toml`. The worker's `do_run_select` calls the new + resolver — `RawStatement::column_table_name` / + `column_origin_name` per result column — before constructing + the `DataResult`. Tests: a single-column SELECT recovers the + playground type (covering each of the ten types, the + pedagogically important one being `bool` → `true` / `false`); + a SELECT with a computed projection keeps it typeless; a + SELECT through a CTE recovers the underlying column's type + if the engine's column-origin metadata follows through the + CTE (verified, not assumed). +10. **Highlighting / completion / hint** spot-checks via the + typing-surface matrix (ADR-0022 / ADR-0030 §8): a SELECT + with a JOIN highlights the JOIN keywords; Tab past + `select t.` offers columns of `t`; column completion + inside a `WHERE` after `from a join b on …` offers both + `a`'s and `b`'s columns; column completion inside a + correlated subquery sees the outer scope; the `[ERR]` + indicator fires on a malformed subquery; an out-of-subset + construct (e.g. `OVER (…)`) produces an engine-neutral + parse error. +11. **`reject_internal_table`** spot-checks against every new + table-source slot: a `FROM __rdbms_columns` parse-rejects; + a `WITH __rdbms_x AS (…)` parse-rejects; a `FROM` inside a + CTE body referencing `__rdbms_*` parse-rejects. + +Later phases continue ADR-0030's plan unchanged — Phase 3 (DML), +Phase 4 (DDL), Phase 5 (DSL → SQL echo), Phase 6 (polish). +ADR-0030 §13 OOS items (window functions, `LATERAL`, function +allowlist, quoted identifiers) remain tracked separately and are +authored if and when they are taken up; they are not implicit +follow-ups of Phase 2. + +## See also + +- ADR-0005 — the ten-type vocabulary §10 resolves back to. +- ADR-0016 — the data-table renderer SELECT results reuse. +- ADR-0019 — the friendly-error layer engine-side rejections + route through (§7). +- ADR-0021 — per-command parse-error usage; the Phase-2 surface + inherits the framework, Phase 6 polishes per-clause messages + (§11 OOS-12). +- ADR-0022 — ambient typing assistance. §5/§6/§8 inherit its + keyword-completion / highlighting / hint mechanisms for free, + but §10 extends its `IdentSource::Columns` / `SchemaCache` / + `WalkContext` infrastructure with the scope accumulators, + qualified-prefix narrowing, and the post-walk fixup pass that + Phase 2 needs. +- ADR-0023 / ADR-0024 — the unified grammar tree Phase 2 extends. +- ADR-0026 — the `WHERE` grammar's `Subgrammar` node, depth + counter, and `MAX_SUBGRAMMAR_DEPTH = 64` cap, all reused + unchanged (§9). +- ADR-0027 — the validity indicator, free for the Phase-2 + surface; §1 (ERROR/WARNING guideline) is the source quoted + verbatim in §11; Amendment 1 (`LIKE`-on-numeric WARNING) is + the one that the SQL-expression predicate-warnings gap of + §11.6 closes for the SQL surface. +- ADR-0028 — the styled `OutputLine` mechanism the renderer + uses; not directly touched by Phase 2. +- ADR-0030 — the parent ADR; §3 commissions this phase, §4/§6 + fix execution-as-text, §7 fixes engine neutrality, §11 fixes + history / replay, §13 fixes the long-running OOS list. +- ADR-0031 — the SQL expression grammar this ADR extends + additively (§5, §6); §7 named the two extensions implemented + here. +- `docs/simple-mode-limitations.md` — the DSL limits advanced + mode lifts; Phase 2 lifts the JOIN, subquery, set-op, CTE, + and grouping limits. diff --git a/docs/adr/README.md b/docs/adr/README.md index be5a36c..c18a213 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -37,3 +37,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`) - [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`) - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2 +- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot diff --git a/docs/plans/20260520-adr-0032-phase-2.md b/docs/plans/20260520-adr-0032-phase-2.md new file mode 100644 index 0000000..760008e --- /dev/null +++ b/docs/plans/20260520-adr-0032-phase-2.md @@ -0,0 +1,881 @@ +# Implementation plan — ADR-0032 (ADR-0030 Phase 2) + +**Plan author date:** 2026-05-20 +**Driving ADR:** [ADR-0032 — The full SQL `SELECT` grammar](../adr/0032-sql-select-grammar.md) +**Parent ADR:** [ADR-0030 — Advanced mode: the standard-SQL surface](../adr/0030-advanced-mode-sql-surface.md) + +## Purpose + +ADR-0032 is the *decision*: what Phase 2 covers, how it composes, +and what consequences fall out. This plan is the *mechanics*: +the build order, the per-step exit gates, the cross-cut +verifications that prevent the kind of silent gap Phase 1 +shipped (the SQL-expression predicate-warning gap closed in +ADR-0032 §11.6). + +The plan does **not** re-litigate ADR-0032's decisions. Any +finding during implementation that suggests a decision should +change is escalated to the user and resolved by ADR amendment +(or a new ADR) before code proceeds — never by silent drift. + +## Why this plan exists + +Phase 1 of ADR-0030 shipped with a structurally green test suite +(1260 passing, 0 failing) and a believed-but-unverified "ambient +assistance comes for free" claim. The verification *missed* the +fact that ADR-0027's predicate warnings — emitted by a pass +walking the DSL `Expr` AST — could not fire on SQL expressions +because `sql_expr.rs` builds no AST (ADR-0031 §2). ADR-0032 +§11.6 documents the gap and closes it; this plan exists to make +sure Phase 2 does not ship with an analogous unnoticed gap. + +The two failure modes the plan is built to prevent: + +- **The "free for free" claim shipping without a test that + proves the claim.** Every "X works for SQL automatically" in + ADR-0030/0031/0032 needs an explicit cross-cut test. +- **A sub-phase reporting green because its own scope is + covered, while a behaviour it enables for downstream code is + silently broken.** Each sub-phase's exit gate includes the + cross-cut behaviours it unlocks, not just its own internal + correctness. + +## Working-mode reminder + +Per `CLAUDE.md`'s "Working mode — solo with sub-agents by +default": this work proceeds in solo mode unless the user +explicitly invokes `/team`. The phased structure below maps to +solo-mode's Phase 1 → 5 discipline within each sub-phase: + +- Requirements extraction (the sub-phase's scope below) is + Phase 1. +- Divergent exploration is rarely needed for individual + sub-phases here (ADR-0032 already selected the approach), but + if implementation reveals an ambiguity, it lands as a Phase-2 + exploration before code proceeds. +- The DA hat is worn at every sub-phase gate. The DA critique is + written, not silent (CLAUDE.md "Devil's Advocate + discipline"). +- Escalation rather than guessing applies throughout. + +## Standing authorizations and continuation policy + +The user has granted standing authorizations for this plan, +overriding the corresponding `CLAUDE.md` defaults *for this +plan's scope only*. They are recorded here verbatim so any +implementer (the original session, a future session, or a fresh +agent) operates on the same terms. + +### 1. Walk the plan without interruption when no interruption is needed + +The plan's sub-phases (2a–2g) have explicit, written exit gates. +Within a sub-phase, the implementer proceeds through the build +steps and verifies the gate without asking for permission at +each step. Between sub-phases, the implementer proceeds to the +next one without asking, *provided* the previous gate is green. + +The implementer does NOT interrupt to: + +- Confirm "should I continue to the next sub-phase?" +- Confirm minor implementation choices that match surrounding + code (`CLAUDE.md`'s code-style discipline applies; pick the + consistent option and move on). +- Confirm test naming, file organisation, commit granularity + within a sub-phase. +- Report progress mid-step. A status update arrives at gate + boundaries — not every five minutes. + +### 2. When the implementer MUST escalate + +The escalation rule from `CLAUDE.md` ("Escalate ambiguity — do +not decide for the user") is **not** relaxed. Standing +authorization to walk uninterrupted is *not* authorization to +guess on design questions. The implementer escalates if: + +- An exit gate cannot be made green by work in the current + sub-phase's scope — a test fails for a reason that suggests + the ADR is wrong, or a behaviour the gate requires turns out + to need machinery not in ADR-0032. +- An ambiguity arises that the ADR and this plan do not settle. + See the "Open questions to escalate before code starts" + section at the end of this plan — those are pre-identified. + Anything similar that surfaces is also escalation territory. +- An autonomous design decision is required (a choice outside + ADR-0032 §§1–13's settled list and the plan's per-sub-phase + scope). Per `CLAUDE.md`'s "Out of scope and non-blocking + require user confirmation": the default assumption is that + the work is in scope and is blocking unless the user has + said otherwise. +- A "real blocker" is encountered — environment, dependency, + or test-infrastructure failure that prevents progress and + cannot be resolved within the sub-phase. (A flaky test is + not a real blocker — investigate per `CLAUDE.md`'s "no + excuses" rule on test infrastructure.) + +### 3. Commits are pre-authorized at standard granularity + +The user has explicitly pre-authorized commits within this +plan's scope, overriding `CLAUDE.md`'s "All commits MUST be +confirmed by the user first" for this plan. The implementer: + +- **Commits at natural break points within each sub-phase** and + always at sub-phase gate boundaries. Granular is better than + monolithic. +- **Writes detailed commit messages** in the project's existing + style — `area: short subject` first line, then a body + explaining what changed and why. Recent precedent: + ``` + grammar: SQL SELECT end-to-end (ADR-0030 Phase 1) + tests: Phase 1 SQL SELECT integration tests + app: mode-threaded completion, overlay, and validity indicator + ``` + Body should reference the ADR section being implemented + (e.g., "ADR-0032 §10.2 — Node::ScopedSubgrammar variant + added") so a reader can trace each commit back to a decision. +- **Includes NO AI attribution** of any form (`Co-Authored-By`, + generated-by lines, AI footers, …). This is a global rule + (`CLAUDE.md`); the standing authorization here is for + *committing*, not for relaxing attribution discipline. + +### 4. Pushes remain a user step + +`CLAUDE.md`'s "Push is a user step — agents cannot and shall +never do it" is **not** overridden. The implementer commits +freely; the user pushes when they choose. Unpushed commits are +a normal working state, and the implementer never describes +work as blocked on a push. + +### 5. End-of-plan handoff + +At the end of 2g, after the verification report is produced and +the DA's final PASS verdict is recorded, the implementer +**stops and surfaces a summary to the user** — exact test +counts, the filled cross-cut matrix, the requirements-to-test +mapping, any autonomous decisions made and their justification. +The user confirms acceptance before the plan is considered +done. (This is the one explicit hand-back point.) + +If the user wants follow-up work — adjustments, polish, +additional tests — the implementer addresses it and re-runs the +gate before re-surfacing. The plan does not declare itself +complete; the user does. + +## Sub-phase overview + +| Sub-phase | Title | Test-suite gate | +|-----------|--------------------------------------|-------------------------------------------------| +| **2a** | Grammar fragment | All §1 productions accept; OOS shapes reject | +| **2b** | `ScopedSubgrammar` + scope accumulators | Subquery scope isolation; CTE self-reference visible | +| **2c** | Phase-1 grammar migration | All 7 Phase-1 SQL SELECT tests still green | +| **2d** | Diagnostics passes | Every catalog key has positive + negative tests; Phase-1 gap closure verified | +| **2e** | Completion + post-walk fixup | Per-slot completion matrix green; projection-before-FROM scenario verified | +| **2f** | Type resolution | Each of 10 playground types preserved through bare ref | +| **2g** | Integration sweep + cross-cut verification | All cross-cut tests green; final verification report | + +Each sub-phase ships independently: green tests, clippy clean, a +written DA gate review, and (per project rules) a user-confirmed +commit message before commit. + +--- + +## Sub-phase 2a — Grammar fragment + +### Scope (in) + +- Author `src/dsl/grammar/sql_select.rs` exporting `pub static + SQL_SELECT_STATEMENT: Node` and `pub static + SQL_SELECT_COMPOUND: Node`. +- Encode the full §1 BNF: `with_clause`, `compound_select`, + `select_core`, the projection list with `DISTINCT`/`ALL`/`*`/ + `t.*`/bare-alias, the `from_clause`/`join_clause` chain with + the five JOIN flavours, `where_clause`, `group_by_clause`, + `having_clause`, `order_by_clause`, `limit_clause` with + `OFFSET`. +- Wire `sql_expr::SQL_OR_EXPR` into every expression slot via + `Subgrammar`. +- Author the structural one-line hint key per slot (per ADR-0032 + Consequences "Hint-panel prose" bullet); the keys are placed + in the catalog with placeholder text. Rich teaching prose is + Phase 6. +- Encode the `reject_internal_table` validator on every new + table-source slot. + +### Scope (out — explicit) + +- The `ScopedSubgrammar` variant — that is 2b. +- The Phase-1 `data::SELECT` migration — that is 2c. +- Any new `WalkContext` field — that is 2b. + +### Build steps + +1. Stub `sql_select.rs` with the static node skeleton and + re-export through `grammar/mod.rs`. +2. Author each production tier as a named `static Node` in + loosest-to-tightest order, matching the §1 BNF. +3. For each production, add a unit test in the file (the + `expr.rs` / `sql_expr.rs` test pattern) that walks a + representative input. +4. Author the OOS reject tests (NATURAL JOIN, comma-FROM, + `LIMIT m, n`, `VALUES`, `LATERAL`, window-function `OVER`). +5. `cargo test` and `cargo clippy --all-targets -- -D warnings`. + +### Exit gate + +**Required green tests:** + +- Every JOIN flavour (`INNER`, bare `JOIN`, `LEFT [OUTER]`, + `RIGHT [OUTER]`, `FULL [OUTER]`, `CROSS`) accepts. +- Every set op (`UNION`, `UNION ALL`, `INTERSECT`, `EXCEPT`) + accepts in a compound chain. +- Both CTE forms (`WITH`, `WITH RECURSIVE`) accept; both column- + list and no-column-list shapes accept. +- `DISTINCT`/`ALL` accepts and rejects each other co-occurring. +- `*`, `t.*`, bare-alias projection, `AS alias` projection all + accept. +- Qualified column refs (`t.c`) accept in WHERE/projection/ + ORDER BY/HAVING/GROUP BY. +- `LIMIT n` and `LIMIT n OFFSET m` both accept; `LIMIT m, n` + rejects. +- Every §13 OOS-1 to OOS-7 shape rejects with a parse error + (not a silent accept). + +**Required negative tests:** absence of bare `RECURSIVE` (must +be `WITH RECURSIVE`); absence of `JOIN` after `INNER` (must be +`INNER JOIN`); etc. — the obvious parse-failure modes for each +production. + +**Test-suite count:** ≥ 50 new unit tests in `sql_select.rs` +(rough order; the actual count is dictated by production +coverage). + +**Other gates:** + +- `cargo test` total: 1260 (baseline) + ≥ 50 (new) = ≥ 1310. + Zero failures, one ignored (the unchanged doctest), zero + skipped. +- `cargo clippy --all-targets -- -D warnings` clean. + +### DA gate (written) + +- Does every §1 BNF production have at least one positive + test? +- Does every §13 OOS shape have at least one reject test? +- Did the implementation introduce any structure not covered by + ADR-0032 §1 or §10–§11 (an autonomous decision)? If yes, + escalate to the user before the gate is signed. +- Are the OOS rejects emitting *engine-neutral* parse errors, + or are they leaking SQLite messages? + +--- + +## Sub-phase 2b — `ScopedSubgrammar` + scope accumulators + +### Scope (in) + +- Add the new walker node variant `Node::ScopedSubgrammar(&Node)`. +- Add the `ScopeFrame` struct and `from_scope_stack: + Vec` to `WalkContext`. Add `from_scope`, + `cte_bindings`, `projection_aliases` as fields *inside* + `ScopeFrame`. +- Teach the walker driver to push/pop a fresh frame on + `ScopedSubgrammar` entry/exit. +- Rewire every Subgrammar reference to `&SQL_SELECT_COMPOUND` + in `sql_expr.rs` and `sql_select.rs` to use the new variant. +- Teach `from_clause` / `join_clause` walks to populate the + current frame's `from_scope`. +- Teach `with_clause` walks to populate the outer frame's + `cte_bindings` (placeholder push pre-body per §10.3 stage 1). +- Teach `projection_item` walks to populate the current frame's + `projection_aliases`. +- Implement §10.3 stage 2: at the CTE body's frame exit, derive + output columns from the body's projection and rewrite the + placeholder binding in the outer frame. +- Keep `current_table` / `current_table_columns` as derived + helpers (top-frame single-binding view) — verify the DSL paths + still see them correctly. + +### Scope (out — explicit) + +- Qualified-prefix completion narrowing — that is 2e. +- The post-walk fixup pass for projection identifiers — that is + 2e. +- The arity-check ERROR pass (uses the frame-exit hook from + §10.3 stage 2 but adds a new pass) — that is 2d. + +### Build steps + +1. Add the variant to `Node`; add walker driver match arm + (push, recurse, pop). +2. Add `ScopeFrame` and `from_scope_stack` to `WalkContext`; + rewrite `current_table` / `current_table_columns` accessors + to derive from the top frame. +3. Rewire Subgrammar references to ScopedSubgrammar in + `sql_select.rs` (subquery primary, IN/EXISTS tails, CTE + bodies — except for the ladder-internal recursions in + `sql_expr.rs` that stay as `Subgrammar`). +4. Teach the `from_clause` / `join_clause` walker handlers to + build and push `TableBinding`s. +5. Teach the `with_clause` handler to push the placeholder + `CteBinding` before entering the body. +6. Teach the `projection_item` handler to record aliases. +7. Implement the body-frame-exit hook that derives CTE output + columns per §10.3 stage 2 rules. +8. Unit tests for each. + +### Exit gate + +**Required green tests:** + +- An empty query through the walker still produces the + baseline output (no spurious push). +- A DSL `UPDATE T …` still sets `current_table` correctly via + the derived helper (DSL paths untouched). +- A subquery's FROM does not leak: parse `SELECT * FROM a WHERE + id IN (SELECT id FROM b)`, then probe a cursor outside the + subquery — only `a` is in `from_scope`. +- A correlated reference resolves: parse `SELECT * FROM a WHERE + EXISTS (SELECT 1 FROM b WHERE b.x = a.x)`, then probe inside + the subquery — `a` is visible via outer frame. +- A CTE name is visible inside its own body: parse `WITH r AS + (SELECT 1 UNION ALL SELECT n+1 FROM r) SELECT * FROM r`, then + probe inside the body — `r` is in `cte_bindings`. +- A CTE body's `SELECT *` derives the right output columns: + parse `WITH x AS (SELECT * FROM users) …`, verify `x`'s + derived columns match `users`' columns by name and playground + type. +- A CTE body's `SELECT a, b AS bb FROM t` derives two columns + with the right names and types (`a`: source type; `bb`: + source type since underlying is a bare ref). +- A CTE body's `SELECT a + 1 FROM t` derives one column with + `name = None`. +- A CTE column list `(c1, c2)` renames positionally; arity + mismatch is detectable at frame exit (the diagnostic is + emitted by 2d but the *detection* runs here). +- The depth cap (`MAX_SUBGRAMMAR_DEPTH = 64`) still fires on + pathological nesting — exercised through both variants. + +**Other gates:** + +- All 2a tests still green. +- All 1260 baseline tests still green (DSL paths verified + insulated). +- `cargo clippy --all-targets -- -D warnings` clean. + +### DA gate (written) + +- Did any DSL test regress? If yes, the `current_table` helper + is not deriving correctly — fix before gate signs. +- Is the `ScopedSubgrammar` push/pop happening on every entry + and every exit, including error paths? Read the walker + driver's match arm and verify. +- Does the CTE column derivation handle all six §10.3 rules? + Walk the rule table and find a test for each. +- Did 2b introduce any structure ADR-0032 doesn't sanction? + +--- + +## Sub-phase 2c — Phase-1 grammar migration + +### Scope (in) + +- Move the Phase-1 SQL `SELECT` grammar nodes from + `src/dsl/grammar/data.rs` into `src/dsl/grammar/sql_select.rs` + (or remove them if redundant with the new §1 productions). +- Rebuild the `data::SELECT` `CommandNode` so its body is a + reference to `SQL_SELECT_STATEMENT`. +- Confirm the seven Phase-1 SQL `SELECT` integration tests + still pass without modification. + +### Scope (out — explicit) + +- Any behaviour change. This is a refactor: the Phase-1 surface + must be preserved exactly. + +### Build steps + +1. Identify the Phase-1 SQL `SELECT` static nodes in `data.rs` + (`SELECT_PROJECTION_*`, `SELECT_WHERE_*`, `SELECT_ORDER_*`, + `SELECT_LIMIT_*` if separate, plus any sub-tree they + reference). +2. Move them into `sql_select.rs` or remove if the §1 + productions cover them. +3. Rewrite the `data::SELECT` `CommandNode` to reference + `SQL_SELECT_STATEMENT`. +4. Run the seven Phase-1 SQL `SELECT` integration tests. + +### Exit gate + +**Required green tests:** + +- All seven Phase-1 SQL `SELECT` integration tests + (`tests/sql_select_*.rs` or equivalent) pass without + modification. +- All 2a and 2b tests still green. +- All 1260 baseline tests still green. + +**Other gates:** + +- `cargo clippy --all-targets -- -D warnings` clean. +- Dead code: confirm via `cargo check` that the old `data.rs` + Phase-1 SELECT nodes are either removed or re-exported, not + orphaned. + +### DA gate (written) + +- Were any Phase-1 SELECT behaviours quietly altered? Diff the + Phase-1 integration test outputs (text and structural) — they + must be identical, not "equivalent". +- Did the migration produce dead code? Removing dead code is + fine; *leaving* it isn't. + +--- + +## Sub-phase 2d — Diagnostics passes + +### Scope (in) + +- Implement the three diagnostic passes per ADR-0032 §11.7: + - Schema-existence ERROR pass (extended for multi-binding + `from_scope`, qualified-reference resolution, ambiguity + detection). + - Arity-check ERROR pass (new, hooked to CTE-body and + compound-query frame-exits from 2b). + - Predicate-warnings pass (extended with a MatchedPath-walking + variant for `sql_expr` per §11.6). +- Author the five new `diagnostic.*` catalog keys (§11.5): + `unknown_qualifier`, `ambiguous_column`, + `projection_alias_misplaced`, `cte_arity_mismatch`, + `compound_arity_mismatch`. +- Author the eight new `engine.*` catalog keys for friendly- + error translations (§11.5). + +### Scope (out — explicit) + +- The post-walk projection-list fixup pass (§10.6) — that is 2e. +- Detection of engine-rejected cases (§11.4) — those stay + engine-side. + +### Build steps + +1. Extend the schema-existence pass to iterate every + `from_scope` binding and the active `cte_bindings`. +2. Add qualified-reference resolution: when a `name_or_call` + match has a `'.' identifier` tail, resolve the qualifier + against `from_scope` aliases first, then table names. +3. Add ambiguity detection: an unqualified column ref that + resolves into multiple bindings. +4. Add projection-alias-misplaced check: walk `sql_expr` slots + in WHERE/GROUP BY/HAVING; identifier matches against the + current frame's `projection_aliases` → ERROR. +5. Author the arity-check pass: at every `ScopedSubgrammar` + exit, if the frame is a CTE body, compare declared vs derived + column counts; if the parent context is a compound-query + chain, compare against the previous leg's arity. +6. Author the MatchedPath-walking predicate-warnings variant: + identify `LIKE` / `=`/`!=` with `NULL` / mismatched-type + comparison predicate-tails by node-name labels; emit the + corresponding catalog key with the offending operand's span. +7. Author the catalog entries in the i18n catalog file with + engine-neutral wording. +8. Author tests per §11.2 and §11.6. + +### Exit gate + +**Required green tests:** + +- **One positive and one negative test per new ERROR catalog + key** (the negative confirms the key does *not* fire on a + well-formed equivalent): + - `diagnostic.unknown_qualifier`: `SELECT t.c FROM u` (where + `t` is not in scope) fires; `SELECT u.c FROM u` doesn't. + - `diagnostic.ambiguous_column`: `SELECT id FROM a JOIN b ON + a.id = b.id` fires for the bare `id`; `SELECT a.id FROM a + JOIN b ON a.id = b.id` doesn't. + - `diagnostic.projection_alias_misplaced`: `SELECT a + b AS + x FROM t WHERE x > 0` fires; `… ORDER BY x` doesn't (alias + is bound by ORDER BY time). + - `diagnostic.cte_arity_mismatch`: `WITH x(a, b) AS (SELECT + 1) SELECT * FROM x` fires; matched arity doesn't. + - `diagnostic.compound_arity_mismatch`: `SELECT 1, 2 UNION + SELECT 1` fires; matched arity doesn't. +- **The Phase-1 gap closure** explicitly verified: `SELECT * + FROM products WHERE price LIKE 5` fires + `diagnostic.like_numeric` (the test that would have caught + the Phase-1 gap). Sister tests: `WHERE name = NULL` fires + `diagnostic.eq_null`; `WHERE age = 'old'` fires + `diagnostic.type_mismatch`. +- **Predicate warnings extend across every SQL slot**: the + same `LIKE`-on-numeric pattern fires in `HAVING`, `ON`, a + `CASE WHEN` clause, a projection item, and an `ORDER BY` + item — one test per slot. +- Engine.* keys: integration tests that drive the engine to + the error condition verify the friendly-error layer surfaces + the engine-neutral wording (one test per key, run-and-assert + shape). + +**Other gates:** + +- All 2a–2c tests still green. +- All 1260 baseline tests still green. +- Existing DSL predicate warnings (the DSL `Expr` AST variant) + still fire correctly — verify by re-running the DSL test + suite section that exercises them. +- `cargo clippy --all-targets -- -D warnings` clean. + +### DA gate (written) + +- Does every new catalog key have *both* a positive and a + negative test, or did any slip in with only one? +- Was the Phase-1 gap closure verified, or did the test slip + past in "well, it should work now"? +- Are the engine.* messages truly engine-neutral, or does any + one leak SQLite product/feature names? (ADR-0030 §7.) +- Did 2d cover every Phase-2 case in §11.2, or were any + silently downgraded to "engine handles"? + +--- + +## Sub-phase 2e — Completion + post-walk fixup + +### Scope (in) + +- Add qualified-prefix completion narrowing (§10.5): when the + matched path immediately preceding an `IdentSource::Columns` + slot ends with `Ident '.'`, the completer narrows candidates + to the named binding's columns. +- Implement the post-walk fixup pass for projection-list + identifiers (§10.6): collect projection `Ident` terminals + during the walk; after walk completion, re-resolve each + against the final `from_scope`; rewrite the highlight class + and validity diagnostic in the walker's accumulated output. +- The pass runs as a final stage of the walk itself (§10.6 + integration-point convention). + +### Scope (out — explicit) + +- Completion behaviour beyond column-name narrowing (keyword + completion / function names / table names) — those come for + free from the walker's expected-set machinery and need only + cross-cut verification, not new code. + +### Build steps + +1. Add a "prefix qualifier" hint to the completion API that + the walker passes when the cursor is immediately after + `Ident '.'`. +2. Extend `IdentSource::Columns` completion to honour the + qualifier hint when present. +3. Implement the projection-`Ident` recording during the walk + (append a small struct per projection identifier to a + walker-local list). +4. Implement the fixup pass that runs after the main walk: + re-resolve each recorded identifier; rewrite the highlight + run; update the diagnostics vector. +5. Tests per the exit gate. + +### Exit gate + +**Required green tests:** + +- **Qualified-prefix completion:** at the cursor in `SELECT + t.|`, candidates are `t`'s columns; at `SELECT u.|` where `u` + is not in scope, candidates are empty. +- **Alias-prefix completion:** at the cursor in `SELECT FROM a + AS x JOIN b AS y ON x.|`, candidates are `x`'s (i.e. `a`'s) + columns. +- **Qualified-prefix narrowing for CTE:** at `WITH x AS (SELECT + a, b FROM t) SELECT x.|`, candidates are `a` and `b` (the + derived columns). +- **Qualified-prefix narrowing for CTE with `SELECT *`:** at + `WITH x AS (SELECT * FROM users) SELECT x.|`, candidates are + `users`' columns (per §10.3 derivation). +- **Projection-before-FROM:** typing `SELECT col1, col2` + produces a generic-identifier highlight on `col1`/`col2` + (no `from_scope` to resolve against); typing ` FROM t` after + produces a column highlight on `col1`/`col2` if they exist + in `t`, or an unknown-identifier diagnostic if they don't — + verified within one debounce/walk cycle. +- **`ORDER BY` alias resolution:** parses `SELECT a + b AS + total FROM t ORDER BY total`; the alias resolves without a + fixup pass (single-pass walk handles it per §10.6). +- **Completion across every Phase-2 slot:** Tab at `WHERE |`, + `GROUP BY |`, `HAVING |`, `ON |`, projection `|`, and + `ORDER BY |` all offer expected candidates from the union of + `from_scope` bindings. + +**Other gates:** + +- All 2a–2d tests still green. +- All 1260 baseline tests still green. +- `cargo clippy --all-targets -- -D warnings` clean. + +### DA gate (written) + +- Does the projection-before-FROM verification actually exercise + the full re-walk cycle, or does it just call the fixup pass + directly? (The user-observable behaviour is the re-walk on + debounce; that is what the test must exercise.) +- Does the qualified-prefix narrowing produce empty candidates + when the qualifier doesn't resolve, or does it accidentally + fall back to the global column set? +- Did 2e introduce any priority/ordering for outer-frame + candidates? ADR-0032 §10.5 explicitly says ordering is + completion-tier polish, not specified here — if 2e added + priority, that is an autonomous decision and escalates. + +--- + +## Sub-phase 2f — Type resolution + +### Scope (in) + +- Add `"column_metadata"` to the rusqlite feature list in + `Cargo.toml` (alongside `"bundled"`). +- Implement the worker-side type-resolution helper that runs + after `prepare`, before iterating rows: for each result + column, query + `RawStatement::column_table_name(i)` / + `column_origin_name(i)`; if both return Some, look up the + playground type from `SchemaCache.columns_for_table`; otherwise + leave the slot as `None`. +- Call the helper from `do_run_select` before constructing the + `DataResult`. + +### Scope (out — explicit) + +- Any new walker / grammar work. This is a worker-side change + only. + +### Build steps + +1. Update `Cargo.toml`. `cargo build` to confirm the feature + resolves through `libsqlite3-sys` and the bundled SQLite is + rebuilt with `SQLITE_ENABLE_COLUMN_METADATA`. +2. Add the resolver method on `Database` (probably + `resolve_select_column_types(&Statement, &SchemaCache) -> + Vec>`). +3. Wire into `do_run_select`. +4. Tests per the exit gate. + +### Exit gate + +**Required green tests:** + +- **One test per playground type** (all ten: `text`, `int`, + `real`, `decimal`, `bool`, `date`, `datetime`, `blob`, + `serial`, `shortid`) — a bare-column SELECT of a column of + that type produces `column_types[0] = Some()` and + the renderer formats accordingly. +- **`bool` renders as `true`/`false`** (the pedagogically- + important case lifting Phase-1 §4.5). +- **Computed expression stays typeless:** `SELECT a + 1 FROM + t` produces `column_types[0] = None`; the renderer uses + neutral alignment. +- **Multi-column mixed test:** `SELECT name, age, is_active, + age + 1 FROM users` produces `[Some(text), Some(int), + Some(bool), None]`. +- **CTE pass-through:** `WITH x AS (SELECT name FROM users) + SELECT name FROM x` recovers `Some(text)` — this verifies + the engine's column-origin metadata follows through the CTE + in the bundled SQLite build (verified, not assumed). +- **Subquery result:** `SELECT (SELECT name FROM users WHERE + id = 1)` — the engine's column-origin metadata may or may not + follow through a scalar subquery; whichever it does, the test + asserts the actual behaviour (not the wished-for behaviour) + and an explanatory comment captures the engine's posture. +- **Unknown-table SELECT:** `SELECT * FROM nonexistent` — the + resolver does not panic; the parse-time ERROR already fired + in 2d. + +**Other gates:** + +- All 2a–2e tests still green. +- All 1260 baseline tests still green. +- `cargo clippy --all-targets -- -D warnings` clean. + +### DA gate (written) + +- Does the resolver handle every result-column case (named + column, expression, NULL literal, CTE pass-through) without + panicking? +- Is the rusqlite feature flag in place? `grep + column_metadata Cargo.toml`. +- Did 2f surface any cases where engine-side metadata + *doesn't* follow through (CTEs, views, subqueries)? Each + such finding must be documented as an honest limitation, + with the test capturing the actual behaviour. + +--- + +## Sub-phase 2g — Integration sweep + cross-cut verification + +### Scope (in) + +The catch-all sub-phase. Verifies the full end-to-end Phase 2 +surface and every cross-cut "X comes for free" claim. Produces +the **final phase-exit verification report** (next section). + +### Build steps + +1. Author the end-to-end integration tests (Tier 3) for + real-world-shape queries (the exit-gate list below). +2. Author or extend the cross-cut verification matrix (next + section) for every "X comes for free" claim from + ADR-0030/0031/0032. +3. Run the full test suite. Confirm 1260 baseline + all + sub-phase 2a–2f additions + the 2g integration tests. +4. Run `cargo clippy --all-targets -- -D warnings`. +5. Produce the final verification report. + +### Exit gate + +**Required end-to-end integration tests:** + +- **JOIN with WHERE:** `SELECT a.name, b.total FROM customers + a JOIN orders b ON a.id = b.customer_id WHERE a.country = + 'DE'` — runs, returns expected rows, the data-table renderer + emits a clean output. +- **Recursive CTE:** a small tree traversal (e.g., generate + integers 1..10 via `WITH RECURSIVE`) — runs, returns 10 + rows, renderer clean. +- **UNION of two SELECTs:** runs, returns the union, renderer + clean. +- **Correlated EXISTS subquery:** `SELECT name FROM customers + c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id + = c.id)` — runs, returns expected rows, renderer clean. +- **`history.log` replay:** every Phase-2 statement form + written above replays from the log faithfully (ADR-0030 §11 + unchanged). +- **`reject_internal_table`:** `FROM __rdbms_columns` parse- + rejects; `WITH __rdbms_x AS (…)` parse-rejects; a `FROM + __rdbms_*` inside a CTE body parse-rejects. + +**Cross-cut verification matrix** (next section): every claim +must have a passing test. + +**Other gates:** + +- Total test count: ≥ 1310 + (additions from 2b–2g). The + exact count depends on per-step authorship; the gate is + "every required-test row above and in the cross-cut matrix + has at least one green test", not a magic number. +- Zero failures, zero skipped (excluding the unchanged + doctest). +- `cargo clippy --all-targets -- -D warnings` clean. + +### DA gate (written) + +The final DA review, per CLAUDE.md Phase-5 verification: + +- Does every requirement on the ADR-0032 checklist have a + test? Walk the next-section matrix row by row. +- Are there any "free for free" claims from + ADR-0030/0031/0032 that have no explicit test? (The Phase-1 + gap pattern.) Each must have one. +- Did any sub-phase silently downgrade a requirement to + "later"? Phase-1 §4.1–§4.5 were autonomous decisions + flagged at handoff; any Phase-2 autonomous decision must + also be flagged with the same discipline. +- Are ALL tests green AND zero-skipped? If not, the verdict + is FAIL and 2g loops back. + +--- + +## Cross-cut verification matrix + +Each row is a claim from ADR-0030, ADR-0031, or ADR-0032 that +needs an explicit test, with the test's location. The matrix is +filled in during 2g; the gate is "every row green". + +| Claim source | Claim | Test location | Status | +|---------------------------|---------------------------------------------------------------|---------------|--------| +| ADR-0030 §8 | Syntax highlighting works for SQL keywords | (TBD) | (TBD) | +| ADR-0030 §8 | Tab completion works for SQL keywords | (TBD) | (TBD) | +| ADR-0030 §8 | Hint-panel prose appears at every SQL grammar slot | (TBD) | (TBD) | +| ADR-0030 §8 | `[ERR]`/`[WRN]` validity indicator fires for SQL | (TBD) | (TBD) | +| ADR-0030 §8 | Per-command parse-error usage fires for SQL | (TBD) | (TBD) | +| ADR-0030 §9 | Out-of-subset construct (`OVER (…)`) produces engine-neutral parse error | (TBD) | (TBD) | +| ADR-0030 §11 | Every Phase-2 SQL statement form logs to `history.log` and replays | (TBD) | (TBD) | +| ADR-0031 §5 | Highlighting works for SQL expression operators / `CASE` / function calls | (TBD) | (TBD) | +| ADR-0031 §5 | Column completion works for `IdentSource::Columns` slots in `sql_expr` | (TBD) | (TBD) | +| ADR-0031 §5 | Hint prose appears at every `sql_expr` slot | (TBD) | (TBD) | +| ADR-0032 §10.2 | DSL `Subgrammar` recursion (ADR-0026) does NOT push scope | (TBD) | (TBD) | +| ADR-0032 §10.2 | `sql_expr` ladder `Subgrammar` recursion does NOT push scope | (TBD) | (TBD) | +| ADR-0032 §10.2 | `ScopedSubgrammar` (SELECT) DOES push scope | (TBD) | (TBD) | +| ADR-0032 §10.3 | `SELECT *` CTE body derives output columns from from_scope | (TBD) | (TBD) | +| ADR-0032 §10.3 | Computed CTE projection without alias yields unnamed slot | (TBD) | (TBD) | +| ADR-0032 §10.3 | CTE `(col-list)` renames positionally | (TBD) | (TBD) | +| ADR-0032 §10.3 | Compound CTE body takes columns from first leg | (TBD) | (TBD) | +| ADR-0032 §10.3 | Recursive CTE body takes columns from non-recursive leg | (TBD) | (TBD) | +| ADR-0032 §10.6 | Projection-before-FROM re-resolves on full re-walk | (TBD) | (TBD) | +| ADR-0032 §11.6 | **Phase-1 gap: `LIKE`-on-numeric fires on SQL `WHERE`** | (TBD) | (TBD) | +| ADR-0032 §11.6 | `= NULL` fires on SQL `WHERE` | (TBD) | (TBD) | +| ADR-0032 §11.6 | Type-mismatch fires on SQL `WHERE` | (TBD) | (TBD) | +| ADR-0032 §11.6 | Predicate warnings fire on `HAVING`, `ON`, `CASE`, projection, `ORDER BY` | (TBD) | (TBD) | +| ADR-0032 §11.4 / §13 | Aggregate-in-WHERE rejected by engine, surfaced engine-neutral | (TBD) | (TBD) | +| ADR-0032 §11.4 / §13 | Non-aggregated-column-with-GROUP-BY engine-neutral | (TBD) | (TBD) | +| ADR-0032 §9 | Depth cap fires on pathological SELECT nesting (≥ 64 frames) | (TBD) | (TBD) | +| ADR-0032 §12 | Engine column-origin metadata follows through CTE | (TBD) | (TBD) | +| ADR-0032 §12 | All 10 playground types recovered via bare ref | (TBD) | (TBD) | +| ADR-0032 §13 | Every OOS shape rejects (NATURAL, USING, comma-FROM, comma-LIMIT, window, LATERAL, VALUES) | (TBD) | (TBD) | + +The implementer fills in `Test location` and `Status` (green +checkmark or red cross) as 2g proceeds. A row marked red blocks +the phase exit. + +--- + +## Final phase-exit verification report + +Produced at the end of 2g. A markdown document at +`docs/handoff/-phase-2-verification.md` (or similar). The +report contains: + +1. **Test suite totals** — passed / failed / skipped / ignored, + exact numbers from `cargo test` output. Compared against the + Phase-1 baseline (1260 / 0 / 0 / 1). +2. **The full cross-cut verification matrix** with every row + green. +3. **A requirements-to-test mapping** — every numbered decision + in ADR-0032 §§1–13 with the test(s) that prove it. +4. **A list of any autonomous decisions** made during + implementation, with rationale and explicit user-confirmation + pointer for each. (Per CLAUDE.md: an autonomous decision + without explicit user confirmation is a verdict FAIL.) +5. **The DA's written final review** — PASS or FAIL, no + "conditional" verdicts. + +Only after the report is produced and signed off does the +user-facing "Phase 2 complete" message issue. + +## Open questions to escalate before code starts + +These are matters the plan deliberately leaves to the implementer +to escalate when reached, *not* to decide silently: + +1. **Subquery type-resolution through scalar subqueries.** The + 2f exit gate notes the engine's column-origin metadata may or + may not follow through a scalar subquery. The test asserts + the actual behaviour; if the behaviour is "doesn't follow", + that becomes an honest limitation in §12 — the user is + informed, the limitation is documented. +2. **Duplicate CTE-name detection.** §11.4 leaves this as + engine-rejected (no pre-flight diagnostic). If 2g's + verification finds the engine message is confusing, escalate: + the user may want a pre-flight `diagnostic.duplicate_cte` + ERROR, which would mean a small extension to 2d. +3. **Function name completion / allowlist.** §11.4 / ADR-0030 + §13 OOS-3 leaves no allowlist. If real-world testing reveals + learners are confused by silently-admitted-then-engine-rejected + function names, escalate for an explicit ADR. + +## What this plan does NOT contain + +- Time estimates. Phase 2 is large and ADR-0032 is the first + pass through the problem; calibration is poor. Milestones, + not hours. +- Sub-phase commit granularity beyond "ship green tests at each + gate". The implementer commits at natural break points within + a sub-phase; the gate is what gets verified. +- Re-statements of ADR-0032's decisions. The plan refers to + ADR-0032 by section; the implementer reads both.