7e4bc122be
A bare table alias typed where a column is expected — `… GROUP BY o`,
with `o` aliasing `FROM Orders o` — was a blind spot: completion offered
nothing for `o`, and the hint panel called the in-scope alias an unknown
column (`no such column o on table Orders, ...`).
Completion now offers each FROM source's qualifier (alias-if-present-else
table-name) at a bare sql_expr_ident slot, folded into the column
candidate list; on an exact-qualifier partial the alias source steps
aside so the diagnostic can surface. The bare-reference diagnostic arm
emits a targeted `alias_used_as_column` / `table_used_as_column` hint
("`o` is a table alias — write `o.<column>` ...") after the
projection-alias check, so ORDER-BY alias refs still win and a genuine
unknown column still reports `unknown_column`.
Two guards keep the qualified-form advice correct: SQL only (role
`sql_expr_ident`, so the DSL `expr_column` path keeps `unknown_column`
since the DSL has no `table.column` syntax) and effective-qualifier
match (alias-if-present-else-table, so an aliased source referenced by
its shadowed real name falls through rather than being advised as
`name.<column>`). The diagnostic is a drop-in replacement for
`unknown_column` at the same span/Error severity, so verdict/overlay/hint
paths are unchanged.
ADR-0032 Amendment 3; +10 tests.
1588 lines
75 KiB
Markdown
1588 lines
75 KiB
Markdown
# ADR-0032: The full SQL `SELECT` grammar
|
||
|
||
## Status
|
||
|
||
Accepted
|
||
|
||
## Context
|
||
|
||
ADR-0030 commissions advanced mode as a body of **SQL grammar
|
||
inside the unified grammar tree** (ADR-0023/0024), phased. Phase 1
|
||
("Foundations + first `SELECT`") shipped: a single-table `SELECT`
|
||
with projection, `WHERE`, `ORDER BY`, and `LIMIT`, executed as
|
||
validated SQL text through the existing data-table renderer.
|
||
ADR-0031 authored the SQL **expression grammar** the Phase-1
|
||
`SELECT` consumed.
|
||
|
||
Phase 2 — "`SELECT` — full" — is the next slice. ADR-0030 §3 lists
|
||
it: `JOIN`s, `GROUP BY` / `HAVING`, aggregates, subquery
|
||
expressions, `UNION`/`INTERSECT`/`EXCEPT`, common table
|
||
expressions, `LIMIT … OFFSET`, qualified column references.
|
||
ADR-0030 §3 also says the full `SELECT` grammar "is each large
|
||
enough to warrant their own focused ADR when implemented — the
|
||
precedent is ADR-0026 for the `WHERE` grammar." This is that ADR.
|
||
|
||
The architecture is fixed (ADR-0030 §1, §4, §6, §8): one walker,
|
||
grammar-as-text execution, ambient assistance for free. This ADR
|
||
fixes the **shape** of the grammar — the productions, the
|
||
recursion, the additive extensions to ADR-0031's expression
|
||
fragment, and the few execution-path implications (worker-side
|
||
column-origin lookup so result columns recover their playground
|
||
type). It deliberately does **not** revisit ADR-0030's structural
|
||
decisions; references in this ADR's text to ADR-0030 §X mean
|
||
"that decision is the controlling one."
|
||
|
||
### What ADR-0030 and ADR-0031 already fix
|
||
|
||
- **No batch parser; SQL is grammar in the unified tree.**
|
||
Subquery recursion is a `Node::Subgrammar(&NAMED)` reference,
|
||
exactly as the expression ladder uses it (ADR-0031 §3).
|
||
- **No AST builder for the parts that execute as text.**
|
||
`Command::Select { sql: String }` carries the validated source;
|
||
the worker prepares and runs it (ADR-0030 §4/§6, ADR-0031 §2).
|
||
- **The `__rdbms_*` rejection** at every table-name slot
|
||
(ADR-0030 §6) — re-applied to every Phase-2 table-source slot
|
||
(`FROM`, `JOIN`, CTE-name).
|
||
- **No allowlist for function names** (ADR-0030 §13 OOS-3,
|
||
ADR-0031 §6). Aggregates (`count`, `sum`, `avg`, `min`, `max`)
|
||
parse through the generic `name_or_call` path — the grammar is
|
||
structurally aggregate-blind, by design.
|
||
- **No quoted identifiers** (ADR-0031 §7 OOS-3) — unchanged.
|
||
- **`MAX_SUBGRAMMAR_DEPTH = 64`** (ADR-0026) is the shared
|
||
recursion budget across DSL `Expr`, SQL expression, and (added
|
||
here) SQL `SELECT` recursion. No new walker capability is
|
||
introduced (§9).
|
||
|
||
### The boundary with ADR-0031
|
||
|
||
ADR-0031 §7 named two additive extensions deferred to this ADR:
|
||
|
||
- **OOS-1: subquery expressions** — `( SELECT … )` as a `primary`,
|
||
`IN ( SELECT … )`, `EXISTS ( SELECT … )`. Their grammar is fixed
|
||
in §6; they are additive `Choice` branches in `sql_expr.rs`,
|
||
recursing into the named `SELECT` fragment authored here.
|
||
- **OOS-2: qualified column references** — `t.c` / `alias.c`.
|
||
Their grammar is fixed in §5; they are an additive tail on
|
||
`name_or_call` in `sql_expr.rs`.
|
||
|
||
`sql_expr.rs` was shaped to receive both branches without
|
||
restructuring (ADR-0031 §7 promise). This ADR redeems that
|
||
promise; the changes there are strictly additive.
|
||
|
||
## Decision
|
||
|
||
### 1. The top-level `SELECT` grammar
|
||
|
||
The full statement decomposes into a top-level *compound query*
|
||
(set-operator chains around per-leg *core selects*), wrapped by
|
||
an optional `WITH` prefix and trailing `ORDER BY` / `LIMIT`:
|
||
|
||
```
|
||
select_statement := [ with_clause ] compound_select
|
||
compound_select := select_core ( set_op select_core )*
|
||
[ order_by_clause ]
|
||
[ limit_clause ]
|
||
set_op := UNION [ ALL ] | INTERSECT | EXCEPT
|
||
select_core := SELECT [ DISTINCT | ALL ]
|
||
projection_list
|
||
[ from_clause ]
|
||
[ where_clause ]
|
||
[ group_by_clause ]
|
||
[ having_clause ]
|
||
with_clause := WITH [ RECURSIVE ] cte_def
|
||
( ',' cte_def )*
|
||
cte_def := identifier [ '(' column_name_list ')' ]
|
||
AS '(' compound_select ')'
|
||
projection_list := projection_item ( ',' projection_item )*
|
||
projection_item := '*'
|
||
| identifier '.' '*'
|
||
| sql_expr [ [ AS ] identifier ]
|
||
from_clause := FROM table_source ( join_clause )*
|
||
table_source := identifier [ [ AS ] identifier ]
|
||
join_clause := [ INNER ] JOIN table_source ON sql_expr
|
||
| LEFT [ OUTER ] JOIN table_source ON sql_expr
|
||
| RIGHT [ OUTER ] JOIN table_source ON sql_expr
|
||
| FULL [ OUTER ] JOIN table_source ON sql_expr
|
||
| CROSS JOIN table_source
|
||
where_clause := WHERE sql_expr
|
||
group_by_clause := GROUP BY sql_expr ( ',' sql_expr )*
|
||
having_clause := HAVING sql_expr
|
||
order_by_clause := ORDER BY order_item ( ',' order_item )*
|
||
order_item := sql_expr [ ASC | DESC ]
|
||
limit_clause := LIMIT sql_expr [ OFFSET sql_expr ]
|
||
```
|
||
|
||
`sql_expr` is ADR-0031's `SQL_OR_EXPR`, extended additively per
|
||
§5 and §6. `column_name_list` is `identifier (, identifier)*`.
|
||
|
||
The named `static Node` exported by the new
|
||
`src/dsl/grammar/sql_select.rs` is `SQL_SELECT_STATEMENT`
|
||
(matching the full statement) and `SQL_SELECT_COMPOUND` (the
|
||
embedded form, omitting the outer `WITH`; this is what subqueries
|
||
recurse into — see §6, §9).
|
||
|
||
Notes on specific productions:
|
||
|
||
- **`FROM` stays optional.** Phase 1's autonomous decision §4.1
|
||
is upheld: `SELECT 1` and `SELECT upper('x')` continue to
|
||
parse. With JOINs landing, the absence of a `FROM` simply
|
||
means no `from_clause`/`join_clause` was matched; no extra
|
||
shape is needed.
|
||
- **Bare-alias projection (`select a x`) is admitted.** Phase 1's
|
||
autonomous decision §4.2 deliberately rejected it as
|
||
structurally ambiguous. With Phase 2's grammar — `FROM` is
|
||
the only word that can legitimately follow a projection list,
|
||
and it is a keyword in the walker's expected-set — the
|
||
ambiguity dissolves: an identifier following the last
|
||
projection expression that is not `FROM`, `,`, `WHERE`,
|
||
`GROUP`, `ORDER`, `LIMIT`, or a set-op keyword is a bare
|
||
alias, and is so admitted. This lifts a small but visible
|
||
Phase-1 limitation.
|
||
- **`SELECT [ DISTINCT | ALL ]`.** `ALL` is the default and is
|
||
admitted for symmetry; `DISTINCT` is the meaningful case. They
|
||
are mutually exclusive at this position (a `Choice`, not two
|
||
`Optional`s).
|
||
- **`identifier '.' '*'`** lives only in `projection_item`, never
|
||
in `sql_expr`. This is intentional: `t.*` is *projection
|
||
syntax*, not an expression, and admitting it as an expression
|
||
primary would let it appear in `WHERE` / `ORDER BY` / etc.
|
||
where the engine would reject it and the engine-neutral error
|
||
would be hard to phrase. The grammar simply refuses it
|
||
structurally outside projection.
|
||
- **`UNION ALL` is a single set-op,** not `UNION` followed by an
|
||
`ALL` modifier on the next leg. `set_op` is a `Choice` of the
|
||
four atoms (with `UNION` and `UNION ALL` as separate branches);
|
||
factoring `UNION [ ALL ]` is also valid but the explicit four
|
||
branches keep the matched-path classes cleaner for
|
||
highlighting.
|
||
|
||
### 2. JOIN flavours admitted
|
||
|
||
The grammar admits exactly the flavours the user picked:
|
||
|
||
- `INNER JOIN` / bare `JOIN`
|
||
- `LEFT [ OUTER ] JOIN`
|
||
- `RIGHT [ OUTER ] JOIN`
|
||
- `FULL [ OUTER ] JOIN`
|
||
- `CROSS JOIN`
|
||
|
||
The first four take a mandatory `ON sql_expr`; `CROSS JOIN`
|
||
takes none. `OUTER` is the optional explicit modifier on
|
||
`LEFT` / `RIGHT` / `FULL`.
|
||
|
||
**Explicitly out (§11):** `NATURAL JOIN`, `JOIN … USING (col)`,
|
||
and comma-list `FROM t1, t2` (the legacy implicit cross join).
|
||
The first two add grammar weight for limited teaching value;
|
||
comma-FROM teaches habits we do not want to encourage —
|
||
`CROSS JOIN` covers the same shape explicitly.
|
||
|
||
JOIN chains are admitted as a flat `( join_clause )*`. Standard
|
||
SQL is left-associative; since the grammar builds no AST and the
|
||
engine receives the source text verbatim (ADR-0030 §4), the
|
||
engine resolves the associativity. The grammar's job ends at "the
|
||
chain parses".
|
||
|
||
### 3. Set operators and compound queries
|
||
|
||
`UNION`, `UNION ALL`, `INTERSECT`, `EXCEPT` all admitted —
|
||
ADR-0030 §3's full set.
|
||
|
||
The compound shape (§1) is `select_core (set_op select_core)*`,
|
||
flat. Standard SQL gives `INTERSECT` higher precedence than
|
||
`UNION` / `EXCEPT`; the engine resolves this — the grammar admits
|
||
the chain as written. This mirrors §2's JOIN-chain decision.
|
||
A user who wants explicit grouping writes
|
||
`(SELECT … INTERSECT SELECT …) UNION SELECT …`, which falls out
|
||
of the subquery-`primary` branch (§6) — though for a top-level
|
||
statement this requires an extra `SELECT` wrapping. In practice
|
||
the engine's precedence is what learners encounter; calling it
|
||
out in the `help sql` page (ADR-0030 Phase 6) is sufficient.
|
||
|
||
`ORDER BY` / `LIMIT` on a compound apply to the whole compound,
|
||
not to a leg — fixed by the position of `order_by_clause` and
|
||
`limit_clause` in §1's `compound_select`.
|
||
|
||
### 4. CTEs (`WITH` and `WITH RECURSIVE`)
|
||
|
||
The full `with_clause` per §1. Both forms admitted: non-recursive
|
||
`WITH` for naming intermediate results, and `WITH RECURSIVE` for
|
||
recursive queries (tree traversals, transitive closure,
|
||
generated sequences).
|
||
|
||
The `cte_def` body is a parenthesised `compound_select`, so the
|
||
recursion is into `SQL_SELECT_COMPOUND` via `Subgrammar` — the
|
||
same recursion mechanism subqueries use (§9).
|
||
|
||
**CTE-name collisions.** A CTE name shares the table-name
|
||
namespace at the engine. Standard SQL: the CTE shadows a
|
||
same-named base table within the statement. The grammar is
|
||
agnostic — both are identifiers in a table-source slot — so the
|
||
shadowing falls out of engine resolution. The
|
||
`reject_internal_table` validator still rejects any `__rdbms_*`
|
||
identifier in any table-source slot, **including** CTE-name
|
||
slots and the `FROM`s inside CTE bodies. That is the right
|
||
posture: the reserved namespace is reserved everywhere.
|
||
|
||
Recursive CTEs use the standard `cte_name AS ( base_case UNION
|
||
[ALL] recursive_case )` shape — already admitted by §1's
|
||
`compound_select` body. No grammar branch specific to recursion
|
||
is needed; the `RECURSIVE` keyword is a hint to the engine, not
|
||
a grammar gate.
|
||
|
||
### 5. Qualified column references
|
||
|
||
Additive extension to `sql_expr.rs` (ADR-0031 §7 OOS-2).
|
||
`name_or_call`'s identifier prefix gains a `Choice` tail:
|
||
|
||
```
|
||
name_or_call := identifier
|
||
( '.' identifier
|
||
| '(' call_args? ')'
|
||
)?
|
||
```
|
||
|
||
The leading identifier is matched once (preserving ADR-0031 §1's
|
||
factoring — no `Choice` branch begins with an identifier). The
|
||
optional tail is *either* a qualified-reference suffix
|
||
(`. identifier`) *or* a function-call argument list (`( … )`),
|
||
not both. A bare identifier with no tail remains a plain column
|
||
reference.
|
||
|
||
A function call with a qualified name — `schema.f(…)` — is not in
|
||
scope (we have no schemas) and is structurally inadmissible by
|
||
construction: there is no production that admits both a `.`-tail
|
||
and a `(`-tail.
|
||
|
||
Completion for the qualified form: when the cursor is past
|
||
`identifier '.'`, the completion source is "columns of the table
|
||
or alias named by the leading identifier", resolved from the
|
||
active `SchemaCache` (the same source the DSL completion uses,
|
||
ADR-0030 §8). This is a small extension to the existing
|
||
`IdentSource::Columns` machinery — when in scope, column
|
||
completion is scoped to the named source.
|
||
|
||
### 6. Subquery expressions
|
||
|
||
Additive extensions to `sql_expr.rs` (ADR-0031 §7 OOS-1):
|
||
|
||
- **Scalar subquery as `primary`.** A `Choice` branch
|
||
`'(' compound_select ')'`. The existing `'(' or_expr ')'`
|
||
branch handles parenthesised expressions. Both start with
|
||
`'('`, so per ADR-0031 §1's factoring principle, the `'('` is
|
||
matched once and the inside is a `Choice` between
|
||
`compound_select` and `or_expr`. The first inside token
|
||
disambiguates: `SELECT` or `WITH` → subquery; anything else →
|
||
expression. The two `Choice` branches have non-overlapping
|
||
first-token sets, so the walker's expected-set at the
|
||
ambiguity point merges naturally without `Optional`-first
|
||
hazards.
|
||
|
||
- **`IN ( subquery )`.** The existing `predicate_tail`'s
|
||
`IN '(' additive (',' additive)* ')'` branch gains a sibling
|
||
`IN '(' compound_select ')'`. Same `'('` factoring as the
|
||
scalar case: after `'('`, branch on `SELECT`/`WITH` (subquery)
|
||
vs additive-first-token (literal list). `NOT IN` follows from
|
||
the existing `[ NOT ]` factoring on the predicate tail.
|
||
|
||
- **`[ NOT ] EXISTS ( subquery )`.** Added as a `primary`
|
||
`Choice` branch:
|
||
|
||
```
|
||
primary := … | EXISTS '(' compound_select ')'
|
||
```
|
||
|
||
The bare `EXISTS` form lives in `primary`; `NOT EXISTS` falls
|
||
out of the existing `not_expr := NOT not_expr` tier above
|
||
`primary` in the precedence ladder. This is structurally
|
||
cleaner than putting `[ NOT ] EXISTS` inside `primary`: there
|
||
is only one place `NOT` is admitted, and it composes uniformly.
|
||
|
||
All three branches recurse through `Subgrammar(&SQL_SELECT_COMPOUND)`.
|
||
Correlated subqueries fall out for free — a subquery's
|
||
`sql_expr` reaches identifiers, which the engine resolves
|
||
against outer scopes. The grammar imposes no correlation
|
||
constraint; correlation is engine-side semantics.
|
||
|
||
### 7. `GROUP BY` and `HAVING`
|
||
|
||
`GROUP BY` takes a comma-separated list of `sql_expr`s.
|
||
Standard SQL admits any expression as a grouping key (not just
|
||
bare columns) — e.g. `GROUP BY date(created_at)`. The grammar
|
||
admits this without special-casing.
|
||
|
||
`HAVING` is a single `sql_expr`. Its semantics is "boolean over
|
||
grouped rows"; the grammar does not enforce that — the
|
||
expression's typing is the engine's concern.
|
||
|
||
**Aggregate correctness is not grammar-checked.** Whether a
|
||
projection's non-aggregated columns are valid given the
|
||
`GROUP BY` keys is a semantic question. ADR-0030 §9 settled this:
|
||
the grammar admits structurally, the engine rejects semantically,
|
||
and the friendly-error layer renders engine-neutral wording
|
||
(ADR-0019). A learner who writes `SELECT Name, COUNT(*) FROM t`
|
||
sees an engine-neutral "Name must appear in a GROUP BY clause or
|
||
be wrapped in an aggregate function"-style message, not a raw
|
||
engine string and not a parse error. This is the project's
|
||
honest limitation (ADR-0030 §7) and remains so.
|
||
|
||
### 8. `LIMIT` / `OFFSET` and `ORDER BY` extras
|
||
|
||
`LIMIT n [ OFFSET m ]` — the standard form. Both `n` and `m` are
|
||
`sql_expr`s (in practice integer literals, but the grammar
|
||
admits the general form so e.g. `LIMIT max(10, x) OFFSET 0` is
|
||
structurally accepted; the engine constrains values).
|
||
|
||
The MySQL/SQLite legacy comma form `LIMIT m, n` is **out** (§11).
|
||
Its argument order (offset first, then count) inverts the
|
||
keyword form — a needless source of confusion.
|
||
|
||
`ORDER BY` already admits `sql_expr` items with optional
|
||
`ASC` / `DESC` (Phase 1). With Phase 2:
|
||
|
||
- **Column-position references** (`ORDER BY 1, 3 DESC`) fall out
|
||
for free — an integer literal is a valid `sql_expr`, and the
|
||
engine interprets a bare positive integer in `ORDER BY` as a
|
||
column position. The grammar does not distinguish the case;
|
||
rendering interprets the position. Document in `help sql`.
|
||
- **Qualified refs** in `ORDER BY` (e.g. `ORDER BY t.c`) fall
|
||
out of §5 — the grammar uses the same `sql_expr` body.
|
||
|
||
### 9. Recursion, the depth budget, and the walker
|
||
|
||
`SELECT` recurses into itself at four points:
|
||
|
||
- A subquery `primary` in `sql_expr` (§6).
|
||
- An `IN ( subquery )` predicate tail (§6).
|
||
- An `EXISTS ( subquery )` primary (§6).
|
||
- A CTE body (§4).
|
||
|
||
Every recursion is wired through
|
||
`Node::Subgrammar(&SQL_SELECT_COMPOUND)` — the named `static` Node
|
||
exported by `sql_select.rs`. The recursion is token-guarded in
|
||
every case: a subquery `primary` is preceded by `'('`; an
|
||
`IN ( subquery )` by `IN (`; an `EXISTS ( subquery )` by
|
||
`EXISTS (`; a CTE body by `AS (`. There is no left recursion;
|
||
the walker always makes progress.
|
||
|
||
`MAX_SUBGRAMMAR_DEPTH = 64` (ADR-0026, reused by ADR-0031) is
|
||
**shared**: DSL `Expr` recursion, SQL expression recursion, and
|
||
SQL `SELECT` recursion all increment the same
|
||
`WalkContext::subgrammar_depth`. A worst-case learner query
|
||
might be `SELECT … WHERE id IN (SELECT … WHERE id IN (SELECT …))`
|
||
with each inner select carrying a few-deep expression — well
|
||
below the cap. The cap remains purely a stack-overflow guard;
|
||
**this ADR does not raise it**. If pathological-but-realistic
|
||
learner queries reach 64 in practice, a focused ADR lifts it
|
||
with measurements. Speculative raising would weaken the guard
|
||
without evidence.
|
||
|
||
**No new walker capability is introduced.** `Subgrammar`, the
|
||
depth counter, the cap, and the friendly depth-exceeded error
|
||
all carry over from ADR-0026 unchanged — the same posture
|
||
ADR-0031 took. This is a non-trivial property: Phase 2 is the
|
||
biggest single grammar slice in the project, and it lands
|
||
without changing the walker's contract.
|
||
|
||
### 10. Completion scope and the `WalkContext` extension
|
||
|
||
ADR-0030 §8 promises that "ambient assistance comes for free"
|
||
because SQL is grammar in the unified tree. For Phase 1's
|
||
single-table `SELECT` this was substantially true: the existing
|
||
`WalkContext::current_table` mechanism (populated via the
|
||
`writes_table: true` flag on the `FROM` table-name slot) gave
|
||
`WHERE` and `ORDER BY` column-name completion against the right
|
||
table at no incremental cost.
|
||
|
||
Phase 2 breaks the "free" claim. Multiple `FROM` tables via
|
||
`JOIN`s, aliases, CTE-defined table sources, subqueries with their
|
||
own `FROM` scope, qualified `t.c` references, projection aliases
|
||
referenced in `ORDER BY` — every Phase-2 surface needs **scope
|
||
information that `WalkContext` does not currently carry**. §9's
|
||
"no new walker capability" claim holds for grammar recursion
|
||
(`Subgrammar` and the depth cap suffice); for completion scope it
|
||
is too strong, and is softened here to an honest split.
|
||
|
||
The current `WalkContext` carries one table at a time
|
||
(`current_table: Option<String>` + `current_table_columns`), set
|
||
by `writes_table: true` on a `Tables` identifier. DSL paths
|
||
(`update T`, `delete from T`, `insert into T`) rely on this
|
||
single-table contract and continue to work unchanged. Phase 2
|
||
adds layered accumulators alongside, not in place of.
|
||
|
||
#### 10.1. The from-scope accumulator
|
||
|
||
A new `WalkContext` field:
|
||
|
||
```
|
||
from_scope: Vec<TableBinding>
|
||
TableBinding { table: String, alias: Option<String>,
|
||
columns: Vec<TableColumn> }
|
||
```
|
||
|
||
Populated incrementally as the walker descends through
|
||
`from_clause` and each `join_clause` (§1). The first table-source
|
||
slot pushes a binding; every subsequent `JOIN` pushes another.
|
||
`Ident` slots whose `IdentSource` is `Columns` now resolve against
|
||
the union of every binding's columns, with deduplication.
|
||
|
||
`current_table` / `current_table_columns` remain as derived
|
||
helpers: when `from_scope.len() == 1`, they expose that single
|
||
binding's data, preserving the contract every existing DSL path
|
||
relies on. DSL `UPDATE` / `DELETE` / `INSERT` continue to push
|
||
exactly one binding via the existing `writes_table: true`
|
||
mechanism, unchanged.
|
||
|
||
#### 10.2. Scope-stack discipline at `Subgrammar` boundaries
|
||
|
||
Subqueries (§6) and CTE bodies (§4) introduce new lexical scopes.
|
||
A column reference inside `SELECT … WHERE id IN (SELECT id FROM
|
||
u)` resolves first against the inner `SELECT`'s `FROM` (`u`), and
|
||
— for correlation — also against the outer scope.
|
||
`subgrammar_depth` is a counter; it suffices for §9's depth cap
|
||
but not for scope.
|
||
|
||
Phase 2 layers a stack on top. A new field:
|
||
|
||
```
|
||
from_scope_stack: Vec<ScopeFrame>
|
||
ScopeFrame {
|
||
from_scope: Vec<TableBinding>,
|
||
cte_bindings: Vec<CteBinding>,
|
||
projection_aliases: Vec<String>,
|
||
}
|
||
```
|
||
|
||
The new walker node variant — `Node::ScopedSubgrammar(&Node)` —
|
||
is what triggers a scope push. It is a sibling of the existing
|
||
`Node::Subgrammar(&Node)`, with the same recursion semantics
|
||
(reference-following, depth-counted) and one additional driver
|
||
behaviour: on entry, push the current `ScopeFrame` onto
|
||
`from_scope_stack` and start a fresh empty frame; on exit, pop
|
||
back. The existing `Node::Subgrammar` variant is unchanged — DSL
|
||
`Expr` recursion (ADR-0026) and the `sql_expr.rs` precedence-
|
||
ladder recursion (ADR-0031) keep using it and never push a scope.
|
||
|
||
The grammar source spells the choice explicitly at each call
|
||
site: subqueries in `sql_expr.rs` and CTE bodies in
|
||
`sql_select.rs` reference the compound-SELECT through
|
||
`Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND)`; predicate-ladder
|
||
recursion in `sql_expr.rs` continues to use
|
||
`Node::Subgrammar(&SQL_OR_EXPR)`. Self-documenting, no flag
|
||
bookkeeping, and the walker change is localised to one extra arm
|
||
in the driver's `match` over `Node` variants.
|
||
|
||
Column-completion candidates inside a scope frame are the union
|
||
of the current frame's `from_scope` and (for correlated refs)
|
||
all outer frames; outer-frame columns are admitted as additional
|
||
candidates so correlated references work. Ordering or visual
|
||
differentiation between current-frame and outer-frame candidates
|
||
is completion-tier polish and is not specified by this ADR — the
|
||
current completion API (`candidates_at_cursor*`) returns a flat
|
||
`Vec`, and adding a priority dimension is a separate concern.
|
||
CTE bindings resolve the same way (outward-walking) — a CTE
|
||
defined in an outer query is visible inside an inner subquery as
|
||
a table source, unless the inner subquery defines a CTE of the
|
||
same name and shadows it.
|
||
|
||
This is the one explicit walker-capability extension Phase 2
|
||
makes. It is scoped: one new node variant, no new walker entry
|
||
point, no change to how Subgrammar bodies are entered
|
||
structurally. The depth cap (§9) applies to both variants
|
||
uniformly through the shared `subgrammar_depth` counter.
|
||
|
||
#### 10.3. CTE bindings
|
||
|
||
A frame-local accumulator carries CTE definitions visible in the
|
||
current scope:
|
||
|
||
```
|
||
cte_bindings: Vec<CteBinding>
|
||
CteBinding {
|
||
name: String,
|
||
columns: Vec<CteColumn>,
|
||
}
|
||
CteColumn {
|
||
name: Option<String>, // None for unnamed
|
||
// computed projections
|
||
type_: Option<Type>, // resolved playground type
|
||
// if derivable
|
||
}
|
||
```
|
||
|
||
A CTE definition `cte_name [(col-list)] AS (compound_select)`
|
||
produces a binding in two stages:
|
||
|
||
1. **Pre-body push** (so `WITH RECURSIVE` self-references resolve).
|
||
When the walker reaches `AS` and is about to enter the body's
|
||
`Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND)`, it pushes a
|
||
placeholder binding into the *outer* frame's `cte_bindings`
|
||
with `columns = []` (an empty stand-in). The CTE name is now
|
||
visible as a table source from inside the body.
|
||
2. **Body-finalised harvest** (when the body's scope frame
|
||
completes). On `ScopedSubgrammar` exit, before popping the
|
||
frame, the driver derives the body's projection-list output
|
||
columns (rules below) and rewrites the placeholder binding in
|
||
the outer frame.
|
||
|
||
**Output-column derivation rules.** Walking the body's
|
||
projection items:
|
||
|
||
| Projection item | Derived CTE column(s) |
|
||
|---------------------------------------|----------------------------------------------------------------------------------------|
|
||
| `*` | Every column from the body frame's `from_scope`, in order, with their resolved types |
|
||
| `t.*` (qualified wildcard) | Every column from binding `t` in the body frame's `from_scope`, with their types |
|
||
| `col` (bare ref, resolves uniquely) | One column: name = `col`, type = the resolved column's playground type |
|
||
| `t.col` (qualified ref) | One column: name = `col`, type = `t`'s column's type |
|
||
| `expr AS alias` or bare `expr alias` | One column: name = `alias`, type = the underlying type if `expr` is a single column ref, else `None` |
|
||
| `expr` (computed, no alias) | One column: name = `None`, type = `None` — engine assigns an implementation-defined name |
|
||
|
||
For compound bodies (`UNION` / `INTERSECT` / `EXCEPT`) the columns
|
||
come from the **first leg** per standard SQL. For recursive CTE
|
||
bodies (`WITH RECURSIVE`) the same rule — the non-recursive leg
|
||
dictates.
|
||
|
||
If a `(col-list)` was supplied on the CTE name, it **renames** the
|
||
derived columns positionally and overrides their names; types are
|
||
preserved from the derivation. If the column-count of `(col-list)`
|
||
disagrees with the body's projection arity, the grammar admits
|
||
this and the engine surfaces the mismatch — `do_run_select`'s
|
||
engine-neutral error layer carries the message (ADR-0030 §9,
|
||
ADR-0019).
|
||
|
||
**Completion past `cte_alias.|`.** Where the derivation produced
|
||
named columns (every form above except computed-no-alias), they
|
||
complete with their names and (where typed) participate in §11's
|
||
result-type resolution if the CTE's columns are projected
|
||
upstream. Where the derivation produced an unnamed column slot,
|
||
that slot is silently skipped from the qualified-prefix candidate
|
||
list — the user typing `cte.|` past it sees only the nameable
|
||
columns. The cure for "I want my expression to be referenceable
|
||
from outside the CTE" is to add an alias, which is the same cure
|
||
the engine itself enforces at execution time.
|
||
|
||
This is substantially better than the earlier "honest limitation"
|
||
posture: the common `SELECT *` body is fully resolvable; explicit
|
||
projections are resolvable; only un-aliased computed columns
|
||
elude us, and the right learner response there is the same as
|
||
the engine's right learner response — write an alias.
|
||
|
||
`cte_bindings` lives on the scope frame, so a CTE defined in an
|
||
outer query is visible inside an inner subquery as a table source
|
||
unless that subquery defines a CTE of the same name (which
|
||
shadows it, per standard SQL).
|
||
|
||
#### 10.4. Projection-alias bindings
|
||
|
||
Standard SQL admits `ORDER BY` referencing a SELECT-list alias:
|
||
`SELECT a + b AS total FROM t ORDER BY total`. A third frame-local
|
||
accumulator:
|
||
|
||
```
|
||
projection_aliases: Vec<String>
|
||
```
|
||
|
||
Each `projection_item`'s optional alias (whether `AS x` or bare
|
||
`x` — see §1) appends its name. `Ident` slots inside the trailing
|
||
`ORDER BY`'s `sql_expr`s offer projection aliases as additional
|
||
candidates alongside column names. This addresses §1's bare-alias
|
||
admission's completion behaviour at the same time.
|
||
|
||
The accumulator is not consulted inside `WHERE`, `GROUP BY`, or
|
||
`HAVING` — standard SQL forbids alias references there
|
||
(aliases are not yet bound at evaluation time). The grammar
|
||
admits them structurally regardless; the engine rejects; ADR-0019
|
||
renders the engine-neutral error.
|
||
|
||
#### 10.5. Qualified-prefix completion
|
||
|
||
§5 fixed the grammar for `t.c` references. The completion
|
||
behaviour at qualified positions:
|
||
|
||
- At an `Ident` cursor with **no prefix**, candidates are the
|
||
union of every `from_scope` binding's columns, plus
|
||
`projection_aliases` when in `ORDER BY`, deduplicated. CTE-name
|
||
candidates apply only in table-source slots, not column slots.
|
||
- At an `Ident` cursor immediately after `prefix '.'`, candidates
|
||
are **scoped**: resolve `prefix` against the active `from_scope`
|
||
(preferring alias matches over table matches, since aliases
|
||
shadow), and offer that binding's columns alone. If `prefix`
|
||
doesn't resolve to a binding, the candidate list is empty — the
|
||
walker's expected-set still surfaces the syntactic alternatives
|
||
(the user sees no column candidates but the structural error
|
||
message reports the unresolved prefix).
|
||
|
||
The qualified-prefix narrowing is a small extension to the
|
||
existing `IdentSource::Columns` handling: when the matched-path
|
||
immediately preceding the `Ident` ends with `Ident '.'`, the
|
||
completer is told the prefix and narrows accordingly. This is the
|
||
only completion-source-level change; the rest is data flowing
|
||
through the new accumulators.
|
||
|
||
#### 10.6. The projection-before-FROM problem
|
||
|
||
Standard SQL writes projection **before** `FROM`. A user typing
|
||
`select col1, col2 from mytable` produces, mid-typing, a state
|
||
where the projection list has been parsed but the `FROM` has not.
|
||
At that point the column-name completer cannot scope to
|
||
`mytable` — it does not know `mytable` is coming. Validation and
|
||
highlighting face the same problem: `col1` and `col2` cannot be
|
||
checked as belonging to `mytable` until the user types `from
|
||
mytable`. The debounced re-walk on every keystroke (ADR-0027) is
|
||
**not** sufficient on its own to fix this in a single-pass walker,
|
||
because by the time the FROM is parsed, the projection
|
||
identifiers have already been resolved (left-to-right) against
|
||
the only scope information available at that moment — the empty
|
||
`from_scope`.
|
||
|
||
There is no fully satisfying single-pass answer. Phase 2's
|
||
posture is therefore explicit:
|
||
|
||
1. **During-typing completion** of projection-list column names,
|
||
when `from_scope` is empty (no `FROM` yet), uses the unioned
|
||
`SchemaCache.columns` — every column known to the schema —
|
||
as the candidate set. This is the same global fallback Phase 1
|
||
uses and remains the right behaviour: a noisier-but-useful
|
||
completion is better than no completion.
|
||
2. **A post-walk fixup pass** re-evaluates projection-list column
|
||
refs against the *final* `from_scope` after the walk
|
||
completes. The walk records each projection `Ident`'s
|
||
span and matched-path location; once the walk reaches end-of-
|
||
input (or end-of-statement), the fixup walks the recorded
|
||
list, looks up each identifier against the final `from_scope`,
|
||
and:
|
||
- **Rewrites the highlight class** on that terminal —
|
||
downgrading "column" → "unknown identifier" when the
|
||
identifier doesn't belong to any in-scope binding,
|
||
upgrading "unknown identifier" → "column" when it does.
|
||
- **Updates the diagnostic** for the validity indicator
|
||
(ADR-0027) — a column-not-found ERROR either appears or
|
||
disappears based on the post-walk scope.
|
||
|
||
**Integration point.** The fixup runs as the **final stage of
|
||
the walk itself**, after all grammar nodes have been processed
|
||
but before `WalkResult` is returned to the caller. It mutates
|
||
the walker's accumulated highlight runs and diagnostics vector
|
||
in place, so the consumer (the renderer, the validity
|
||
indicator) sees a single coherent snapshot. This keeps the
|
||
walker the single source of truth for what reaches the
|
||
renderer — the fixup is conceptually part of "what the walker
|
||
produces", not a separate post-processing layer. The same
|
||
convention applies to the §11.6 SQL-expression predicate
|
||
warnings, which also run as a final walk stage.
|
||
3. The fixup runs on every debounced re-walk (ADR-0027 already
|
||
triggers the full walk per keystroke), so the user observes:
|
||
typing `col1, col2 from mytable`, the `col1` / `col2`
|
||
initially highlight as generic identifiers (with a soft
|
||
warning if not found anywhere in the schema); the moment
|
||
`mytable` is typed, the highlight snaps to the column class
|
||
if `col1` / `col2` belong to `mytable`, or to the
|
||
unknown-identifier diagnostic if they don't — within one
|
||
debounce cycle.
|
||
|
||
The fixup pass does not re-parse; it only re-resolves
|
||
identifiers against the final `from_scope`.
|
||
|
||
`ORDER BY` alias resolution needs no fixup. Projection precedes
|
||
`ORDER BY` in walk order, so `projection_aliases` is fully
|
||
populated by the time the walker reaches an `ORDER BY` `Ident`;
|
||
the alias-as-column-candidate is resolved in the single forward
|
||
pass.
|
||
|
||
This is the answer to the user's "I think this may be automatic"
|
||
intuition: the debounced re-walk is automatic; the
|
||
post-walk fixup pass is the new infrastructure that makes the
|
||
re-walk produce *correct* results. Without it, projection-list
|
||
column refs would forever validate against the global column set
|
||
even after the `FROM` is typed.
|
||
|
||
#### 10.7. The honest split
|
||
|
||
§9 still holds for **grammar recursion**: `Subgrammar` and the
|
||
depth cap are reused unchanged. For **completion scope**, this
|
||
section introduces:
|
||
|
||
- New `WalkContext` fields: `from_scope`, `from_scope_stack`,
|
||
`cte_bindings`, `projection_aliases`.
|
||
- Scope push/pop discipline at `SQL_SELECT_COMPOUND` `Subgrammar`
|
||
boundaries — driven by a marker on the Subgrammar target so DSL
|
||
Subgrammars are unaffected.
|
||
- A qualified-prefix narrowing in the `IdentSource::Columns`
|
||
completion path.
|
||
- A post-walk fixup pass for projection-list identifier
|
||
highlighting and validity (§10.6).
|
||
|
||
These are real walker-contract extensions. They are scoped: no
|
||
new node kinds, no new walk-driver entry points, no changes to
|
||
how Subgrammar bodies are entered structurally. The existing DSL
|
||
paths are unaffected — their grammars never push a SELECT scope,
|
||
never define a CTE, never carry projection aliases — and the
|
||
single-table `current_table` / `current_table_columns` view is
|
||
preserved as a derived helper.
|
||
|
||
§9's claim is therefore restated honestly: **grammar recursion
|
||
needs no new walker capability; completion scope needs the
|
||
additions above.**
|
||
|
||
### 11. Diagnostics for Phase-2 validation cases
|
||
|
||
ADR-0027 fixes the warning-vs-error guideline verbatim:
|
||
|
||
> **ERROR** — the input is *known* to fail. Either it does not
|
||
> parse (incomplete, or a mismatched / invalid token), or it
|
||
> parses but names something that does not exist (an unknown
|
||
> table or column).
|
||
>
|
||
> **WARNING** — the input is valid and *will* run, but is very
|
||
> likely not what a knowledgeable user wants: a type-mismatched
|
||
> comparison, or `= NULL` (both from ADR-0026 §7). Amendment 1
|
||
> adds a third trigger — `LIKE` against a numeric column.
|
||
>
|
||
> The split is *certainty of failure* versus *likely misleading*.
|
||
|
||
This section walks the Phase-2 surface case-by-case, classifies
|
||
each against that guideline, and identifies the diagnostic
|
||
machinery additions needed. It also flags a Phase-1 carry-over
|
||
gap (§11.6) that Phase 2 closes.
|
||
|
||
#### 11.1. Existing diagnostics, briefly
|
||
|
||
Two post-walk passes today (`src/dsl/walker/mod.rs`):
|
||
|
||
- **Schema-existence pass** (ERROR). Walks the `MatchedPath`,
|
||
checks every `IdentSource::Tables` / `IdentSource::Columns`
|
||
ident against `SchemaCache`. Emits `diagnostic.unknown_table`
|
||
and `diagnostic.unknown_column`. Today this assumes a single
|
||
`current_table` for column resolution.
|
||
- **Expression predicate-warnings pass** (WARNING). Walks the
|
||
parsed DSL `Expr` AST emitted by `expr.rs`'s builder.
|
||
Emits `diagnostic.eq_null`, `diagnostic.type_mismatch`,
|
||
`diagnostic.like_numeric`. Runs only on WHERE expressions in
|
||
the DSL.
|
||
|
||
Phase 2 extends both, and §11.6 fills a SQL-side gap.
|
||
|
||
#### 11.2. Phase-2 new ERROR cases
|
||
|
||
Every case below is "known to fail on the engine" — the engine
|
||
would surface a message the friendly-error layer would translate
|
||
(ADR-0019). Surfacing them as pre-flight ERROR diagnostics gives
|
||
the learner the answer one debounce cycle faster, with the
|
||
walker as the single source of truth.
|
||
|
||
- **Unknown table in any `FROM`/`JOIN` slot.** The existing
|
||
schema-existence pass extends from "the one
|
||
`current_table`" to walking every `from_scope` binding's
|
||
`table` and emitting `diagnostic.unknown_table` per
|
||
unresolved name. CTE-name slots in the active
|
||
`cte_bindings` are valid table sources and exempt from
|
||
this check.
|
||
- **Unknown CTE-as-table.** A table-source slot whose name is
|
||
not in `SchemaCache.tables` *and* not in the active
|
||
`cte_bindings` chain emits `diagnostic.unknown_table` (same
|
||
catalog key — from the learner's perspective the engine
|
||
message is the same; the slot is a "table that doesn't
|
||
exist", whether they meant a CTE or a base table).
|
||
- **Unknown table or alias in a qualified column reference**
|
||
(`t.c` where `t` doesn't resolve in the active
|
||
`from_scope`). New catalog key
|
||
`diagnostic.unknown_qualifier` `{qualifier}`.
|
||
- **Unknown column in a qualified reference** (`t.c` where `t`
|
||
resolves but `c` is not a column of that binding). Reuses
|
||
`diagnostic.unknown_column` with the column name in context.
|
||
- **Ambiguous unqualified column reference** — a column name
|
||
used unqualified that exists in two or more `from_scope`
|
||
bindings. The engine raises "ambiguous column name"; we
|
||
surface it as ERROR with a new catalog key
|
||
`diagnostic.ambiguous_column` `{column}, {qualifiers}` so
|
||
the learner sees which two tables the name appeared in.
|
||
- **Reference to a projection alias in `WHERE` / `GROUP BY` /
|
||
`HAVING`.** Standard SQL forbids it (aliases are not bound
|
||
at evaluation time). The grammar admits the identifier
|
||
structurally; a new diagnostic pass emits ERROR with a new
|
||
catalog key `diagnostic.projection_alias_misplaced`
|
||
`{alias}, {clause}`.
|
||
- **CTE column-list arity mismatch.** When `cte_name (col1,
|
||
col2, …) AS (compound_select)` declares N columns and the
|
||
body's projection (§10.3) derives M columns with N ≠ M, the
|
||
CTE harvest pass (§10.3 stage 2) emits ERROR with a new
|
||
catalog key `diagnostic.cte_arity_mismatch` `{cte},
|
||
{declared}, {actual}`.
|
||
- **Compound-query column-count mismatch.** When a `UNION` /
|
||
`INTERSECT` / `EXCEPT` chain has legs whose projection
|
||
arities differ, the engine errors at execution. Phase 2
|
||
catches it pre-flight: each leg's derived arity (the same
|
||
derivation the CTE harvest uses) is compared as the
|
||
compound is assembled. ERROR with a new catalog key
|
||
`diagnostic.compound_arity_mismatch` `{op}, {left_n},
|
||
{right_n}`.
|
||
- **Internal-table reference in any new table-source slot.**
|
||
Already a parse-time rejection via
|
||
`reject_internal_table` (§1, §4) — surfaces as a parse
|
||
error, not a post-walk diagnostic. Listed here for
|
||
completeness: the catalog key `select.internal_table`
|
||
authored in Phase 1 covers every Phase-2 slot too.
|
||
|
||
#### 11.3. Phase-2 new WARNING cases
|
||
|
||
The existing WARNING set (`= NULL`, type-mismatched
|
||
comparison, `LIKE`-on-numeric) is the right set. Phase-2 adds
|
||
**no new WARNING categories** — every Phase-2-specific case
|
||
falls into ERROR (§11.2) or engine-rejected (§11.4).
|
||
|
||
Considered and rejected as WARNINGs:
|
||
|
||
- **CTE name shadowing a base table.** Standard SQL behaviour;
|
||
often intentional (the canonical "filter to a subset, then
|
||
query as if it were the base table" pattern). No diagnostic.
|
||
- **Correlated reference without explicit qualification.**
|
||
Correlation is implicit in standard SQL; per the user
|
||
guideline a knowledgeable user does want this. The walker
|
||
validates the reference silently against the outer-frame
|
||
scope; no warning, no diagnostic.
|
||
- **Unused CTE.** A CTE defined in `WITH` but never referenced.
|
||
The engine ignores it; many learners write CTEs as
|
||
intermediate scratch space. Not a warning.
|
||
|
||
#### 11.4. Engine-rejected (no diagnostic)
|
||
|
||
These fail on the engine and surface via ADR-0019's
|
||
friendly-error layer at execution time. The walker does not
|
||
attempt pre-flight detection because:
|
||
|
||
- **Non-aggregated columns in projection with `GROUP BY`** —
|
||
detecting requires knowing which function names are
|
||
aggregates; ADR-0030 §13 OOS-3 / ADR-0031 §6 keep us
|
||
allowlist-free.
|
||
- **Aggregate function in `WHERE`** — same reason.
|
||
- **Scalar subquery returning multiple rows** — semantic, not
|
||
syntactic; requires execution.
|
||
- **Recursive CTE without a `UNION`** — requires inspection of
|
||
the body's compound shape against the recursive contract;
|
||
doable in principle, deferred as engine territory.
|
||
- **Duplicate CTE names within the same `WITH`** — checkable
|
||
in principle (walking `cte_bindings` for duplicates), but
|
||
the engine catches it cleanly. Phase 2 does not pre-flight
|
||
it; could be added later if its absence proves confusing.
|
||
- **Type-mismatched JOIN ON predicates** — the existing
|
||
expression type-mismatch warning (extended per §11.6)
|
||
handles the explicit-literal case; arbitrary-expression
|
||
cases require type inference and stay engine-side.
|
||
|
||
#### 11.5. Catalog additions
|
||
|
||
Phase 2 adds the following message-catalog keys (ADR-0019).
|
||
Every key is engine-neutral by construction.
|
||
|
||
Parse-time-detectable (post-walk diagnostic passes):
|
||
|
||
| Key | Slots |
|
||
|----------------------------------------|--------------------------------------------------|
|
||
| `diagnostic.unknown_qualifier` | `{qualifier}` |
|
||
| `diagnostic.ambiguous_column` | `{column}, {qualifiers}` |
|
||
| `diagnostic.projection_alias_misplaced`| `{alias}, {clause}` |
|
||
| `diagnostic.cte_arity_mismatch` | `{cte}, {declared}, {actual}` |
|
||
| `diagnostic.compound_arity_mismatch` | `{op}, {left_n}, {right_n}` |
|
||
|
||
Engine-error translations (friendly-error layer; reached on
|
||
execution failure):
|
||
|
||
| Key | Engine cause |
|
||
|----------------------------------------|--------------------------------------------------|
|
||
| `engine.no_such_table` | `no such table: <name>` (post-execution path) |
|
||
| `engine.no_such_column` | `no such column: <name>` (post-execution path) |
|
||
| `engine.ambiguous_column` | `ambiguous column name: <name>` |
|
||
| `engine.aggregate_misuse` | `misuse of aggregate function <name>()` |
|
||
| `engine.group_by_required` | `column must appear in the GROUP BY clause or be used in an aggregate function` (or equivalent) |
|
||
| `engine.compound_arity_mismatch` | `SELECTs to the left and right of UNION do not have the same number of result columns` (or equivalent) |
|
||
| `engine.scalar_subquery_too_many_rows` | scalar subquery cardinality violation |
|
||
| `engine.recursive_cte_malformed` | recursive CTE shape errors |
|
||
|
||
The parse-time keys and the engine keys are intentionally
|
||
separate even when they describe the same situation
|
||
(`engine.ambiguous_column` mirrors
|
||
`diagnostic.ambiguous_column`) — the parse-time message can
|
||
include the learner's typed text and span; the engine-time
|
||
message catches what the parser missed and routes through the
|
||
friendly-error layer with whatever context the engine yielded.
|
||
|
||
Two pre-existing parse-time keys are reused unchanged for
|
||
Phase-2 slots: `diagnostic.unknown_table`,
|
||
`diagnostic.unknown_column`, and the Phase-1
|
||
`select.internal_table`.
|
||
|
||
#### 11.6. The Phase-1 SQL-expression predicate-warning gap
|
||
|
||
ADR-0027 Amendment 1's `LIKE`-on-numeric warning, and ADR-0026
|
||
§7's `= NULL` and type-mismatch warnings, are emitted by a pass
|
||
that walks the **DSL** `Expr` AST. Phase 1's `sql_expr.rs`
|
||
deliberately builds **no AST** (ADR-0031 §2). The consequence
|
||
is a Phase-1 carry-over gap: **SQL `WHERE` expressions today
|
||
emit none of these warnings** — `select * from t where name
|
||
like 5` parses, the engine runs it, and the learner gets the
|
||
engine's verdict, not the friendly pre-flight nudge ADR-0027
|
||
Amendment 1 promised.
|
||
|
||
Phase 2 closes this. The predicate-warnings pass gains a
|
||
**MatchedPath-walking variant** that runs over the SQL
|
||
expression nodes and identifies the predicate shapes
|
||
structurally (a `LIKE` predicate-tail with a column-ref left
|
||
operand; a `=`/`!=` predicate-tail with a `NULL` literal
|
||
operand; a comparison predicate-tail with a column-literal
|
||
operand pair of mismatched types). It does not need an `Expr`
|
||
AST because the matched-path terminals carry both the byte spans
|
||
(for the diagnostic) and the node-name labels (for shape
|
||
identification). The same catalog keys (`diagnostic.eq_null`,
|
||
`diagnostic.type_mismatch`, `diagnostic.like_numeric`) apply
|
||
unchanged; only the pass implementation is new.
|
||
|
||
The MatchedPath-walking pass runs over **every** Phase-2
|
||
`sql_expr` slot — `WHERE`, `HAVING`, `ON`, `CASE` branches,
|
||
projection items, `ORDER BY` items — so warnings surface
|
||
uniformly across the SQL surface rather than just `WHERE`. This
|
||
is a strict improvement over Phase 1's behaviour, where even
|
||
Phase-1 SELECT WHERE expressions got no predicate warnings.
|
||
|
||
Type-resolution for the MatchedPath-walking pass: a column ref's
|
||
type comes from §10's `from_scope` (or, for `t.c`, the specific
|
||
binding); a literal's type comes from its lexical class. When
|
||
the column ref doesn't resolve (the schema-existence ERROR pass
|
||
will already have flagged it), the warning pass skips the
|
||
predicate — no point compounding diagnostics on an already-
|
||
broken reference.
|
||
|
||
#### 11.7. Mechanism summary
|
||
|
||
Three diagnostic passes by end of Phase 2, all running as final
|
||
stages of the walk (per §10.6's integration-point convention):
|
||
|
||
1. **Schema-existence ERROR pass** — extended from single
|
||
`current_table` to walking every `from_scope` binding and
|
||
the active `cte_bindings`. Adds the qualified-reference
|
||
and ambiguity checks (§11.2).
|
||
2. **Arity-check ERROR pass** (new) — runs at CTE-body and
|
||
compound-query frame-exits (the same `ScopedSubgrammar`
|
||
exit hook §10.3 uses), comparing declared vs derived
|
||
column counts.
|
||
3. **Predicate-warnings pass** — extended with a
|
||
MatchedPath-walking variant for `sql_expr` (§11.6) covering
|
||
`= NULL`, type mismatch, and `LIKE`-on-numeric across every
|
||
SQL expression slot, in addition to the existing DSL `Expr`
|
||
AST variant for DSL expressions.
|
||
|
||
Per the integration-point convention (§10.6), each pass
|
||
mutates the walker's accumulated highlight runs and diagnostics
|
||
in place; the consumer sees a single coherent snapshot.
|
||
|
||
The projection-list fixup of §10.6 is conceptually part of pass
|
||
(1) — it is the same "re-resolve identifier against final
|
||
scope" operation, applied to the small subset of identifiers
|
||
whose scope wasn't fully known at first-pass walk time.
|
||
|
||
### 12. Result-column type resolution
|
||
|
||
Phase 1's `column_types: Vec<None>` is partially lifted: where a
|
||
projection item is structurally a single column reference, the
|
||
worker resolves it back to the source column's playground type
|
||
(ADR-0005) and populates that slot in `DataResult.column_types`.
|
||
Everything else stays `None`.
|
||
|
||
This addresses Phase-1 autonomous decision §4.5 (bool SELECT
|
||
results render as `0`/`1`): a bare `bool` column now renders as
|
||
`true` / `false` again, alignment recovers, and the `show data`
|
||
rendering path is reached for the common case.
|
||
|
||
**Resolution rule.** A projection item is "structurally a single
|
||
column reference" when, after stripping an optional `[ AS ]
|
||
alias`, its expression is one of:
|
||
|
||
- An unqualified identifier (`Name`) that resolves uniquely to a
|
||
single column across the FROM tables;
|
||
- A qualified reference (`t.c` / `alias.c`) that resolves
|
||
unambiguously through the FROM aliases.
|
||
|
||
Anything else — function calls, arithmetic, `CASE`, literals,
|
||
subquery expressions, the `*` and `t.*` wildcards — keeps
|
||
`column_types[i] = None`. When resolution is ambiguous
|
||
(unqualified column name appears in two FROM tables) the
|
||
grammar admits it (engine resolves or errors); the type-resolver
|
||
returns `None` and the renderer falls back to neutral alignment.
|
||
|
||
**Implementation seam.** The strongly preferred mechanism is
|
||
**engine-side column-origin lookup**: after preparing the
|
||
statement, query the prepared statement for each result column's
|
||
underlying table and column. The engine knows authoritatively
|
||
which result columns are direct references and which are
|
||
expressions; for direct references it returns the source
|
||
table+column, for expressions it returns nothing. This avoids
|
||
re-parsing the source or adding structured projection-item data
|
||
to the `MatchedPath` — the grammar tier is not involved at all,
|
||
which preserves ADR-0031 §2's "no AST" decision and stays on the
|
||
right side of ADR-0030's "one source of truth" rule.
|
||
|
||
The Phase-2 implementer verifies that the rusqlite version
|
||
pinned in `Cargo.toml` exposes this metadata (the SQLite C API
|
||
calls are `sqlite3_column_table_name` /
|
||
`sqlite3_column_origin_name` — they have been stable for two
|
||
decades; rusqlite either exposes them directly or via the
|
||
underlying `*mut sqlite3_stmt` handle). If exposure turns out
|
||
to be awkward, the fallback is a small post-parse walk over the
|
||
projection-item subtrees in the `MatchedPath` — strictly worse
|
||
because it duplicates a slice of parsing, but available.
|
||
|
||
The resolution pass adds one method on `Database` (something
|
||
like `resolve_select_column_types`) called from `do_run_select`
|
||
before the `DataResult` is shipped. It takes the prepared
|
||
statement and the active `SchemaCache`, and returns
|
||
`Vec<Option<Type>>`. The renderer needs no change — `None`
|
||
slots already render as typeless.
|
||
|
||
This is the only execution-path change Phase 2 makes; everything
|
||
else routes through Phase 1's grammar-as-text execution.
|
||
|
||
### 13. Out of scope
|
||
|
||
- **OOS-1. Derived tables in `FROM`** — `FROM (SELECT …) [AS]
|
||
alias`. The same shapes are reachable via CTEs (§4), which
|
||
Phase 2 ships. Derived tables in `FROM` are not authored here.
|
||
- **OOS-2. `NATURAL JOIN` and `JOIN … USING (col)`.** Both are
|
||
convenience forms. NATURAL is widely considered a footgun;
|
||
USING is cleaner but adds grammar weight without lifting any
|
||
expressive ceiling. Out.
|
||
- **OOS-3. Comma-list `FROM t1, t2` (implicit cross join).** Out.
|
||
`CROSS JOIN` covers the same shape explicitly.
|
||
- **OOS-4. `LIMIT m, n` (the legacy comma form).** Out (§8).
|
||
- **OOS-5. Window functions** (`OVER (…)`, `PARTITION BY`,
|
||
window-frame syntax). A meaningful learning topic, but a large
|
||
surface of its own and out of ADR-0030's commissioned set.
|
||
- **OOS-6. `LATERAL` joins.** Not commissioned by ADR-0030.
|
||
- **OOS-7. `VALUES (…)` as a row source.** Not commissioned.
|
||
- **OOS-8. A function/aggregate allowlist** — ADR-0030 §13
|
||
OOS-3 / ADR-0031 §7 OOS-4 still apply: aggregate names parse
|
||
generically through `name_or_call`.
|
||
- **OOS-9. Quoted identifiers** (`"column name"`). Tracked as
|
||
ADR-0031 §7 OOS-3, still tracked.
|
||
- **OOS-10. Engine-checked aggregate correctness at parse
|
||
time.** The grammar admits structurally; engine rejects
|
||
semantically; ADR-0019 surfaces the engine's verdict in
|
||
engine-neutral wording (§7).
|
||
- **OOS-11. Result-column type resolution beyond bare column
|
||
refs.** Computed columns (`a + b`, `upper(name)`, `CASE …`)
|
||
stay typeless (§10).
|
||
- **OOS-12. The `help sql` page and parse-error usage entries**
|
||
for the Phase-2 surface. The grammar carries the `help_id`s
|
||
authored in this phase, but the page content and the rich
|
||
per-command usage messages are Phase 6 (ADR-0030 §10) and
|
||
ADR-0021. Phase 2 leaves the same `help_id: None` shape Phase
|
||
1 used for `select`.
|
||
|
||
## Consequences
|
||
|
||
- A new grammar file, `src/dsl/grammar/sql_select.rs`, parallel
|
||
to `sql_expr.rs`, exporting `pub static SQL_SELECT_STATEMENT:
|
||
Node` and `pub static SQL_SELECT_COMPOUND: Node`. The Phase-1
|
||
`data::SELECT` `CommandNode` is rebuilt against
|
||
`SQL_SELECT_STATEMENT` (its body becomes a `Subgrammar`
|
||
reference); the `CommandNode` itself stays.
|
||
- **Phase-1 SQL `SELECT` grammar nodes migrate.** The Phase-1
|
||
static nodes that live in `src/dsl/grammar/data.rs` for the
|
||
single-table SELECT (the projection, FROM, WHERE, ORDER-BY,
|
||
LIMIT sub-trees) move into `sql_select.rs` as the
|
||
starting-point for the §1 productions; the file leaves only
|
||
the `CommandNode` shell behind. The seven Phase-1 SQL `SELECT`
|
||
integration tests are part of the safety net for this
|
||
migration — they must continue to pass under the rebuilt
|
||
grammar, in addition to the new Phase-2 integration tests
|
||
authored in step 4 of the implementation notes.
|
||
- **Hint-panel prose** for the new clauses (JOIN flavours, ON,
|
||
GROUP BY, HAVING, UNION / INTERSECT / EXCEPT, WITH, OFFSET, the
|
||
qualified-prefix and CTE-prefix completion states) is
|
||
authored at the structural level alongside each grammar node
|
||
in step 1 — a one-liner per slot, enough to drive the hint
|
||
panel. Richer per-clause teaching prose and the `help sql`
|
||
reference page remain ADR-0030 Phase 6 work (§12 OOS-12).
|
||
- **Walker cost is expected to stay proportional to source
|
||
length.** The new accumulators are `O(bindings + aliases)`
|
||
per frame; the scope stack is bounded by `MAX_SUBGRAMMAR_DEPTH
|
||
= 64` (§9); the §10.6 post-walk fixup pass touches one entry
|
||
per projection-list `Ident` (a small set). Each debounced
|
||
keystroke (ADR-0027) walks once, fixes up once, and emits a
|
||
single coherent output. No new pathological case is
|
||
introduced — if a learner-realistic query produces a
|
||
noticeable typing-time stall, measure first and revisit the
|
||
recursion budget or the accumulator structure on evidence.
|
||
- `sql_expr.rs` gains three additive `Choice` branches and one
|
||
additive tail on `name_or_call` (§5, §6). The existing tiers
|
||
and the depth-cap discipline are unchanged. The Phase-1 tests
|
||
continue to exercise the existing branches as they stand.
|
||
- **No new walker capability** (§9). `Subgrammar`, the depth
|
||
counter, the cap, and the friendly depth error are all reused
|
||
unchanged — the same posture ADR-0031 took.
|
||
- `Command::Select { sql: String }` is unchanged. The validated
|
||
source SQL is simply larger; the worker still routes it through
|
||
`Database::run_select` and `do_run_select` (Phase 1 path).
|
||
- The worker gains a post-prepare type-resolution helper that
|
||
populates `column_types` for direct-reference projection items
|
||
(§12) via the engine's column-origin metadata. **`Cargo.toml`
|
||
gains `column_metadata` to `rusqlite`'s feature list**
|
||
(alongside the existing `bundled`); this pulls in the SQLite
|
||
`SQLITE_ENABLE_COLUMN_METADATA` compile flag and exposes
|
||
`RawStatement::column_table_name` /
|
||
`column_origin_name` / `column_database_name` on the prepared
|
||
statement. Verified against the project's pinned rusqlite
|
||
0.39.0. This is the only Phase-2 execution-path change.
|
||
- **Three diagnostic passes** (§11.7) — schema-existence
|
||
(extended), CTE/compound arity-check (new), and predicate
|
||
warnings (extended with a MatchedPath-walking variant for
|
||
`sql_expr` — §11.6). All run as final walk stages and
|
||
mutate the walker's accumulated output in place. Closes the
|
||
Phase-1 carry-over gap where SQL `WHERE` expressions emitted
|
||
no `LIKE`-on-numeric / type-mismatch / `= NULL` warnings.
|
||
- **Catalog additions** (§11.5) — five new `diagnostic.*` keys
|
||
for parse-time-detectable cases and eight new `engine.*`
|
||
keys for friendly-error layer translations of engine
|
||
messages.
|
||
- The walker's `WalkContext` gains the completion-scope
|
||
accumulators of §10 — a `from_scope_stack: Vec<ScopeFrame>`
|
||
whose top frame is the active `from_scope` / `cte_bindings` /
|
||
`projection_aliases`. A **new node variant `Node::Scoped
|
||
Subgrammar(&Node)`** (§10.2) is the trigger for push/pop;
|
||
existing `Node::Subgrammar` is unchanged so DSL `Expr` and
|
||
`sql_expr` recursion are unaffected. A post-walk fixup pass
|
||
re-resolves projection-list identifier highlighting and
|
||
validity once the final `from_scope` is known (§10.6). CTE
|
||
output columns are derived from the body's projection list at
|
||
body-frame exit, populating the binding back into the outer
|
||
frame (§10.3) — so `SELECT *` and explicit-projection CTE
|
||
bodies both yield real column completion past `cte_alias.|`.
|
||
This **softens §9's "no new walker capability" claim** for
|
||
completion scope; grammar recursion still needs nothing new.
|
||
- `__rdbms_*` rejection extends to **every** table-source slot
|
||
introduced by Phase 2: the `FROM` table, each `JOIN`'s table,
|
||
each CTE name, and the `FROM` table inside any CTE body
|
||
(§4, §6). The `reject_internal_table` validator is reused.
|
||
- Completion gains: SQL keywords for joins / set ops / `WITH` /
|
||
`GROUP` / `HAVING` / `OFFSET` (all walker-derived, no
|
||
bespoke code); column completion scoped to a qualified prefix
|
||
`t.` resolves through the active `SchemaCache` (§5).
|
||
- Phase-1 autonomous decisions §4.1 and §4.3–§4.4 stand (optional
|
||
`FROM`, `help_id: None`, walker-mode defaults). §4.2 is lifted
|
||
(bare-alias projection admitted, §1). §4.5 is partially lifted
|
||
(bare bool column refs recover their type via §12).
|
||
- `requirements.md`'s `Q1` / `Q2` advance further; `Q4` was
|
||
already ticked by ADR-0030 and ADR-0031.
|
||
|
||
## Implementation notes
|
||
|
||
A build order, each step guarded by the test suite. The phases
|
||
within Phase 2 mirror the ADR-0030 / ADR-0031 staging — grammar
|
||
first, execution-path change last.
|
||
|
||
**Detailed plan: `docs/plans/20260520-adr-0032-phase-2.md`.**
|
||
The notes below are the outline; the plan refines them into
|
||
seven sub-phases (2a–2g) with per-gate exit criteria, a
|
||
cross-cut verification matrix that explicitly tests every
|
||
"X comes for free" claim from ADR-0030/0031/0032 (the kind of
|
||
implicit claim that produced the Phase-1 SQL-expression
|
||
predicate-warning gap §11.6 closes), and a final phase-exit
|
||
verification report template. Implementers work through the
|
||
plan; the ADR remains the decisions.
|
||
|
||
1. **The `sql_select.rs` grammar fragment.** Author the
|
||
stratified tiers of §1 as named `static` `Node`s, recursion
|
||
via `Subgrammar`. Export `SQL_SELECT_STATEMENT` and
|
||
`SQL_SELECT_COMPOUND`. The existing `data::SELECT`
|
||
`CommandNode` is rebuilt against `SQL_SELECT_STATEMENT`.
|
||
2. **Unit tests** against the fragment directly (the
|
||
`expr.rs` / `sql_expr.rs` test pattern): JOIN flavours,
|
||
GROUP BY / HAVING, qualified refs, every set-op, recursive
|
||
and non-recursive CTEs, `LIMIT … OFFSET`, `DISTINCT`,
|
||
`t.*` projection, the bare-alias projection, plus the
|
||
keyword-case-insensitivity check.
|
||
3. **`sql_expr.rs` additive extensions** (§5, §6): the
|
||
qualified-ref tail on `name_or_call`; the scalar-subquery
|
||
`primary` branch; the `IN (subquery)` predicate-tail branch;
|
||
the `EXISTS (subquery)` `primary` branch. Unit tests for each.
|
||
4. **Integration tests** (the `tests/` Tier-3 path, building on
|
||
Phase 1's SQL `SELECT` tests): each JOIN flavour returns the
|
||
expected rows; GROUP BY / HAVING aggregates over real data;
|
||
`UNION` / `INTERSECT` / `EXCEPT` between two SELECTs; a
|
||
non-recursive CTE; a recursive CTE (a small tree traversal
|
||
or generated-sequence example); a scalar subquery in
|
||
`WHERE`; `IN (SELECT …)`; `EXISTS (…)`; qualified refs
|
||
resolving correctly.
|
||
5. **The `WalkContext` scope accumulators** (§10). Add the
|
||
`ScopeFrame` type (`from_scope` / `cte_bindings` /
|
||
`projection_aliases`) and the `from_scope_stack`; add the
|
||
`Node::ScopedSubgrammar(&Node)` variant alongside the
|
||
existing `Node::Subgrammar`; teach the driver to push/pop a
|
||
fresh frame on `ScopedSubgrammar` entry/exit; rewrite every
|
||
reference to `&SQL_SELECT_COMPOUND` from outside its own
|
||
definition to use the new variant (subqueries in
|
||
`sql_expr.rs`, CTE bodies in `sql_select.rs`); teach
|
||
`from_clause` / `join_clause` to populate the frame's
|
||
`from_scope`; teach `with_clause` to push placeholder CTE
|
||
bindings before the body and harvest derived output columns
|
||
on body-exit per §10.3; teach `projection_item` to append to
|
||
`projection_aliases`. Keep `current_table` /
|
||
`current_table_columns` as derived helpers (top frame's
|
||
single-binding view) so the DSL paths stay green.
|
||
6. **Qualified-prefix completion** (§10.5). When the
|
||
matched-path immediately preceding an `IdentSource::Columns`
|
||
slot ends with `Ident '.'`, narrow candidates to the named
|
||
binding's columns. Unit tests: `select t.` Tab offers
|
||
`t`'s columns; an unresolved prefix returns an empty list.
|
||
7. **Post-walk fixup pass** (§10.6). Collect projection-list
|
||
`Ident` terminals during the walk; after the walk, re-resolve
|
||
each against the final `from_scope`, rewriting the highlight
|
||
class and validity diagnostic. Tests: typing `select col1
|
||
from t` lights `col1` correctly once `t` is typed; typing
|
||
`select bogus from t` produces a column-not-found diagnostic.
|
||
8. **Diagnostic passes** (§11). Extend the schema-existence
|
||
ERROR pass to walk every `from_scope` binding plus
|
||
`cte_bindings`; add the qualified-reference and ambiguity
|
||
checks (§11.2). Add the new arity-check ERROR pass at the
|
||
CTE-body and compound-query frame-exit hooks (§11.7 case
|
||
2). Extend the predicate-warnings pass with a
|
||
MatchedPath-walking variant covering every Phase-2
|
||
`sql_expr` slot (§11.6) — closes the Phase-1 carry-over
|
||
gap. Author the five new `diagnostic.*` catalog keys and
|
||
the eight new `engine.*` translation keys (§11.5).
|
||
Tests: one positive and one negative case per new ERROR
|
||
key; predicate warnings firing on `select * from t where
|
||
col like 5` (the Phase-1 gap closure); arity-mismatch
|
||
ERRORs on a CTE and on a `UNION`.
|
||
9. **Result-column type resolution** (§12). Add
|
||
`"column_metadata"` to rusqlite's feature list in
|
||
`Cargo.toml`. The worker's `do_run_select` calls the new
|
||
resolver — `RawStatement::column_table_name` /
|
||
`column_origin_name` per result column — before constructing
|
||
the `DataResult`. Tests: a single-column SELECT recovers the
|
||
playground type (covering each of the ten types, the
|
||
pedagogically important one being `bool` → `true` / `false`);
|
||
a SELECT with a computed projection keeps it typeless; a
|
||
SELECT through a CTE recovers the underlying column's type
|
||
if the engine's column-origin metadata follows through the
|
||
CTE (verified, not assumed).
|
||
10. **Highlighting / completion / hint** spot-checks via the
|
||
typing-surface matrix (ADR-0022 / ADR-0030 §8): a SELECT
|
||
with a JOIN highlights the JOIN keywords; Tab past
|
||
`select t.` offers columns of `t`; column completion
|
||
inside a `WHERE` after `from a join b on …` offers both
|
||
`a`'s and `b`'s columns; column completion inside a
|
||
correlated subquery sees the outer scope; the `[ERR]`
|
||
indicator fires on a malformed subquery; an out-of-subset
|
||
construct (e.g. `OVER (…)`) produces an engine-neutral
|
||
parse error.
|
||
11. **`reject_internal_table`** spot-checks against every new
|
||
table-source slot: a `FROM __rdbms_columns` parse-rejects;
|
||
a `WITH __rdbms_x AS (…)` parse-rejects; a `FROM` inside a
|
||
CTE body referencing `__rdbms_*` parse-rejects.
|
||
|
||
Later phases continue ADR-0030's plan unchanged — Phase 3 (DML),
|
||
Phase 4 (DDL), Phase 5 (DSL → SQL echo), Phase 6 (polish).
|
||
ADR-0030 §13 OOS items (window functions, `LATERAL`, function
|
||
allowlist, quoted identifiers) remain tracked separately and are
|
||
authored if and when they are taken up; they are not implicit
|
||
follow-ups of Phase 2.
|
||
|
||
## Amendment 1 — Empirical scope of column-origin metadata (2026-05-20)
|
||
|
||
§12 was written conservatively: it constrained type recovery to
|
||
projection items "structurally a single column reference" and
|
||
listed "subquery expressions" alongside arithmetic and `CASE` as
|
||
cases that stay `None`. The implementation plan's Open Question 1
|
||
(`docs/plans/20260520-adr-0032-phase-2.md`) captured the matching
|
||
uncertainty about CTEs and scalar subqueries, leaving the test in
|
||
sub-phase 2f to "assert the actual behaviour (not the wished-for
|
||
behaviour)".
|
||
|
||
A throwaway probe against the pinned bundled SQLite (run
|
||
2026-05-20, with `rusqlite` 0.39.0 + `column_metadata`) settles
|
||
the question. Across twenty representative query shapes, the
|
||
engine's `sqlite3_column_table_name` / `sqlite3_column_origin_name`
|
||
metadata follows through:
|
||
|
||
- direct bare column refs (the baseline);
|
||
- `AS alias` projections (the alias remaps the output name but
|
||
the origin pair stays the source `(table, column)`);
|
||
- table-alias qualified refs (`u.name` → `(users, name)`);
|
||
- non-recursive CTEs, including `SELECT *` bodies, bare-ref
|
||
bodies, qualified-ref bodies, and `(col-list)`-renamed
|
||
bodies (the rename remaps the output name; origin stays the
|
||
underlying column);
|
||
- CTE chains (a CTE that selects from a prior CTE — origin
|
||
traces back to the base table);
|
||
- derived tables in `FROM (SELECT …) AS sub` (out-of-scope for
|
||
Phase 2 per §13 OOS-1, but useful to note: if ever admitted,
|
||
type recovery comes for free);
|
||
- scalar subqueries used as a projection primary (`SELECT
|
||
(SELECT name FROM users WHERE id = 1)` — origin is preserved
|
||
whether the subquery has an outer alias or not);
|
||
- `UNION` / `UNION ALL` / `INTERSECT` / `EXCEPT` compound
|
||
queries (result columns carry the first leg's origin);
|
||
- multi-table `JOIN` projections (per-column origin per leg);
|
||
- `IN (SELECT …)` subqueries in `WHERE` (the inner subquery
|
||
does not affect the outer projection's origin).
|
||
|
||
The metadata returns `None` for exactly two structural classes:
|
||
|
||
- **Computed projections** — function calls, arithmetic
|
||
expressions, string concatenation, `CASE` expressions,
|
||
literals, the `*` and `t.*` wildcards. Expected; pedagogically
|
||
obvious; no surprise for the learner.
|
||
- **Recursive CTE result columns** (`WITH RECURSIVE r(n) AS
|
||
(SELECT 1 UNION ALL SELECT n + 1 FROM r WHERE n < 5) SELECT n
|
||
FROM r`). The recursion materialises through an internal
|
||
temporary table that has no base-column origin to point at.
|
||
This is the one structural surprise — a recursive-CTE result
|
||
column is typeless even when it is structurally a bare name
|
||
reference, because the engine cannot trace the column back
|
||
past the recursion.
|
||
|
||
### What §12's resolution rule becomes
|
||
|
||
The original §12 rule classifies projection items structurally
|
||
(unqualified ident / qualified ref → recover; everything else →
|
||
None). The empirical finding makes that classification redundant
|
||
and slightly wrong: it misses scalar subqueries and CTE-routed
|
||
refs that the engine does carry through, and it would have
|
||
needed extending for `(col-list)`-renamed CTEs.
|
||
|
||
The amended posture: **trust the engine's column-origin metadata
|
||
verbatim**. For each result column, call
|
||
`column_table_name(i)` / `column_origin_name(i)`. If both return
|
||
`Some`, look the pair up in the active `SchemaCache` and use the
|
||
playground type. If either is `None`, the slot stays `None` and
|
||
the renderer falls back to neutral alignment. No structural
|
||
classification of the projection item is needed; the grammar tier
|
||
stays uninvolved (preserving ADR-0031 §2's "no AST" decision and
|
||
ADR-0030's "one source of truth" rule, both as before).
|
||
|
||
The "structurally a single column reference" definition in §12's
|
||
**Resolution rule** is superseded by the engine-driven rule
|
||
above. The §12 **Implementation seam** is unchanged in approach
|
||
(engine-side column-origin lookup is still the mechanism), but
|
||
the speculative fallback paragraph ("If exposure turns out to be
|
||
awkward, the fallback is a small post-parse walk over the
|
||
projection-item subtrees in the `MatchedPath`") is moot — the
|
||
exposure works, and the engine's metadata is broader than a
|
||
grammar-side walk could be without re-implementing SQLite's
|
||
query-planner traceback. The fallback path is removed.
|
||
|
||
### Effect on the Phase-2 plan's sub-phase 2f
|
||
|
||
The 2f exit gate's "CTE pass-through" row should be asserted
|
||
positive (recovers `Some(text)`). The "Subquery result" row,
|
||
which the plan left as "assert whichever behaviour the engine
|
||
exhibits", should be asserted positive as well. A new explicit
|
||
2f test row covers the named limitation: a recursive CTE result
|
||
column must produce `column_types[0] = None` and the renderer
|
||
must fall back to neutral alignment without panicking.
|
||
|
||
The catalog and grammar-side work in 2a–2e is unaffected by this
|
||
amendment. Only 2f's test list and the worker's
|
||
`resolve_select_column_types` helper change shape (the helper
|
||
becomes simpler — no structural classification, just a direct
|
||
metadata lookup per result column).
|
||
|
||
This amendment narrows the honest limitation in §12 from
|
||
"computed / non-direct projection items" to "computed projections
|
||
and recursive CTE result columns" — a tighter, factually
|
||
verified carve-out.
|
||
|
||
## Amendment 2 — §10.6 fixup-pass mechanism (2026-05-20)
|
||
|
||
§10.6's prescription for the post-walk fixup is written in
|
||
terms of "rewriting the highlight class" on projection-list
|
||
`Ident` terminals — downgrading "column" → "unknown identifier"
|
||
when an ident doesn't belong to the eventual `from_scope`, or
|
||
upgrading the reverse direction once a `FROM` is typed. The
|
||
implementation chose a different mechanism that achieves the
|
||
identical user-visible effect; this Amendment records the
|
||
choice so a reader of §10.6 doesn't go looking for a literal
|
||
`per_byte_class` rewrite step that does not exist.
|
||
|
||
### Mechanism actually used
|
||
|
||
Two pieces, both already in the codebase by the end of
|
||
sub-phase 2d:
|
||
|
||
1. **Two-pass schema-existence diagnostic.** The 2d rewrite of
|
||
`schema_existence_diagnostics` (`src/dsl/walker/mod.rs`)
|
||
runs a pre-pass over the matched path that collects every
|
||
`IdentSource::Tables` / `cte_name` / `table_alias` ident
|
||
into a single binding vec, regardless of where in the path
|
||
it sits. The main pass then resolves each `sql_expr_ident`
|
||
against the **complete** binding set. A projection ident
|
||
that resolves under the eventual FROM scope produces no
|
||
diagnostic; one that doesn't produces an
|
||
`unknown_column` diagnostic on its own span.
|
||
|
||
2. **Diagnostic-overlay renderer.** `src/input_render.rs`
|
||
reads the walker's diagnostic list at every keystroke and
|
||
overlays each diagnostic's span with the appropriate
|
||
colour (Error red for unknown-column, Warning for
|
||
type-mismatch / `LIKE`-on-numeric / etc.). The overlay
|
||
sits on top of the walker's `per_byte_class` (which keeps
|
||
all idents at `HighlightClass::Identifier`).
|
||
|
||
Combined, the two yield the §10.6 user-visible behaviour:
|
||
typing `select bogus_col`, the diagnostic emits and the
|
||
overlay paints the ident red as soon as a FROM appears that
|
||
shows the column doesn't exist; typing `select real_col`, no
|
||
diagnostic emits and the ident stays Identifier-coloured.
|
||
Within one debounce cycle.
|
||
|
||
### Why this is equivalent
|
||
|
||
§10.6's stated goal is correctness of the end-of-walk
|
||
classification — "rewriting the highlight class" is one
|
||
implementation strategy for that goal. The HighlightClass
|
||
enum in the codebase has only one identifier slot
|
||
(`Identifier`); the Error tint comes from diagnostic overlay,
|
||
not from a separate `Column` vs `UnknownIdentifier` class.
|
||
The two-pass diagnostic pass is the "post-walk fixup" that
|
||
§10.6 calls for — it just runs inside the diagnostic emitter
|
||
rather than as a separate rewrite step. The integration
|
||
point (§10.6's "final stage of the walk itself") still
|
||
holds: `schema_existence_diagnostics` runs after the walk's
|
||
main work, mutating the walker's accumulated diagnostic
|
||
vector in place. Consumers see a single coherent snapshot.
|
||
|
||
### Completion mid-typing
|
||
|
||
§10.6's second user-visible promise — "during-typing
|
||
completion of projection-list column names uses the global
|
||
fallback" — is preserved as a posture, but improved at the
|
||
edges in sub-phase 2e by a look-ahead probe in
|
||
`src/completion.rs`. When the leading walk produces no
|
||
`from_scope` (the projection-before-FROM state) **and** the
|
||
full input does have a FROM after the cursor, a second walk
|
||
on the full input populates the binding set, and column
|
||
candidates narrow to that scope. The fallback to global
|
||
`SchemaCache.columns` remains the path when the full input
|
||
doesn't parse cleanly (e.g., the user deleted `*` and is
|
||
mid-edit). This is a strict improvement: the realistic
|
||
"edit an existing query" workflow now narrows correctly.
|
||
|
||
### What §10.6's prescription becomes
|
||
|
||
The "rewrite the highlight class" wording is superseded by:
|
||
**the post-walk diagnostic pass re-resolves projection
|
||
idents against the complete scope and emits / withholds the
|
||
unknown-column diagnostic accordingly; the renderer's
|
||
diagnostic-overlay path achieves the visual change**. No
|
||
new `HighlightClass` variant is required.
|
||
|
||
§10.6's other prescriptions stand verbatim — the integration
|
||
point (final walk stage, in-place mutation of walker
|
||
accumulators), the per-keystroke re-walk (ADR-0027's
|
||
debounced cadence), and the ORDER BY no-fixup-needed
|
||
clarification.
|
||
|
||
## Amendment 3 — bare table aliases in expression slots (2026-06-12)
|
||
|
||
Issue #31. A bare in-scope table alias typed where the grammar
|
||
expects a column — `… GROUP BY o`, with `o` aliasing
|
||
`FROM Orders o` — was a blind spot in two surfaces:
|
||
|
||
- **Completion (§10).** §10.5 narrows columns *past* a
|
||
`qualifier .`, but the bare-ident slot before the dot offered
|
||
only columns and function names, never the aliases themselves.
|
||
A learner mid-typing `o` toward `o.<column>` got no Tab help.
|
||
- **Diagnostics (§11.2).** §11.2 added `projection_alias_misplaced`
|
||
for a *projection* alias used in a forbidden clause, but a bare
|
||
*table* alias fell through to the generic `unknown_column`
|
||
bare-reference check (§11.2's `matched.len() == 0` arm), which
|
||
reported `no such column \`o\` on table \`Orders, …\`` — calling
|
||
an in-scope alias an unknown column.
|
||
|
||
### What changes
|
||
|
||
1. **Completion offers in-scope FROM qualifiers at a bare
|
||
`sql_expr_ident` slot** (one not already past a `qualifier .`).
|
||
Each binding contributes its *qualifier* — the alias if it has
|
||
one, else the table name (an aliased source must be referenced
|
||
by its alias). Folded into the existing `IdentSource::Columns`
|
||
candidate list so it sorts / dedups / colours uniformly. When
|
||
the partial *exactly* matches an in-scope qualifier the alias
|
||
source steps aside: discoverability is already served, and
|
||
suppressing sibling aliases lets the diagnostic below surface
|
||
(rather than being hidden by the `typing_over_diag` path).
|
||
|
||
2. **A bare ident matching an in-scope qualifier now emits a
|
||
targeted diagnostic** instead of `unknown_column`, checked in
|
||
the `matched.len() == 0` arm *after* the projection-alias check
|
||
(so an ORDER-BY projection-alias reference still wins). It is a
|
||
drop-in replacement at the same span and `Error` severity — only
|
||
the message text changes — so the validity verdict, token
|
||
overlay, and hint-panel paths behave exactly as they did for
|
||
`unknown_column`:
|
||
- `diagnostic.alias_used_as_column` — `` `o` is a table alias —
|
||
write `o.<column>` to reference one of its columns `` (the
|
||
binding has an alias), or
|
||
- `diagnostic.table_used_as_column` — same shape, "is a table"
|
||
(an un-aliased table source).
|
||
|
||
Two guards keep the qualified-form advice correct (both covered
|
||
by regression tests):
|
||
- **SQL only.** The branch fires only for `role ==
|
||
"sql_expr_ident"`. The DSL `Expr` (role `expr_column`) reaches
|
||
the same arm but has no `table.column` syntax, so a DSL bare
|
||
table-name ref keeps the generic `unknown_column` — advising
|
||
the qualified form there would be wrong.
|
||
- **Effective-qualifier match.** It matches the binding's
|
||
*effective qualifier* — the alias if present, else the table
|
||
name — not the table name independently. An aliased source
|
||
must be referenced by its alias (`FROM a x … GROUP BY a` is
|
||
invalid SQL), so the shadowed real name `a` falls through to
|
||
`unknown_column` rather than being advised as `a.<column>`.
|
||
This mirrors the completion side's qualifier rule exactly.
|
||
|
||
A genuine unknown column (matching no alias, table, or column)
|
||
still reports `unknown_column` verbatim.
|
||
|
||
The message tail is deliberately clause-neutral ("to reference
|
||
one of its columns") rather than GROUP-BY-specific, because the
|
||
bare-reference arm fires across the projection, `WHERE`,
|
||
`GROUP BY`, and `HAVING`.
|
||
|
||
This is an additive refinement of §10 and §11.2; no grammar node
|
||
changes.
|
||
|
||
## See also
|
||
|
||
- ADR-0005 — the ten-type vocabulary §10 resolves back to.
|
||
- ADR-0016 — the data-table renderer SELECT results reuse.
|
||
- ADR-0019 — the friendly-error layer engine-side rejections
|
||
route through (§7).
|
||
- ADR-0021 — per-command parse-error usage; the Phase-2 surface
|
||
inherits the framework, Phase 6 polishes per-clause messages
|
||
(§11 OOS-12).
|
||
- ADR-0022 — ambient typing assistance. §5/§6/§8 inherit its
|
||
keyword-completion / highlighting / hint mechanisms for free,
|
||
but §10 extends its `IdentSource::Columns` / `SchemaCache` /
|
||
`WalkContext` infrastructure with the scope accumulators,
|
||
qualified-prefix narrowing, and the post-walk fixup pass that
|
||
Phase 2 needs.
|
||
- ADR-0023 / ADR-0024 — the unified grammar tree Phase 2 extends.
|
||
- ADR-0026 — the `WHERE` grammar's `Subgrammar` node, depth
|
||
counter, and `MAX_SUBGRAMMAR_DEPTH = 64` cap, all reused
|
||
unchanged (§9).
|
||
- ADR-0027 — the validity indicator, free for the Phase-2
|
||
surface; §1 (ERROR/WARNING guideline) is the source quoted
|
||
verbatim in §11; Amendment 1 (`LIKE`-on-numeric WARNING) is
|
||
the one that the SQL-expression predicate-warnings gap of
|
||
§11.6 closes for the SQL surface.
|
||
- ADR-0028 — the styled `OutputLine` mechanism the renderer
|
||
uses; not directly touched by Phase 2.
|
||
- ADR-0030 — the parent ADR; §3 commissions this phase, §4/§6
|
||
fix execution-as-text, §7 fixes engine neutrality, §11 fixes
|
||
history / replay, §13 fixes the long-running OOS list.
|
||
- ADR-0031 — the SQL expression grammar this ADR extends
|
||
additively (§5, §6); §7 named the two extensions implemented
|
||
here.
|
||
- `docs/simple-mode-limitations.md` — the DSL limits advanced
|
||
mode lifts; Phase 2 lifts the JOIN, subquery, set-op, CTE,
|
||
and grouping limits.
|