Amendment 2 records the §10.6 fixup-pass mechanism choice. §10.6 prescribes "rewriting the highlight class" on projection-list idents at end-of-walk; the actual implementation uses a different mechanism that achieves the identical user-visible behavior: 1. 2d's two-pass schema-existence diagnostic collects every FROM binding from the matched path first, then resolves projection idents against the complete scope. The post-walk re-resolve §10.6 calls for, just embedded in the diagnostic emitter. 2. input_render.rs's diagnostic-overlay path colors each diagnostic span Error/Warning, achieving the visual change §10.6 describes without needing a new HighlightClass variant. The completion-mid-typing piece is improved by the §10.5 look-ahead probe (sub-phase 2e earlier). Four new regression tests in `projection_before_from_tests` pin the behavior so a future refactor can't silently regress it: correct ident resolves silently, unknown ident flags via diagnostic on its span, multi-projection only flags unknowns, projection-without-FROM is silent. ADR index entry updated to reference Amendment 2. Test totals: 1424 → 1428 passing (+4). Clippy clean.
72 KiB
ADR-0032: The full SQL SELECT grammar
Status
Accepted
Context
ADR-0030 commissions advanced mode as a body of SQL grammar
inside the unified grammar tree (ADR-0023/0024), phased. Phase 1
("Foundations + first SELECT") shipped: a single-table SELECT
with projection, WHERE, ORDER BY, and LIMIT, executed as
validated SQL text through the existing data-table renderer.
ADR-0031 authored the SQL expression grammar the Phase-1
SELECT consumed.
Phase 2 — "SELECT — full" — is the next slice. ADR-0030 §3 lists
it: JOINs, GROUP BY / HAVING, aggregates, subquery
expressions, UNION/INTERSECT/EXCEPT, common table
expressions, LIMIT … OFFSET, qualified column references.
ADR-0030 §3 also says the full SELECT grammar "is each large
enough to warrant their own focused ADR when implemented — the
precedent is ADR-0026 for the WHERE grammar." This is that ADR.
The architecture is fixed (ADR-0030 §1, §4, §6, §8): one walker, grammar-as-text execution, ambient assistance for free. This ADR fixes the shape of the grammar — the productions, the recursion, the additive extensions to ADR-0031's expression fragment, and the few execution-path implications (worker-side column-origin lookup so result columns recover their playground type). It deliberately does not revisit ADR-0030's structural decisions; references in this ADR's text to ADR-0030 §X mean "that decision is the controlling one."
What ADR-0030 and ADR-0031 already fix
- No batch parser; SQL is grammar in the unified tree.
Subquery recursion is a
Node::Subgrammar(&NAMED)reference, exactly as the expression ladder uses it (ADR-0031 §3). - No AST builder for the parts that execute as text.
Command::Select { sql: String }carries the validated source; the worker prepares and runs it (ADR-0030 §4/§6, ADR-0031 §2). - The
__rdbms_*rejection at every table-name slot (ADR-0030 §6) — re-applied to every Phase-2 table-source slot (FROM,JOIN, CTE-name). - No allowlist for function names (ADR-0030 §13 OOS-3,
ADR-0031 §6). Aggregates (
count,sum,avg,min,max) parse through the genericname_or_callpath — the grammar is structurally aggregate-blind, by design. - No quoted identifiers (ADR-0031 §7 OOS-3) — unchanged.
MAX_SUBGRAMMAR_DEPTH = 64(ADR-0026) is the shared recursion budget across DSLExpr, SQL expression, and (added here) SQLSELECTrecursion. No new walker capability is introduced (§9).
The boundary with ADR-0031
ADR-0031 §7 named two additive extensions deferred to this ADR:
- OOS-1: subquery expressions —
( SELECT … )as aprimary,IN ( SELECT … ),EXISTS ( SELECT … ). Their grammar is fixed in §6; they are additiveChoicebranches insql_expr.rs, recursing into the namedSELECTfragment authored here. - OOS-2: qualified column references —
t.c/alias.c. Their grammar is fixed in §5; they are an additive tail onname_or_callinsql_expr.rs.
sql_expr.rs was shaped to receive both branches without
restructuring (ADR-0031 §7 promise). This ADR redeems that
promise; the changes there are strictly additive.
Decision
1. The top-level SELECT grammar
The full statement decomposes into a top-level compound query
(set-operator chains around per-leg core selects), wrapped by
an optional WITH prefix and trailing ORDER BY / LIMIT:
select_statement := [ with_clause ] compound_select
compound_select := select_core ( set_op select_core )*
[ order_by_clause ]
[ limit_clause ]
set_op := UNION [ ALL ] | INTERSECT | EXCEPT
select_core := SELECT [ DISTINCT | ALL ]
projection_list
[ from_clause ]
[ where_clause ]
[ group_by_clause ]
[ having_clause ]
with_clause := WITH [ RECURSIVE ] cte_def
( ',' cte_def )*
cte_def := identifier [ '(' column_name_list ')' ]
AS '(' compound_select ')'
projection_list := projection_item ( ',' projection_item )*
projection_item := '*'
| identifier '.' '*'
| sql_expr [ [ AS ] identifier ]
from_clause := FROM table_source ( join_clause )*
table_source := identifier [ [ AS ] identifier ]
join_clause := [ INNER ] JOIN table_source ON sql_expr
| LEFT [ OUTER ] JOIN table_source ON sql_expr
| RIGHT [ OUTER ] JOIN table_source ON sql_expr
| FULL [ OUTER ] JOIN table_source ON sql_expr
| CROSS JOIN table_source
where_clause := WHERE sql_expr
group_by_clause := GROUP BY sql_expr ( ',' sql_expr )*
having_clause := HAVING sql_expr
order_by_clause := ORDER BY order_item ( ',' order_item )*
order_item := sql_expr [ ASC | DESC ]
limit_clause := LIMIT sql_expr [ OFFSET sql_expr ]
sql_expr is ADR-0031's SQL_OR_EXPR, extended additively per
§5 and §6. column_name_list is identifier (, identifier)*.
The named static Node exported by the new
src/dsl/grammar/sql_select.rs is SQL_SELECT_STATEMENT
(matching the full statement) and SQL_SELECT_COMPOUND (the
embedded form, omitting the outer WITH; this is what subqueries
recurse into — see §6, §9).
Notes on specific productions:
FROMstays optional. Phase 1's autonomous decision §4.1 is upheld:SELECT 1andSELECT upper('x')continue to parse. With JOINs landing, the absence of aFROMsimply means nofrom_clause/join_clausewas matched; no extra shape is needed.- Bare-alias projection (
select a x) is admitted. Phase 1's autonomous decision §4.2 deliberately rejected it as structurally ambiguous. With Phase 2's grammar —FROMis the only word that can legitimately follow a projection list, and it is a keyword in the walker's expected-set — the ambiguity dissolves: an identifier following the last projection expression that is notFROM,,,WHERE,GROUP,ORDER,LIMIT, or a set-op keyword is a bare alias, and is so admitted. This lifts a small but visible Phase-1 limitation. SELECT [ DISTINCT | ALL ].ALLis the default and is admitted for symmetry;DISTINCTis the meaningful case. They are mutually exclusive at this position (aChoice, not twoOptionals).identifier '.' '*'lives only inprojection_item, never insql_expr. This is intentional:t.*is projection syntax, not an expression, and admitting it as an expression primary would let it appear inWHERE/ORDER BY/ etc. where the engine would reject it and the engine-neutral error would be hard to phrase. The grammar simply refuses it structurally outside projection.UNION ALLis a single set-op, notUNIONfollowed by anALLmodifier on the next leg.set_opis aChoiceof the four atoms (withUNIONandUNION ALLas separate branches); factoringUNION [ ALL ]is also valid but the explicit four branches keep the matched-path classes cleaner for highlighting.
2. JOIN flavours admitted
The grammar admits exactly the flavours the user picked:
INNER JOIN/ bareJOINLEFT [ OUTER ] JOINRIGHT [ OUTER ] JOINFULL [ OUTER ] JOINCROSS JOIN
The first four take a mandatory ON sql_expr; CROSS JOIN
takes none. OUTER is the optional explicit modifier on
LEFT / RIGHT / FULL.
Explicitly out (§11): NATURAL JOIN, JOIN … USING (col),
and comma-list FROM t1, t2 (the legacy implicit cross join).
The first two add grammar weight for limited teaching value;
comma-FROM teaches habits we do not want to encourage —
CROSS JOIN covers the same shape explicitly.
JOIN chains are admitted as a flat ( join_clause )*. Standard
SQL is left-associative; since the grammar builds no AST and the
engine receives the source text verbatim (ADR-0030 §4), the
engine resolves the associativity. The grammar's job ends at "the
chain parses".
3. Set operators and compound queries
UNION, UNION ALL, INTERSECT, EXCEPT all admitted —
ADR-0030 §3's full set.
The compound shape (§1) is select_core (set_op select_core)*,
flat. Standard SQL gives INTERSECT higher precedence than
UNION / EXCEPT; the engine resolves this — the grammar admits
the chain as written. This mirrors §2's JOIN-chain decision.
A user who wants explicit grouping writes
(SELECT … INTERSECT SELECT …) UNION SELECT …, which falls out
of the subquery-primary branch (§6) — though for a top-level
statement this requires an extra SELECT wrapping. In practice
the engine's precedence is what learners encounter; calling it
out in the help sql page (ADR-0030 Phase 6) is sufficient.
ORDER BY / LIMIT on a compound apply to the whole compound,
not to a leg — fixed by the position of order_by_clause and
limit_clause in §1's compound_select.
4. CTEs (WITH and WITH RECURSIVE)
The full with_clause per §1. Both forms admitted: non-recursive
WITH for naming intermediate results, and WITH RECURSIVE for
recursive queries (tree traversals, transitive closure,
generated sequences).
The cte_def body is a parenthesised compound_select, so the
recursion is into SQL_SELECT_COMPOUND via Subgrammar — the
same recursion mechanism subqueries use (§9).
CTE-name collisions. A CTE name shares the table-name
namespace at the engine. Standard SQL: the CTE shadows a
same-named base table within the statement. The grammar is
agnostic — both are identifiers in a table-source slot — so the
shadowing falls out of engine resolution. The
reject_internal_table validator still rejects any __rdbms_*
identifier in any table-source slot, including CTE-name
slots and the FROMs inside CTE bodies. That is the right
posture: the reserved namespace is reserved everywhere.
Recursive CTEs use the standard cte_name AS ( base_case UNION [ALL] recursive_case ) shape — already admitted by §1's
compound_select body. No grammar branch specific to recursion
is needed; the RECURSIVE keyword is a hint to the engine, not
a grammar gate.
5. Qualified column references
Additive extension to sql_expr.rs (ADR-0031 §7 OOS-2).
name_or_call's identifier prefix gains a Choice tail:
name_or_call := identifier
( '.' identifier
| '(' call_args? ')'
)?
The leading identifier is matched once (preserving ADR-0031 §1's
factoring — no Choice branch begins with an identifier). The
optional tail is either a qualified-reference suffix
(. identifier) or a function-call argument list (( … )),
not both. A bare identifier with no tail remains a plain column
reference.
A function call with a qualified name — schema.f(…) — is not in
scope (we have no schemas) and is structurally inadmissible by
construction: there is no production that admits both a .-tail
and a (-tail.
Completion for the qualified form: when the cursor is past
identifier '.', the completion source is "columns of the table
or alias named by the leading identifier", resolved from the
active SchemaCache (the same source the DSL completion uses,
ADR-0030 §8). This is a small extension to the existing
IdentSource::Columns machinery — when in scope, column
completion is scoped to the named source.
6. Subquery expressions
Additive extensions to sql_expr.rs (ADR-0031 §7 OOS-1):
-
Scalar subquery as
primary. AChoicebranch'(' compound_select ')'. The existing'(' or_expr ')'branch handles parenthesised expressions. Both start with'(', so per ADR-0031 §1's factoring principle, the'('is matched once and the inside is aChoicebetweencompound_selectandor_expr. The first inside token disambiguates:SELECTorWITH→ subquery; anything else → expression. The twoChoicebranches have non-overlapping first-token sets, so the walker's expected-set at the ambiguity point merges naturally withoutOptional-first hazards. -
IN ( subquery ). The existingpredicate_tail'sIN '(' additive (',' additive)* ')'branch gains a siblingIN '(' compound_select ')'. Same'('factoring as the scalar case: after'(', branch onSELECT/WITH(subquery) vs additive-first-token (literal list).NOT INfollows from the existing[ NOT ]factoring on the predicate tail. -
[ NOT ] EXISTS ( subquery ). Added as aprimaryChoicebranch:primary := … | EXISTS '(' compound_select ')'The bare
EXISTSform lives inprimary;NOT EXISTSfalls out of the existingnot_expr := NOT not_exprtier aboveprimaryin the precedence ladder. This is structurally cleaner than putting[ NOT ] EXISTSinsideprimary: there is only one placeNOTis admitted, and it composes uniformly.
All three branches recurse through Subgrammar(&SQL_SELECT_COMPOUND).
Correlated subqueries fall out for free — a subquery's
sql_expr reaches identifiers, which the engine resolves
against outer scopes. The grammar imposes no correlation
constraint; correlation is engine-side semantics.
7. GROUP BY and HAVING
GROUP BY takes a comma-separated list of sql_exprs.
Standard SQL admits any expression as a grouping key (not just
bare columns) — e.g. GROUP BY date(created_at). The grammar
admits this without special-casing.
HAVING is a single sql_expr. Its semantics is "boolean over
grouped rows"; the grammar does not enforce that — the
expression's typing is the engine's concern.
Aggregate correctness is not grammar-checked. Whether a
projection's non-aggregated columns are valid given the
GROUP BY keys is a semantic question. ADR-0030 §9 settled this:
the grammar admits structurally, the engine rejects semantically,
and the friendly-error layer renders engine-neutral wording
(ADR-0019). A learner who writes SELECT Name, COUNT(*) FROM t
sees an engine-neutral "Name must appear in a GROUP BY clause or
be wrapped in an aggregate function"-style message, not a raw
engine string and not a parse error. This is the project's
honest limitation (ADR-0030 §7) and remains so.
8. LIMIT / OFFSET and ORDER BY extras
LIMIT n [ OFFSET m ] — the standard form. Both n and m are
sql_exprs (in practice integer literals, but the grammar
admits the general form so e.g. LIMIT max(10, x) OFFSET 0 is
structurally accepted; the engine constrains values).
The MySQL/SQLite legacy comma form LIMIT m, n is out (§11).
Its argument order (offset first, then count) inverts the
keyword form — a needless source of confusion.
ORDER BY already admits sql_expr items with optional
ASC / DESC (Phase 1). With Phase 2:
- Column-position references (
ORDER BY 1, 3 DESC) fall out for free — an integer literal is a validsql_expr, and the engine interprets a bare positive integer inORDER BYas a column position. The grammar does not distinguish the case; rendering interprets the position. Document inhelp sql. - Qualified refs in
ORDER BY(e.g.ORDER BY t.c) fall out of §5 — the grammar uses the samesql_exprbody.
9. Recursion, the depth budget, and the walker
SELECT recurses into itself at four points:
- A subquery
primaryinsql_expr(§6). - An
IN ( subquery )predicate tail (§6). - An
EXISTS ( subquery )primary (§6). - A CTE body (§4).
Every recursion is wired through
Node::Subgrammar(&SQL_SELECT_COMPOUND) — the named static Node
exported by sql_select.rs. The recursion is token-guarded in
every case: a subquery primary is preceded by '('; an
IN ( subquery ) by IN (; an EXISTS ( subquery ) by
EXISTS (; a CTE body by AS (. There is no left recursion;
the walker always makes progress.
MAX_SUBGRAMMAR_DEPTH = 64 (ADR-0026, reused by ADR-0031) is
shared: DSL Expr recursion, SQL expression recursion, and
SQL SELECT recursion all increment the same
WalkContext::subgrammar_depth. A worst-case learner query
might be SELECT … WHERE id IN (SELECT … WHERE id IN (SELECT …))
with each inner select carrying a few-deep expression — well
below the cap. The cap remains purely a stack-overflow guard;
this ADR does not raise it. If pathological-but-realistic
learner queries reach 64 in practice, a focused ADR lifts it
with measurements. Speculative raising would weaken the guard
without evidence.
No new walker capability is introduced. Subgrammar, the
depth counter, the cap, and the friendly depth-exceeded error
all carry over from ADR-0026 unchanged — the same posture
ADR-0031 took. This is a non-trivial property: Phase 2 is the
biggest single grammar slice in the project, and it lands
without changing the walker's contract.
10. Completion scope and the WalkContext extension
ADR-0030 §8 promises that "ambient assistance comes for free"
because SQL is grammar in the unified tree. For Phase 1's
single-table SELECT this was substantially true: the existing
WalkContext::current_table mechanism (populated via the
writes_table: true flag on the FROM table-name slot) gave
WHERE and ORDER BY column-name completion against the right
table at no incremental cost.
Phase 2 breaks the "free" claim. Multiple FROM tables via
JOINs, aliases, CTE-defined table sources, subqueries with their
own FROM scope, qualified t.c references, projection aliases
referenced in ORDER BY — every Phase-2 surface needs scope
information that WalkContext does not currently carry. §9's
"no new walker capability" claim holds for grammar recursion
(Subgrammar and the depth cap suffice); for completion scope it
is too strong, and is softened here to an honest split.
The current WalkContext carries one table at a time
(current_table: Option<String> + current_table_columns), set
by writes_table: true on a Tables identifier. DSL paths
(update T, delete from T, insert into T) rely on this
single-table contract and continue to work unchanged. Phase 2
adds layered accumulators alongside, not in place of.
10.1. The from-scope accumulator
A new WalkContext field:
from_scope: Vec<TableBinding>
TableBinding { table: String, alias: Option<String>,
columns: Vec<TableColumn> }
Populated incrementally as the walker descends through
from_clause and each join_clause (§1). The first table-source
slot pushes a binding; every subsequent JOIN pushes another.
Ident slots whose IdentSource is Columns now resolve against
the union of every binding's columns, with deduplication.
current_table / current_table_columns remain as derived
helpers: when from_scope.len() == 1, they expose that single
binding's data, preserving the contract every existing DSL path
relies on. DSL UPDATE / DELETE / INSERT continue to push
exactly one binding via the existing writes_table: true
mechanism, unchanged.
10.2. Scope-stack discipline at Subgrammar boundaries
Subqueries (§6) and CTE bodies (§4) introduce new lexical scopes.
A column reference inside SELECT … WHERE id IN (SELECT id FROM u) resolves first against the inner SELECT's FROM (u), and
— for correlation — also against the outer scope.
subgrammar_depth is a counter; it suffices for §9's depth cap
but not for scope.
Phase 2 layers a stack on top. A new field:
from_scope_stack: Vec<ScopeFrame>
ScopeFrame {
from_scope: Vec<TableBinding>,
cte_bindings: Vec<CteBinding>,
projection_aliases: Vec<String>,
}
The new walker node variant — Node::ScopedSubgrammar(&Node) —
is what triggers a scope push. It is a sibling of the existing
Node::Subgrammar(&Node), with the same recursion semantics
(reference-following, depth-counted) and one additional driver
behaviour: on entry, push the current ScopeFrame onto
from_scope_stack and start a fresh empty frame; on exit, pop
back. The existing Node::Subgrammar variant is unchanged — DSL
Expr recursion (ADR-0026) and the sql_expr.rs precedence-
ladder recursion (ADR-0031) keep using it and never push a scope.
The grammar source spells the choice explicitly at each call
site: subqueries in sql_expr.rs and CTE bodies in
sql_select.rs reference the compound-SELECT through
Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND); predicate-ladder
recursion in sql_expr.rs continues to use
Node::Subgrammar(&SQL_OR_EXPR). Self-documenting, no flag
bookkeeping, and the walker change is localised to one extra arm
in the driver's match over Node variants.
Column-completion candidates inside a scope frame are the union
of the current frame's from_scope and (for correlated refs)
all outer frames; outer-frame columns are admitted as additional
candidates so correlated references work. Ordering or visual
differentiation between current-frame and outer-frame candidates
is completion-tier polish and is not specified by this ADR — the
current completion API (candidates_at_cursor*) returns a flat
Vec, and adding a priority dimension is a separate concern.
CTE bindings resolve the same way (outward-walking) — a CTE
defined in an outer query is visible inside an inner subquery as
a table source, unless the inner subquery defines a CTE of the
same name and shadows it.
This is the one explicit walker-capability extension Phase 2
makes. It is scoped: one new node variant, no new walker entry
point, no change to how Subgrammar bodies are entered
structurally. The depth cap (§9) applies to both variants
uniformly through the shared subgrammar_depth counter.
10.3. CTE bindings
A frame-local accumulator carries CTE definitions visible in the current scope:
cte_bindings: Vec<CteBinding>
CteBinding {
name: String,
columns: Vec<CteColumn>,
}
CteColumn {
name: Option<String>, // None for unnamed
// computed projections
type_: Option<Type>, // resolved playground type
// if derivable
}
A CTE definition cte_name [(col-list)] AS (compound_select)
produces a binding in two stages:
- Pre-body push (so
WITH RECURSIVEself-references resolve). When the walker reachesASand is about to enter the body'sNode::ScopedSubgrammar(&SQL_SELECT_COMPOUND), it pushes a placeholder binding into the outer frame'scte_bindingswithcolumns = [](an empty stand-in). The CTE name is now visible as a table source from inside the body. - Body-finalised harvest (when the body's scope frame
completes). On
ScopedSubgrammarexit, before popping the frame, the driver derives the body's projection-list output columns (rules below) and rewrites the placeholder binding in the outer frame.
Output-column derivation rules. Walking the body's projection items:
| Projection item | Derived CTE column(s) |
|---|---|
* |
Every column from the body frame's from_scope, in order, with their resolved types |
t.* (qualified wildcard) |
Every column from binding t in the body frame's from_scope, with their types |
col (bare ref, resolves uniquely) |
One column: name = col, type = the resolved column's playground type |
t.col (qualified ref) |
One column: name = col, type = t's column's type |
expr AS alias or bare expr alias |
One column: name = alias, type = the underlying type if expr is a single column ref, else None |
expr (computed, no alias) |
One column: name = None, type = None — engine assigns an implementation-defined name |
For compound bodies (UNION / INTERSECT / EXCEPT) the columns
come from the first leg per standard SQL. For recursive CTE
bodies (WITH RECURSIVE) the same rule — the non-recursive leg
dictates.
If a (col-list) was supplied on the CTE name, it renames the
derived columns positionally and overrides their names; types are
preserved from the derivation. If the column-count of (col-list)
disagrees with the body's projection arity, the grammar admits
this and the engine surfaces the mismatch — do_run_select's
engine-neutral error layer carries the message (ADR-0030 §9,
ADR-0019).
Completion past cte_alias.|. Where the derivation produced
named columns (every form above except computed-no-alias), they
complete with their names and (where typed) participate in §11's
result-type resolution if the CTE's columns are projected
upstream. Where the derivation produced an unnamed column slot,
that slot is silently skipped from the qualified-prefix candidate
list — the user typing cte.| past it sees only the nameable
columns. The cure for "I want my expression to be referenceable
from outside the CTE" is to add an alias, which is the same cure
the engine itself enforces at execution time.
This is substantially better than the earlier "honest limitation"
posture: the common SELECT * body is fully resolvable; explicit
projections are resolvable; only un-aliased computed columns
elude us, and the right learner response there is the same as
the engine's right learner response — write an alias.
cte_bindings lives on the scope frame, so a CTE defined in an
outer query is visible inside an inner subquery as a table source
unless that subquery defines a CTE of the same name (which
shadows it, per standard SQL).
10.4. Projection-alias bindings
Standard SQL admits ORDER BY referencing a SELECT-list alias:
SELECT a + b AS total FROM t ORDER BY total. A third frame-local
accumulator:
projection_aliases: Vec<String>
Each projection_item's optional alias (whether AS x or bare
x — see §1) appends its name. Ident slots inside the trailing
ORDER BY's sql_exprs offer projection aliases as additional
candidates alongside column names. This addresses §1's bare-alias
admission's completion behaviour at the same time.
The accumulator is not consulted inside WHERE, GROUP BY, or
HAVING — standard SQL forbids alias references there
(aliases are not yet bound at evaluation time). The grammar
admits them structurally regardless; the engine rejects; ADR-0019
renders the engine-neutral error.
10.5. Qualified-prefix completion
§5 fixed the grammar for t.c references. The completion
behaviour at qualified positions:
- At an
Identcursor with no prefix, candidates are the union of everyfrom_scopebinding's columns, plusprojection_aliaseswhen inORDER BY, deduplicated. CTE-name candidates apply only in table-source slots, not column slots. - At an
Identcursor immediately afterprefix '.', candidates are scoped: resolveprefixagainst the activefrom_scope(preferring alias matches over table matches, since aliases shadow), and offer that binding's columns alone. Ifprefixdoesn't resolve to a binding, the candidate list is empty — the walker's expected-set still surfaces the syntactic alternatives (the user sees no column candidates but the structural error message reports the unresolved prefix).
The qualified-prefix narrowing is a small extension to the
existing IdentSource::Columns handling: when the matched-path
immediately preceding the Ident ends with Ident '.', the
completer is told the prefix and narrows accordingly. This is the
only completion-source-level change; the rest is data flowing
through the new accumulators.
10.6. The projection-before-FROM problem
Standard SQL writes projection before FROM. A user typing
select col1, col2 from mytable produces, mid-typing, a state
where the projection list has been parsed but the FROM has not.
At that point the column-name completer cannot scope to
mytable — it does not know mytable is coming. Validation and
highlighting face the same problem: col1 and col2 cannot be
checked as belonging to mytable until the user types from mytable. The debounced re-walk on every keystroke (ADR-0027) is
not sufficient on its own to fix this in a single-pass walker,
because by the time the FROM is parsed, the projection
identifiers have already been resolved (left-to-right) against
the only scope information available at that moment — the empty
from_scope.
There is no fully satisfying single-pass answer. Phase 2's posture is therefore explicit:
-
During-typing completion of projection-list column names, when
from_scopeis empty (noFROMyet), uses the unionedSchemaCache.columns— every column known to the schema — as the candidate set. This is the same global fallback Phase 1 uses and remains the right behaviour: a noisier-but-useful completion is better than no completion. -
A post-walk fixup pass re-evaluates projection-list column refs against the final
from_scopeafter the walk completes. The walk records each projectionIdent's span and matched-path location; once the walk reaches end-of- input (or end-of-statement), the fixup walks the recorded list, looks up each identifier against the finalfrom_scope, and:- Rewrites the highlight class on that terminal — downgrading "column" → "unknown identifier" when the identifier doesn't belong to any in-scope binding, upgrading "unknown identifier" → "column" when it does.
- Updates the diagnostic for the validity indicator (ADR-0027) — a column-not-found ERROR either appears or disappears based on the post-walk scope.
Integration point. The fixup runs as the final stage of the walk itself, after all grammar nodes have been processed but before
WalkResultis returned to the caller. It mutates the walker's accumulated highlight runs and diagnostics vector in place, so the consumer (the renderer, the validity indicator) sees a single coherent snapshot. This keeps the walker the single source of truth for what reaches the renderer — the fixup is conceptually part of "what the walker produces", not a separate post-processing layer. The same convention applies to the §11.6 SQL-expression predicate warnings, which also run as a final walk stage. -
The fixup runs on every debounced re-walk (ADR-0027 already triggers the full walk per keystroke), so the user observes: typing
col1, col2 from mytable, thecol1/col2initially highlight as generic identifiers (with a soft warning if not found anywhere in the schema); the momentmytableis typed, the highlight snaps to the column class ifcol1/col2belong tomytable, or to the unknown-identifier diagnostic if they don't — within one debounce cycle.
The fixup pass does not re-parse; it only re-resolves
identifiers against the final from_scope.
ORDER BY alias resolution needs no fixup. Projection precedes
ORDER BY in walk order, so projection_aliases is fully
populated by the time the walker reaches an ORDER BY Ident;
the alias-as-column-candidate is resolved in the single forward
pass.
This is the answer to the user's "I think this may be automatic"
intuition: the debounced re-walk is automatic; the
post-walk fixup pass is the new infrastructure that makes the
re-walk produce correct results. Without it, projection-list
column refs would forever validate against the global column set
even after the FROM is typed.
10.7. The honest split
§9 still holds for grammar recursion: Subgrammar and the
depth cap are reused unchanged. For completion scope, this
section introduces:
- New
WalkContextfields:from_scope,from_scope_stack,cte_bindings,projection_aliases. - Scope push/pop discipline at
SQL_SELECT_COMPOUNDSubgrammarboundaries — driven by a marker on the Subgrammar target so DSL Subgrammars are unaffected. - A qualified-prefix narrowing in the
IdentSource::Columnscompletion path. - A post-walk fixup pass for projection-list identifier highlighting and validity (§10.6).
These are real walker-contract extensions. They are scoped: no
new node kinds, no new walk-driver entry points, no changes to
how Subgrammar bodies are entered structurally. The existing DSL
paths are unaffected — their grammars never push a SELECT scope,
never define a CTE, never carry projection aliases — and the
single-table current_table / current_table_columns view is
preserved as a derived helper.
§9's claim is therefore restated honestly: grammar recursion needs no new walker capability; completion scope needs the additions above.
11. Diagnostics for Phase-2 validation cases
ADR-0027 fixes the warning-vs-error guideline verbatim:
ERROR — the input is known to fail. Either it does not parse (incomplete, or a mismatched / invalid token), or it parses but names something that does not exist (an unknown table or column).
WARNING — the input is valid and will run, but is very likely not what a knowledgeable user wants: a type-mismatched comparison, or
= NULL(both from ADR-0026 §7). Amendment 1 adds a third trigger —LIKEagainst a numeric column.The split is certainty of failure versus likely misleading.
This section walks the Phase-2 surface case-by-case, classifies each against that guideline, and identifies the diagnostic machinery additions needed. It also flags a Phase-1 carry-over gap (§11.6) that Phase 2 closes.
11.1. Existing diagnostics, briefly
Two post-walk passes today (src/dsl/walker/mod.rs):
- Schema-existence pass (ERROR). Walks the
MatchedPath, checks everyIdentSource::Tables/IdentSource::Columnsident againstSchemaCache. Emitsdiagnostic.unknown_tableanddiagnostic.unknown_column. Today this assumes a singlecurrent_tablefor column resolution. - Expression predicate-warnings pass (WARNING). Walks the
parsed DSL
ExprAST emitted byexpr.rs's builder. Emitsdiagnostic.eq_null,diagnostic.type_mismatch,diagnostic.like_numeric. Runs only on WHERE expressions in the DSL.
Phase 2 extends both, and §11.6 fills a SQL-side gap.
11.2. Phase-2 new ERROR cases
Every case below is "known to fail on the engine" — the engine would surface a message the friendly-error layer would translate (ADR-0019). Surfacing them as pre-flight ERROR diagnostics gives the learner the answer one debounce cycle faster, with the walker as the single source of truth.
- Unknown table in any
FROM/JOINslot. The existing schema-existence pass extends from "the onecurrent_table" to walking everyfrom_scopebinding'stableand emittingdiagnostic.unknown_tableper unresolved name. CTE-name slots in the activecte_bindingsare valid table sources and exempt from this check. - Unknown CTE-as-table. A table-source slot whose name is
not in
SchemaCache.tablesand not in the activecte_bindingschain emitsdiagnostic.unknown_table(same catalog key — from the learner's perspective the engine message is the same; the slot is a "table that doesn't exist", whether they meant a CTE or a base table). - Unknown table or alias in a qualified column reference
(
t.cwheretdoesn't resolve in the activefrom_scope). New catalog keydiagnostic.unknown_qualifier{qualifier}. - Unknown column in a qualified reference (
t.cwheretresolves butcis not a column of that binding). Reusesdiagnostic.unknown_columnwith the column name in context. - Ambiguous unqualified column reference — a column name
used unqualified that exists in two or more
from_scopebindings. The engine raises "ambiguous column name"; we surface it as ERROR with a new catalog keydiagnostic.ambiguous_column{column}, {qualifiers}so the learner sees which two tables the name appeared in. - Reference to a projection alias in
WHERE/GROUP BY/HAVING. Standard SQL forbids it (aliases are not bound at evaluation time). The grammar admits the identifier structurally; a new diagnostic pass emits ERROR with a new catalog keydiagnostic.projection_alias_misplaced{alias}, {clause}. - CTE column-list arity mismatch. When
cte_name (col1, col2, …) AS (compound_select)declares N columns and the body's projection (§10.3) derives M columns with N ≠ M, the CTE harvest pass (§10.3 stage 2) emits ERROR with a new catalog keydiagnostic.cte_arity_mismatch{cte}, {declared}, {actual}. - Compound-query column-count mismatch. When a
UNION/INTERSECT/EXCEPTchain has legs whose projection arities differ, the engine errors at execution. Phase 2 catches it pre-flight: each leg's derived arity (the same derivation the CTE harvest uses) is compared as the compound is assembled. ERROR with a new catalog keydiagnostic.compound_arity_mismatch{op}, {left_n}, {right_n}. - Internal-table reference in any new table-source slot.
Already a parse-time rejection via
reject_internal_table(§1, §4) — surfaces as a parse error, not a post-walk diagnostic. Listed here for completeness: the catalog keyselect.internal_tableauthored in Phase 1 covers every Phase-2 slot too.
11.3. Phase-2 new WARNING cases
The existing WARNING set (= NULL, type-mismatched
comparison, LIKE-on-numeric) is the right set. Phase-2 adds
no new WARNING categories — every Phase-2-specific case
falls into ERROR (§11.2) or engine-rejected (§11.4).
Considered and rejected as WARNINGs:
- CTE name shadowing a base table. Standard SQL behaviour; often intentional (the canonical "filter to a subset, then query as if it were the base table" pattern). No diagnostic.
- Correlated reference without explicit qualification. Correlation is implicit in standard SQL; per the user guideline a knowledgeable user does want this. The walker validates the reference silently against the outer-frame scope; no warning, no diagnostic.
- Unused CTE. A CTE defined in
WITHbut never referenced. The engine ignores it; many learners write CTEs as intermediate scratch space. Not a warning.
11.4. Engine-rejected (no diagnostic)
These fail on the engine and surface via ADR-0019's friendly-error layer at execution time. The walker does not attempt pre-flight detection because:
- Non-aggregated columns in projection with
GROUP BY— detecting requires knowing which function names are aggregates; ADR-0030 §13 OOS-3 / ADR-0031 §6 keep us allowlist-free. - Aggregate function in
WHERE— same reason. - Scalar subquery returning multiple rows — semantic, not syntactic; requires execution.
- Recursive CTE without a
UNION— requires inspection of the body's compound shape against the recursive contract; doable in principle, deferred as engine territory. - Duplicate CTE names within the same
WITH— checkable in principle (walkingcte_bindingsfor duplicates), but the engine catches it cleanly. Phase 2 does not pre-flight it; could be added later if its absence proves confusing. - Type-mismatched JOIN ON predicates — the existing expression type-mismatch warning (extended per §11.6) handles the explicit-literal case; arbitrary-expression cases require type inference and stay engine-side.
11.5. Catalog additions
Phase 2 adds the following message-catalog keys (ADR-0019). Every key is engine-neutral by construction.
Parse-time-detectable (post-walk diagnostic passes):
| Key | Slots |
|---|---|
diagnostic.unknown_qualifier |
{qualifier} |
diagnostic.ambiguous_column |
{column}, {qualifiers} |
diagnostic.projection_alias_misplaced |
{alias}, {clause} |
diagnostic.cte_arity_mismatch |
{cte}, {declared}, {actual} |
diagnostic.compound_arity_mismatch |
{op}, {left_n}, {right_n} |
Engine-error translations (friendly-error layer; reached on execution failure):
| Key | Engine cause |
|---|---|
engine.no_such_table |
no such table: <name> (post-execution path) |
engine.no_such_column |
no such column: <name> (post-execution path) |
engine.ambiguous_column |
ambiguous column name: <name> |
engine.aggregate_misuse |
misuse of aggregate function <name>() |
engine.group_by_required |
column must appear in the GROUP BY clause or be used in an aggregate function (or equivalent) |
engine.compound_arity_mismatch |
SELECTs to the left and right of UNION do not have the same number of result columns (or equivalent) |
engine.scalar_subquery_too_many_rows |
scalar subquery cardinality violation |
engine.recursive_cte_malformed |
recursive CTE shape errors |
The parse-time keys and the engine keys are intentionally
separate even when they describe the same situation
(engine.ambiguous_column mirrors
diagnostic.ambiguous_column) — the parse-time message can
include the learner's typed text and span; the engine-time
message catches what the parser missed and routes through the
friendly-error layer with whatever context the engine yielded.
Two pre-existing parse-time keys are reused unchanged for
Phase-2 slots: diagnostic.unknown_table,
diagnostic.unknown_column, and the Phase-1
select.internal_table.
11.6. The Phase-1 SQL-expression predicate-warning gap
ADR-0027 Amendment 1's LIKE-on-numeric warning, and ADR-0026
§7's = NULL and type-mismatch warnings, are emitted by a pass
that walks the DSL Expr AST. Phase 1's sql_expr.rs
deliberately builds no AST (ADR-0031 §2). The consequence
is a Phase-1 carry-over gap: SQL WHERE expressions today
emit none of these warnings — select * from t where name like 5 parses, the engine runs it, and the learner gets the
engine's verdict, not the friendly pre-flight nudge ADR-0027
Amendment 1 promised.
Phase 2 closes this. The predicate-warnings pass gains a
MatchedPath-walking variant that runs over the SQL
expression nodes and identifies the predicate shapes
structurally (a LIKE predicate-tail with a column-ref left
operand; a =/!= predicate-tail with a NULL literal
operand; a comparison predicate-tail with a column-literal
operand pair of mismatched types). It does not need an Expr
AST because the matched-path terminals carry both the byte spans
(for the diagnostic) and the node-name labels (for shape
identification). The same catalog keys (diagnostic.eq_null,
diagnostic.type_mismatch, diagnostic.like_numeric) apply
unchanged; only the pass implementation is new.
The MatchedPath-walking pass runs over every Phase-2
sql_expr slot — WHERE, HAVING, ON, CASE branches,
projection items, ORDER BY items — so warnings surface
uniformly across the SQL surface rather than just WHERE. This
is a strict improvement over Phase 1's behaviour, where even
Phase-1 SELECT WHERE expressions got no predicate warnings.
Type-resolution for the MatchedPath-walking pass: a column ref's
type comes from §10's from_scope (or, for t.c, the specific
binding); a literal's type comes from its lexical class. When
the column ref doesn't resolve (the schema-existence ERROR pass
will already have flagged it), the warning pass skips the
predicate — no point compounding diagnostics on an already-
broken reference.
11.7. Mechanism summary
Three diagnostic passes by end of Phase 2, all running as final stages of the walk (per §10.6's integration-point convention):
- Schema-existence ERROR pass — extended from single
current_tableto walking everyfrom_scopebinding and the activecte_bindings. Adds the qualified-reference and ambiguity checks (§11.2). - Arity-check ERROR pass (new) — runs at CTE-body and
compound-query frame-exits (the same
ScopedSubgrammarexit hook §10.3 uses), comparing declared vs derived column counts. - Predicate-warnings pass — extended with a
MatchedPath-walking variant for
sql_expr(§11.6) covering= NULL, type mismatch, andLIKE-on-numeric across every SQL expression slot, in addition to the existing DSLExprAST variant for DSL expressions.
Per the integration-point convention (§10.6), each pass mutates the walker's accumulated highlight runs and diagnostics in place; the consumer sees a single coherent snapshot.
The projection-list fixup of §10.6 is conceptually part of pass (1) — it is the same "re-resolve identifier against final scope" operation, applied to the small subset of identifiers whose scope wasn't fully known at first-pass walk time.
12. Result-column type resolution
Phase 1's column_types: Vec<None> is partially lifted: where a
projection item is structurally a single column reference, the
worker resolves it back to the source column's playground type
(ADR-0005) and populates that slot in DataResult.column_types.
Everything else stays None.
This addresses Phase-1 autonomous decision §4.5 (bool SELECT
results render as 0/1): a bare bool column now renders as
true / false again, alignment recovers, and the show data
rendering path is reached for the common case.
Resolution rule. A projection item is "structurally a single
column reference" when, after stripping an optional [ AS ] alias, its expression is one of:
- An unqualified identifier (
Name) that resolves uniquely to a single column across the FROM tables; - A qualified reference (
t.c/alias.c) that resolves unambiguously through the FROM aliases.
Anything else — function calls, arithmetic, CASE, literals,
subquery expressions, the * and t.* wildcards — keeps
column_types[i] = None. When resolution is ambiguous
(unqualified column name appears in two FROM tables) the
grammar admits it (engine resolves or errors); the type-resolver
returns None and the renderer falls back to neutral alignment.
Implementation seam. The strongly preferred mechanism is
engine-side column-origin lookup: after preparing the
statement, query the prepared statement for each result column's
underlying table and column. The engine knows authoritatively
which result columns are direct references and which are
expressions; for direct references it returns the source
table+column, for expressions it returns nothing. This avoids
re-parsing the source or adding structured projection-item data
to the MatchedPath — the grammar tier is not involved at all,
which preserves ADR-0031 §2's "no AST" decision and stays on the
right side of ADR-0030's "one source of truth" rule.
The Phase-2 implementer verifies that the rusqlite version
pinned in Cargo.toml exposes this metadata (the SQLite C API
calls are sqlite3_column_table_name /
sqlite3_column_origin_name — they have been stable for two
decades; rusqlite either exposes them directly or via the
underlying *mut sqlite3_stmt handle). If exposure turns out
to be awkward, the fallback is a small post-parse walk over the
projection-item subtrees in the MatchedPath — strictly worse
because it duplicates a slice of parsing, but available.
The resolution pass adds one method on Database (something
like resolve_select_column_types) called from do_run_select
before the DataResult is shipped. It takes the prepared
statement and the active SchemaCache, and returns
Vec<Option<Type>>. The renderer needs no change — None
slots already render as typeless.
This is the only execution-path change Phase 2 makes; everything else routes through Phase 1's grammar-as-text execution.
13. Out of scope
- OOS-1. Derived tables in
FROM—FROM (SELECT …) [AS] alias. The same shapes are reachable via CTEs (§4), which Phase 2 ships. Derived tables inFROMare not authored here. - OOS-2.
NATURAL JOINandJOIN … USING (col). Both are convenience forms. NATURAL is widely considered a footgun; USING is cleaner but adds grammar weight without lifting any expressive ceiling. Out. - OOS-3. Comma-list
FROM t1, t2(implicit cross join). Out.CROSS JOINcovers the same shape explicitly. - OOS-4.
LIMIT m, n(the legacy comma form). Out (§8). - OOS-5. Window functions (
OVER (…),PARTITION BY, window-frame syntax). A meaningful learning topic, but a large surface of its own and out of ADR-0030's commissioned set. - OOS-6.
LATERALjoins. Not commissioned by ADR-0030. - OOS-7.
VALUES (…)as a row source. Not commissioned. - OOS-8. A function/aggregate allowlist — ADR-0030 §13
OOS-3 / ADR-0031 §7 OOS-4 still apply: aggregate names parse
generically through
name_or_call. - OOS-9. Quoted identifiers (
"column name"). Tracked as ADR-0031 §7 OOS-3, still tracked. - OOS-10. Engine-checked aggregate correctness at parse time. The grammar admits structurally; engine rejects semantically; ADR-0019 surfaces the engine's verdict in engine-neutral wording (§7).
- OOS-11. Result-column type resolution beyond bare column
refs. Computed columns (
a + b,upper(name),CASE …) stay typeless (§10). - OOS-12. The
help sqlpage and parse-error usage entries for the Phase-2 surface. The grammar carries thehelp_ids authored in this phase, but the page content and the rich per-command usage messages are Phase 6 (ADR-0030 §10) and ADR-0021. Phase 2 leaves the samehelp_id: Noneshape Phase 1 used forselect.
Consequences
- A new grammar file,
src/dsl/grammar/sql_select.rs, parallel tosql_expr.rs, exportingpub static SQL_SELECT_STATEMENT: Nodeandpub static SQL_SELECT_COMPOUND: Node. The Phase-1data::SELECTCommandNodeis rebuilt againstSQL_SELECT_STATEMENT(its body becomes aSubgrammarreference); theCommandNodeitself stays. - Phase-1 SQL
SELECTgrammar nodes migrate. The Phase-1 static nodes that live insrc/dsl/grammar/data.rsfor the single-table SELECT (the projection, FROM, WHERE, ORDER-BY, LIMIT sub-trees) move intosql_select.rsas the starting-point for the §1 productions; the file leaves only theCommandNodeshell behind. The seven Phase-1 SQLSELECTintegration tests are part of the safety net for this migration — they must continue to pass under the rebuilt grammar, in addition to the new Phase-2 integration tests authored in step 4 of the implementation notes. - Hint-panel prose for the new clauses (JOIN flavours, ON,
GROUP BY, HAVING, UNION / INTERSECT / EXCEPT, WITH, OFFSET, the
qualified-prefix and CTE-prefix completion states) is
authored at the structural level alongside each grammar node
in step 1 — a one-liner per slot, enough to drive the hint
panel. Richer per-clause teaching prose and the
help sqlreference page remain ADR-0030 Phase 6 work (§12 OOS-12). - Walker cost is expected to stay proportional to source
length. The new accumulators are
O(bindings + aliases)per frame; the scope stack is bounded byMAX_SUBGRAMMAR_DEPTH = 64(§9); the §10.6 post-walk fixup pass touches one entry per projection-listIdent(a small set). Each debounced keystroke (ADR-0027) walks once, fixes up once, and emits a single coherent output. No new pathological case is introduced — if a learner-realistic query produces a noticeable typing-time stall, measure first and revisit the recursion budget or the accumulator structure on evidence. sql_expr.rsgains three additiveChoicebranches and one additive tail onname_or_call(§5, §6). The existing tiers and the depth-cap discipline are unchanged. The Phase-1 tests continue to exercise the existing branches as they stand.- No new walker capability (§9).
Subgrammar, the depth counter, the cap, and the friendly depth error are all reused unchanged — the same posture ADR-0031 took. Command::Select { sql: String }is unchanged. The validated source SQL is simply larger; the worker still routes it throughDatabase::run_selectanddo_run_select(Phase 1 path).- The worker gains a post-prepare type-resolution helper that
populates
column_typesfor direct-reference projection items (§12) via the engine's column-origin metadata.Cargo.tomlgainscolumn_metadatatorusqlite's feature list (alongside the existingbundled); this pulls in the SQLiteSQLITE_ENABLE_COLUMN_METADATAcompile flag and exposesRawStatement::column_table_name/column_origin_name/column_database_nameon the prepared statement. Verified against the project's pinned rusqlite 0.39.0. This is the only Phase-2 execution-path change. - Three diagnostic passes (§11.7) — schema-existence
(extended), CTE/compound arity-check (new), and predicate
warnings (extended with a MatchedPath-walking variant for
sql_expr— §11.6). All run as final walk stages and mutate the walker's accumulated output in place. Closes the Phase-1 carry-over gap where SQLWHEREexpressions emitted noLIKE-on-numeric / type-mismatch /= NULLwarnings. - Catalog additions (§11.5) — five new
diagnostic.*keys for parse-time-detectable cases and eight newengine.*keys for friendly-error layer translations of engine messages. - The walker's
WalkContextgains the completion-scope accumulators of §10 — afrom_scope_stack: Vec<ScopeFrame>whose top frame is the activefrom_scope/cte_bindings/projection_aliases. A new node variantNode::Scoped Subgrammar(&Node)(§10.2) is the trigger for push/pop; existingNode::Subgrammaris unchanged so DSLExprandsql_exprrecursion are unaffected. A post-walk fixup pass re-resolves projection-list identifier highlighting and validity once the finalfrom_scopeis known (§10.6). CTE output columns are derived from the body's projection list at body-frame exit, populating the binding back into the outer frame (§10.3) — soSELECT *and explicit-projection CTE bodies both yield real column completion pastcte_alias.|. This softens §9's "no new walker capability" claim for completion scope; grammar recursion still needs nothing new. __rdbms_*rejection extends to every table-source slot introduced by Phase 2: theFROMtable, eachJOIN's table, each CTE name, and theFROMtable inside any CTE body (§4, §6). Thereject_internal_tablevalidator is reused.- Completion gains: SQL keywords for joins / set ops /
WITH/GROUP/HAVING/OFFSET(all walker-derived, no bespoke code); column completion scoped to a qualified prefixt.resolves through the activeSchemaCache(§5). - Phase-1 autonomous decisions §4.1 and §4.3–§4.4 stand (optional
FROM,help_id: None, walker-mode defaults). §4.2 is lifted (bare-alias projection admitted, §1). §4.5 is partially lifted (bare bool column refs recover their type via §12). requirements.md'sQ1/Q2advance further;Q4was already ticked by ADR-0030 and ADR-0031.
Implementation notes
A build order, each step guarded by the test suite. The phases within Phase 2 mirror the ADR-0030 / ADR-0031 staging — grammar first, execution-path change last.
Detailed plan: docs/plans/20260520-adr-0032-phase-2.md.
The notes below are the outline; the plan refines them into
seven sub-phases (2a–2g) with per-gate exit criteria, a
cross-cut verification matrix that explicitly tests every
"X comes for free" claim from ADR-0030/0031/0032 (the kind of
implicit claim that produced the Phase-1 SQL-expression
predicate-warning gap §11.6 closes), and a final phase-exit
verification report template. Implementers work through the
plan; the ADR remains the decisions.
- The
sql_select.rsgrammar fragment. Author the stratified tiers of §1 as namedstaticNodes, recursion viaSubgrammar. ExportSQL_SELECT_STATEMENTandSQL_SELECT_COMPOUND. The existingdata::SELECTCommandNodeis rebuilt againstSQL_SELECT_STATEMENT. - Unit tests against the fragment directly (the
expr.rs/sql_expr.rstest pattern): JOIN flavours, GROUP BY / HAVING, qualified refs, every set-op, recursive and non-recursive CTEs,LIMIT … OFFSET,DISTINCT,t.*projection, the bare-alias projection, plus the keyword-case-insensitivity check. sql_expr.rsadditive extensions (§5, §6): the qualified-ref tail onname_or_call; the scalar-subqueryprimarybranch; theIN (subquery)predicate-tail branch; theEXISTS (subquery)primarybranch. Unit tests for each.- Integration tests (the
tests/Tier-3 path, building on Phase 1's SQLSELECTtests): each JOIN flavour returns the expected rows; GROUP BY / HAVING aggregates over real data;UNION/INTERSECT/EXCEPTbetween two SELECTs; a non-recursive CTE; a recursive CTE (a small tree traversal or generated-sequence example); a scalar subquery inWHERE;IN (SELECT …);EXISTS (…); qualified refs resolving correctly. - The
WalkContextscope accumulators (§10). Add theScopeFrametype (from_scope/cte_bindings/projection_aliases) and thefrom_scope_stack; add theNode::ScopedSubgrammar(&Node)variant alongside the existingNode::Subgrammar; teach the driver to push/pop a fresh frame onScopedSubgrammarentry/exit; rewrite every reference to&SQL_SELECT_COMPOUNDfrom outside its own definition to use the new variant (subqueries insql_expr.rs, CTE bodies insql_select.rs); teachfrom_clause/join_clauseto populate the frame'sfrom_scope; teachwith_clauseto push placeholder CTE bindings before the body and harvest derived output columns on body-exit per §10.3; teachprojection_itemto append toprojection_aliases. Keepcurrent_table/current_table_columnsas derived helpers (top frame's single-binding view) so the DSL paths stay green. - Qualified-prefix completion (§10.5). When the
matched-path immediately preceding an
IdentSource::Columnsslot ends withIdent '.', narrow candidates to the named binding's columns. Unit tests:select t.Tab offerst's columns; an unresolved prefix returns an empty list. - Post-walk fixup pass (§10.6). Collect projection-list
Identterminals during the walk; after the walk, re-resolve each against the finalfrom_scope, rewriting the highlight class and validity diagnostic. Tests: typingselect col1 from tlightscol1correctly oncetis typed; typingselect bogus from tproduces a column-not-found diagnostic. - Diagnostic passes (§11). Extend the schema-existence
ERROR pass to walk every
from_scopebinding pluscte_bindings; add the qualified-reference and ambiguity checks (§11.2). Add the new arity-check ERROR pass at the CTE-body and compound-query frame-exit hooks (§11.7 case 2). Extend the predicate-warnings pass with a MatchedPath-walking variant covering every Phase-2sql_exprslot (§11.6) — closes the Phase-1 carry-over gap. Author the five newdiagnostic.*catalog keys and the eight newengine.*translation keys (§11.5). Tests: one positive and one negative case per new ERROR key; predicate warnings firing onselect * from t where col like 5(the Phase-1 gap closure); arity-mismatch ERRORs on a CTE and on aUNION. - Result-column type resolution (§12). Add
"column_metadata"to rusqlite's feature list inCargo.toml. The worker'sdo_run_selectcalls the new resolver —RawStatement::column_table_name/column_origin_nameper result column — before constructing theDataResult. Tests: a single-column SELECT recovers the playground type (covering each of the ten types, the pedagogically important one beingbool→true/false); a SELECT with a computed projection keeps it typeless; a SELECT through a CTE recovers the underlying column's type if the engine's column-origin metadata follows through the CTE (verified, not assumed). - Highlighting / completion / hint spot-checks via the
typing-surface matrix (ADR-0022 / ADR-0030 §8): a SELECT
with a JOIN highlights the JOIN keywords; Tab past
select t.offers columns oft; column completion inside aWHEREafterfrom a join b on …offers botha's andb's columns; column completion inside a correlated subquery sees the outer scope; the[ERR]indicator fires on a malformed subquery; an out-of-subset construct (e.g.OVER (…)) produces an engine-neutral parse error. reject_internal_tablespot-checks against every new table-source slot: aFROM __rdbms_columnsparse-rejects; aWITH __rdbms_x AS (…)parse-rejects; aFROMinside a CTE body referencing__rdbms_*parse-rejects.
Later phases continue ADR-0030's plan unchanged — Phase 3 (DML),
Phase 4 (DDL), Phase 5 (DSL → SQL echo), Phase 6 (polish).
ADR-0030 §13 OOS items (window functions, LATERAL, function
allowlist, quoted identifiers) remain tracked separately and are
authored if and when they are taken up; they are not implicit
follow-ups of Phase 2.
Amendment 1 — Empirical scope of column-origin metadata (2026-05-20)
§12 was written conservatively: it constrained type recovery to
projection items "structurally a single column reference" and
listed "subquery expressions" alongside arithmetic and CASE as
cases that stay None. The implementation plan's Open Question 1
(docs/plans/20260520-adr-0032-phase-2.md) captured the matching
uncertainty about CTEs and scalar subqueries, leaving the test in
sub-phase 2f to "assert the actual behaviour (not the wished-for
behaviour)".
A throwaway probe against the pinned bundled SQLite (run
2026-05-20, with rusqlite 0.39.0 + column_metadata) settles
the question. Across twenty representative query shapes, the
engine's sqlite3_column_table_name / sqlite3_column_origin_name
metadata follows through:
- direct bare column refs (the baseline);
AS aliasprojections (the alias remaps the output name but the origin pair stays the source(table, column));- table-alias qualified refs (
u.name→(users, name)); - non-recursive CTEs, including
SELECT *bodies, bare-ref bodies, qualified-ref bodies, and(col-list)-renamed bodies (the rename remaps the output name; origin stays the underlying column); - CTE chains (a CTE that selects from a prior CTE — origin traces back to the base table);
- derived tables in
FROM (SELECT …) AS sub(out-of-scope for Phase 2 per §13 OOS-1, but useful to note: if ever admitted, type recovery comes for free); - scalar subqueries used as a projection primary (
SELECT (SELECT name FROM users WHERE id = 1)— origin is preserved whether the subquery has an outer alias or not); UNION/UNION ALL/INTERSECT/EXCEPTcompound queries (result columns carry the first leg's origin);- multi-table
JOINprojections (per-column origin per leg); IN (SELECT …)subqueries inWHERE(the inner subquery does not affect the outer projection's origin).
The metadata returns None for exactly two structural classes:
- Computed projections — function calls, arithmetic
expressions, string concatenation,
CASEexpressions, literals, the*andt.*wildcards. Expected; pedagogically obvious; no surprise for the learner. - Recursive CTE result columns (
WITH RECURSIVE r(n) AS (SELECT 1 UNION ALL SELECT n + 1 FROM r WHERE n < 5) SELECT n FROM r). The recursion materialises through an internal temporary table that has no base-column origin to point at. This is the one structural surprise — a recursive-CTE result column is typeless even when it is structurally a bare name reference, because the engine cannot trace the column back past the recursion.
What §12's resolution rule becomes
The original §12 rule classifies projection items structurally
(unqualified ident / qualified ref → recover; everything else →
None). The empirical finding makes that classification redundant
and slightly wrong: it misses scalar subqueries and CTE-routed
refs that the engine does carry through, and it would have
needed extending for (col-list)-renamed CTEs.
The amended posture: trust the engine's column-origin metadata
verbatim. For each result column, call
column_table_name(i) / column_origin_name(i). If both return
Some, look the pair up in the active SchemaCache and use the
playground type. If either is None, the slot stays None and
the renderer falls back to neutral alignment. No structural
classification of the projection item is needed; the grammar tier
stays uninvolved (preserving ADR-0031 §2's "no AST" decision and
ADR-0030's "one source of truth" rule, both as before).
The "structurally a single column reference" definition in §12's
Resolution rule is superseded by the engine-driven rule
above. The §12 Implementation seam is unchanged in approach
(engine-side column-origin lookup is still the mechanism), but
the speculative fallback paragraph ("If exposure turns out to be
awkward, the fallback is a small post-parse walk over the
projection-item subtrees in the MatchedPath") is moot — the
exposure works, and the engine's metadata is broader than a
grammar-side walk could be without re-implementing SQLite's
query-planner traceback. The fallback path is removed.
Effect on the Phase-2 plan's sub-phase 2f
The 2f exit gate's "CTE pass-through" row should be asserted
positive (recovers Some(text)). The "Subquery result" row,
which the plan left as "assert whichever behaviour the engine
exhibits", should be asserted positive as well. A new explicit
2f test row covers the named limitation: a recursive CTE result
column must produce column_types[0] = None and the renderer
must fall back to neutral alignment without panicking.
The catalog and grammar-side work in 2a–2e is unaffected by this
amendment. Only 2f's test list and the worker's
resolve_select_column_types helper change shape (the helper
becomes simpler — no structural classification, just a direct
metadata lookup per result column).
This amendment narrows the honest limitation in §12 from "computed / non-direct projection items" to "computed projections and recursive CTE result columns" — a tighter, factually verified carve-out.
Amendment 2 — §10.6 fixup-pass mechanism (2026-05-20)
§10.6's prescription for the post-walk fixup is written in
terms of "rewriting the highlight class" on projection-list
Ident terminals — downgrading "column" → "unknown identifier"
when an ident doesn't belong to the eventual from_scope, or
upgrading the reverse direction once a FROM is typed. The
implementation chose a different mechanism that achieves the
identical user-visible effect; this Amendment records the
choice so a reader of §10.6 doesn't go looking for a literal
per_byte_class rewrite step that does not exist.
Mechanism actually used
Two pieces, both already in the codebase by the end of sub-phase 2d:
-
Two-pass schema-existence diagnostic. The 2d rewrite of
schema_existence_diagnostics(src/dsl/walker/mod.rs) runs a pre-pass over the matched path that collects everyIdentSource::Tables/cte_name/table_aliasident into a single binding vec, regardless of where in the path it sits. The main pass then resolves eachsql_expr_identagainst the complete binding set. A projection ident that resolves under the eventual FROM scope produces no diagnostic; one that doesn't produces anunknown_columndiagnostic on its own span. -
Diagnostic-overlay renderer.
src/input_render.rsreads the walker's diagnostic list at every keystroke and overlays each diagnostic's span with the appropriate colour (Error red for unknown-column, Warning for type-mismatch /LIKE-on-numeric / etc.). The overlay sits on top of the walker'sper_byte_class(which keeps all idents atHighlightClass::Identifier).
Combined, the two yield the §10.6 user-visible behaviour:
typing select bogus_col, the diagnostic emits and the
overlay paints the ident red as soon as a FROM appears that
shows the column doesn't exist; typing select real_col, no
diagnostic emits and the ident stays Identifier-coloured.
Within one debounce cycle.
Why this is equivalent
§10.6's stated goal is correctness of the end-of-walk
classification — "rewriting the highlight class" is one
implementation strategy for that goal. The HighlightClass
enum in the codebase has only one identifier slot
(Identifier); the Error tint comes from diagnostic overlay,
not from a separate Column vs UnknownIdentifier class.
The two-pass diagnostic pass is the "post-walk fixup" that
§10.6 calls for — it just runs inside the diagnostic emitter
rather than as a separate rewrite step. The integration
point (§10.6's "final stage of the walk itself") still
holds: schema_existence_diagnostics runs after the walk's
main work, mutating the walker's accumulated diagnostic
vector in place. Consumers see a single coherent snapshot.
Completion mid-typing
§10.6's second user-visible promise — "during-typing
completion of projection-list column names uses the global
fallback" — is preserved as a posture, but improved at the
edges in sub-phase 2e by a look-ahead probe in
src/completion.rs. When the leading walk produces no
from_scope (the projection-before-FROM state) and the
full input does have a FROM after the cursor, a second walk
on the full input populates the binding set, and column
candidates narrow to that scope. The fallback to global
SchemaCache.columns remains the path when the full input
doesn't parse cleanly (e.g., the user deleted * and is
mid-edit). This is a strict improvement: the realistic
"edit an existing query" workflow now narrows correctly.
What §10.6's prescription becomes
The "rewrite the highlight class" wording is superseded by:
the post-walk diagnostic pass re-resolves projection
idents against the complete scope and emits / withholds the
unknown-column diagnostic accordingly; the renderer's
diagnostic-overlay path achieves the visual change. No
new HighlightClass variant is required.
§10.6's other prescriptions stand verbatim — the integration point (final walk stage, in-place mutation of walker accumulators), the per-keystroke re-walk (ADR-0027's debounced cadence), and the ORDER BY no-fixup-needed clarification.
See also
- ADR-0005 — the ten-type vocabulary §10 resolves back to.
- ADR-0016 — the data-table renderer SELECT results reuse.
- ADR-0019 — the friendly-error layer engine-side rejections route through (§7).
- ADR-0021 — per-command parse-error usage; the Phase-2 surface inherits the framework, Phase 6 polishes per-clause messages (§11 OOS-12).
- ADR-0022 — ambient typing assistance. §5/§6/§8 inherit its
keyword-completion / highlighting / hint mechanisms for free,
but §10 extends its
IdentSource::Columns/SchemaCache/WalkContextinfrastructure with the scope accumulators, qualified-prefix narrowing, and the post-walk fixup pass that Phase 2 needs. - ADR-0023 / ADR-0024 — the unified grammar tree Phase 2 extends.
- ADR-0026 — the
WHEREgrammar'sSubgrammarnode, depth counter, andMAX_SUBGRAMMAR_DEPTH = 64cap, all reused unchanged (§9). - ADR-0027 — the validity indicator, free for the Phase-2
surface; §1 (ERROR/WARNING guideline) is the source quoted
verbatim in §11; Amendment 1 (
LIKE-on-numeric WARNING) is the one that the SQL-expression predicate-warnings gap of §11.6 closes for the SQL surface. - ADR-0028 — the styled
OutputLinemechanism the renderer uses; not directly touched by Phase 2. - ADR-0030 — the parent ADR; §3 commissions this phase, §4/§6 fix execution-as-text, §7 fixes engine neutrality, §11 fixes history / replay, §13 fixes the long-running OOS list.
- ADR-0031 — the SQL expression grammar this ADR extends additively (§5, §6); §7 named the two extensions implemented here.
docs/simple-mode-limitations.md— the DSL limits advanced mode lifts; Phase 2 lifts the JOIN, subquery, set-op, CTE, and grouping limits.