Add src/dsl/sql_functions.rs (KNOWN_SQL_FUNCTIONS) as the shared source of truth at sql_expr_ident slots: - #15: offer the functions as Tab candidates under a new CandidateKind::Function + ninth Theme colour tok_function (blue, distinct from keyword/identifier/type). - #16: restore the column-typo flag the #6 fix had dropped wholesale — invalid_ident_at_cursor now bails only when the partial prefix-matches a known function, else falls through to the schema-column check. A column named like a function (e.g. `count`) is deduped (column wins). `cast` is excluded — CAST(x AS type) is not a plain-call shape. The no-validation-allowlist posture stands: the list drives completion + the typo hint only, never parse-time acceptance. Docs: ADR-0022 Amendment 6, ADR-0031 status note, README index, requirements I3/I4 + refreshed test baseline.
20 KiB
ADR-0031: The SQL expression grammar
Status
Accepted
Context
ADR-0030 made advanced mode a body of SQL grammar inside the
unified grammar tree (ADR-0023/0024) rather than a separate
batch parser. It deferred two large grammar slices to their own
focused ADRs (ADR-0030 §3): the full SELECT grammar and the
SQL expression grammar. This ADR fixes the second.
The SQL expression grammar is the fragment that fills every
expression slot in advanced-mode SQL — ADR-0030 §3 names them:
WHERE, HAVING, CHECK, SELECT projections, and DEFAULT.
ADR-0030 §3 describes it as "the superset of ADR-0026's WHERE
grammar" — adding arithmetic, function calls, CASE, and
(eventually) subquery expressions on top of the comparison /
LIKE / IN / BETWEEN / IS NULL predicate set that ADR-0026
already authored for the DSL.
It is the first concrete piece of ADR-0030's phased plan: ADR-0030
Phase 1 ("Foundations + first SELECT") opens with "Author the
core SQL expression grammar — the ADR-0026 superset — as its
own ADR." This is that ADR.
What ADR-0026 already established
ADR-0026 authored a recursive WHERE expression for the DSL. The
machinery this ADR builds on is all in place:
Node::Subgrammar(&'static Node)— a reference-following node that lets a namedstaticgrammar fragment appear inside its own subtree, so a recursive grammar can be expressed even thoughSeq/Choiceembed children by value and cannot close a cycle.- A stratified grammar — one named
staticNodeper precedence tier — which removes left recursion (every recursion is guarded by a token) and encodes precedence in the layering. WalkContext::subgrammar_depthandMAX_SUBGRAMMAR_DEPTH = 64— a stack-overflow guard that turns pathologically nested input into a friendly error.- The factored
predicate_tail— the shared operand prefix matched once; the infixNOTfactored as an explicitNOT negatablebranch; noChoicebranch starting with anOptional(anOptional-firstSeq"commits" and discards sibling branches' expected sets).
This ADR reuses every one of those. The new grammar is larger, but it is the same kind of grammar, walked by the same walker.
Why this is not just "extend expr.rs"
The DSL's WHERE grammar (src/dsl/grammar/expr.rs) is bound by
ADR-0026's deliberate teaching limits, recorded in
docs/simple-mode-limitations.md: operands are a column or a
literal — no arithmetic, no string concatenation, no scalar
functions, no subqueries. Those limits are a feature of simple
mode, not an accident; the DSL WHERE grammar must keep them.
Advanced mode is the surface that lifts them (ADR-0030 §4). So
the SQL expression grammar cannot be the DSL grammar with a few
nodes added — it has a different operand set (a full scalar
expression, not column-or-literal) and a different relationship
to its consumers (see Decision §2). It is a parallel fragment.
Keeping it parallel also keeps simple mode's 1240-test surface
untouched: nothing in expr.rs changes.
Decision
1. One unified expression ladder
ADR-0026's DSL grammar stratifies into a boolean layer
(or/and/not/bool_primary) sitting above a predicate
layer, because the DSL deliberately forbids a boolean
sub-expression as a comparison operand — (a > b) = (c > d)
cannot be written.
Standard SQL draws no such line: a boolean is a value, AND /
OR / NOT and the comparison operators are simply operators at
their own precedence tiers, and a parenthesised group is a whole
expression regardless of whether it reads as "boolean" or
"scalar". The SQL expression grammar therefore is a single
precedence ladder, loosest tier to tightest:
expr := or_expr
or_expr := and_expr ( OR and_expr )*
and_expr := not_expr ( AND not_expr )*
not_expr := NOT not_expr | predicate
predicate := additive predicate_tail?
predicate_tail := cmp_op additive
| [ NOT ] LIKE additive
| [ NOT ] BETWEEN additive AND additive
| [ NOT ] IN ( additive ( , additive )* )
| IS [ NOT ] NULL
cmp_op := = | <> | != | < | <= | > | >=
additive := multiplicative ( ( + | - | || ) multiplicative )*
multiplicative := unary ( ( * | / | % ) unary )*
unary := ( - | + ) unary | primary
primary := literal
| ( or_expr )
| case_expr
| name_or_call
name_or_call := identifier [ '(' call_args? ')' ]
call_args := '*' | [ DISTINCT ] or_expr ( , or_expr )*
case_expr := CASE [ or_expr ]
( WHEN or_expr THEN or_expr )+
[ ELSE or_expr ]
END
literal := number | string | TRUE | FALSE | NULL
Precedence, loosest first: OR, AND, NOT, the comparison /
predicate tier, additive (+ - ||), multiplicative (* / %),
unary sign, primary. This is standard SQL operator precedence
restricted to the teaching-relevant operators.
Notes on specific productions:
name_or_callis factored, not aChoice. A function call (upper(Name)) and a column reference (Name) share an identifier prefix. Splitting them into twoChoicebranches would let the function-call branch commit on the identifier and then fail at the missing(, discarding the column-ref branch (the ADR-0026 "noOptional-first branch" hazard, in reverse). Instead the identifier is matched once and the( call_args )group is anOptionaltail: present → a call, absent → a column reference. The grammar need not decide which — see §2 — it only validates that one of the two shapes holds.call_argshandles*andDISTINCT.count(*)is the one place*is an argument;count(distinct col)the one placeDISTINCTleads an argument list. (The projection-levelselect *is not an expression — it belongs to theSELECTgrammar, ADR-0030 / Phase 1, not here.) The grammar admits function calls structurally; it does not know which names are aggregates — that distinction is the engine's, and matters only onceGROUP BYlands (ADR-0030 Phase 2).case_exprcovers both forms — searchedCASE WHEN … ENDand simpleCASE <operand> WHEN … END. Every sub-part is anor_exprfor uniformity (SQL allows any expression in each slot);ENDcloses it.||is string concatenation, standard SQL, at the additive tier. It liftssimple-mode-limitations.md's "no string concatenation".%is modulo. It is not in ISO SQL (which spells itMOD(a, b)), but it is near-universal across mainstream engines and is what a learner expects. ADR-0030's "pedagogy wins ties" admits it;MODalso remains reachable through the genericname_or_callpath.
2. The fragment validates; it builds no AST
ADR-0026's WHERE grammar carries an AST-fragment builder
(build_expr) that folds the matched terminals into a recursive
Expr, because its consumers — update / delete / show data
— are typed Commands whose executor compiles that Expr to
parameterised SQL.
The SQL expression grammar deliberately builds no AST. This follows directly from ADR-0030 §4 and §6:
WHERE/HAVING/SELECTprojections live inside aSELECTor a DML statement, and ADR-0030 §4 executes those "as the validated SQL itself … they change no schema, so modelling them as a typedCommandbuys nothing." There is noExprto compile — the engine parses the SQL.CHECKandDEFAULTlive inside advanced-mode DDL. ADR-0030 §11 stores their expressions inproject.yaml"as SQL the user could re-enter" — text, not a structured tree. ADR-0030 §4 is explicit that these expressions are "not lowered into the DSL's deliberately-limitedExpr."
So no consumer of this grammar wants an Expr. The fragment's
entire job is the other three walker outputs:
- Accept or reject — the input either is or is not a well-formed in-subset SQL expression.
- The flat
MatchedPathof matched terminals — which is what drives syntax highlighting, completion, the expected-set, and the hint panel (§5). - A source span. A consumer that needs the expression as
text (the
SELECTbuilder assemblingCommand::Select's SQL; a futureCHECKbuilder) recovers it by slicing the original source between the first and last matched terminal's byte offsets. The terminals already carryspanfor highlighting; nothing new is needed on the matched path.
This is a real simplification over ADR-0026 — no build_expr
analogue, no second structural pass, no expression AST type — and
it is the correct shape for a grammar whose consumers run SQL
rather than compile it. The grammar tier still owns validation,
highlighting, completion, and the no-left-recursion guarantee;
it simply has no tree to hand back.
Consequence for the SELECT builder (ADR-0030 / Phase 1).
A command ast_builder today receives only &MatchedPath. The
SELECT builder needs the original source to populate
Command::Select's validated SQL text. The builder signature
gains a source: &str parameter — a mechanical sweep across the
~21 existing CommandNode builders (most ignore it), of the same
category as ADR-0030's noted match Command sweep. It is called
out here because it is a direct consequence of the no-AST
decision; the change itself belongs to the Phase 1 SELECT work,
governed by ADR-0030.
3. Recursion, and the depth cap
The grammar's recursion points are all token-guarded — each consumes at least one token before recursing, so the greedy top-down walker always makes progress:
not_expr := NOT not_expr— afterNOT.primary := ( or_expr )— after(.unary := ( - | + ) unary— after a sign.call_argsoperands — after the call's(.case_exprsub-parts — afterCASE/WHEN/THEN/ELSE.IN ( … )operands — afterIN (.
Every recursion is wired through Node::Subgrammar(&NAMED)
referencing a named static tier, exactly as in expr.rs. The
walker counts active Subgrammar frames in
WalkContext::subgrammar_depth; this grammar reuses ADR-0026's
MAX_SUBGRAMMAR_DEPTH = 64 cap and its friendly
"expression nested too deeply" error — no new walker capability
is required. The ladder descends a few Subgrammar frames per
nesting level, so the effective hand-written nesting limit is
comfortably past anything a learner types; the cap is purely a
stack-overflow guard.
4. A separate fragment, parallel to the DSL grammar
The SQL expression grammar is authored in a new file,
src/dsl/grammar/sql_expr.rs, parallel to expr.rs (which keeps
the DSL WHERE grammar). They are deliberately not merged:
- Different operand sets. The DSL operand is a column or a literal; the SQL operand is a full scalar expression.
- Different output.
expr.rsbuilds anExpr;sql_expr.rsbuilds nothing (§2). - Mode isolation. Simple mode must never gain arithmetic or
functions — the limits in
simple-mode-limitations.mdare a teaching feature. A shared fragment risks leaking the SQL surface into the DSL grammar. - Regression containment.
expr.rsis exercised by a large share of the 1240-test suite. A parallel file changes none of it.
The predicate-tail shapes (cmp_op / LIKE / BETWEEN / IN /
IS NULL) look structurally identical between the two grammars,
but each branch's operand sub-node differs (column-or-literal vs
additive), so the static nodes cannot literally be shared.
The design is shared — sql_expr.rs follows expr.rs's
factoring (operand prefix matched once, infix NOT as an
explicit branch, no Optional-first branch) — and that is the
reuse that matters.
5. Ambient assistance comes for free
Because the fragment is grammar in the unified tree, the walker gives it — with no expression-specific assistance code — the same ambient assistance every DSL command gets (ADR-0030 §8, ADR-0022):
- Syntax highlighting of SQL keywords, identifiers, literals, and operators, from the per-byte highlight classes the walk records.
- Tab completion of SQL keywords (
and,or,like,between,case,when, …) and of column names — thename_or_callidentifier slot usesIdentSource::Columns, so it completes against the statement's table(s) from the sameSchemaCachethe DSL uses. Function names are not completed (there is no allowlist — ADR-0030 §7 OOS-3); a typed function name simply is not a candidate. - Hint-panel prose at each grammar slot.
- The
[ERR]/[WRN]validity indicator (ADR-0027). - Per-command parse-error usage (ADR-0021).
The name_or_call identifier slot resolves to Columns because,
at the moment the identifier is typed, the common case is a
column reference and column completion is the helpful default; a
function call is recognised a token later when ( follows. The
grammar does not need to decide between the two (§2), so the slot
can optimise for the common completion.
6. Errors and the unsupported surface
A construct outside this grammar — a window function's OVER
clause, a CAST with :: syntax, an array literal — is an
ordinary walker parse error, carrying the expected-set and
routed through the friendly-error layer with engine-neutral
wording (ADR-0030 §9, ADR-0019). There is no separate
"valid SQL but unsupported" classifier — ADR-0030 §1 dropped the
batch parser that would be needed for one.
Expression-level engine neutrality is best-effort, exactly as
ADR-0030 §7 states: the grammar enforces the structural subset
(operators, CASE, call syntax), but because there is no
function allowlist, an engine-specific function the grammar
admits and the engine then rejects surfaces an engine-neutral
execution error rather than being caught at parse time. This is
the accepted honest limitation; a function allowlist remains
ADR-0030 §13 OOS-3.
7. Out of scope
- OOS-1. Subquery expressions. A
( SELECT … )as aprimary,<op> ( SELECT … ),IN ( SELECT … ), andEXISTS ( SELECT … )are part of the eventual surface (ADR-0030 §3) but cannot be realised until theSELECTgrammar itself exists and is recursive — that is ADR-0030 Phase 2 ("SELECT— full"). This ADR's grammar is authored so that adding a subquery branch toprimary(and anIN ( subquery )/EXISTSform) is an additive change: a newChoicebranch guarded by(/EXISTS, recursing throughSubgrammarinto theSELECTfragment. No restructuring is foreseen. - OOS-2. Qualified column references (
table.column,alias.column). A single-tableSELECT(ADR-0030 Phase 1) never needs them; they become meaningful withJOINs (Phase 2).name_or_calltakes an unqualified identifier for now; a[ '.' identifier ]tail is an additive extension. - OOS-3. Quoted identifiers (
"column name"). The DSL has no quoted-identifier syntax; introducing one is a cross-cutting lexer change, tracked separately. - OOS-4. A function allowlist — ADR-0030 §13 OOS-3, restated: function calls are admitted generically.
- OOS-5. An expression AST. Explicitly not built (§2). If a
future consumer genuinely needs structured expression data
(none is foreseen — DDL
CHECK/DEFAULTstore text), that is a new decision, not a deferral.
Consequences
- A new grammar file,
src/dsl/grammar/sql_expr.rs, exporting a singlepub static SQL_EXPRESSION: Node(aSubgrammar(&SQL_OR_EXPR)) that any SQLCommandNodedrops into itsSeqas one node — the same drop-in shape asexpr::EXPRESSION. - No new walker capability.
Subgrammar, the depth counter, the cap, and the friendly depth error are all reused from ADR-0026 unchanged. - No expression AST, no fragment builder — a deliberate simplification over ADR-0026 (§2).
expr.rsand the simple-modeWHEREsurface are untouched; the 1240-test baseline is insulated by construction (§4).- The command
ast_buildersignature gains asource: &strparameter (§2) — a ~21-site mechanical sweep, executed as part of the Phase 1SELECTwork (ADR-0030), not here. - Subquery expressions and qualified column references are
authored later as additive
primarybranches (§7) — the grammar is shaped to receive them. - The fragment is the shared dependency of every advanced-mode
expression slot —
WHERE,HAVING,SELECTprojections,CHECK,DEFAULT— defined once.
Implementation notes
A build order, each step guarded by the test suite. Steps 1–5 are
ADR-0030 Phase 1; the fragment is consumed first by the
single-table SELECT's WHERE and projection slots.
- The grammar fragment —
sql_expr.rswith the stratified tiers of §1 as namedstaticNodes, recursion viaSubgrammar. No builder.pub static SQL_EXPRESSION. - Unit tests walking representative inputs against the
fragment directly (the
expr.rstest pattern): every operator and precedence pair,CASEboth forms, function calls includingcount(*)andcount(distinct …), the full predicate set, parenthesised regrouping, the depth cap, and the keyword-case-insensitivity check. - Wire it into the Phase 1
SELECTgrammar — theWHEREslot and the projection items referenceSQL_EXPRESSION(ADR-0030 Phase 1). - Highlighting / completion / hint spot-checks — confirm the §5 assistance works through a SQL expression with no expression-specific code, via the typing-surface matrix.
- Engine-neutral error spot-checks for out-of-subset constructs (§6).
Later phases extend the same fragment:
- ADR-0030 Phase 2 adds the subquery
primarybranches and qualified column references (OOS-1, OOS-2) once the recursiveSELECTgrammar exists, and exercises the fragment fromHAVING. - ADR-0030 Phase 4 consumes the fragment from advanced-mode
DDL
CHECKandDEFAULT.
See also
- ADR-0019 — the friendly-error layer SQL parse and execution errors route through (§6).
- ADR-0021 — per-command parse-error usage, free for SQL (§5).
- ADR-0022 — ambient typing assistance; §5 is its reach into the SQL expression.
- ADR-0023 / ADR-0024 — the unified grammar tree this fragment is authored into.
- ADR-0026 — the DSL
WHEREexpression grammar this is the superset of: theSubgrammarnode, the stratified-grammar technique, the depth cap, and thepredicate_tailfactoring are all inherited from it. - ADR-0027 — the validity indicator, free for SQL (§5).
- ADR-0030 — advanced mode's SQL surface; §3 commissions this ADR, §4/§6 are the source of the no-AST decision (§2), §7/§13 set the engine-neutrality posture and the no-allowlist rule.
docs/simple-mode-limitations.md— the DSL limits this grammar lifts for advanced mode (§1, §4).
Status note — known-function list layered on the slot (2026-05-30)
The sql_expr_ident slot is IdentSource::Columns and, per §1 / §5,
does not itself know which identifiers are function names — it
optimises for the common case (a column reference) and admits the
function-call shape structurally; §5 explicitly noted "function names
are not completed … a typed function name simply is not a candidate".
ADR-0022 Amendment 6 layers a curated known-function list
(src/dsl/sql_functions.rs) on top of this slot, consumed two ways:
as Tab-completion candidates so a learner can discover sum / upper
/ … (issue #15 — softening §5's "not completed" line to "completed
from a curated pedagogical list, not an allowlist for validation"),
and as the allow-list that lets the typing-time column-typo hint stay
strict at this slot — flag a partial as "no such column" only when it
matches neither a schema column nor a known function name (issue #16).
The grammar here is unchanged, and §6/§7's no-validation-allowlist
posture stands: the list drives completion + the typo hint, not
parse-time acceptance (an unknown function still parses and surfaces an
engine-neutral execution error). The list sits in the completion /
hint layer above the grammar.