# ADR-0031: The SQL expression grammar ## Status Accepted ## Context ADR-0030 made advanced mode a body of **SQL grammar inside the unified grammar tree** (ADR-0023/0024) rather than a separate batch parser. It deferred two large grammar slices to their own focused ADRs (ADR-0030 §3): the **full `SELECT` grammar** and the **SQL expression grammar**. This ADR fixes the second. The SQL expression grammar is the fragment that fills every expression slot in advanced-mode SQL — ADR-0030 §3 names them: `WHERE`, `HAVING`, `CHECK`, `SELECT` projections, and `DEFAULT`. ADR-0030 §3 describes it as "the superset of ADR-0026's `WHERE` grammar" — adding arithmetic, function calls, `CASE`, and (eventually) subquery expressions on top of the comparison / `LIKE` / `IN` / `BETWEEN` / `IS NULL` predicate set that ADR-0026 already authored for the DSL. It is the first concrete piece of ADR-0030's phased plan: ADR-0030 Phase 1 ("Foundations + first `SELECT`") opens with "Author the core SQL **expression grammar** — the ADR-0026 superset — as its own ADR." This is that ADR. ### What ADR-0026 already established ADR-0026 authored a recursive `WHERE` expression for the DSL. The machinery this ADR builds on is all in place: - **`Node::Subgrammar(&'static Node)`** — a reference-following node that lets a named `static` grammar fragment appear inside its own subtree, so a recursive grammar can be expressed even though `Seq`/`Choice` embed children by value and cannot close a cycle. - **A stratified grammar** — one named `static` `Node` per precedence tier — which removes left recursion (every recursion is guarded by a token) and encodes precedence in the layering. - **`WalkContext::subgrammar_depth`** and `MAX_SUBGRAMMAR_DEPTH = 64` — a stack-overflow guard that turns pathologically nested input into a friendly error. - **The factored `predicate_tail`** — the shared operand prefix matched once; the infix `NOT` factored as an explicit `NOT negatable` branch; no `Choice` branch starting with an `Optional` (an `Optional`-first `Seq` "commits" and discards sibling branches' expected sets). This ADR reuses every one of those. The new grammar is larger, but it is the same *kind* of grammar, walked by the same walker. ### Why this is not just "extend `expr.rs`" The DSL's `WHERE` grammar (`src/dsl/grammar/expr.rs`) is bound by ADR-0026's deliberate teaching limits, recorded in `docs/simple-mode-limitations.md`: operands are a column or a literal — *no* arithmetic, *no* string concatenation, *no* scalar functions, *no* subqueries. Those limits are a feature of simple mode, not an accident; the DSL `WHERE` grammar must keep them. Advanced mode is the surface that lifts them (ADR-0030 §4). So the SQL expression grammar cannot be the DSL grammar with a few nodes added — it has a different operand set (a full scalar expression, not column-or-literal) and a different relationship to its consumers (see Decision §2). It is a parallel fragment. Keeping it parallel also keeps simple mode's 1240-test surface untouched: nothing in `expr.rs` changes. ## Decision ### 1. One unified expression ladder ADR-0026's DSL grammar stratifies into a *boolean* layer (`or`/`and`/`not`/`bool_primary`) sitting above a *predicate* layer, because the DSL deliberately forbids a boolean sub-expression as a comparison operand — `(a > b) = (c > d)` cannot be written. Standard SQL draws no such line: a boolean *is* a value, `AND` / `OR` / `NOT` and the comparison operators are simply operators at their own precedence tiers, and a parenthesised group is a whole expression regardless of whether it reads as "boolean" or "scalar". The SQL expression grammar therefore is a **single precedence ladder**, loosest tier to tightest: ``` expr := or_expr or_expr := and_expr ( OR and_expr )* and_expr := not_expr ( AND not_expr )* not_expr := NOT not_expr | predicate predicate := additive predicate_tail? predicate_tail := cmp_op additive | [ NOT ] LIKE additive | [ NOT ] BETWEEN additive AND additive | [ NOT ] IN ( additive ( , additive )* ) | IS [ NOT ] NULL cmp_op := = | <> | != | < | <= | > | >= additive := multiplicative ( ( + | - | || ) multiplicative )* multiplicative := unary ( ( * | / | % ) unary )* unary := ( - | + ) unary | primary primary := literal | ( or_expr ) | case_expr | name_or_call name_or_call := identifier [ '(' call_args? ')' ] call_args := '*' | [ DISTINCT ] or_expr ( , or_expr )* case_expr := CASE [ or_expr ] ( WHEN or_expr THEN or_expr )+ [ ELSE or_expr ] END literal := number | string | TRUE | FALSE | NULL ``` Precedence, loosest first: `OR`, `AND`, `NOT`, the comparison / predicate tier, additive (`+ - ||`), multiplicative (`* / %`), unary sign, primary. This is standard SQL operator precedence restricted to the teaching-relevant operators. Notes on specific productions: - **`name_or_call` is factored, not a `Choice`.** A function call (`upper(Name)`) and a column reference (`Name`) share an identifier prefix. Splitting them into two `Choice` branches would let the function-call branch *commit* on the identifier and then fail at the missing `(`, discarding the column-ref branch (the ADR-0026 "no `Optional`-first branch" hazard, in reverse). Instead the identifier is matched once and the `( call_args )` group is an `Optional` tail: present → a call, absent → a column reference. The grammar need not decide which — see §2 — it only validates that one of the two shapes holds. - **`call_args` handles `*` and `DISTINCT`.** `count(*)` is the one place `*` is an argument; `count(distinct col)` the one place `DISTINCT` leads an argument list. (The projection-level `select *` is *not* an expression — it belongs to the `SELECT` grammar, ADR-0030 / Phase 1, not here.) The grammar admits function calls structurally; it does not know which names are aggregates — that distinction is the engine's, and matters only once `GROUP BY` lands (ADR-0030 Phase 2). - **`case_expr` covers both forms** — searched `CASE WHEN … END` and simple `CASE WHEN … END`. Every sub-part is an `or_expr` for uniformity (SQL allows any expression in each slot); `END` closes it. - **`||` is string concatenation**, standard SQL, at the additive tier. It lifts `simple-mode-limitations.md`'s "no string concatenation". - **`%` is modulo.** It is not in ISO SQL (which spells it `MOD(a, b)`), but it is near-universal across mainstream engines and is what a learner expects. ADR-0030's "pedagogy wins ties" admits it; `MOD` also remains reachable through the generic `name_or_call` path. ### 2. The fragment validates; it builds no AST ADR-0026's `WHERE` grammar carries an AST-fragment builder (`build_expr`) that folds the matched terminals into a recursive `Expr`, because its consumers — `update` / `delete` / `show data` — are typed `Command`s whose executor compiles that `Expr` to parameterised SQL. **The SQL expression grammar deliberately builds no AST.** This follows directly from ADR-0030 §4 and §6: - `WHERE` / `HAVING` / `SELECT` projections live inside a `SELECT` or a DML statement, and ADR-0030 §4 executes those "as the validated SQL itself … they change no schema, so modelling them as a typed `Command` buys nothing." There is no `Expr` to compile — the engine parses the SQL. - `CHECK` and `DEFAULT` live inside advanced-mode DDL. ADR-0030 §11 stores their expressions in `project.yaml` "as SQL the user could re-enter" — text, not a structured tree. ADR-0030 §4 is explicit that these expressions are "**not** lowered into the DSL's deliberately-limited `Expr`." So no consumer of this grammar wants an `Expr`. The fragment's entire job is the other three walker outputs: 1. **Accept or reject** — the input either is or is not a well-formed in-subset SQL expression. 2. **The flat `MatchedPath`** of matched terminals — which is what drives syntax highlighting, completion, the expected-set, and the hint panel (§5). 3. **A source span.** A consumer that needs the expression *as text* (the `SELECT` builder assembling `Command::Select`'s SQL; a future `CHECK` builder) recovers it by slicing the original source between the first and last matched terminal's byte offsets. The terminals already carry `span` for highlighting; nothing new is needed on the matched path. This is a real simplification over ADR-0026 — no `build_expr` analogue, no second structural pass, no expression AST type — and it is the correct shape for a grammar whose consumers run SQL rather than compile it. The grammar tier still owns validation, highlighting, completion, and the no-left-recursion guarantee; it simply has no tree to hand back. **Consequence for the `SELECT` builder (ADR-0030 / Phase 1).** A command `ast_builder` today receives only `&MatchedPath`. The `SELECT` builder needs the original source to populate `Command::Select`'s validated SQL text. The builder signature gains a `source: &str` parameter — a mechanical sweep across the ~21 existing `CommandNode` builders (most ignore it), of the same category as ADR-0030's noted `match Command` sweep. It is called out here because it is a direct consequence of the no-AST decision; the change itself belongs to the Phase 1 SELECT work, governed by ADR-0030. ### 3. Recursion, and the depth cap The grammar's recursion points are all **token-guarded** — each consumes at least one token before recursing, so the greedy top-down walker always makes progress: - `not_expr := NOT not_expr` — after `NOT`. - `primary := ( or_expr )` — after `(`. - `unary := ( - | + ) unary` — after a sign. - `call_args` operands — after the call's `(`. - `case_expr` sub-parts — after `CASE` / `WHEN` / `THEN` / `ELSE`. - `IN ( … )` operands — after `IN (`. Every recursion is wired through `Node::Subgrammar(&NAMED)` referencing a named `static` tier, exactly as in `expr.rs`. The walker counts active `Subgrammar` frames in `WalkContext::subgrammar_depth`; this grammar reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap and its friendly "expression nested too deeply" error — no new walker capability is required. The ladder descends a few `Subgrammar` frames per nesting level, so the effective hand-written nesting limit is comfortably past anything a learner types; the cap is purely a stack-overflow guard. ### 4. A separate fragment, parallel to the DSL grammar The SQL expression grammar is authored in a new file, `src/dsl/grammar/sql_expr.rs`, parallel to `expr.rs` (which keeps the DSL `WHERE` grammar). They are deliberately *not* merged: - **Different operand sets.** The DSL operand is a column or a literal; the SQL operand is a full scalar expression. - **Different output.** `expr.rs` builds an `Expr`; `sql_expr.rs` builds nothing (§2). - **Mode isolation.** Simple mode must never gain arithmetic or functions — the limits in `simple-mode-limitations.md` are a teaching feature. A shared fragment risks leaking the SQL surface into the DSL grammar. - **Regression containment.** `expr.rs` is exercised by a large share of the 1240-test suite. A parallel file changes none of it. The predicate-tail shapes (`cmp_op` / `LIKE` / `BETWEEN` / `IN` / `IS NULL`) look structurally identical between the two grammars, but each branch's operand sub-node differs (column-or-literal vs `additive`), so the `static` nodes cannot literally be shared. The *design* is shared — `sql_expr.rs` follows `expr.rs`'s factoring (operand prefix matched once, infix `NOT` as an explicit branch, no `Optional`-first branch) — and that is the reuse that matters. ### 5. Ambient assistance comes for free Because the fragment is grammar in the unified tree, the walker gives it — with no expression-specific assistance code — the same ambient assistance every DSL command gets (ADR-0030 §8, ADR-0022): - **Syntax highlighting** of SQL keywords, identifiers, literals, and operators, from the per-byte highlight classes the walk records. - **Tab completion** of SQL keywords (`and`, `or`, `like`, `between`, `case`, `when`, …) and of column names — the `name_or_call` identifier slot uses `IdentSource::Columns`, so it completes against the statement's table(s) from the same `SchemaCache` the DSL uses. Function names are not completed (there is no allowlist — ADR-0030 §7 OOS-3); a typed function name simply is not a candidate. - **Hint-panel prose** at each grammar slot. - **The `[ERR]` / `[WRN]` validity indicator** (ADR-0027). - **Per-command parse-error usage** (ADR-0021). The `name_or_call` identifier slot resolves to `Columns` because, at the moment the identifier is typed, the common case is a column reference and column completion is the helpful default; a function call is recognised a token later when `(` follows. The grammar does not need to decide between the two (§2), so the slot can optimise for the common completion. ### 6. Errors and the unsupported surface A construct outside this grammar — a window function's `OVER` clause, a `CAST` with `::` syntax, an array literal — is an ordinary walker parse error, carrying the expected-set and routed through the friendly-error layer with engine-neutral wording (ADR-0030 §9, ADR-0019). There is no separate "valid SQL but unsupported" classifier — ADR-0030 §1 dropped the batch parser that would be needed for one. Expression-level engine neutrality is **best-effort**, exactly as ADR-0030 §7 states: the grammar enforces the *structural* subset (operators, `CASE`, call syntax), but because there is no function allowlist, an engine-specific function the grammar admits and the engine then rejects surfaces an engine-neutral *execution* error rather than being caught at parse time. This is the accepted honest limitation; a function allowlist remains ADR-0030 §13 OOS-3. ### 7. Out of scope - **OOS-1. Subquery expressions.** A `( SELECT … )` as a `primary`, ` ( SELECT … )`, `IN ( SELECT … )`, and `EXISTS ( SELECT … )` are part of the eventual surface (ADR-0030 §3) but cannot be realised until the `SELECT` grammar itself exists and is recursive — that is ADR-0030 Phase 2 ("`SELECT` — full"). This ADR's grammar is authored so that adding a subquery branch to `primary` (and an `IN ( subquery )` / `EXISTS` form) is an additive change: a new `Choice` branch guarded by `(`/`EXISTS`, recursing through `Subgrammar` into the `SELECT` fragment. No restructuring is foreseen. - **OOS-2. Qualified column references** (`table.column`, `alias.column`). A single-table `SELECT` (ADR-0030 Phase 1) never needs them; they become meaningful with `JOIN`s (Phase 2). `name_or_call` takes an unqualified identifier for now; a `[ '.' identifier ]` tail is an additive extension. - **OOS-3. Quoted identifiers** (`"column name"`). The DSL has no quoted-identifier syntax; introducing one is a cross-cutting lexer change, tracked separately. - **OOS-4. A function allowlist** — ADR-0030 §13 OOS-3, restated: function calls are admitted generically. - **OOS-5. An expression AST.** Explicitly not built (§2). If a future consumer genuinely needs structured expression data (none is foreseen — DDL `CHECK`/`DEFAULT` store text), that is a new decision, not a deferral. ## Consequences - A new grammar file, `src/dsl/grammar/sql_expr.rs`, exporting a single `pub static SQL_EXPRESSION: Node` (a `Subgrammar(&SQL_OR_EXPR)`) that any SQL `CommandNode` drops into its `Seq` as one node — the same drop-in shape as `expr::EXPRESSION`. - **No new walker capability.** `Subgrammar`, the depth counter, the cap, and the friendly depth error are all reused from ADR-0026 unchanged. - **No expression AST, no fragment builder** — a deliberate simplification over ADR-0026 (§2). - `expr.rs` and the simple-mode `WHERE` surface are **untouched**; the 1240-test baseline is insulated by construction (§4). - The command `ast_builder` signature gains a `source: &str` parameter (§2) — a ~21-site mechanical sweep, executed as part of the Phase 1 `SELECT` work (ADR-0030), not here. - Subquery expressions and qualified column references are authored later as additive `primary` branches (§7) — the grammar is shaped to receive them. - The fragment is the shared dependency of every advanced-mode expression slot — `WHERE`, `HAVING`, `SELECT` projections, `CHECK`, `DEFAULT` — defined once. ## Implementation notes A build order, each step guarded by the test suite. Steps 1–5 are ADR-0030 Phase 1; the fragment is consumed first by the single-table `SELECT`'s `WHERE` and projection slots. 1. **The grammar fragment** — `sql_expr.rs` with the stratified tiers of §1 as named `static` `Node`s, recursion via `Subgrammar`. No builder. `pub static SQL_EXPRESSION`. 2. **Unit tests** walking representative inputs against the fragment directly (the `expr.rs` test pattern): every operator and precedence pair, `CASE` both forms, function calls including `count(*)` and `count(distinct …)`, the full predicate set, parenthesised regrouping, the depth cap, and the keyword-case-insensitivity check. 3. **Wire it into the Phase 1 `SELECT` grammar** — the `WHERE` slot and the projection items reference `SQL_EXPRESSION` (ADR-0030 Phase 1). 4. **Highlighting / completion / hint** spot-checks — confirm the §5 assistance works through a SQL expression with no expression-specific code, via the typing-surface matrix. 5. **Engine-neutral error** spot-checks for out-of-subset constructs (§6). Later phases extend the same fragment: - **ADR-0030 Phase 2** adds the subquery `primary` branches and qualified column references (OOS-1, OOS-2) once the recursive `SELECT` grammar exists, and exercises the fragment from `HAVING`. - **ADR-0030 Phase 4** consumes the fragment from advanced-mode DDL `CHECK` and `DEFAULT`. ## See also - ADR-0019 — the friendly-error layer SQL parse and execution errors route through (§6). - ADR-0021 — per-command parse-error usage, free for SQL (§5). - ADR-0022 — ambient typing assistance; §5 is its reach into the SQL expression. - ADR-0023 / ADR-0024 — the unified grammar tree this fragment is authored into. - ADR-0026 — the DSL `WHERE` expression grammar this is the superset of: the `Subgrammar` node, the stratified-grammar technique, the depth cap, and the `predicate_tail` factoring are all inherited from it. - ADR-0027 — the validity indicator, free for SQL (§5). - ADR-0030 — advanced mode's SQL surface; §3 commissions this ADR, §4/§6 are the source of the no-AST decision (§2), §7/§13 set the engine-neutrality posture and the no-allowlist rule. - `docs/simple-mode-limitations.md` — the DSL limits this grammar lifts for advanced mode (§1, §4).