Adds tests/typing_surface/where_expression.rs — 9 matrix cells for the complex WHERE / show-data limit typing surface: operator candidates after an operand, AND / OR after a predicate, NOT, BETWEEN / IN bounds, and `show data` where / limit. Writing the cells surfaced a grammar bug. `predicate_tail`'s `[NOT] negatable` branch started with `Optional(not)`, and an Optional-first `Seq` always "commits" — so on an incomplete input the walker's `Choice` returned that branch's `Incomplete` early and discarded every sibling branch's expected set, dropping `is` and the comparison operators from completion after a column. Fixed by splitting it into explicit `NOT negatable` and bare `negatable` branches — no `predicate_tail` branch starts with an `Optional` now. The matched terminal sequence is unchanged, so `build_expr` is untouched. Docs: ADR-0026 gains an "As-built notes" section recording the option-1 builder realization, its two deviations from the §3 sketch, and the deferral of §7 diagnostic flagging to ADR-0027. requirements.md C5a -> [x] (steps 1-4) with the test baseline refreshed to 1079; CLAUDE.md's deferred list reconciled (C5a implemented; the QA1/QA2 note now points at ADR-0028).
20 KiB
ADR-0026: Complex WHERE expressions
Status
Accepted
Context
The requirements checklist commits, in C5a, to complex
WHERE expressions — AND / OR, comparison operators,
LIKE — for the update, delete, and show data row
filters. It is described there as the bridge from DSL
fluency toward real SQL. Today the DSL is well short of
that:
- The only filter the DSL parses is a single
where <col> = <val>equality. There is noAND/OR, no operator other than=, noLIKE, noIS NULL, no parentheses. updateanddeletecarry that filter asRowFilter::Where { column, value }.show data <T>carries no filter at all — it always selects every row.- The
ValueAST is purely syntactic (Number,Text,Bool,Null); per-column type handling happens at bind time indb.rs.
Three things make this the right moment, and shape the decision:
QA1(EXPLAIN QUERY PLAN) needs a filtered query. An unfilteredSELECT * FROM talways plans as a full scan; an index can never appear. QA1's pedagogical payoff depends on aWHEREwhose plan flips between a scan and an index search.CHECKconstraints (C3) need an expression grammar. ACHECK (Age >= 0 AND Age < 150)is the same expression problem. A throwaway mini-grammar forCHECKplus a second one forWHEREwould be waste; this ADR builds the grammar once.- The grammar architecture must grow to host it. The
unified walker grammar (ADR-0023 / ADR-0024) is a
non-recursive trie of
&'staticNodes, and its parse output (MatchedPath) is a flat list of matched terminals. AWHEREexpression is recursive and its shape is data-dependent — neither fits as the grammar stands.
The architectural problem, precisely
A boolean expression — a = 1 AND (b > 2 OR c LIKE 'x%')
— is recursive (a parenthesised group is itself an
expression) and carries operator precedence. Two facts
about the current walker:
- The
Nodetree is acyclic. Every combinator references its children through&'static [Node]/&'static Node, and the registry lives inconsts. Aconstcannot refer to itself, so no node can close a cycle.OptionalandRepeatedalready hold a&'static Nodereference — recursion through a reference is expressible if the fragment is a namedstatic— butSeqandChoiceembed their children by value in a slice, and a cyclic value has no finite representation. MatchedPathis flat. It is aVec<MatchedItem>of matched terminals in source order; the combinators shape the order but record no grouping. For every command today that is enough: each command's shape is a fixed template, so the AST builder reads terminals by position or by role. A recursive expression has no fixed template —where a = 1andwhere (a=1 or b=2) and c=3have different shapes — so a flat terminal list cannot be rebuilt into the expression tree without parsing it a second time.
Left recursion is a third fact, and it is a property of
parsing technique rather than of this codebase: a
top-down walker cannot consume a rule whose leftmost
symbol is the rule itself (expr := expr OP expr recurses
without consuming input). The standard remedy — a
stratified, left-factored grammar — is adopted below.
Decision
1. The expression grammar
The WHERE expression is a stratified grammar — one layer
per precedence tier. Stratification removes left recursion
(every recursion is guarded by a token) and encodes
operator precedence in the layering, so there is no
separate precedence-resolution step.
or_expr := and_expr ( OR and_expr )*
and_expr := not_expr ( AND not_expr )*
not_expr := NOT not_expr | bool_primary
bool_primary := ( or_expr ) | predicate
predicate := operand cmp_op operand
| operand [ NOT ] LIKE operand
| operand [ NOT ] BETWEEN operand AND operand
| operand [ NOT ] IN ( operand [ , operand ]* )
| operand IS [ NOT ] NULL
operand := literal | column_ref
cmp_op := = | != | <> | < | <= | > | >=
- Operator set: the six comparisons, with both
!=and<>accepted (<>is standard SQL,!=the common variant — the engine accepts both);AND/OR/NOT; parentheses;LIKEwith%/_wildcards;IS NULL/IS NOT NULL;IN;BETWEEN.LIKE,IN, andBETWEENtake an optional infixNOT, mirroringIS NOT NULL. - Operands are a column reference or a literal — not a
nested expression. Parentheses group boolean
sub-expressions (
bool_primary), not comparison operands. A bare column reference is not a boolean expression: a predicate always has an operator (writeActive = true, notActive). - The only recursion is
( or_expr )andNOT not_expr; each consumes a token ((orNOT) before recursing, so the greedy top-down walker always makes progress. - Nesting depth is capped at 64. Hand-written
WHEREclauses do not approach this; the cap exists only so pathological input (((((…))))) yields a friendly "expression nested too deeply (limit 64)" error rather than a stack overflow.
The grammar is deliberately a subset of standard SQL's
WHERE syntax, so a learner's knowledge transfers
directly when advanced-mode SQL (Q1) lands.
2. Grammar architecture: a reference-following node
Seq / Choice embed children by value and cannot hold a
cyclic node. One new Node variant closes the gap:
/// Walks the referenced node once, mandatory. Because the
/// reference is a `&'static Node`, a named `static`
/// fragment may appear inside its own subtree — the
/// mechanism that lets the expression grammar recurse.
/// (ADR-0023 sketched this as `SubgrammarRef`.)
Subgrammar(&'static Node),
Subgrammar is the static counterpart of the existing
DynamicSubgrammar (a walk-time factory). The expression
grammar's tiers are declared as named static items;
bool_primary's ( or_expr ) branch reaches or_expr
through Subgrammar(&OR_EXPR), and not_expr reaches
itself the same way. The walker gains one match arm — walk
the referenced node once — plus the depth counter for the
§1 cap.
The expression grammar is one fragment, referenced by
update, delete, and show data alike — defined once.
3. The expression result — built selectively
MatchedPath — the walker's flat list of matched
terminals — is left unchanged. The recursive structure
lives only where it is needed: inside the expression.
The expression grammar fragment carries its own
AST-fragment builder. As the walker recurses through the
stratified tiers, that builder runs — the walker's
recursion is the precedence-correct tree, so the builder
assembles a nested Expr directly, with no second parse
and no separate precedence pass. The finished Expr is
carried as a single item in the otherwise-unchanged flat
MatchedPath (MatchedKind gains one variant to hold a
built expression).
Every existing command builder is therefore genuinely
untouched — the flat path it reads is exactly as before.
update / delete / show data take that one expression
item and read its Expr.
This is the "selectively if necessary" option: the parser
gains structured output exactly where the grammar is
recursive, and nowhere else. A system-wide hierarchical
MatchedPath was considered and rejected — it would
record group structure for every command while only the
expression consumed it, leaving the non-expression
grouping computed but unread, and so untested. The general
"a grammar fragment may carry a builder" mechanism
introduced here is exercised by its one user; nothing is
recorded that nothing reads.
4. The Expr AST
A new recursive expression AST joins the command AST:
pub enum Expr {
Or(Vec<Expr>),
And(Vec<Expr>),
Not(Box<Expr>),
Predicate(Predicate),
}
pub enum Predicate {
Compare { left: Operand, op: CompareOp, right: Operand },
Like { target: Operand, pattern: Operand, negated: bool },
Between { target: Operand, low: Operand, high: Operand, negated: bool },
In { target: Operand, items: Vec<Operand>, negated: bool },
IsNull { target: Operand, negated: bool },
}
pub enum Operand { Column(String), Literal(Value) }
pub enum CompareOp { Eq, NotEq, Lt, LtEq, Gt, GtEq }
Or / And are n-ary — a flat a AND b AND c is one
And of three. Single-child tiers collapse: a predicate
reached through the or → and → not layers with no
connective is just that Predicate, not three wrappers.
RowFilter changes from
RowFilter::Where { column: String, value: Value }
to
RowFilter::Where(Expr)
for update / delete. show data carries
filter: Option<Expr> and limit: Option<u64>.
5. The commands
update <T> set <assignments> ( where <expr> | --all-rows )
delete from <T> ( where <expr> | --all-rows )
show data <T> [ where <expr> ] [ limit <n> ]
update/deletekeep ADR-0014's mandatory where-or---all-rowschoice; a complex expression satisfies thewhereside.show datagains an optionalwhere. Reading every row stays the safe default for a read, so no--all-rowsopt-in is needed there — the clause is simply optional.show datagains an optionallimit <n>(<n>a non-negative integer). Whenlimitis present the query is implicitly ordered by the table's primary key, solimit 20is a stable "first 20 by primary key" rather than an arbitrary subset — every table created through the DSL has a primary key. Explicitorder byis out of scope (§10).
6. SQL generation
The Expr is compiled to a parameterised SQL WHERE
string:
- Every literal becomes a
?placeholder bound as a parameter — never spliced into the SQL text. Identifiers arequote_ident-quoted. - A literal compared against a column is converted to that
column's storage representation through the existing
bind_for_columnpath, exactly as the currentwhere col = valdoes. - Connectives,
NOT, and parentheses are emitted from the tree structure. limitemitsLIMIT ?with the bound count, plus the implicitORDER BYover the primary-key column(s) (§5).
The application never evaluates the expression itself — the database does, and re-derives precedence from the operators. The expression is "passed through" only in that sense; the raw user text is never forwarded.
7. Type handling — permissive and advisory
A type mismatch in a comparison is flagged, not blocked. This matches the app's ambient-assistance posture (ADR-0022): the tool indicates problems, it does not refuse input.
- A literal in
column OP literalis type-checked against the column. When the types are compatible the literal is converted and bound per the column's type (§6). - When they are not compatible —
Name > 5on a text column,Age LIKE '5%'on an int column — the mismatch is surfaced through the existing highlight and hint channels as an error-class annotation, but the command still parses, still submits, and still runs. The literal binds by its own syntactic type and the database's comparison rules take over — which is precisely the behaviour a learner is experimenting to observe. - This relaxes current behaviour. Today
bind_for_columnrejects a type-mismatchedWHEREliteral; under this ADR it does not. The relaxation is scoped toWHEREcomparisons. Writes (insert,update … set) stay strict: STRICT storage genuinely cannot hold a mistyped value, so a mistyped write is a real error, not an experiment. = NULL/!= NULLis a specific flagged case. It is valid syntax that almost never does what the user intends (in SQL it is never true). The walker special-cases a comparison whose operator is=/!=and whose operand is theNULLliteral: it is highlighted as an error, and the hint points atIS NULL/IS NOT NULL. As with type mismatches it still runs if submitted — a learner who wants to see whatx = NULLdoes may.
Always-on submit-time signalling of flagged-but-runnable
input (an (INVALID) / WARNING marker at the input
field's edge) is a separate, general concern — see §10.
8. Completion, hints, highlighting
Because the expression is parsed in-grammar — not handed to an opaque sub-parser — the ambient-assistance machinery (ADR-0022) works inside an expression with no separate implementation:
- column-name completion resolves against the command's table;
- value positions carry per-type hints, as they do for the
current
where col = val; - operator keywords (
and,or,like,between, …) surface as completion candidates where the grammar allows them; - syntax highlighting walks the same tree.
Type-mismatch and = NULL flagging (§7) surface through
these same highlight and hint channels.
9. Errors
Parse errors continue to route through the existing
ParseError shape, so ADR-0021's per-command usage help
and the hint panel keep working for the new clauses. The
depth-cap breach (§1) is a friendly error of the same
kind.
10. Out of scope
ORDER BY.limituses implicit primary-key ordering for determinism; explicitorder byis a clean future addition, tracked separately.LIMIT … OFFSET, andlimitanywhere other thanshow data.- Operands beyond a column or a literal — arithmetic
(
a + b), string concatenation, scalar functions, subqueries,EXISTS. The playground'sWHEREcompares columns and literals. - A bare column as a boolean (
where Active). - The input-field validity indicator. An always-
visible
(INVALID)/WARNINGmarker at the edge of the input field — signalling, before submit, that the current input would error or is flagged — is a general feature spanning every command, not justWHERE. It gets its own small ADR; this ADR defines only what inside aWHEREis flagged. The indicator is to carry two severities: a hardERROR/INVALID(the input cannot run) and a softerWARNING(it runs but is probably not intended — type mismatches,= NULL). CHECKconstraints. The constraints ADR (C3) will reuse this expression grammar; it is not built here.
Consequences
C5ais satisfied;show datagains filtering andlimit(advancingC5/V5);QA1is unblocked; the futureCHECKconstraint has an expression grammar to reuse.- The grammar gains one node variant (
Subgrammar) and a recursion-depth counter. MatchedPathis unchanged; the expression fragment carries an AST-builder that produces anExpr, carried as a single matched item. Existing command builders are untouched.- A new recursive
ExprAST joins the command AST;RowFilterchanges fromWhere { column, value }toWhere(Expr). - Type-mismatched
WHEREcomparisons change from refused to flagged but runnable — a deliberate, scoped behaviour change (§7). - Old
history.loglines using the previouswhere col = valform remain valid — that form is a strict subset of the new grammar — soreplayis unaffected. (Not a design driver; noted.) - Forward-look toward
Q1: advanced-mode SQL will be parsed bysqlparser-rs, a separate parser, so thisExprAST is not literally shared with it. The value of the DSL expression being a SQL subset is pedagogical — learner knowledge transfers — not code reuse.
Implementation notes
A sensible build order, each step guarded by the test suite and the typing-surface matrix:
- The
Subgrammarnode and the recursion-depth counter — the walker capability for a recursive fragment.MatchedPathis unchanged. No user-visible change. - The expression grammar fragment, the
ExprAST, and the fragment-builder the walker invokes to produce theExpr. - Wire the fragment into
update/delete(replacing the oldwhere) and intoshow data(newwhere, newlimit). Expr→ parameterised SQL generation; the implicit primary-keyORDER BYforlimit.- Schema-aware type-mismatch and
= NULLflagging in the walker. - Typing-surface matrix cells for the new surface.
As-built notes (2026-05-18)
Steps 1–4 are implemented and committed; step 5 (the §7 diagnostic flagging) is deferred — see below. Realization choices, and where they deviate from the design sketch above:
- §3 builder — option 1 ("reconstruct in builder"),
chosen with the project owner before implementation. The
stratified grammar is walked normally; its terminals flow
into the flat
MatchedPathunchanged (driving highlight / completion / the expected-set).grammar::expr::build_exprthen folds that flat terminal slice into theExpr— a deterministic recursive descent mirroring the grammar tiers, run only at submit-time dispatch, never per keystroke. Two honest deviations from the §3 wording:- No
MatchedKind::Exprvariant.MatchedPathstays purely terminals — arguably more faithful to "MatchedPath stays flat" than carrying a builtExprin it. TheExpris assembled in the commandast_builders (build_update/build_delete/build_show), which already reconstruct structuredCommands from the flat path;build_expris the same pattern, one tier deeper. - There is a second structural pass over the
expression tokens, scoped to submit-time dispatch. "No
second parse" is read as "no separate parser framework":
the walk validates and drives assistance,
build_expris the singleast_builderfor the fragment — the same category asbuild_insert.
- No
- Grammar shape.
predicateis factored asoperand predicate_tail(shared operand prefix), and the infixNOTis factored in front of theLIKE/BETWEEN/INchoice — so the walker's first-commit-winsChoicesemantics discriminate branches on a cleanly-failing first token. Subgrammardepth.MAX_SUBGRAMMAR_DEPTH = 64counts activeSubgrammarrecursion frames. The stratified grammar descends ~4–5 frames per parenthesis level, so the effective parenthesis-nesting limit is roughly a dozen — far past any hand-written filter; the cap is purely a stack-overflow guard.- §8 hints. The expression's right-hand operands resolve
through a schema-aware
DynamicSubgrammar(where_rhs_ operand) so the hint panel narrows to the compared column's type, exactly as the pre-ADRwhere col = valslot did. The operand grammar carries no validators — permissive per §7. - Step 5 deferred to ADR-0027. The §7 behaviour
relaxation is done:
bind_where_literalbinds a type-mismatched WHERE literal by its syntactic shape, and the pre-ADR bind-time rejection is gone. The §7 diagnostic flagging — surfacing a type-mismatched comparison or= NULLas an error-class highlight + hint — is the seam with ADR-0027, which designs the walker diagnostics-severity model that flagging belongs in (and whose WARNING severity is defined to have "no triggers until ADR-0026 is implemented"). Building the flagging as a standalone mechanism first would be reworked by ADR-0027; the recommendation is to implement it as the first triggers of ADR-0027's model.
See also
- ADR-0009 — DSL command syntax conventions (
--flags, keyword clauses). - ADR-0014 — data operations, the
Valuemodel,bind_for_column, the mandatory where-or---all-rowsrule, auto-show. - ADR-0021 — the parser as source of truth for H1a parse-error help.
- ADR-0022 — ambient typing assistance: the highlight, hint, and completion machinery the expression plugs into.
- ADR-0023 / ADR-0024 — the unified grammar tree this
extends; ADR-0023 sketched the
SubgrammarRefnode realised here asSubgrammar. - ADR-0025 — indexes; the reason
QA1, and thus a filtered query, is now worthwhile.