# ADR-0026: Complex WHERE expressions ## Status Accepted ## Context The requirements checklist commits, in `C5a`, to *complex WHERE expressions* — `AND` / `OR`, comparison operators, `LIKE` — for the `update`, `delete`, and `show data` row filters. It is described there as the bridge from DSL fluency toward real SQL. Today the DSL is well short of that: - The only filter the DSL parses is a single `where = ` equality. There is no `AND` / `OR`, no operator other than `=`, no `LIKE`, no `IS NULL`, no parentheses. - `update` and `delete` carry that filter as `RowFilter::Where { column, value }`. `show data ` carries **no filter at all** — it always selects every row. - The `Value` AST is purely syntactic (`Number`, `Text`, `Bool`, `Null`); per-column type handling happens at bind time in `db.rs`. Three things make this the right moment, and shape the decision: 1. **`QA1` (`EXPLAIN QUERY PLAN`) needs a filtered query.** An unfiltered `SELECT * FROM t` always plans as a full scan; an index can never appear. QA1's pedagogical payoff depends on a `WHERE` whose plan flips between a scan and an index search. 2. **`CHECK` constraints (`C3`) need an expression grammar.** A `CHECK (Age >= 0 AND Age < 150)` is the same expression problem. A throwaway mini-grammar for `CHECK` plus a second one for `WHERE` would be waste; this ADR builds the grammar once. 3. **The grammar architecture must grow to host it.** The unified walker grammar (ADR-0023 / ADR-0024) is a non-recursive trie of `&'static` `Node`s, and its parse output (`MatchedPath`) is a flat list of matched terminals. A `WHERE` expression is recursive and its shape is data-dependent — neither fits as the grammar stands. ### The architectural problem, precisely A boolean expression — `a = 1 AND (b > 2 OR c LIKE 'x%')` — is recursive (a parenthesised group is itself an expression) and carries operator precedence. Two facts about the current walker: - **The `Node` tree is acyclic.** Every combinator references its children through `&'static [Node]` / `&'static Node`, and the registry lives in `const`s. A `const` cannot refer to itself, so no node can close a cycle. `Optional` and `Repeated` already hold a `&'static Node` *reference* — recursion through a reference is expressible if the fragment is a named `static` — but `Seq` and `Choice` embed their children *by value* in a slice, and a cyclic value has no finite representation. - **`MatchedPath` is flat.** It is a `Vec` of matched *terminals* in source order; the combinators shape the order but record no grouping. For every command today that is enough: each command's shape is a fixed template, so the AST builder reads terminals by position or by role. A recursive expression has no fixed template — `where a = 1` and `where (a=1 or b=2) and c=3` have different shapes — so a flat terminal list cannot be rebuilt into the expression tree without parsing it a second time. Left recursion is a third fact, and it is a property of parsing technique rather than of this codebase: a top-down walker cannot consume a rule whose leftmost symbol is the rule itself (`expr := expr OP expr` recurses without consuming input). The standard remedy — a stratified, left-factored grammar — is adopted below. ## Decision ### 1. The expression grammar The `WHERE` expression is a stratified grammar — one layer per precedence tier. Stratification removes left recursion (every recursion is guarded by a token) and encodes operator precedence in the layering, so there is no separate precedence-resolution step. ``` or_expr := and_expr ( OR and_expr )* and_expr := not_expr ( AND not_expr )* not_expr := NOT not_expr | bool_primary bool_primary := ( or_expr ) | predicate predicate := operand cmp_op operand | operand [ NOT ] LIKE operand | operand [ NOT ] BETWEEN operand AND operand | operand [ NOT ] IN ( operand [ , operand ]* ) | operand IS [ NOT ] NULL operand := literal | column_ref cmp_op := = | != | <> | < | <= | > | >= ``` - **Operator set:** the six comparisons, with both `!=` and `<>` accepted (`<>` is standard SQL, `!=` the common variant — the engine accepts both); `AND` / `OR` / `NOT`; parentheses; `LIKE` with `%` / `_` wildcards; `IS NULL` / `IS NOT NULL`; `IN`; `BETWEEN`. `LIKE`, `IN`, and `BETWEEN` take an optional infix `NOT`, mirroring `IS NOT NULL`. - **Operands are a column reference or a literal** — not a nested expression. Parentheses group *boolean* sub-expressions (`bool_primary`), not comparison operands. A bare column reference is not a boolean expression: a predicate always has an operator (write `Active = true`, not `Active`). - The only recursion is `( or_expr )` and `NOT not_expr`; each consumes a token (`(` or `NOT`) before recursing, so the greedy top-down walker always makes progress. - **Nesting depth is capped at 64.** Hand-written `WHERE` clauses do not approach this; the cap exists only so pathological input (`((((…))))`) yields a friendly *"expression nested too deeply (limit 64)"* error rather than a stack overflow. The grammar is deliberately a subset of standard SQL's `WHERE` syntax, so a learner's knowledge transfers directly when advanced-mode SQL (`Q1`) lands. ### 2. Grammar architecture: a reference-following node `Seq` / `Choice` embed children by value and cannot hold a cyclic node. One new `Node` variant closes the gap: ```rust /// Walks the referenced node once, mandatory. Because the /// reference is a `&'static Node`, a named `static` /// fragment may appear inside its own subtree — the /// mechanism that lets the expression grammar recurse. /// (ADR-0023 sketched this as `SubgrammarRef`.) Subgrammar(&'static Node), ``` `Subgrammar` is the static counterpart of the existing `DynamicSubgrammar` (a walk-time factory). The expression grammar's tiers are declared as named `static` items; `bool_primary`'s `( or_expr )` branch reaches `or_expr` through `Subgrammar(&OR_EXPR)`, and `not_expr` reaches itself the same way. The walker gains one match arm — walk the referenced node once — plus the depth counter for the §1 cap. The expression grammar is one fragment, referenced by `update`, `delete`, and `show data` alike — defined once. ### 3. The expression result — built selectively `MatchedPath` — the walker's flat list of matched terminals — is left unchanged. The recursive structure lives only where it is needed: inside the expression. The expression grammar fragment carries its own *AST-fragment builder*. As the walker recurses through the stratified tiers, that builder runs — the walker's recursion *is* the precedence-correct tree, so the builder assembles a nested `Expr` directly, with no second parse and no separate precedence pass. The finished `Expr` is carried as a single item in the otherwise-unchanged flat `MatchedPath` (`MatchedKind` gains one variant to hold a built expression). Every existing command builder is therefore genuinely untouched — the flat path it reads is exactly as before. `update` / `delete` / `show data` take that one expression item and read its `Expr`. This is the "selectively if necessary" option: the parser gains structured output exactly where the grammar is recursive, and nowhere else. A *system-wide* hierarchical `MatchedPath` was considered and rejected — it would record group structure for every command while only the expression consumed it, leaving the non-expression grouping computed but unread, and so untested. The general "a grammar fragment may carry a builder" mechanism introduced here is exercised by its one user; nothing is recorded that nothing reads. ### 4. The `Expr` AST A new recursive expression AST joins the command AST: ```rust pub enum Expr { Or(Vec), And(Vec), Not(Box), Predicate(Predicate), } pub enum Predicate { Compare { left: Operand, op: CompareOp, right: Operand }, Like { target: Operand, pattern: Operand, negated: bool }, Between { target: Operand, low: Operand, high: Operand, negated: bool }, In { target: Operand, items: Vec, negated: bool }, IsNull { target: Operand, negated: bool }, } pub enum Operand { Column(String), Literal(Value) } pub enum CompareOp { Eq, NotEq, Lt, LtEq, Gt, GtEq } ``` `Or` / `And` are n-ary — a flat `a AND b AND c` is one `And` of three. Single-child tiers collapse: a `predicate` reached through the `or → and → not` layers with no connective is just that `Predicate`, not three wrappers. `RowFilter` changes from ```rust RowFilter::Where { column: String, value: Value } ``` to ```rust RowFilter::Where(Expr) ``` for `update` / `delete`. `show data` carries `filter: Option` and `limit: Option`. ### 5. The commands ``` update set ( where | --all-rows ) delete from ( where | --all-rows ) show data [ where ] [ limit ] ``` - `update` / `delete` keep ADR-0014's mandatory where-or-`--all-rows` choice; a complex expression satisfies the `where` side. - **`show data` gains an optional `where`.** Reading every row stays the safe default for a read, so no `--all-rows` opt-in is needed there — the clause is simply optional. - **`show data` gains an optional `limit `** (`` a non-negative integer). When `limit` is present the query is implicitly ordered by the table's primary key, so `limit 20` is a stable "first 20 by primary key" rather than an arbitrary subset — every table created through the DSL has a primary key. Explicit `order by` is out of scope (§10). ### 6. SQL generation The `Expr` is compiled to a parameterised SQL `WHERE` string: - Every literal becomes a `?` placeholder bound as a parameter — never spliced into the SQL text. Identifiers are `quote_ident`-quoted. - A literal compared against a column is converted to that column's storage representation through the existing `bind_for_column` path, exactly as the current `where col = val` does. - Connectives, `NOT`, and parentheses are emitted from the tree structure. - `limit` emits `LIMIT ?` with the bound count, plus the implicit `ORDER BY` over the primary-key column(s) (§5). The application never evaluates the expression itself — the database does, and re-derives precedence from the operators. The expression is "passed through" only in that sense; the raw user text is never forwarded. ### 7. Type handling — permissive and advisory A type mismatch in a comparison is **flagged, not blocked**. This matches the app's ambient-assistance posture (ADR-0022): the tool indicates problems, it does not refuse input. - A literal in `column OP literal` is type-checked against the column. When the types are compatible the literal is converted and bound per the column's type (§6). - When they are **not** compatible — `Name > 5` on a text column, `Age LIKE '5%'` on an int column — the mismatch is surfaced through the existing highlight and hint channels as an error-class annotation, but the command still parses, still submits, and still runs. The literal binds by its own syntactic type and the database's comparison rules take over — which is precisely the behaviour a learner is experimenting to observe. - **This relaxes current behaviour.** Today `bind_for_column` *rejects* a type-mismatched `WHERE` literal; under this ADR it does not. The relaxation is scoped to `WHERE` comparisons. Writes (`insert`, `update … set`) stay strict: STRICT storage genuinely cannot hold a mistyped value, so a mistyped write is a real error, not an experiment. - **`= NULL` / `!= NULL`** is a specific flagged case. It is valid syntax that almost never does what the user intends (in SQL it is never true). The walker special-cases a comparison whose operator is `=` / `!=` and whose operand is the `NULL` literal: it is highlighted as an error, and the hint points at `IS NULL` / `IS NOT NULL`. As with type mismatches it still runs if submitted — a learner who wants to see what `x = NULL` does may. Always-on submit-time signalling of flagged-but-runnable input (an `(INVALID)` / `WARNING` marker at the input field's edge) is a separate, general concern — see §10. ### 8. Completion, hints, highlighting Because the expression is parsed *in-grammar* — not handed to an opaque sub-parser — the ambient-assistance machinery (ADR-0022) works inside an expression with no separate implementation: - column-name completion resolves against the command's table; - value positions carry per-type hints, as they do for the current `where col = val`; - operator keywords (`and`, `or`, `like`, `between`, …) surface as completion candidates where the grammar allows them; - syntax highlighting walks the same tree. Type-mismatch and `= NULL` flagging (§7) surface through these same highlight and hint channels. ### 9. Errors Parse errors continue to route through the existing `ParseError` shape, so ADR-0021's per-command usage help and the hint panel keep working for the new clauses. The depth-cap breach (§1) is a friendly error of the same kind. ### 10. Out of scope - **`ORDER BY`.** `limit` uses implicit primary-key ordering for determinism; explicit `order by` is a clean future addition, tracked separately. - **`LIMIT … OFFSET`**, and `limit` anywhere other than `show data`. - **Operands beyond a column or a literal** — arithmetic (`a + b`), string concatenation, scalar functions, subqueries, `EXISTS`. The playground's `WHERE` compares columns and literals. - **A bare column as a boolean** (`where Active`). - **The input-field validity indicator.** An always- visible `(INVALID)` / `WARNING` marker at the edge of the input field — signalling, before submit, that the current input would error or is flagged — is a general feature spanning every command, not just `WHERE`. It gets its own small ADR; this ADR defines only *what* inside a `WHERE` is flagged. The indicator is to carry two severities: a hard `ERROR` / `INVALID` (the input cannot run) and a softer `WARNING` (it runs but is probably not intended — type mismatches, `= NULL`). - **`CHECK` constraints.** The constraints ADR (`C3`) will reuse this expression grammar; it is not built here. ## Consequences - `C5a` is satisfied; `show data` gains filtering and `limit` (advancing `C5` / `V5`); `QA1` is unblocked; the future `CHECK` constraint has an expression grammar to reuse. - The grammar gains one node variant (`Subgrammar`) and a recursion-depth counter. - `MatchedPath` is unchanged; the expression fragment carries an AST-builder that produces an `Expr`, carried as a single matched item. Existing command builders are untouched. - A new recursive `Expr` AST joins the command AST; `RowFilter` changes from `Where { column, value }` to `Where(Expr)`. - Type-mismatched `WHERE` comparisons change from *refused* to *flagged but runnable* — a deliberate, scoped behaviour change (§7). - Old `history.log` lines using the previous `where col = val` form remain valid — that form is a strict subset of the new grammar — so `replay` is unaffected. (Not a design driver; noted.) - Forward-look toward `Q1`: advanced-mode SQL will be parsed by `sqlparser-rs`, a separate parser, so this `Expr` AST is not literally shared with it. The value of the DSL expression being a SQL subset is pedagogical — learner knowledge transfers — not code reuse. ## Implementation notes A sensible build order, each step guarded by the test suite and the typing-surface matrix: 1. The `Subgrammar` node and the recursion-depth counter — the walker capability for a recursive fragment. `MatchedPath` is unchanged. No user-visible change. 2. The expression grammar fragment, the `Expr` AST, and the fragment-builder the walker invokes to produce the `Expr`. 3. Wire the fragment into `update` / `delete` (replacing the old `where`) and into `show data` (new `where`, new `limit`). 4. `Expr` → parameterised SQL generation; the implicit primary-key `ORDER BY` for `limit`. 5. Schema-aware type-mismatch and `= NULL` flagging in the walker. 6. Typing-surface matrix cells for the new surface. ### As-built notes (2026-05-18) Steps 1–4 are implemented and committed; step 5 (the §7 diagnostic flagging) is deferred — see below. Realization choices, and where they deviate from the design sketch above: - **§3 builder — option 1 ("reconstruct in builder"),** chosen with the project owner before implementation. The stratified grammar is walked normally; its terminals flow into the flat `MatchedPath` unchanged (driving highlight / completion / the expected-set). `grammar::expr::build_expr` then folds that flat terminal slice into the `Expr` — a deterministic recursive descent mirroring the grammar tiers, run only at submit-time dispatch, never per keystroke. Two honest deviations from the §3 wording: - **No `MatchedKind::Expr` variant.** `MatchedPath` stays purely terminals — arguably more faithful to "MatchedPath stays flat" than carrying a built `Expr` in it. The `Expr` is assembled in the command `ast_builder`s (`build_update` / `build_delete` / `build_show`), which already reconstruct structured `Command`s from the flat path; `build_expr` is the same pattern, one tier deeper. - **There is a second structural pass** over the expression tokens, scoped to submit-time dispatch. "No second parse" is read as "no separate parser framework": the walk validates and drives assistance, `build_expr` is the single `ast_builder` for the fragment — the same category as `build_insert`. - **Grammar shape.** `predicate` is factored as `operand predicate_tail` (shared operand prefix), and the infix `NOT` is factored in front of the `LIKE` / `BETWEEN` / `IN` choice — so the walker's first-commit-wins `Choice` semantics discriminate branches on a cleanly-failing first token. - **`Subgrammar` depth.** `MAX_SUBGRAMMAR_DEPTH = 64` counts active `Subgrammar` recursion frames. The stratified grammar descends ~4–5 frames per parenthesis level, so the effective parenthesis-nesting limit is roughly a dozen — far past any hand-written filter; the cap is purely a stack-overflow guard. - **§8 hints.** The expression's right-hand operands resolve through a schema-aware `DynamicSubgrammar` (`where_rhs_ operand`) so the hint panel narrows to the compared column's type, exactly as the pre-ADR `where col = val` slot did. The operand grammar carries no validators — permissive per §7. - **Step 5 — done, as part of ADR-0027.** The §7 *behaviour* relaxation landed with steps 1–4: `bind_where_literal` binds a type-mismatched WHERE literal by its syntactic shape, and the pre-ADR bind-time rejection is gone. The §7 *diagnostic flagging* — a type-mismatched comparison or `= NULL` as a flagged finding — was folded into ADR-0027's walker diagnostics-severity model rather than built as a standalone mechanism (ADR-0027's WARNING severity is defined to have "no triggers until ADR-0026 is implemented" — those triggers are exactly these). It surfaces as a `Severity::Warning` `Diagnostic`, computed post-walk from the built `Expr` against the table's column types — see ADR-0027. ## See also - ADR-0009 — DSL command syntax conventions (`--` flags, keyword clauses). - ADR-0014 — data operations, the `Value` model, `bind_for_column`, the mandatory where-or-`--all-rows` rule, auto-show. - ADR-0021 — the parser as source of truth for H1a parse-error help. - ADR-0022 — ambient typing assistance: the highlight, hint, and completion machinery the expression plugs into. - ADR-0023 / ADR-0024 — the unified grammar tree this extends; ADR-0023 sketched the `SubgrammarRef` node realised here as `Subgrammar`. - ADR-0025 — indexes; the reason `QA1`, and thus a filtered query, is now worthwhile.