Files
rdbms-playground/docs/adr/0026-complex-where-expressions.md
T
claude@clouddev1 a3268495e2 ADR-0027: existing-cases sweep + docs (step F)
Sweep: input_verdict tests confirm the schema-existence check
fires across the identifier-taking commands — unknown table
on drop / show / add column, unknown column on drop column /
update — and that known references stay clean. The Step B
check is grammar-generic, so this is verification + coverage
rather than new code.

Docs: requirements.md S6 -> [x], baseline 1096; CLAUDE.md
deferred list reconciled (C5a and S6 are done — removed);
ADR-0026's as-built note updated (step 5 shipped via
ADR-0027); ADR-0027 gains an As-built notes section
recording the post-walk diagnostics realization, the
pre-rendered message, the timeout-based debounce, coarse
WARNING spans, and the deferred highlight/hint wiring.
2026-05-19 07:35:06 +00:00

20 KiB
Raw Permalink Blame History

ADR-0026: Complex WHERE expressions

Status

Accepted

Context

The requirements checklist commits, in C5a, to complex WHERE expressionsAND / OR, comparison operators, LIKE — for the update, delete, and show data row filters. It is described there as the bridge from DSL fluency toward real SQL. Today the DSL is well short of that:

  • The only filter the DSL parses is a single where <col> = <val> equality. There is no AND / OR, no operator other than =, no LIKE, no IS NULL, no parentheses.
  • update and delete carry that filter as RowFilter::Where { column, value }. show data <T> carries no filter at all — it always selects every row.
  • The Value AST is purely syntactic (Number, Text, Bool, Null); per-column type handling happens at bind time in db.rs.

Three things make this the right moment, and shape the decision:

  1. QA1 (EXPLAIN QUERY PLAN) needs a filtered query. An unfiltered SELECT * FROM t always plans as a full scan; an index can never appear. QA1's pedagogical payoff depends on a WHERE whose plan flips between a scan and an index search.
  2. CHECK constraints (C3) need an expression grammar. A CHECK (Age >= 0 AND Age < 150) is the same expression problem. A throwaway mini-grammar for CHECK plus a second one for WHERE would be waste; this ADR builds the grammar once.
  3. The grammar architecture must grow to host it. The unified walker grammar (ADR-0023 / ADR-0024) is a non-recursive trie of &'static Nodes, and its parse output (MatchedPath) is a flat list of matched terminals. A WHERE expression is recursive and its shape is data-dependent — neither fits as the grammar stands.

The architectural problem, precisely

A boolean expression — a = 1 AND (b > 2 OR c LIKE 'x%') — is recursive (a parenthesised group is itself an expression) and carries operator precedence. Two facts about the current walker:

  • The Node tree is acyclic. Every combinator references its children through &'static [Node] / &'static Node, and the registry lives in consts. A const cannot refer to itself, so no node can close a cycle. Optional and Repeated already hold a &'static Node reference — recursion through a reference is expressible if the fragment is a named static — but Seq and Choice embed their children by value in a slice, and a cyclic value has no finite representation.
  • MatchedPath is flat. It is a Vec<MatchedItem> of matched terminals in source order; the combinators shape the order but record no grouping. For every command today that is enough: each command's shape is a fixed template, so the AST builder reads terminals by position or by role. A recursive expression has no fixed template — where a = 1 and where (a=1 or b=2) and c=3 have different shapes — so a flat terminal list cannot be rebuilt into the expression tree without parsing it a second time.

Left recursion is a third fact, and it is a property of parsing technique rather than of this codebase: a top-down walker cannot consume a rule whose leftmost symbol is the rule itself (expr := expr OP expr recurses without consuming input). The standard remedy — a stratified, left-factored grammar — is adopted below.

Decision

1. The expression grammar

The WHERE expression is a stratified grammar — one layer per precedence tier. Stratification removes left recursion (every recursion is guarded by a token) and encodes operator precedence in the layering, so there is no separate precedence-resolution step.

or_expr      := and_expr ( OR and_expr )*
and_expr     := not_expr ( AND not_expr )*
not_expr     := NOT not_expr | bool_primary
bool_primary := ( or_expr ) | predicate
predicate    := operand cmp_op operand
              | operand [ NOT ] LIKE operand
              | operand [ NOT ] BETWEEN operand AND operand
              | operand [ NOT ] IN ( operand [ , operand ]* )
              | operand IS [ NOT ] NULL
operand      := literal | column_ref
cmp_op       := =  |  !=  |  <>  |  <  |  <=  |  >  |  >=
  • Operator set: the six comparisons, with both != and <> accepted (<> is standard SQL, != the common variant — the engine accepts both); AND / OR / NOT; parentheses; LIKE with % / _ wildcards; IS NULL / IS NOT NULL; IN; BETWEEN. LIKE, IN, and BETWEEN take an optional infix NOT, mirroring IS NOT NULL.
  • Operands are a column reference or a literal — not a nested expression. Parentheses group boolean sub-expressions (bool_primary), not comparison operands. A bare column reference is not a boolean expression: a predicate always has an operator (write Active = true, not Active).
  • The only recursion is ( or_expr ) and NOT not_expr; each consumes a token (( or NOT) before recursing, so the greedy top-down walker always makes progress.
  • Nesting depth is capped at 64. Hand-written WHERE clauses do not approach this; the cap exists only so pathological input (((((…))))) yields a friendly "expression nested too deeply (limit 64)" error rather than a stack overflow.

The grammar is deliberately a subset of standard SQL's WHERE syntax, so a learner's knowledge transfers directly when advanced-mode SQL (Q1) lands.

2. Grammar architecture: a reference-following node

Seq / Choice embed children by value and cannot hold a cyclic node. One new Node variant closes the gap:

/// Walks the referenced node once, mandatory. Because the
/// reference is a `&'static Node`, a named `static`
/// fragment may appear inside its own subtree — the
/// mechanism that lets the expression grammar recurse.
/// (ADR-0023 sketched this as `SubgrammarRef`.)
Subgrammar(&'static Node),

Subgrammar is the static counterpart of the existing DynamicSubgrammar (a walk-time factory). The expression grammar's tiers are declared as named static items; bool_primary's ( or_expr ) branch reaches or_expr through Subgrammar(&OR_EXPR), and not_expr reaches itself the same way. The walker gains one match arm — walk the referenced node once — plus the depth counter for the §1 cap.

The expression grammar is one fragment, referenced by update, delete, and show data alike — defined once.

3. The expression result — built selectively

MatchedPath — the walker's flat list of matched terminals — is left unchanged. The recursive structure lives only where it is needed: inside the expression.

The expression grammar fragment carries its own AST-fragment builder. As the walker recurses through the stratified tiers, that builder runs — the walker's recursion is the precedence-correct tree, so the builder assembles a nested Expr directly, with no second parse and no separate precedence pass. The finished Expr is carried as a single item in the otherwise-unchanged flat MatchedPath (MatchedKind gains one variant to hold a built expression).

Every existing command builder is therefore genuinely untouched — the flat path it reads is exactly as before. update / delete / show data take that one expression item and read its Expr.

This is the "selectively if necessary" option: the parser gains structured output exactly where the grammar is recursive, and nowhere else. A system-wide hierarchical MatchedPath was considered and rejected — it would record group structure for every command while only the expression consumed it, leaving the non-expression grouping computed but unread, and so untested. The general "a grammar fragment may carry a builder" mechanism introduced here is exercised by its one user; nothing is recorded that nothing reads.

4. The Expr AST

A new recursive expression AST joins the command AST:

pub enum Expr {
    Or(Vec<Expr>),
    And(Vec<Expr>),
    Not(Box<Expr>),
    Predicate(Predicate),
}

pub enum Predicate {
    Compare { left: Operand, op: CompareOp, right: Operand },
    Like    { target: Operand, pattern: Operand, negated: bool },
    Between { target: Operand, low: Operand, high: Operand, negated: bool },
    In      { target: Operand, items: Vec<Operand>, negated: bool },
    IsNull  { target: Operand, negated: bool },
}

pub enum Operand   { Column(String), Literal(Value) }
pub enum CompareOp { Eq, NotEq, Lt, LtEq, Gt, GtEq }

Or / And are n-ary — a flat a AND b AND c is one And of three. Single-child tiers collapse: a predicate reached through the or → and → not layers with no connective is just that Predicate, not three wrappers.

RowFilter changes from

RowFilter::Where { column: String, value: Value }

to

RowFilter::Where(Expr)

for update / delete. show data carries filter: Option<Expr> and limit: Option<u64>.

5. The commands

update <T> set <assignments> ( where <expr> | --all-rows )
delete from <T>               ( where <expr> | --all-rows )
show data <T> [ where <expr> ] [ limit <n> ]
  • update / delete keep ADR-0014's mandatory where-or---all-rows choice; a complex expression satisfies the where side.
  • show data gains an optional where. Reading every row stays the safe default for a read, so no --all-rows opt-in is needed there — the clause is simply optional.
  • show data gains an optional limit <n> (<n> a non-negative integer). When limit is present the query is implicitly ordered by the table's primary key, so limit 20 is a stable "first 20 by primary key" rather than an arbitrary subset — every table created through the DSL has a primary key. Explicit order by is out of scope (§10).

6. SQL generation

The Expr is compiled to a parameterised SQL WHERE string:

  • Every literal becomes a ? placeholder bound as a parameter — never spliced into the SQL text. Identifiers are quote_ident-quoted.
  • A literal compared against a column is converted to that column's storage representation through the existing bind_for_column path, exactly as the current where col = val does.
  • Connectives, NOT, and parentheses are emitted from the tree structure.
  • limit emits LIMIT ? with the bound count, plus the implicit ORDER BY over the primary-key column(s) (§5).

The application never evaluates the expression itself — the database does, and re-derives precedence from the operators. The expression is "passed through" only in that sense; the raw user text is never forwarded.

7. Type handling — permissive and advisory

A type mismatch in a comparison is flagged, not blocked. This matches the app's ambient-assistance posture (ADR-0022): the tool indicates problems, it does not refuse input.

  • A literal in column OP literal is type-checked against the column. When the types are compatible the literal is converted and bound per the column's type (§6).
  • When they are not compatible — Name > 5 on a text column, Age LIKE '5%' on an int column — the mismatch is surfaced through the existing highlight and hint channels as an error-class annotation, but the command still parses, still submits, and still runs. The literal binds by its own syntactic type and the database's comparison rules take over — which is precisely the behaviour a learner is experimenting to observe.
  • This relaxes current behaviour. Today bind_for_column rejects a type-mismatched WHERE literal; under this ADR it does not. The relaxation is scoped to WHERE comparisons. Writes (insert, update … set) stay strict: STRICT storage genuinely cannot hold a mistyped value, so a mistyped write is a real error, not an experiment.
  • = NULL / != NULL is a specific flagged case. It is valid syntax that almost never does what the user intends (in SQL it is never true). The walker special-cases a comparison whose operator is = / != and whose operand is the NULL literal: it is highlighted as an error, and the hint points at IS NULL / IS NOT NULL. As with type mismatches it still runs if submitted — a learner who wants to see what x = NULL does may.

Always-on submit-time signalling of flagged-but-runnable input (an (INVALID) / WARNING marker at the input field's edge) is a separate, general concern — see §10.

8. Completion, hints, highlighting

Because the expression is parsed in-grammar — not handed to an opaque sub-parser — the ambient-assistance machinery (ADR-0022) works inside an expression with no separate implementation:

  • column-name completion resolves against the command's table;
  • value positions carry per-type hints, as they do for the current where col = val;
  • operator keywords (and, or, like, between, …) surface as completion candidates where the grammar allows them;
  • syntax highlighting walks the same tree.

Type-mismatch and = NULL flagging (§7) surface through these same highlight and hint channels.

9. Errors

Parse errors continue to route through the existing ParseError shape, so ADR-0021's per-command usage help and the hint panel keep working for the new clauses. The depth-cap breach (§1) is a friendly error of the same kind.

10. Out of scope

  • ORDER BY. limit uses implicit primary-key ordering for determinism; explicit order by is a clean future addition, tracked separately.
  • LIMIT … OFFSET, and limit anywhere other than show data.
  • Operands beyond a column or a literal — arithmetic (a + b), string concatenation, scalar functions, subqueries, EXISTS. The playground's WHERE compares columns and literals.
  • A bare column as a boolean (where Active).
  • The input-field validity indicator. An always- visible (INVALID) / WARNING marker at the edge of the input field — signalling, before submit, that the current input would error or is flagged — is a general feature spanning every command, not just WHERE. It gets its own small ADR; this ADR defines only what inside a WHERE is flagged. The indicator is to carry two severities: a hard ERROR / INVALID (the input cannot run) and a softer WARNING (it runs but is probably not intended — type mismatches, = NULL).
  • CHECK constraints. The constraints ADR (C3) will reuse this expression grammar; it is not built here.

Consequences

  • C5a is satisfied; show data gains filtering and limit (advancing C5 / V5); QA1 is unblocked; the future CHECK constraint has an expression grammar to reuse.
  • The grammar gains one node variant (Subgrammar) and a recursion-depth counter.
  • MatchedPath is unchanged; the expression fragment carries an AST-builder that produces an Expr, carried as a single matched item. Existing command builders are untouched.
  • A new recursive Expr AST joins the command AST; RowFilter changes from Where { column, value } to Where(Expr).
  • Type-mismatched WHERE comparisons change from refused to flagged but runnable — a deliberate, scoped behaviour change (§7).
  • Old history.log lines using the previous where col = val form remain valid — that form is a strict subset of the new grammar — so replay is unaffected. (Not a design driver; noted.)
  • Forward-look toward Q1: advanced-mode SQL will be parsed by sqlparser-rs, a separate parser, so this Expr AST is not literally shared with it. The value of the DSL expression being a SQL subset is pedagogical — learner knowledge transfers — not code reuse.

Implementation notes

A sensible build order, each step guarded by the test suite and the typing-surface matrix:

  1. The Subgrammar node and the recursion-depth counter — the walker capability for a recursive fragment. MatchedPath is unchanged. No user-visible change.
  2. The expression grammar fragment, the Expr AST, and the fragment-builder the walker invokes to produce the Expr.
  3. Wire the fragment into update / delete (replacing the old where) and into show data (new where, new limit).
  4. Expr → parameterised SQL generation; the implicit primary-key ORDER BY for limit.
  5. Schema-aware type-mismatch and = NULL flagging in the walker.
  6. Typing-surface matrix cells for the new surface.

As-built notes (2026-05-18)

Steps 14 are implemented and committed; step 5 (the §7 diagnostic flagging) is deferred — see below. Realization choices, and where they deviate from the design sketch above:

  • §3 builder — option 1 ("reconstruct in builder"), chosen with the project owner before implementation. The stratified grammar is walked normally; its terminals flow into the flat MatchedPath unchanged (driving highlight / completion / the expected-set). grammar::expr::build_expr then folds that flat terminal slice into the Expr — a deterministic recursive descent mirroring the grammar tiers, run only at submit-time dispatch, never per keystroke. Two honest deviations from the §3 wording:
    • No MatchedKind::Expr variant. MatchedPath stays purely terminals — arguably more faithful to "MatchedPath stays flat" than carrying a built Expr in it. The Expr is assembled in the command ast_builders (build_update / build_delete / build_show), which already reconstruct structured Commands from the flat path; build_expr is the same pattern, one tier deeper.
    • There is a second structural pass over the expression tokens, scoped to submit-time dispatch. "No second parse" is read as "no separate parser framework": the walk validates and drives assistance, build_expr is the single ast_builder for the fragment — the same category as build_insert.
  • Grammar shape. predicate is factored as operand predicate_tail (shared operand prefix), and the infix NOT is factored in front of the LIKE / BETWEEN / IN choice — so the walker's first-commit-wins Choice semantics discriminate branches on a cleanly-failing first token.
  • Subgrammar depth. MAX_SUBGRAMMAR_DEPTH = 64 counts active Subgrammar recursion frames. The stratified grammar descends ~45 frames per parenthesis level, so the effective parenthesis-nesting limit is roughly a dozen — far past any hand-written filter; the cap is purely a stack-overflow guard.
  • §8 hints. The expression's right-hand operands resolve through a schema-aware DynamicSubgrammar (where_rhs_ operand) so the hint panel narrows to the compared column's type, exactly as the pre-ADR where col = val slot did. The operand grammar carries no validators — permissive per §7.
  • Step 5 — done, as part of ADR-0027. The §7 behaviour relaxation landed with steps 14: bind_where_literal binds a type-mismatched WHERE literal by its syntactic shape, and the pre-ADR bind-time rejection is gone. The §7 diagnostic flagging — a type-mismatched comparison or = NULL as a flagged finding — was folded into ADR-0027's walker diagnostics-severity model rather than built as a standalone mechanism (ADR-0027's WARNING severity is defined to have "no triggers until ADR-0026 is implemented" — those triggers are exactly these). It surfaces as a Severity::Warning Diagnostic, computed post-walk from the built Expr against the table's column types — see ADR-0027.

See also

  • ADR-0009 — DSL command syntax conventions (-- flags, keyword clauses).
  • ADR-0014 — data operations, the Value model, bind_for_column, the mandatory where-or---all-rows rule, auto-show.
  • ADR-0021 — the parser as source of truth for H1a parse-error help.
  • ADR-0022 — ambient typing assistance: the highlight, hint, and completion machinery the expression plugs into.
  • ADR-0023 / ADR-0024 — the unified grammar tree this extends; ADR-0023 sketched the SubgrammarRef node realised here as Subgrammar.
  • ADR-0025 — indexes; the reason QA1, and thus a filtered query, is now worthwhile.