Files
rdbms-playground/docs/adr/0026-complex-where-expressions.md
T
claude@clouddev1 a50c6cdf70 WHERE expressions: matrix cells + predicate_tail grammar fix (ADR-0026 step 6)
Adds tests/typing_surface/where_expression.rs — 9 matrix
cells for the complex WHERE / show-data limit typing surface:
operator candidates after an operand, AND / OR after a
predicate, NOT, BETWEEN / IN bounds, and `show data`
where / limit.

Writing the cells surfaced a grammar bug. `predicate_tail`'s
`[NOT] negatable` branch started with `Optional(not)`, and an
Optional-first `Seq` always "commits" — so on an incomplete
input the walker's `Choice` returned that branch's
`Incomplete` early and discarded every sibling branch's
expected set, dropping `is` and the comparison operators from
completion after a column. Fixed by splitting it into
explicit `NOT negatable` and bare `negatable` branches — no
`predicate_tail` branch starts with an `Optional` now. The
matched terminal sequence is unchanged, so `build_expr` is
untouched.

Docs: ADR-0026 gains an "As-built notes" section recording
the option-1 builder realization, its two deviations from the
§3 sketch, and the deferral of §7 diagnostic flagging to
ADR-0027. requirements.md C5a -> [x] (steps 1-4) with the
test baseline refreshed to 1079; CLAUDE.md's deferred list
reconciled (C5a implemented; the QA1/QA2 note now points at
ADR-0028).
2026-05-18 23:19:53 +00:00

496 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0026: Complex WHERE expressions
## Status
Accepted
## Context
The requirements checklist commits, in `C5a`, to *complex
WHERE expressions* — `AND` / `OR`, comparison operators,
`LIKE` — for the `update`, `delete`, and `show data` row
filters. It is described there as the bridge from DSL
fluency toward real SQL. Today the DSL is well short of
that:
- The only filter the DSL parses is a single
`where <col> = <val>` equality. There is no `AND` / `OR`,
no operator other than `=`, no `LIKE`, no `IS NULL`, no
parentheses.
- `update` and `delete` carry that filter as
`RowFilter::Where { column, value }`. `show data <T>`
carries **no filter at all** — it always selects every
row.
- The `Value` AST is purely syntactic (`Number`, `Text`,
`Bool`, `Null`); per-column type handling happens at
bind time in `db.rs`.
Three things make this the right moment, and shape the
decision:
1. **`QA1` (`EXPLAIN QUERY PLAN`) needs a filtered query.**
An unfiltered `SELECT * FROM t` always plans as a full
scan; an index can never appear. QA1's pedagogical
payoff depends on a `WHERE` whose plan flips between a
scan and an index search.
2. **`CHECK` constraints (`C3`) need an expression
grammar.** A `CHECK (Age >= 0 AND Age < 150)` is the
same expression problem. A throwaway mini-grammar for
`CHECK` plus a second one for `WHERE` would be waste;
this ADR builds the grammar once.
3. **The grammar architecture must grow to host it.** The
unified walker grammar (ADR-0023 / ADR-0024) is a
non-recursive trie of `&'static` `Node`s, and its parse
output (`MatchedPath`) is a flat list of matched
terminals. A `WHERE` expression is recursive and its
shape is data-dependent — neither fits as the grammar
stands.
### The architectural problem, precisely
A boolean expression — `a = 1 AND (b > 2 OR c LIKE 'x%')`
— is recursive (a parenthesised group is itself an
expression) and carries operator precedence. Two facts
about the current walker:
- **The `Node` tree is acyclic.** Every combinator
references its children through `&'static [Node]` /
`&'static Node`, and the registry lives in `const`s. A
`const` cannot refer to itself, so no node can close a
cycle. `Optional` and `Repeated` already hold a
`&'static Node` *reference* — recursion through a
reference is expressible if the fragment is a named
`static` — but `Seq` and `Choice` embed their children
*by value* in a slice, and a cyclic value has no finite
representation.
- **`MatchedPath` is flat.** It is a `Vec<MatchedItem>` of
matched *terminals* in source order; the combinators
shape the order but record no grouping. For every
command today that is enough: each command's shape is a
fixed template, so the AST builder reads terminals by
position or by role. A recursive expression has no fixed
template — `where a = 1` and
`where (a=1 or b=2) and c=3` have different shapes — so a
flat terminal list cannot be rebuilt into the expression
tree without parsing it a second time.
Left recursion is a third fact, and it is a property of
parsing technique rather than of this codebase: a
top-down walker cannot consume a rule whose leftmost
symbol is the rule itself (`expr := expr OP expr` recurses
without consuming input). The standard remedy — a
stratified, left-factored grammar — is adopted below.
## Decision
### 1. The expression grammar
The `WHERE` expression is a stratified grammar — one layer
per precedence tier. Stratification removes left recursion
(every recursion is guarded by a token) and encodes
operator precedence in the layering, so there is no
separate precedence-resolution step.
```
or_expr := and_expr ( OR and_expr )*
and_expr := not_expr ( AND not_expr )*
not_expr := NOT not_expr | bool_primary
bool_primary := ( or_expr ) | predicate
predicate := operand cmp_op operand
| operand [ NOT ] LIKE operand
| operand [ NOT ] BETWEEN operand AND operand
| operand [ NOT ] IN ( operand [ , operand ]* )
| operand IS [ NOT ] NULL
operand := literal | column_ref
cmp_op := = | != | <> | < | <= | > | >=
```
- **Operator set:** the six comparisons, with both `!=`
and `<>` accepted (`<>` is standard SQL, `!=` the common
variant — the engine accepts both); `AND` / `OR` / `NOT`;
parentheses; `LIKE` with `%` / `_` wildcards;
`IS NULL` / `IS NOT NULL`; `IN`; `BETWEEN`. `LIKE`,
`IN`, and `BETWEEN` take an optional infix `NOT`,
mirroring `IS NOT NULL`.
- **Operands are a column reference or a literal** — not a
nested expression. Parentheses group *boolean*
sub-expressions (`bool_primary`), not comparison
operands. A bare column reference is not a boolean
expression: a predicate always has an operator (write
`Active = true`, not `Active`).
- The only recursion is `( or_expr )` and `NOT not_expr`;
each consumes a token (`(` or `NOT`) before recursing,
so the greedy top-down walker always makes progress.
- **Nesting depth is capped at 64.** Hand-written `WHERE`
clauses do not approach this; the cap exists only so
pathological input (`((((…))))`) yields a friendly
*"expression nested too deeply (limit 64)"* error rather
than a stack overflow.
The grammar is deliberately a subset of standard SQL's
`WHERE` syntax, so a learner's knowledge transfers
directly when advanced-mode SQL (`Q1`) lands.
### 2. Grammar architecture: a reference-following node
`Seq` / `Choice` embed children by value and cannot hold a
cyclic node. One new `Node` variant closes the gap:
```rust
/// Walks the referenced node once, mandatory. Because the
/// reference is a `&'static Node`, a named `static`
/// fragment may appear inside its own subtree — the
/// mechanism that lets the expression grammar recurse.
/// (ADR-0023 sketched this as `SubgrammarRef`.)
Subgrammar(&'static Node),
```
`Subgrammar` is the static counterpart of the existing
`DynamicSubgrammar` (a walk-time factory). The expression
grammar's tiers are declared as named `static` items;
`bool_primary`'s `( or_expr )` branch reaches `or_expr`
through `Subgrammar(&OR_EXPR)`, and `not_expr` reaches
itself the same way. The walker gains one match arm — walk
the referenced node once — plus the depth counter for the
§1 cap.
The expression grammar is one fragment, referenced by
`update`, `delete`, and `show data` alike — defined once.
### 3. The expression result — built selectively
`MatchedPath` — the walker's flat list of matched
terminals — is left unchanged. The recursive structure
lives only where it is needed: inside the expression.
The expression grammar fragment carries its own
*AST-fragment builder*. As the walker recurses through the
stratified tiers, that builder runs — the walker's
recursion *is* the precedence-correct tree, so the builder
assembles a nested `Expr` directly, with no second parse
and no separate precedence pass. The finished `Expr` is
carried as a single item in the otherwise-unchanged flat
`MatchedPath` (`MatchedKind` gains one variant to hold a
built expression).
Every existing command builder is therefore genuinely
untouched — the flat path it reads is exactly as before.
`update` / `delete` / `show data` take that one expression
item and read its `Expr`.
This is the "selectively if necessary" option: the parser
gains structured output exactly where the grammar is
recursive, and nowhere else. A *system-wide* hierarchical
`MatchedPath` was considered and rejected — it would
record group structure for every command while only the
expression consumed it, leaving the non-expression
grouping computed but unread, and so untested. The general
"a grammar fragment may carry a builder" mechanism
introduced here is exercised by its one user; nothing is
recorded that nothing reads.
### 4. The `Expr` AST
A new recursive expression AST joins the command AST:
```rust
pub enum Expr {
Or(Vec<Expr>),
And(Vec<Expr>),
Not(Box<Expr>),
Predicate(Predicate),
}
pub enum Predicate {
Compare { left: Operand, op: CompareOp, right: Operand },
Like { target: Operand, pattern: Operand, negated: bool },
Between { target: Operand, low: Operand, high: Operand, negated: bool },
In { target: Operand, items: Vec<Operand>, negated: bool },
IsNull { target: Operand, negated: bool },
}
pub enum Operand { Column(String), Literal(Value) }
pub enum CompareOp { Eq, NotEq, Lt, LtEq, Gt, GtEq }
```
`Or` / `And` are n-ary — a flat `a AND b AND c` is one
`And` of three. Single-child tiers collapse: a `predicate`
reached through the `or → and → not` layers with no
connective is just that `Predicate`, not three wrappers.
`RowFilter` changes from
```rust
RowFilter::Where { column: String, value: Value }
```
to
```rust
RowFilter::Where(Expr)
```
for `update` / `delete`. `show data` carries
`filter: Option<Expr>` and `limit: Option<u64>`.
### 5. The commands
```
update <T> set <assignments> ( where <expr> | --all-rows )
delete from <T> ( where <expr> | --all-rows )
show data <T> [ where <expr> ] [ limit <n> ]
```
- `update` / `delete` keep ADR-0014's mandatory
where-or-`--all-rows` choice; a complex expression
satisfies the `where` side.
- **`show data` gains an optional `where`.** Reading every
row stays the safe default for a read, so no `--all-rows`
opt-in is needed there — the clause is simply optional.
- **`show data` gains an optional `limit <n>`** (`<n>` a
non-negative integer). When `limit` is present the query
is implicitly ordered by the table's primary key, so
`limit 20` is a stable "first 20 by primary key" rather
than an arbitrary subset — every table created through
the DSL has a primary key. Explicit `order by` is out of
scope (§10).
### 6. SQL generation
The `Expr` is compiled to a parameterised SQL `WHERE`
string:
- Every literal becomes a `?` placeholder bound as a
parameter — never spliced into the SQL text.
Identifiers are `quote_ident`-quoted.
- A literal compared against a column is converted to that
column's storage representation through the existing
`bind_for_column` path, exactly as the current
`where col = val` does.
- Connectives, `NOT`, and parentheses are emitted from the
tree structure.
- `limit` emits `LIMIT ?` with the bound count, plus the
implicit `ORDER BY` over the primary-key column(s) (§5).
The application never evaluates the expression itself —
the database does, and re-derives precedence from the
operators. The expression is "passed through" only in
that sense; the raw user text is never forwarded.
### 7. Type handling — permissive and advisory
A type mismatch in a comparison is **flagged, not
blocked**. This matches the app's ambient-assistance
posture (ADR-0022): the tool indicates problems, it does
not refuse input.
- A literal in `column OP literal` is type-checked against
the column. When the types are compatible the literal is
converted and bound per the column's type (§6).
- When they are **not** compatible — `Name > 5` on a text
column, `Age LIKE '5%'` on an int column — the mismatch
is surfaced through the existing highlight and hint
channels as an error-class annotation, but the command
still parses, still submits, and still runs. The literal
binds by its own syntactic type and the database's
comparison rules take over — which is precisely the
behaviour a learner is experimenting to observe.
- **This relaxes current behaviour.** Today
`bind_for_column` *rejects* a type-mismatched `WHERE`
literal; under this ADR it does not. The relaxation is
scoped to `WHERE` comparisons. Writes (`insert`,
`update … set`) stay strict: STRICT storage genuinely
cannot hold a mistyped value, so a mistyped write is a
real error, not an experiment.
- **`= NULL` / `!= NULL`** is a specific flagged case. It
is valid syntax that almost never does what the user
intends (in SQL it is never true). The walker
special-cases a comparison whose operator is `=` / `!=`
and whose operand is the `NULL` literal: it is
highlighted as an error, and the hint points at
`IS NULL` / `IS NOT NULL`. As with type mismatches it
still runs if submitted — a learner who wants to see
what `x = NULL` does may.
Always-on submit-time signalling of flagged-but-runnable
input (an `(INVALID)` / `WARNING` marker at the input
field's edge) is a separate, general concern — see §10.
### 8. Completion, hints, highlighting
Because the expression is parsed *in-grammar* — not handed
to an opaque sub-parser — the ambient-assistance machinery
(ADR-0022) works inside an expression with no separate
implementation:
- column-name completion resolves against the command's
table;
- value positions carry per-type hints, as they do for the
current `where col = val`;
- operator keywords (`and`, `or`, `like`, `between`, …)
surface as completion candidates where the grammar
allows them;
- syntax highlighting walks the same tree.
Type-mismatch and `= NULL` flagging (§7) surface through
these same highlight and hint channels.
### 9. Errors
Parse errors continue to route through the existing
`ParseError` shape, so ADR-0021's per-command usage help
and the hint panel keep working for the new clauses. The
depth-cap breach (§1) is a friendly error of the same
kind.
### 10. Out of scope
- **`ORDER BY`.** `limit` uses implicit primary-key
ordering for determinism; explicit `order by` is a clean
future addition, tracked separately.
- **`LIMIT … OFFSET`**, and `limit` anywhere other than
`show data`.
- **Operands beyond a column or a literal** — arithmetic
(`a + b`), string concatenation, scalar functions,
subqueries, `EXISTS`. The playground's `WHERE` compares
columns and literals.
- **A bare column as a boolean** (`where Active`).
- **The input-field validity indicator.** An always-
visible `(INVALID)` / `WARNING` marker at the edge of
the input field — signalling, before submit, that the
current input would error or is flagged — is a general
feature spanning every command, not just `WHERE`. It
gets its own small ADR; this ADR defines only *what*
inside a `WHERE` is flagged. The indicator is to carry
two severities: a hard `ERROR` / `INVALID` (the input
cannot run) and a softer `WARNING` (it runs but is
probably not intended — type mismatches, `= NULL`).
- **`CHECK` constraints.** The constraints ADR (`C3`) will
reuse this expression grammar; it is not built here.
## Consequences
- `C5a` is satisfied; `show data` gains filtering and
`limit` (advancing `C5` / `V5`); `QA1` is unblocked; the
future `CHECK` constraint has an expression grammar to
reuse.
- The grammar gains one node variant (`Subgrammar`) and a
recursion-depth counter.
- `MatchedPath` is unchanged; the expression fragment
carries an AST-builder that produces an `Expr`, carried
as a single matched item. Existing command builders are
untouched.
- A new recursive `Expr` AST joins the command AST;
`RowFilter` changes from `Where { column, value }` to
`Where(Expr)`.
- Type-mismatched `WHERE` comparisons change from
*refused* to *flagged but runnable* — a deliberate,
scoped behaviour change (§7).
- Old `history.log` lines using the previous
`where col = val` form remain valid — that form is a
strict subset of the new grammar — so `replay` is
unaffected. (Not a design driver; noted.)
- Forward-look toward `Q1`: advanced-mode SQL will be
parsed by `sqlparser-rs`, a separate parser, so this
`Expr` AST is not literally shared with it. The value of
the DSL expression being a SQL subset is pedagogical —
learner knowledge transfers — not code reuse.
## Implementation notes
A sensible build order, each step guarded by the test
suite and the typing-surface matrix:
1. The `Subgrammar` node and the recursion-depth counter
— the walker capability for a recursive fragment.
`MatchedPath` is unchanged. No user-visible change.
2. The expression grammar fragment, the `Expr` AST, and
the fragment-builder the walker invokes to produce the
`Expr`.
3. Wire the fragment into `update` / `delete` (replacing
the old `where`) and into `show data` (new `where`,
new `limit`).
4. `Expr` → parameterised SQL generation; the implicit
primary-key `ORDER BY` for `limit`.
5. Schema-aware type-mismatch and `= NULL` flagging in the
walker.
6. Typing-surface matrix cells for the new surface.
### As-built notes (2026-05-18)
Steps 14 are implemented and committed; step 5 (the §7
diagnostic flagging) is deferred — see below. Realization
choices, and where they deviate from the design sketch above:
- **§3 builder — option 1 ("reconstruct in builder"),**
chosen with the project owner before implementation. The
stratified grammar is walked normally; its terminals flow
into the flat `MatchedPath` unchanged (driving highlight /
completion / the expected-set). `grammar::expr::build_expr`
then folds that flat terminal slice into the `Expr` — a
deterministic recursive descent mirroring the grammar
tiers, run only at submit-time dispatch, never per
keystroke. Two honest deviations from the §3 wording:
- **No `MatchedKind::Expr` variant.** `MatchedPath` stays
purely terminals — arguably more faithful to "MatchedPath
stays flat" than carrying a built `Expr` in it. The
`Expr` is assembled in the command `ast_builder`s
(`build_update` / `build_delete` / `build_show`), which
already reconstruct structured `Command`s from the flat
path; `build_expr` is the same pattern, one tier deeper.
- **There is a second structural pass** over the
expression tokens, scoped to submit-time dispatch. "No
second parse" is read as "no separate parser framework":
the walk validates and drives assistance, `build_expr`
is the single `ast_builder` for the fragment — the same
category as `build_insert`.
- **Grammar shape.** `predicate` is factored as `operand
predicate_tail` (shared operand prefix), and the infix
`NOT` is factored in front of the `LIKE` / `BETWEEN` /
`IN` choice — so the walker's first-commit-wins `Choice`
semantics discriminate branches on a cleanly-failing first
token.
- **`Subgrammar` depth.** `MAX_SUBGRAMMAR_DEPTH = 64` counts
active `Subgrammar` recursion frames. The stratified
grammar descends ~45 frames per parenthesis level, so the
effective parenthesis-nesting limit is roughly a dozen —
far past any hand-written filter; the cap is purely a
stack-overflow guard.
- **§8 hints.** The expression's right-hand operands resolve
through a schema-aware `DynamicSubgrammar` (`where_rhs_
operand`) so the hint panel narrows to the compared
column's type, exactly as the pre-ADR `where col = val`
slot did. The operand grammar carries no validators —
permissive per §7.
- **Step 5 deferred to ADR-0027.** The §7 *behaviour*
relaxation is done: `bind_where_literal` binds a
type-mismatched WHERE literal by its syntactic shape, and
the pre-ADR bind-time rejection is gone. The §7
*diagnostic flagging* — surfacing a type-mismatched
comparison or `= NULL` as an error-class highlight + hint
— is the seam with ADR-0027, which designs the walker
diagnostics-severity model that flagging belongs in (and
whose WARNING severity is defined to have "no triggers
until ADR-0026 is implemented"). Building the flagging as
a standalone mechanism first would be reworked by
ADR-0027; the recommendation is to implement it as the
first triggers of ADR-0027's model.
## See also
- ADR-0009 — DSL command syntax conventions (`--` flags,
keyword clauses).
- ADR-0014 — data operations, the `Value` model,
`bind_for_column`, the mandatory where-or-`--all-rows`
rule, auto-show.
- ADR-0021 — the parser as source of truth for H1a
parse-error help.
- ADR-0022 — ambient typing assistance: the highlight,
hint, and completion machinery the expression plugs
into.
- ADR-0023 / ADR-0024 — the unified grammar tree this
extends; ADR-0023 sketched the `SubgrammarRef` node
realised here as `Subgrammar`.
- ADR-0025 — indexes; the reason `QA1`, and thus a
filtered query, is now worthwhile.