Files
rdbms-playground/docs/adr/0026-complex-where-expressions.md
T
claude@clouddev1 a3268495e2 ADR-0027: existing-cases sweep + docs (step F)
Sweep: input_verdict tests confirm the schema-existence check
fires across the identifier-taking commands — unknown table
on drop / show / add column, unknown column on drop column /
update — and that known references stay clean. The Step B
check is grammar-generic, so this is verification + coverage
rather than new code.

Docs: requirements.md S6 -> [x], baseline 1096; CLAUDE.md
deferred list reconciled (C5a and S6 are done — removed);
ADR-0026's as-built note updated (step 5 shipped via
ADR-0027); ADR-0027 gains an As-built notes section
recording the post-walk diagnostics realization, the
pre-rendered message, the timeout-based debounce, coarse
WARNING spans, and the deferred highlight/hint wiring.
2026-05-19 07:35:06 +00:00

497 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0026: Complex WHERE expressions
## Status
Accepted
## Context
The requirements checklist commits, in `C5a`, to *complex
WHERE expressions* — `AND` / `OR`, comparison operators,
`LIKE` — for the `update`, `delete`, and `show data` row
filters. It is described there as the bridge from DSL
fluency toward real SQL. Today the DSL is well short of
that:
- The only filter the DSL parses is a single
`where <col> = <val>` equality. There is no `AND` / `OR`,
no operator other than `=`, no `LIKE`, no `IS NULL`, no
parentheses.
- `update` and `delete` carry that filter as
`RowFilter::Where { column, value }`. `show data <T>`
carries **no filter at all** — it always selects every
row.
- The `Value` AST is purely syntactic (`Number`, `Text`,
`Bool`, `Null`); per-column type handling happens at
bind time in `db.rs`.
Three things make this the right moment, and shape the
decision:
1. **`QA1` (`EXPLAIN QUERY PLAN`) needs a filtered query.**
An unfiltered `SELECT * FROM t` always plans as a full
scan; an index can never appear. QA1's pedagogical
payoff depends on a `WHERE` whose plan flips between a
scan and an index search.
2. **`CHECK` constraints (`C3`) need an expression
grammar.** A `CHECK (Age >= 0 AND Age < 150)` is the
same expression problem. A throwaway mini-grammar for
`CHECK` plus a second one for `WHERE` would be waste;
this ADR builds the grammar once.
3. **The grammar architecture must grow to host it.** The
unified walker grammar (ADR-0023 / ADR-0024) is a
non-recursive trie of `&'static` `Node`s, and its parse
output (`MatchedPath`) is a flat list of matched
terminals. A `WHERE` expression is recursive and its
shape is data-dependent — neither fits as the grammar
stands.
### The architectural problem, precisely
A boolean expression — `a = 1 AND (b > 2 OR c LIKE 'x%')`
— is recursive (a parenthesised group is itself an
expression) and carries operator precedence. Two facts
about the current walker:
- **The `Node` tree is acyclic.** Every combinator
references its children through `&'static [Node]` /
`&'static Node`, and the registry lives in `const`s. A
`const` cannot refer to itself, so no node can close a
cycle. `Optional` and `Repeated` already hold a
`&'static Node` *reference* — recursion through a
reference is expressible if the fragment is a named
`static` — but `Seq` and `Choice` embed their children
*by value* in a slice, and a cyclic value has no finite
representation.
- **`MatchedPath` is flat.** It is a `Vec<MatchedItem>` of
matched *terminals* in source order; the combinators
shape the order but record no grouping. For every
command today that is enough: each command's shape is a
fixed template, so the AST builder reads terminals by
position or by role. A recursive expression has no fixed
template — `where a = 1` and
`where (a=1 or b=2) and c=3` have different shapes — so a
flat terminal list cannot be rebuilt into the expression
tree without parsing it a second time.
Left recursion is a third fact, and it is a property of
parsing technique rather than of this codebase: a
top-down walker cannot consume a rule whose leftmost
symbol is the rule itself (`expr := expr OP expr` recurses
without consuming input). The standard remedy — a
stratified, left-factored grammar — is adopted below.
## Decision
### 1. The expression grammar
The `WHERE` expression is a stratified grammar — one layer
per precedence tier. Stratification removes left recursion
(every recursion is guarded by a token) and encodes
operator precedence in the layering, so there is no
separate precedence-resolution step.
```
or_expr := and_expr ( OR and_expr )*
and_expr := not_expr ( AND not_expr )*
not_expr := NOT not_expr | bool_primary
bool_primary := ( or_expr ) | predicate
predicate := operand cmp_op operand
| operand [ NOT ] LIKE operand
| operand [ NOT ] BETWEEN operand AND operand
| operand [ NOT ] IN ( operand [ , operand ]* )
| operand IS [ NOT ] NULL
operand := literal | column_ref
cmp_op := = | != | <> | < | <= | > | >=
```
- **Operator set:** the six comparisons, with both `!=`
and `<>` accepted (`<>` is standard SQL, `!=` the common
variant — the engine accepts both); `AND` / `OR` / `NOT`;
parentheses; `LIKE` with `%` / `_` wildcards;
`IS NULL` / `IS NOT NULL`; `IN`; `BETWEEN`. `LIKE`,
`IN`, and `BETWEEN` take an optional infix `NOT`,
mirroring `IS NOT NULL`.
- **Operands are a column reference or a literal** — not a
nested expression. Parentheses group *boolean*
sub-expressions (`bool_primary`), not comparison
operands. A bare column reference is not a boolean
expression: a predicate always has an operator (write
`Active = true`, not `Active`).
- The only recursion is `( or_expr )` and `NOT not_expr`;
each consumes a token (`(` or `NOT`) before recursing,
so the greedy top-down walker always makes progress.
- **Nesting depth is capped at 64.** Hand-written `WHERE`
clauses do not approach this; the cap exists only so
pathological input (`((((…))))`) yields a friendly
*"expression nested too deeply (limit 64)"* error rather
than a stack overflow.
The grammar is deliberately a subset of standard SQL's
`WHERE` syntax, so a learner's knowledge transfers
directly when advanced-mode SQL (`Q1`) lands.
### 2. Grammar architecture: a reference-following node
`Seq` / `Choice` embed children by value and cannot hold a
cyclic node. One new `Node` variant closes the gap:
```rust
/// Walks the referenced node once, mandatory. Because the
/// reference is a `&'static Node`, a named `static`
/// fragment may appear inside its own subtree — the
/// mechanism that lets the expression grammar recurse.
/// (ADR-0023 sketched this as `SubgrammarRef`.)
Subgrammar(&'static Node),
```
`Subgrammar` is the static counterpart of the existing
`DynamicSubgrammar` (a walk-time factory). The expression
grammar's tiers are declared as named `static` items;
`bool_primary`'s `( or_expr )` branch reaches `or_expr`
through `Subgrammar(&OR_EXPR)`, and `not_expr` reaches
itself the same way. The walker gains one match arm — walk
the referenced node once — plus the depth counter for the
§1 cap.
The expression grammar is one fragment, referenced by
`update`, `delete`, and `show data` alike — defined once.
### 3. The expression result — built selectively
`MatchedPath` — the walker's flat list of matched
terminals — is left unchanged. The recursive structure
lives only where it is needed: inside the expression.
The expression grammar fragment carries its own
*AST-fragment builder*. As the walker recurses through the
stratified tiers, that builder runs — the walker's
recursion *is* the precedence-correct tree, so the builder
assembles a nested `Expr` directly, with no second parse
and no separate precedence pass. The finished `Expr` is
carried as a single item in the otherwise-unchanged flat
`MatchedPath` (`MatchedKind` gains one variant to hold a
built expression).
Every existing command builder is therefore genuinely
untouched — the flat path it reads is exactly as before.
`update` / `delete` / `show data` take that one expression
item and read its `Expr`.
This is the "selectively if necessary" option: the parser
gains structured output exactly where the grammar is
recursive, and nowhere else. A *system-wide* hierarchical
`MatchedPath` was considered and rejected — it would
record group structure for every command while only the
expression consumed it, leaving the non-expression
grouping computed but unread, and so untested. The general
"a grammar fragment may carry a builder" mechanism
introduced here is exercised by its one user; nothing is
recorded that nothing reads.
### 4. The `Expr` AST
A new recursive expression AST joins the command AST:
```rust
pub enum Expr {
Or(Vec<Expr>),
And(Vec<Expr>),
Not(Box<Expr>),
Predicate(Predicate),
}
pub enum Predicate {
Compare { left: Operand, op: CompareOp, right: Operand },
Like { target: Operand, pattern: Operand, negated: bool },
Between { target: Operand, low: Operand, high: Operand, negated: bool },
In { target: Operand, items: Vec<Operand>, negated: bool },
IsNull { target: Operand, negated: bool },
}
pub enum Operand { Column(String), Literal(Value) }
pub enum CompareOp { Eq, NotEq, Lt, LtEq, Gt, GtEq }
```
`Or` / `And` are n-ary — a flat `a AND b AND c` is one
`And` of three. Single-child tiers collapse: a `predicate`
reached through the `or → and → not` layers with no
connective is just that `Predicate`, not three wrappers.
`RowFilter` changes from
```rust
RowFilter::Where { column: String, value: Value }
```
to
```rust
RowFilter::Where(Expr)
```
for `update` / `delete`. `show data` carries
`filter: Option<Expr>` and `limit: Option<u64>`.
### 5. The commands
```
update <T> set <assignments> ( where <expr> | --all-rows )
delete from <T> ( where <expr> | --all-rows )
show data <T> [ where <expr> ] [ limit <n> ]
```
- `update` / `delete` keep ADR-0014's mandatory
where-or-`--all-rows` choice; a complex expression
satisfies the `where` side.
- **`show data` gains an optional `where`.** Reading every
row stays the safe default for a read, so no `--all-rows`
opt-in is needed there — the clause is simply optional.
- **`show data` gains an optional `limit <n>`** (`<n>` a
non-negative integer). When `limit` is present the query
is implicitly ordered by the table's primary key, so
`limit 20` is a stable "first 20 by primary key" rather
than an arbitrary subset — every table created through
the DSL has a primary key. Explicit `order by` is out of
scope (§10).
### 6. SQL generation
The `Expr` is compiled to a parameterised SQL `WHERE`
string:
- Every literal becomes a `?` placeholder bound as a
parameter — never spliced into the SQL text.
Identifiers are `quote_ident`-quoted.
- A literal compared against a column is converted to that
column's storage representation through the existing
`bind_for_column` path, exactly as the current
`where col = val` does.
- Connectives, `NOT`, and parentheses are emitted from the
tree structure.
- `limit` emits `LIMIT ?` with the bound count, plus the
implicit `ORDER BY` over the primary-key column(s) (§5).
The application never evaluates the expression itself —
the database does, and re-derives precedence from the
operators. The expression is "passed through" only in
that sense; the raw user text is never forwarded.
### 7. Type handling — permissive and advisory
A type mismatch in a comparison is **flagged, not
blocked**. This matches the app's ambient-assistance
posture (ADR-0022): the tool indicates problems, it does
not refuse input.
- A literal in `column OP literal` is type-checked against
the column. When the types are compatible the literal is
converted and bound per the column's type (§6).
- When they are **not** compatible — `Name > 5` on a text
column, `Age LIKE '5%'` on an int column — the mismatch
is surfaced through the existing highlight and hint
channels as an error-class annotation, but the command
still parses, still submits, and still runs. The literal
binds by its own syntactic type and the database's
comparison rules take over — which is precisely the
behaviour a learner is experimenting to observe.
- **This relaxes current behaviour.** Today
`bind_for_column` *rejects* a type-mismatched `WHERE`
literal; under this ADR it does not. The relaxation is
scoped to `WHERE` comparisons. Writes (`insert`,
`update … set`) stay strict: STRICT storage genuinely
cannot hold a mistyped value, so a mistyped write is a
real error, not an experiment.
- **`= NULL` / `!= NULL`** is a specific flagged case. It
is valid syntax that almost never does what the user
intends (in SQL it is never true). The walker
special-cases a comparison whose operator is `=` / `!=`
and whose operand is the `NULL` literal: it is
highlighted as an error, and the hint points at
`IS NULL` / `IS NOT NULL`. As with type mismatches it
still runs if submitted — a learner who wants to see
what `x = NULL` does may.
Always-on submit-time signalling of flagged-but-runnable
input (an `(INVALID)` / `WARNING` marker at the input
field's edge) is a separate, general concern — see §10.
### 8. Completion, hints, highlighting
Because the expression is parsed *in-grammar* — not handed
to an opaque sub-parser — the ambient-assistance machinery
(ADR-0022) works inside an expression with no separate
implementation:
- column-name completion resolves against the command's
table;
- value positions carry per-type hints, as they do for the
current `where col = val`;
- operator keywords (`and`, `or`, `like`, `between`, …)
surface as completion candidates where the grammar
allows them;
- syntax highlighting walks the same tree.
Type-mismatch and `= NULL` flagging (§7) surface through
these same highlight and hint channels.
### 9. Errors
Parse errors continue to route through the existing
`ParseError` shape, so ADR-0021's per-command usage help
and the hint panel keep working for the new clauses. The
depth-cap breach (§1) is a friendly error of the same
kind.
### 10. Out of scope
- **`ORDER BY`.** `limit` uses implicit primary-key
ordering for determinism; explicit `order by` is a clean
future addition, tracked separately.
- **`LIMIT … OFFSET`**, and `limit` anywhere other than
`show data`.
- **Operands beyond a column or a literal** — arithmetic
(`a + b`), string concatenation, scalar functions,
subqueries, `EXISTS`. The playground's `WHERE` compares
columns and literals.
- **A bare column as a boolean** (`where Active`).
- **The input-field validity indicator.** An always-
visible `(INVALID)` / `WARNING` marker at the edge of
the input field — signalling, before submit, that the
current input would error or is flagged — is a general
feature spanning every command, not just `WHERE`. It
gets its own small ADR; this ADR defines only *what*
inside a `WHERE` is flagged. The indicator is to carry
two severities: a hard `ERROR` / `INVALID` (the input
cannot run) and a softer `WARNING` (it runs but is
probably not intended — type mismatches, `= NULL`).
- **`CHECK` constraints.** The constraints ADR (`C3`) will
reuse this expression grammar; it is not built here.
## Consequences
- `C5a` is satisfied; `show data` gains filtering and
`limit` (advancing `C5` / `V5`); `QA1` is unblocked; the
future `CHECK` constraint has an expression grammar to
reuse.
- The grammar gains one node variant (`Subgrammar`) and a
recursion-depth counter.
- `MatchedPath` is unchanged; the expression fragment
carries an AST-builder that produces an `Expr`, carried
as a single matched item. Existing command builders are
untouched.
- A new recursive `Expr` AST joins the command AST;
`RowFilter` changes from `Where { column, value }` to
`Where(Expr)`.
- Type-mismatched `WHERE` comparisons change from
*refused* to *flagged but runnable* — a deliberate,
scoped behaviour change (§7).
- Old `history.log` lines using the previous
`where col = val` form remain valid — that form is a
strict subset of the new grammar — so `replay` is
unaffected. (Not a design driver; noted.)
- Forward-look toward `Q1`: advanced-mode SQL will be
parsed by `sqlparser-rs`, a separate parser, so this
`Expr` AST is not literally shared with it. The value of
the DSL expression being a SQL subset is pedagogical —
learner knowledge transfers — not code reuse.
## Implementation notes
A sensible build order, each step guarded by the test
suite and the typing-surface matrix:
1. The `Subgrammar` node and the recursion-depth counter
— the walker capability for a recursive fragment.
`MatchedPath` is unchanged. No user-visible change.
2. The expression grammar fragment, the `Expr` AST, and
the fragment-builder the walker invokes to produce the
`Expr`.
3. Wire the fragment into `update` / `delete` (replacing
the old `where`) and into `show data` (new `where`,
new `limit`).
4. `Expr` → parameterised SQL generation; the implicit
primary-key `ORDER BY` for `limit`.
5. Schema-aware type-mismatch and `= NULL` flagging in the
walker.
6. Typing-surface matrix cells for the new surface.
### As-built notes (2026-05-18)
Steps 14 are implemented and committed; step 5 (the §7
diagnostic flagging) is deferred — see below. Realization
choices, and where they deviate from the design sketch above:
- **§3 builder — option 1 ("reconstruct in builder"),**
chosen with the project owner before implementation. The
stratified grammar is walked normally; its terminals flow
into the flat `MatchedPath` unchanged (driving highlight /
completion / the expected-set). `grammar::expr::build_expr`
then folds that flat terminal slice into the `Expr` — a
deterministic recursive descent mirroring the grammar
tiers, run only at submit-time dispatch, never per
keystroke. Two honest deviations from the §3 wording:
- **No `MatchedKind::Expr` variant.** `MatchedPath` stays
purely terminals — arguably more faithful to "MatchedPath
stays flat" than carrying a built `Expr` in it. The
`Expr` is assembled in the command `ast_builder`s
(`build_update` / `build_delete` / `build_show`), which
already reconstruct structured `Command`s from the flat
path; `build_expr` is the same pattern, one tier deeper.
- **There is a second structural pass** over the
expression tokens, scoped to submit-time dispatch. "No
second parse" is read as "no separate parser framework":
the walk validates and drives assistance, `build_expr`
is the single `ast_builder` for the fragment — the same
category as `build_insert`.
- **Grammar shape.** `predicate` is factored as `operand
predicate_tail` (shared operand prefix), and the infix
`NOT` is factored in front of the `LIKE` / `BETWEEN` /
`IN` choice — so the walker's first-commit-wins `Choice`
semantics discriminate branches on a cleanly-failing first
token.
- **`Subgrammar` depth.** `MAX_SUBGRAMMAR_DEPTH = 64` counts
active `Subgrammar` recursion frames. The stratified
grammar descends ~45 frames per parenthesis level, so the
effective parenthesis-nesting limit is roughly a dozen —
far past any hand-written filter; the cap is purely a
stack-overflow guard.
- **§8 hints.** The expression's right-hand operands resolve
through a schema-aware `DynamicSubgrammar` (`where_rhs_
operand`) so the hint panel narrows to the compared
column's type, exactly as the pre-ADR `where col = val`
slot did. The operand grammar carries no validators —
permissive per §7.
- **Step 5 — done, as part of ADR-0027.** The §7
*behaviour* relaxation landed with steps 14:
`bind_where_literal` binds a type-mismatched WHERE literal
by its syntactic shape, and the pre-ADR bind-time
rejection is gone. The §7 *diagnostic flagging* — a
type-mismatched comparison or `= NULL` as a flagged
finding — was folded into ADR-0027's walker
diagnostics-severity model rather than built as a
standalone mechanism (ADR-0027's WARNING severity is
defined to have "no triggers until ADR-0026 is
implemented" — those triggers are exactly these). It
surfaces as a `Severity::Warning` `Diagnostic`, computed
post-walk from the built `Expr` against the table's
column types — see ADR-0027.
## See also
- ADR-0009 — DSL command syntax conventions (`--` flags,
keyword clauses).
- ADR-0014 — data operations, the `Value` model,
`bind_for_column`, the mandatory where-or-`--all-rows`
rule, auto-show.
- ADR-0021 — the parser as source of truth for H1a
parse-error help.
- ADR-0022 — ambient typing assistance: the highlight,
hint, and completion machinery the expression plugs
into.
- ADR-0023 / ADR-0024 — the unified grammar tree this
extends; ADR-0023 sketched the `SubgrammarRef` node
realised here as `Subgrammar`.
- ADR-0025 — indexes; the reason `QA1`, and thus a
filtered query, is now worthwhile.