From 6e42a118a32962fb395b4830f22d66f3055b58fc Mon Sep 17 00:00:00 2001 From: "claude@clouddev1" Date: Mon, 18 May 2026 10:34:12 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20add=20ADR-0026=20=E2=80=94=20complex=20?= =?UTF-8?q?WHERE=20expressions?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The C5a design: a stratified, recursive WHERE-expression grammar (AND/OR/NOT, comparisons, LIKE, IS NULL, IN, BETWEEN) for update / delete / show-data filters; show data gains optional `where` and `limit`. Adds the `Subgrammar` reference-following grammar node and a recursive `Expr` AST, built selectively for the expression fragment. - docs/adr/0026-complex-where-expressions.md — the ADR. - docs/adr/README.md — index entry. - docs/simple-mode-limitations.md — new running list of simple-mode query boundaries vs. advanced SQL, seeded from ADR-0026. - docs/requirements.md — C5a [~] -> [ ] (designed, not yet implemented); new Documentation section with DOC1. --- docs/adr/0026-complex-where-expressions.md | 435 +++++++++++++++++++++ docs/adr/README.md | 1 + docs/requirements.md | 21 +- docs/simple-mode-limitations.md | 40 ++ 4 files changed, 493 insertions(+), 4 deletions(-) create mode 100644 docs/adr/0026-complex-where-expressions.md create mode 100644 docs/simple-mode-limitations.md diff --git a/docs/adr/0026-complex-where-expressions.md b/docs/adr/0026-complex-where-expressions.md new file mode 100644 index 0000000..f1e5cb6 --- /dev/null +++ b/docs/adr/0026-complex-where-expressions.md @@ -0,0 +1,435 @@ +# ADR-0026: Complex WHERE expressions + +## Status + +Accepted + +## Context + +The requirements checklist commits, in `C5a`, to *complex +WHERE expressions* — `AND` / `OR`, comparison operators, +`LIKE` — for the `update`, `delete`, and `show data` row +filters. It is described there as the bridge from DSL +fluency toward real SQL. Today the DSL is well short of +that: + +- The only filter the DSL parses is a single + `where = ` equality. There is no `AND` / `OR`, + no operator other than `=`, no `LIKE`, no `IS NULL`, no + parentheses. +- `update` and `delete` carry that filter as + `RowFilter::Where { column, value }`. `show data ` + carries **no filter at all** — it always selects every + row. +- The `Value` AST is purely syntactic (`Number`, `Text`, + `Bool`, `Null`); per-column type handling happens at + bind time in `db.rs`. + +Three things make this the right moment, and shape the +decision: + +1. **`QA1` (`EXPLAIN QUERY PLAN`) needs a filtered query.** + An unfiltered `SELECT * FROM t` always plans as a full + scan; an index can never appear. QA1's pedagogical + payoff depends on a `WHERE` whose plan flips between a + scan and an index search. +2. **`CHECK` constraints (`C3`) need an expression + grammar.** A `CHECK (Age >= 0 AND Age < 150)` is the + same expression problem. A throwaway mini-grammar for + `CHECK` plus a second one for `WHERE` would be waste; + this ADR builds the grammar once. +3. **The grammar architecture must grow to host it.** The + unified walker grammar (ADR-0023 / ADR-0024) is a + non-recursive trie of `&'static` `Node`s, and its parse + output (`MatchedPath`) is a flat list of matched + terminals. A `WHERE` expression is recursive and its + shape is data-dependent — neither fits as the grammar + stands. + +### The architectural problem, precisely + +A boolean expression — `a = 1 AND (b > 2 OR c LIKE 'x%')` +— is recursive (a parenthesised group is itself an +expression) and carries operator precedence. Two facts +about the current walker: + +- **The `Node` tree is acyclic.** Every combinator + references its children through `&'static [Node]` / + `&'static Node`, and the registry lives in `const`s. A + `const` cannot refer to itself, so no node can close a + cycle. `Optional` and `Repeated` already hold a + `&'static Node` *reference* — recursion through a + reference is expressible if the fragment is a named + `static` — but `Seq` and `Choice` embed their children + *by value* in a slice, and a cyclic value has no finite + representation. +- **`MatchedPath` is flat.** It is a `Vec` of + matched *terminals* in source order; the combinators + shape the order but record no grouping. For every + command today that is enough: each command's shape is a + fixed template, so the AST builder reads terminals by + position or by role. A recursive expression has no fixed + template — `where a = 1` and + `where (a=1 or b=2) and c=3` have different shapes — so a + flat terminal list cannot be rebuilt into the expression + tree without parsing it a second time. + +Left recursion is a third fact, and it is a property of +parsing technique rather than of this codebase: a +top-down walker cannot consume a rule whose leftmost +symbol is the rule itself (`expr := expr OP expr` recurses +without consuming input). The standard remedy — a +stratified, left-factored grammar — is adopted below. + +## Decision + +### 1. The expression grammar + +The `WHERE` expression is a stratified grammar — one layer +per precedence tier. Stratification removes left recursion +(every recursion is guarded by a token) and encodes +operator precedence in the layering, so there is no +separate precedence-resolution step. + +``` +or_expr := and_expr ( OR and_expr )* +and_expr := not_expr ( AND not_expr )* +not_expr := NOT not_expr | bool_primary +bool_primary := ( or_expr ) | predicate +predicate := operand cmp_op operand + | operand [ NOT ] LIKE operand + | operand [ NOT ] BETWEEN operand AND operand + | operand [ NOT ] IN ( operand [ , operand ]* ) + | operand IS [ NOT ] NULL +operand := literal | column_ref +cmp_op := = | != | <> | < | <= | > | >= +``` + +- **Operator set:** the six comparisons, with both `!=` + and `<>` accepted (`<>` is standard SQL, `!=` the common + variant — the engine accepts both); `AND` / `OR` / `NOT`; + parentheses; `LIKE` with `%` / `_` wildcards; + `IS NULL` / `IS NOT NULL`; `IN`; `BETWEEN`. `LIKE`, + `IN`, and `BETWEEN` take an optional infix `NOT`, + mirroring `IS NOT NULL`. +- **Operands are a column reference or a literal** — not a + nested expression. Parentheses group *boolean* + sub-expressions (`bool_primary`), not comparison + operands. A bare column reference is not a boolean + expression: a predicate always has an operator (write + `Active = true`, not `Active`). +- The only recursion is `( or_expr )` and `NOT not_expr`; + each consumes a token (`(` or `NOT`) before recursing, + so the greedy top-down walker always makes progress. +- **Nesting depth is capped at 64.** Hand-written `WHERE` + clauses do not approach this; the cap exists only so + pathological input (`((((…))))`) yields a friendly + *"expression nested too deeply (limit 64)"* error rather + than a stack overflow. + +The grammar is deliberately a subset of standard SQL's +`WHERE` syntax, so a learner's knowledge transfers +directly when advanced-mode SQL (`Q1`) lands. + +### 2. Grammar architecture: a reference-following node + +`Seq` / `Choice` embed children by value and cannot hold a +cyclic node. One new `Node` variant closes the gap: + +```rust +/// Walks the referenced node once, mandatory. Because the +/// reference is a `&'static Node`, a named `static` +/// fragment may appear inside its own subtree — the +/// mechanism that lets the expression grammar recurse. +/// (ADR-0023 sketched this as `SubgrammarRef`.) +Subgrammar(&'static Node), +``` + +`Subgrammar` is the static counterpart of the existing +`DynamicSubgrammar` (a walk-time factory). The expression +grammar's tiers are declared as named `static` items; +`bool_primary`'s `( or_expr )` branch reaches `or_expr` +through `Subgrammar(&OR_EXPR)`, and `not_expr` reaches +itself the same way. The walker gains one match arm — walk +the referenced node once — plus the depth counter for the +§1 cap. + +The expression grammar is one fragment, referenced by +`update`, `delete`, and `show data` alike — defined once. + +### 3. The expression result — built selectively + +`MatchedPath` — the walker's flat list of matched +terminals — is left unchanged. The recursive structure +lives only where it is needed: inside the expression. + +The expression grammar fragment carries its own +*AST-fragment builder*. As the walker recurses through the +stratified tiers, that builder runs — the walker's +recursion *is* the precedence-correct tree, so the builder +assembles a nested `Expr` directly, with no second parse +and no separate precedence pass. The finished `Expr` is +carried as a single item in the otherwise-unchanged flat +`MatchedPath` (`MatchedKind` gains one variant to hold a +built expression). + +Every existing command builder is therefore genuinely +untouched — the flat path it reads is exactly as before. +`update` / `delete` / `show data` take that one expression +item and read its `Expr`. + +This is the "selectively if necessary" option: the parser +gains structured output exactly where the grammar is +recursive, and nowhere else. A *system-wide* hierarchical +`MatchedPath` was considered and rejected — it would +record group structure for every command while only the +expression consumed it, leaving the non-expression +grouping computed but unread, and so untested. The general +"a grammar fragment may carry a builder" mechanism +introduced here is exercised by its one user; nothing is +recorded that nothing reads. + +### 4. The `Expr` AST + +A new recursive expression AST joins the command AST: + +```rust +pub enum Expr { + Or(Vec), + And(Vec), + Not(Box), + Predicate(Predicate), +} + +pub enum Predicate { + Compare { left: Operand, op: CompareOp, right: Operand }, + Like { target: Operand, pattern: Operand, negated: bool }, + Between { target: Operand, low: Operand, high: Operand, negated: bool }, + In { target: Operand, items: Vec, negated: bool }, + IsNull { target: Operand, negated: bool }, +} + +pub enum Operand { Column(String), Literal(Value) } +pub enum CompareOp { Eq, NotEq, Lt, LtEq, Gt, GtEq } +``` + +`Or` / `And` are n-ary — a flat `a AND b AND c` is one +`And` of three. Single-child tiers collapse: a `predicate` +reached through the `or → and → not` layers with no +connective is just that `Predicate`, not three wrappers. + +`RowFilter` changes from + +```rust +RowFilter::Where { column: String, value: Value } +``` + +to + +```rust +RowFilter::Where(Expr) +``` + +for `update` / `delete`. `show data` carries +`filter: Option` and `limit: Option`. + +### 5. The commands + +``` +update set ( where | --all-rows ) +delete from ( where | --all-rows ) +show data [ where ] [ limit ] +``` + +- `update` / `delete` keep ADR-0014's mandatory + where-or-`--all-rows` choice; a complex expression + satisfies the `where` side. +- **`show data` gains an optional `where`.** Reading every + row stays the safe default for a read, so no `--all-rows` + opt-in is needed there — the clause is simply optional. +- **`show data` gains an optional `limit `** (`` a + non-negative integer). When `limit` is present the query + is implicitly ordered by the table's primary key, so + `limit 20` is a stable "first 20 by primary key" rather + than an arbitrary subset — every table created through + the DSL has a primary key. Explicit `order by` is out of + scope (§10). + +### 6. SQL generation + +The `Expr` is compiled to a parameterised SQL `WHERE` +string: + +- Every literal becomes a `?` placeholder bound as a + parameter — never spliced into the SQL text. + Identifiers are `quote_ident`-quoted. +- A literal compared against a column is converted to that + column's storage representation through the existing + `bind_for_column` path, exactly as the current + `where col = val` does. +- Connectives, `NOT`, and parentheses are emitted from the + tree structure. +- `limit` emits `LIMIT ?` with the bound count, plus the + implicit `ORDER BY` over the primary-key column(s) (§5). + +The application never evaluates the expression itself — +the database does, and re-derives precedence from the +operators. The expression is "passed through" only in +that sense; the raw user text is never forwarded. + +### 7. Type handling — permissive and advisory + +A type mismatch in a comparison is **flagged, not +blocked**. This matches the app's ambient-assistance +posture (ADR-0022): the tool indicates problems, it does +not refuse input. + +- A literal in `column OP literal` is type-checked against + the column. When the types are compatible the literal is + converted and bound per the column's type (§6). +- When they are **not** compatible — `Name > 5` on a text + column, `Age LIKE '5%'` on an int column — the mismatch + is surfaced through the existing highlight and hint + channels as an error-class annotation, but the command + still parses, still submits, and still runs. The literal + binds by its own syntactic type and the database's + comparison rules take over — which is precisely the + behaviour a learner is experimenting to observe. +- **This relaxes current behaviour.** Today + `bind_for_column` *rejects* a type-mismatched `WHERE` + literal; under this ADR it does not. The relaxation is + scoped to `WHERE` comparisons. Writes (`insert`, + `update … set`) stay strict: STRICT storage genuinely + cannot hold a mistyped value, so a mistyped write is a + real error, not an experiment. +- **`= NULL` / `!= NULL`** is a specific flagged case. It + is valid syntax that almost never does what the user + intends (in SQL it is never true). The walker + special-cases a comparison whose operator is `=` / `!=` + and whose operand is the `NULL` literal: it is + highlighted as an error, and the hint points at + `IS NULL` / `IS NOT NULL`. As with type mismatches it + still runs if submitted — a learner who wants to see + what `x = NULL` does may. + +Always-on submit-time signalling of flagged-but-runnable +input (an `(INVALID)` / `WARNING` marker at the input +field's edge) is a separate, general concern — see §10. + +### 8. Completion, hints, highlighting + +Because the expression is parsed *in-grammar* — not handed +to an opaque sub-parser — the ambient-assistance machinery +(ADR-0022) works inside an expression with no separate +implementation: + +- column-name completion resolves against the command's + table; +- value positions carry per-type hints, as they do for the + current `where col = val`; +- operator keywords (`and`, `or`, `like`, `between`, …) + surface as completion candidates where the grammar + allows them; +- syntax highlighting walks the same tree. + +Type-mismatch and `= NULL` flagging (§7) surface through +these same highlight and hint channels. + +### 9. Errors + +Parse errors continue to route through the existing +`ParseError` shape, so ADR-0021's per-command usage help +and the hint panel keep working for the new clauses. The +depth-cap breach (§1) is a friendly error of the same +kind. + +### 10. Out of scope + +- **`ORDER BY`.** `limit` uses implicit primary-key + ordering for determinism; explicit `order by` is a clean + future addition, tracked separately. +- **`LIMIT … OFFSET`**, and `limit` anywhere other than + `show data`. +- **Operands beyond a column or a literal** — arithmetic + (`a + b`), string concatenation, scalar functions, + subqueries, `EXISTS`. The playground's `WHERE` compares + columns and literals. +- **A bare column as a boolean** (`where Active`). +- **The input-field validity indicator.** An always- + visible `(INVALID)` / `WARNING` marker at the edge of + the input field — signalling, before submit, that the + current input would error or is flagged — is a general + feature spanning every command, not just `WHERE`. It + gets its own small ADR; this ADR defines only *what* + inside a `WHERE` is flagged. The indicator is to carry + two severities: a hard `ERROR` / `INVALID` (the input + cannot run) and a softer `WARNING` (it runs but is + probably not intended — type mismatches, `= NULL`). +- **`CHECK` constraints.** The constraints ADR (`C3`) will + reuse this expression grammar; it is not built here. + +## Consequences + +- `C5a` is satisfied; `show data` gains filtering and + `limit` (advancing `C5` / `V5`); `QA1` is unblocked; the + future `CHECK` constraint has an expression grammar to + reuse. +- The grammar gains one node variant (`Subgrammar`) and a + recursion-depth counter. +- `MatchedPath` is unchanged; the expression fragment + carries an AST-builder that produces an `Expr`, carried + as a single matched item. Existing command builders are + untouched. +- A new recursive `Expr` AST joins the command AST; + `RowFilter` changes from `Where { column, value }` to + `Where(Expr)`. +- Type-mismatched `WHERE` comparisons change from + *refused* to *flagged but runnable* — a deliberate, + scoped behaviour change (§7). +- Old `history.log` lines using the previous + `where col = val` form remain valid — that form is a + strict subset of the new grammar — so `replay` is + unaffected. (Not a design driver; noted.) +- Forward-look toward `Q1`: advanced-mode SQL will be + parsed by `sqlparser-rs`, a separate parser, so this + `Expr` AST is not literally shared with it. The value of + the DSL expression being a SQL subset is pedagogical — + learner knowledge transfers — not code reuse. + +## Implementation notes + +A sensible build order, each step guarded by the test +suite and the typing-surface matrix: + +1. The `Subgrammar` node and the recursion-depth counter + — the walker capability for a recursive fragment. + `MatchedPath` is unchanged. No user-visible change. +2. The expression grammar fragment, the `Expr` AST, and + the fragment-builder the walker invokes to produce the + `Expr`. +3. Wire the fragment into `update` / `delete` (replacing + the old `where`) and into `show data` (new `where`, + new `limit`). +4. `Expr` → parameterised SQL generation; the implicit + primary-key `ORDER BY` for `limit`. +5. Schema-aware type-mismatch and `= NULL` flagging in the + walker. +6. Typing-surface matrix cells for the new surface. + +## See also + +- ADR-0009 — DSL command syntax conventions (`--` flags, + keyword clauses). +- ADR-0014 — data operations, the `Value` model, + `bind_for_column`, the mandatory where-or-`--all-rows` + rule, auto-show. +- ADR-0021 — the parser as source of truth for H1a + parse-error help. +- ADR-0022 — ambient typing assistance: the highlight, + hint, and completion machinery the expression plugs + into. +- ADR-0023 / ADR-0024 — the unified grammar tree this + extends; ADR-0023 sketched the `SubgrammarRef` node + realised here as `Subgrammar`. +- ADR-0025 — indexes; the reason `QA1`, and thus a + filtered query, is now worthwhile. diff --git a/docs/adr/README.md b/docs/adr/README.md index fa48402..adde8ad 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -31,3 +31,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0023 — Unified declarative grammar tree](0023-unified-grammar-tree.md) — direction (superseded for execution detail by ADR-0024) - [ADR-0024 — Unified grammar tree: execution plan](0024-unified-grammar-tree-execution-plan.md) — **Accepted**, the executable spec — implemented (Phases A–F; Phase F shipped "minimal", `parser.rs` retained as the router — see the ADR's Phase F implementation note) - [ADR-0025 — Indexes](0025-indexes.md) — **Accepted**, `add index` / `drop index`, persistence, rebuild-table preservation, and items-list display (`C3` index portion + `S2`) +- [ADR-0026 — Complex WHERE expressions](0026-complex-where-expressions.md) — **Accepted**, stratified recursive expression grammar (`AND`/`OR`/`NOT`, comparisons, `LIKE`, `IS NULL`, `IN`, `BETWEEN`) for `update` / `delete` / `show data` filters; `show data` gains `where` + `limit`; adds the `Subgrammar` node and a recursive `Expr` AST (`C5a`) diff --git a/docs/requirements.md b/docs/requirements.md index b10587b..7f035ec 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -151,10 +151,12 @@ group enabled. (Earlier reference points: 1006 after ADR-0024 writes. Bulk insert, complex WHERE expressions, and SELECT in advanced mode are explicitly tracked separately — see C5a below.)* -- [~] **C5a** Complex WHERE expressions (AND/OR/comparison - operators/LIKE) for UPDATE/DELETE/show-data filtering. Tracks - the natural progression from DSL into real SQL fluency that - motivates the playground; design and ADR pending. +- [ ] **C5a** Complex WHERE expressions (AND/OR, comparison + operators, LIKE, IS NULL, IN, BETWEEN) for UPDATE/DELETE/ + show-data filtering; also adds `where` and `limit` to + `show data`. Tracks the natural progression from DSL into + real SQL fluency that motivates the playground. Designed in + ADR-0026; implementation pending. ## SQL handling @@ -401,6 +403,17 @@ group enabled. (Earlier reference points: 1006 after ADR-0024 - [~] **TU1** Tutorial / lesson system — design and ADR pending before any implementation. Out of v1 unless an ADR is written. +## Documentation + +- [ ] **DOC1** User- and student-facing reference + documentation under `docs/`: the DSL command surface, + the type system, and the boundaries of simple mode. + `docs/simple-mode-limitations.md` is the first piece — + it doubles as student explanation and as detailed + reference. Distinct from in-app `help` (`H3`), the + interactive tutorial system (`TU1`), and the sharing + recipes under `E2`. + ## Testing (per ADR-0008) - [ ] **TT1** Tier 1: `cargo test` + `proptest` covering diff --git a/docs/simple-mode-limitations.md b/docs/simple-mode-limitations.md new file mode 100644 index 0000000..205eb3c --- /dev/null +++ b/docs/simple-mode-limitations.md @@ -0,0 +1,40 @@ +# Simple-mode query limitations + +Simple mode's DSL query surface is deliberately a *subset* +of SQL. The DSL is a teaching on-ramp; advanced mode (raw +SQL) is the full surface. This document is the running +list of what a simple-mode query cannot express that +advanced-mode SQL can. + +It serves two audiences: + +- **Students** — each entry is the seed of a short + explanation of why the boundary exists and what to use + instead (often: switch to advanced mode). +- **Designers** — the consolidated list feeds the future + `Q4` SQL-subset specification: the inverse view of what + the supported subset deliberately leaves out. + +The list grows as new simple-mode surface lands; each +entry names the ADR that drew the boundary. + +## WHERE expressions (ADR-0026) + +- **Comparison operands are a column or a literal**, not a + nested expression. `(a > b) = (c > d)` — comparing two + boolean sub-expressions — cannot be written. Parentheses + group boolean sub-expressions, not comparison operands. +- **A bare column is not a boolean.** A predicate always + has an operator: write `Active = true`, not `Active`. +- **No arithmetic** in expressions (`Price * 1.1`). +- **No string concatenation.** +- **No scalar functions** (`upper(Name)`, `length(x)`, …). +- **No subqueries**, and no `EXISTS`. + +## Query shape (ADR-0026) + +- **No `ORDER BY`.** `show data … limit ` orders + implicitly by the primary key; explicit ordering is not + yet available. +- **No `LIMIT … OFFSET`** — `limit` takes a row count + only.