53808ed9d7bddc8a2e47ed527607645ae3573d12
9 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
53808ed9d7 |
grammar+db: 3e — SQL UPDATE grammar + execution (ADR-0033 §2)
New src/dsl/grammar/sql_update.rs: SQL_UPDATE_SHAPE =
<table> SET col = sql_expr (',' …)* [WHERE sql_expr] [';'], the
__rdbms_* target rejection, and the shared sql_expr on both the
assignment RHS and the predicate. No --all-rows rail — a SQL
UPDATE without WHERE runs as written (ADR-0030 §12). Reuses
sql_select::WHERE_CLAUSE (now pub(crate)) so the predicate
diagnostics are identical. The target uses the shared `table_name`
ident role (not a bespoke one) so the Phase-2 schema-existence and
predicate-warning passes collect it as a scope binding and check
the SET / WHERE columns for free — a bespoke role left them
unchecked (the cross-cut tests caught this).
Command::SqlUpdate { sql, target_table }; Request::RunSqlUpdate +
do_sql_update (execute validated SQL via execute_with_fk_enrichment,
re-persist the target CSV, append history.log). 3e surfaces the
affected-row count only; precise row output is RETURNING (3g), so
the update-success render skips a column-less data set rather than
showing a misleading "(no rows)" band. Behind the dev `sql_update`
entry word until 3j.
Tests: grammar accept/reject; integration (single/multi-col,
no-WHERE all-rows, sql_expr in SET, scalar subquery in SET,
zero-match success, history); walker cross-cut (unknown SET column
→ unknown_column, `= NULL` in WHERE → eq_null warning); app-level
render-guard both ways (column-less → count only; with columns →
table renders). 1524 green, clippy clean.
|
||
|
|
c87363168f |
grammar+db: 3b — SQL INSERT grammar + minimal execution (ADR-0033 §1)
SQL_INSERT_SHAPE (INTO <table> [(cols)] VALUES tuple(s)) with __rdbms_*
target rejection; Command::SqlInsert{sql,target_table}; Request::RunSqlInsert
+ do_sql_insert worker (tx-guarded: execute, then finalize_persistence for
CSV + history before commit, so failures roll back and don't re-persist).
Auto-show is best-effort via last_insert_rowid range.
Isolated behind a dev `sqlinsert` entry word (Advanced) so the SQL path is
testable without making `insert` a shared word yet (that's 3j, after 3d
auto-fill parity). Command::SqlInsert carries only sql+target_table; the
plan's listed_columns/returning land in 3d/3g where they're read.
6 grammar accept/reject tests + 8 integration tests (single/multi-row,
column-list, full-arity, history, rollback-on-failure, multi-row atomicity,
parse-path reconstruction, internal-table rejection). 1452 baseline green.
|
||
|
|
fd259048da |
grammar: admit WITH inside subqueries / CTE bodies (ADR-0032 §10.3)
ADR-0032 §10.3 says cte_bindings lives on the scope frame, with inner subqueries free to declare their own CTEs that shadow outer ones. The grammar didn't actually admit nested WITH inside SQL_SELECT_COMPOUND — a real ADR-vs-implementation gap. Closes the gap by making SQL_SELECT_COMPOUND a Choice between a WITH-prefixed form and a plain form. The naive Optional-prefix approach silently broke the paren-vs-subquery dispatch in sql_expr.rs's PAREN_GROUP: Optional matches 0 bytes, committing the Seq, so SELECT_CORE's NoMatch on `(a + b)` became Failed and the Choice couldn't fall through to or_expr. The Choice-fronted form keeps the fast NoMatch on non-WITH non-SELECT first tokens. Side effect: scalar subquery / IN / EXISTS / derived-table bodies now admit a leading WITH too, which matches standard SQL. Updated two tests that were guarding the old `(WITH …)` rejection behavior. Added one new harvest test exercising nested-WITH inside a CTE body — the harvest's `expand_binding` mechanism already handled the data correctly; the grammar gap was the sole blocker. Test totals: 1414 → 1415 passing (+1 nested-with-in-cte test). Clippy clean. |
||
|
|
c5cf03b152 |
walker: SQL diagnostics — multi-binding scope, qualified refs, Phase-1 gap closure (sub-phase 2d)
Implements the bulk of ADR-0032 §11 diagnostics. The
schema-existence pass becomes multi-binding-aware; the SQL
predicate-warning pass closes the Phase-1 carry-over gap
named in §11.6; pre-flight duplicate-CTE detection lands
(user-approved Plan §Open-2); a `data::WITH` CommandNode
makes WITH-prefixed statements dispatch through the registry.
Catalog (`src/friendly/strings/en-US.yaml`, `src/friendly/keys.rs`):
- Six new `diagnostic.*` keys: ambiguous_column,
compound_arity_mismatch, cte_arity_mismatch, duplicate_cte,
projection_alias_misplaced, unknown_qualifier.
- Eight new `engine.*` translation keys (ADR-0032 §11.5) for
the friendly-error layer to render engine messages in
engine-neutral wording. The catalog entries are authored;
wiring them into the engine-error path is deferred (the
friendly layer reads these by key when reached).
Schema-existence diagnostic (`schema_existence_diagnostics`)
extended per ADR-0032 §11.2:
- A pre-pass collects all `table_name` / `cte_name` / table-
alias idents into a `PassBinding` vec + a CTE name list,
sidestepping the projection-before-FROM ordering problem
(§10.6). The main pass then resolves identifiers against the
complete scope.
- Bare column references resolve against any binding's
columns. Zero matches → `diagnostic.unknown_column` (the
table arg lists all in-scope tables in the multi-binding
case). Two-or-more matches → `diagnostic.ambiguous_column`.
- Qualified `t.c` refs detect their qualifier via a look-ahead
on the matched path (Punct '.' + Ident{role:
sql_expr_qualified_ref} after the leading Ident). Unknown
qualifier → `diagnostic.unknown_qualifier`; the column check
then runs against the resolved binding's table.
- The `t.*` qualified-wildcard's `qualified_star_qualifier`
ident also resolves through the same pass.
- CTE-name references in table-source slots accept silently
(the CTE binding's columns are unknown until the deferred
§10.3 stage-2 harvest lands, so bare column refs into a
CTE binding short-circuit to "accept silently").
- Duplicate CTE names in the same `WITH` block emit
`diagnostic.duplicate_cte` on the second occurrence
(Plan §Open-2).
Phase-1 gap closure (`sql_predicate_warnings`, ADR-0032 §11.6):
A new MatchedPath-walking pass that identifies predicate-tail
shapes by node-name labels and emits the same `diagnostic.*`
keys the DSL `Expr` AST pass already emitted (`eq_null`,
`like_numeric`, `type_mismatch`). Scoped to bare column refs
in `<column> <op> <literal>` form — qualified-ref and
expression-operand cases stay un-flagged in this minimal pass,
which is a safe false-negative posture (the warning is
advisory; the engine still runs). Runs alongside the schema-
existence pass on every successful SQL parse — WHERE,
HAVING, JOIN ON, projection, ORDER BY all get warnings
uniformly. Tests cover all three keys plus the negative
"compatible types don't warn" case.
WITH dispatch (`data::WITH`):
`with x as (…) select * from x` now dispatches via the registry
with entry word `with`. Shape: `SQL_WITH_TAIL`, the post-`WITH`
portion of a statement (optional `RECURSIVE`, the cte_def
list, the trailing compound_select, optional `;`). Both
`data::SELECT` and `data::WITH` route to `build_select` and
produce `Command::Select { sql: source }` — execution is
grammar-as-text, so the entry-word split doesn't fork the
exec path. `is_advanced_only` extended to include `with`.
Deferred per the 2d-scoped DA review (documented as a
`(TBD)` in the cross-cut matrix for 2g):
- `diagnostic.projection_alias_misplaced` — requires clause
detection (the matched-path is flat).
- `diagnostic.compound_arity_mismatch` — needs per-leg
projection counting.
- `diagnostic.cte_arity_mismatch` — depends on §10.3 stage-2
harvest, which 2b deferred.
- `engine.*` key wiring into the friendly-error layer — the
catalog entries are authored; the engine-error path reads
them by key when reached, but no proactive enhancement of
the layer here.
Test totals: 1366 → 1382 passing (+16: 10 schema-existence
multi-binding + diagnostic tests, 7 Phase-1 gap closure
tests, minus duplicates from prior runs), 0 failed, 1 ignored.
Clippy clean.
|
||
|
|
a491df32a0 |
grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c)
The Phase-1 SQL `SELECT` grammar nodes that used to live in `src/dsl/grammar/data.rs` retire — 22 statics / consts and the `reject_internal_table` validator copy are removed, ~150 lines of grammar machinery gone. `data::SELECT.shape` now references the post-`SELECT` portion of the ADR-0032 fragment via a thin `Node::Subgrammar(&sql_select::SQL_SELECT_TAIL)`. `SQL_SELECT_TAIL` is a new export from `sql_select.rs`, parallel to `SQL_SELECT_STATEMENT`. It represents what a top-level `SELECT` statement looks like AFTER the registry's entry-word dispatch has already consumed the leading `SELECT` keyword: the DISTINCT/ALL prefix, projection list, optional FROM / WHERE / GROUP BY / HAVING, the compound set-op chain (each subsequent leg's `SELECT` is part of `SET_OP_TAIL`), outer ORDER BY / LIMIT, and a tolerated trailing `;`. WITH-prefixed statements (`WITH x AS (…) SELECT * FROM x`) are NOT in 2c's scope — they need a separate `data::WITH` `CommandNode` so the entry-word dispatch routes correctly. For now, top-level WITH continues to fall through to the chumsky parser route (the same as in Phase 1). The `SQL_SELECT_STATEMENT` static (which includes the optional WITH prefix) stays available for use by that future CommandNode or by any other consumer that needs the full statement shape. All seven Phase-1 SQL `SELECT` integration tests (`tests/sql_select.rs`) pass without modification, satisfying the 2c exit gate's "behaviour preserved" requirement. The 70 fragment unit tests and the 26 driver-level scope tests also pass — the migration is a refactor, no new tests required. Behaviour change explicitly sanctioned by ADR-0032 §8: Phase-1's `LIMIT_VALIDATOR` (positive-int-only, parse-time) is superseded by the full `sql_expr` admission. `LIMIT max(10, x)` and similar now parse; the engine constrains the value at execution time per the ADR's "grammar admits, engine rejects" posture. Plan §2b status note: the 2026-05-20 deferral of §10.3 stage 2 (CTE output-column harvest derivation) is recorded in `docs/plans/20260520-adr-0032-phase-2.md` per the user-approved deferral. Test totals: 1366 passing (unchanged), 0 failed, 1 ignored. Clippy clean. data.rs loses ~150 lines of dead grammar; the single source of truth for the SQL `SELECT` shape is now `sql_select.rs`. |
||
|
|
4ff054ca75 |
walker: populate cte_bindings placeholders + projection_aliases (ADR-0032 §10.3 stage 1 / §10.4)
Sub-phase 2b checkpoints 4 and 5 combined — adds the placeholder CTE binding push (§10.3 stage 1) and the projection alias accumulator (§10.4). Node::Ident gains two more flags, mechanically applied to every existing site: - `writes_cte_name: bool` — push a placeholder `CteBinding` (name only, empty columns) onto the top `ScopeFrame`'s `cte_bindings`. Set on `CTE_NAME_IDENT` in sql_select.rs. Fires BEFORE the body's `ScopedSubgrammar` enters (the CTE-def Seq's ident slot precedes the body's `(`), so the body can self-reference the CTE name as a valid table source (WITH RECURSIVE). - `writes_projection_alias: bool` — append the matched name to the top frame's `projection_aliases`. Set on `PROJECTION_BARE_ALIAS_IDENT` so both the AS-form (`a AS alpha`) and bare-form (`a alpha`) paths capture cleanly. The ident is shared by both paths through `PROJECTION_AS_ALIAS` and the lookahead factory, so capturing on the ident itself covers both forms with no duplication. The §10.3 stage-2 harvest (deriving CTE output columns from the body's projection per the six derivation rules in the ADR's table) is structurally deferred — the placeholder's `columns` stays empty until the harvest is wired. This is intentional scope honesty: the placeholder-name presence is sufficient for the schema-existence diagnostic (2d) to recognize CTE names as valid table sources, and the qualified-prefix completion (2e) will populate the columns when the harvest hook is added there. Tests below assert the placeholder-name behavior; the column-derivation tests from plan §2b's exit gate will be satisfied incrementally as later sub-phases need them. Tests (8 new, all green): - Single CTE → one placeholder binding with the matched name. - Multiple CTEs → placeholders in declaration order. - Recursive CTE → name visible inside body (the body's `from r` reference parses; verified by the walk completing). - Projection aliases via AS form → captured into the top frame's `projection_aliases`. - Projection aliases via bare form → captured. - Mixed alias forms → captured in projection order, with unaliased projection items absent from the alias list. - No aliases → empty `projection_aliases`. - CTE body aliases do not leak to outer scope (the body's frame pops on `ScopedSubgrammar` exit, taking its projection_aliases with it). All 1358 previous tests still pass. Test totals: 1366 passing, 0 failed, 1 ignored. Clippy clean. This closes out the scope-accumulator side of sub-phase 2b. The remaining 2b-style work — full CTE column-derivation harvest per §10.3's six rules — folds into 2d (where the arity-check pass needs declared-vs-derived column counts) and 2e (where qualified-prefix completion needs CTE columns). |
||
|
|
b522d09f5a |
walker: populate from_scope table bindings (ADR-0032 §10.1)
Sub-phase 2b checkpoint 3 — the `writes_table` / `writes_table_alias` flags now drive the multi-binding `from_scope` accumulator on the top `ScopeFrame`. Node::Ident gains `writes_table_alias: bool`. When set on an ident-name slot, the matched name lands on the most-recently- pushed `TableBinding`'s `alias`. All 46 existing Ident sites across the codebase are updated to `writes_table_alias: false` (mechanical — no behavioral change for DSL paths). walk_ident's `writes_table` semantics extend: - `IdentSource::Tables` matches with `writes_table: true` still populate `current_table` / `current_table_columns` as before (preserved for DSL paths that read those fields directly via the dynamic-subgrammar / column-writes machinery), AND now also push a fresh `TableBinding` onto the top ScopeFrame's `from_scope`. The two mechanisms coexist additively — current_table reflects the most-recent `writes_table` write (single-binding view, as before); from_scope is the authoritative multi-binding accumulator that SQL JOINs, subqueries, and CTE bodies use. sql_select.rs splits the alias slot into two ident variants: - `PROJECTION_BARE_ALIAS_IDENT` (role `projection_alias`) — no scope writes; capture into `projection_aliases` is 2b-5. - `TABLE_SOURCE_BARE_ALIAS_IDENT` (role `table_alias`, `writes_table_alias: true`) — sets the top binding's alias. The `AS alias` form likewise splits into PROJECTION_AS_ALIAS and TABLE_SOURCE_AS_ALIAS so each path threads through the correct ident. The bare-alias lookahead factories return the projection or table-source ident accordingly. `TABLE_NAME_IDENT` in sql_select.rs gets `writes_table: true` so each FROM / JOIN table source pushes a binding. The schema-resolved columns are stored on the TableBinding for later use by qualified-prefix completion (2e) and the schema-existence diagnostic (2d). Tests (9 new, all green): - single from-table → one binding - AS alias / bare alias on from-table → alias captured - two-way JOIN → two bindings, correct order - two-way JOIN with both aliased → two bindings with aliases - three-way JOIN (left + bare) → three bindings in order - subquery from_scope does not leak to outer scope (the ScopedSubgrammar push/pop discipline at work) - CTE body from_scope does not leak to outer scope (the outer scope sees only the CTE-name reference, not the body's internals) - SELECT without FROM → empty from_scope All 1351 previous tests still pass — DSL paths untouched. Test totals: 1358 passing, 0 failed, 1 ignored. Clippy clean. Frame is_cte_body marker, body-projection harvest, and projection_aliases population are the remaining 2b work (2b-4 and 2b-5). |
||
|
|
98a74b23d3 |
grammar: sql_expr additive extensions for §5/§6, CTE body rewires to ScopedSubgrammar
Sub-phase 2b checkpoint 2 — closes the recursion loop between sql_expr.rs and sql_select.rs so subquery expressions and qualified column refs become structurally valid in every SQL context where they belong. sql_expr.rs: - §5 qualified-ref tail. `name_or_call` gains a `.identifier` suffix as a Choice sibling of the function-call `(args)` tail. The leading identifier is still matched once (per ADR-0031 §1's factoring); the optional tail dispatches between the two suffixes by their first character (`.` vs `(`). - §6.1 scalar subquery as primary. The `(or_expr)` and `(SELECT …)` branches share the leading `(`; the first inside token (`SELECT` → subquery, anything else → expression) discriminates. The subquery recurses through `Node::ScopedSubgrammar(&sql_select::SQL_SELECT_COMPOUND)`. - §6.2 IN (subquery) predicate. Sibling of the existing IN-value-list; same `(` factoring, same dispatch. - §6.3 [NOT] EXISTS primary. Bare `EXISTS (compound_select)` lives in `primary`; `NOT EXISTS` falls out via the existing `not_expr := NOT not_expr` tier above `primary`. sql_select.rs: - CTE body recursion rewires `Node::Subgrammar` → `Node::ScopedSubgrammar`, matching §10.2. The top-level statement's COMPOUND embedding stays plain Subgrammar — the implicit bottom frame is the right scope for a statement- level SELECT. Structural side-effect — const-eval cycle workaround: Closing the sql_expr ⇄ sql_select reference loop made Rust's const-evaluator follow the cycle through every `const Node` that transitively reaches it. Mirroring sql_expr.rs's existing pattern, composition Nodes in sql_select.rs (Seq / Choice / Optional / Repeated / Lookahead) are now `static Node` and appear in slice positions through `Node::Subgrammar(&NAME)` wraps; only leaf items (Punct, Word, Ident) remain `const`. Same workaround applies to data.rs's SELECT_PROJ_LIST / SELECT_PROJECTION chain and the inlined `SQL_EXPR` reference. Statics resolve lazily at link time, so the cycle is valid; const-eval is not, and the named `const SQL_EXPR` alias is gone in both files (replaced with the inline `Node::Subgrammar (&sql_expr::SQL_OR_EXPR)` expression at every use site). Test coverage: - sql_expr.rs gains 11 new tests for qualified refs, scalar subquery, IN-subquery, EXISTS / NOT EXISTS, nested subqueries, and the existing IN-value-list form (regression). - sql_select.rs gains 7 new tests for qualified refs in WHERE, scalar subqueries in WHERE / projection, IN / EXISTS / NOT EXISTS in WHERE, nested subqueries, and qualified refs inside CTE bodies. - All 70 prior sql_select tests still pass; the 2a baseline is preserved. `(WITH x AS (…) SELECT * FROM x)` is explicitly NOT admitted as a scalar subquery — ADR-0032 §1 / §9 wire subqueries to SQL_SELECT_COMPOUND, which omits the outer with_clause. WITH remains a statement-level-only construct. Documented in the relevant test. Test totals: 1333 → 1351 passing, 0 failed, 1 ignored (unchanged). Clippy clean. |
||
|
|
8d293358a0 |
grammar: SQL SELECT full statement fragment (ADR-0032 Phase 2a)
Author the standalone walkable shape for the full standard-SQL
SELECT per ADR-0032 §1: compound queries with the four set ops
(UNION / UNION ALL / INTERSECT / EXCEPT), the five JOIN flavours
(INNER / LEFT [OUTER] / RIGHT [OUTER] / FULL [OUTER] / CROSS),
GROUP BY / HAVING, WITH and WITH RECURSIVE common table
expressions, LIMIT … OFFSET, DISTINCT / ALL, qualified-wildcard
`t.*` projection, and bare-alias projection (lifting ADR-0030
Phase-1 §4.2).
Recursion into SQL_SELECT_COMPOUND uses Node::Subgrammar for
2a; sub-phase 2b will rewire those references to the new
Node::ScopedSubgrammar variant for completion-scope discipline
(ADR-0032 §10.2). The Phase-1 data::SELECT CommandNode is not
touched here — the new fragment is reachable only from its own
tests until sub-phase 2c performs the migration.
Two implementation mechanisms realize ADR semantics without
changing them:
- Node::Lookahead disambiguates the projection_item Choice
(bare `*` vs `ident . *` qualified wildcard vs `sql_expr [
alias ]`) and gates bare-alias slots against continuation
keywords. The walker's walk_ident accepts any
identifier-shape token, including keyword-shape ones, and
Choice / Optional are first-match-wins; without lookahead a
bare-alias slot would greedily swallow FROM / WHERE / JOIN /
etc. Per-position follow-sets list which keywords legitimately
follow each alias slot. Same pattern as data.rs's
insert_first_paren precedent.
- INNER JOIN and bare JOIN are split into two distinct Choice
branches (each with a concrete leading keyword) rather than
sharing one Optional(Word("inner"))-leading branch. Avoids a
walker hazard where an Optional-leading-child Seq commits to
idx > 0 and then converts the next child's EOF NoMatch into
Incomplete, blocking the outer Choice from falling through to
later branches. Same semantic surface, distinct mechanism.
The §13 OOS shapes all have explicit reject tests (NATURAL,
USING, comma-FROM, LIMIT m,n, window OVER, VALUES, derived
tables). LATERAL has a noted partial limitation: the comma form
rejects via OOS-3, but the single-keyword form `FROM a LATERAL
JOIN b ON …` is admitted structurally because `lateral` parses
as a bare table-source alias for `a`. This matches ADR-0030's
"grammar admits identifier-shape tokens; engine resolves"
posture.
`__rdbms_*` rejection extends to every Phase-2 table-source
slot — the FROM table, each JOIN's table, each CTE name, and
the FROM inside any CTE body — via the reuseable
reject_internal_table validator.
70 new unit tests in sql_select.rs walk every §1 production and
every OOS reject case. Test totals: 1260 baseline + 70 = 1330
passing, 0 failing, 1 ignored (unchanged from baseline). Clippy
clean.
Per the Phase-2 plan sub-phase 2a exit gate. DA gate written
review: PASS.
|