diff --git a/docs/adr/0036-typed-dml-values-vs-verbatim.md b/docs/adr/0036-typed-dml-values-vs-verbatim.md index cbd59f7..fd859ee 100644 --- a/docs/adr/0036-typed-dml-values-vs-verbatim.md +++ b/docs/adr/0036-typed-dml-values-vs-verbatim.md @@ -11,8 +11,13 @@ consolidating the two modes and a concrete auto-fill difference confirmed that even the single-row literal case is **not** identical across modes). **Phase 1 implemented 2026-05-26** (`INSERT … VALUES` literal validation + offending-value retention; capture-at-parse, no grammar change, execution -unchanged). Phases 2 (`UPDATE … SET` literals) and 3 (completion -hinting/highlighting) pending. +unchanged). **Phase 2 implemented 2026-05-26** (`UPDATE … SET` literal +validation + offending-value retention; the same capture-at-parse technique +on the SET assignment list — `capture_set_literals` in `data.rs` — +classifying each top-level RHS literal-vs-expression, validating literals in +`do_sql_update`, and reading them in `user_value_for_column`; `WHERE` is not +validated, execution stays verbatim). Phase 3 (completion +hinting/highlighting — the only part needing a grammar change) pending. **Augments** **ADR-0030 §4** and **ADR-0033 §10** — it does **not** supersede them and does **not** change the execution model. Advanced-mode @@ -246,8 +251,19 @@ execution), only its `Result` is used. the verbatim insert; the enricher reads them. Covers single- and multi-row, with or without `RETURNING`/`ON CONFLICT`, because execution is untouched. -- **Phase 2 — `UPDATE … SET` literal validation** (same technique on the - SET assignment list). +- **Phase 2 (implemented 2026-05-26) — `UPDATE … SET` literal + validation.** The same capture-at-parse technique on the SET assignment + list: `build_sql_update` calls `capture_set_literals`, which walks the + matched tokens (no reparse) and classifies each *top-level* `SET col = + ` into `(col, Some(Value))` for a bare literal (incl. a signed + number) or `(col, None)` for an expression — using paren depth so a comma + inside a function call or a `where` inside a scalar subquery is never + mistaken for an assignment/clause boundary, and so the trailing top-level + `WHERE` predicate is excluded. `Command::SqlUpdate` gains a + `set_literals` payload; `do_sql_update` validates the literals against + their column types (via the shared `impl_value_for`) before the still + verbatim update; `user_value_for_column` reads them so a constraint error + names the offending value. `WHERE` is deliberately not validated (§2). - **Phase 3 — completion hinting / highlighting.** This is the *only* part that needs a grammar change: a `Choice(typed-literal-slot, sql_expr)` at each value position (reusing the DSL's live diff --git a/docs/adr/README.md b/docs/adr/README.md index a1fb487..fac6bae 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -41,4 +41,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0033 — The full SQL DML grammar (`INSERT` / `UPDATE` / `DELETE`)](0033-sql-dml-grammar.md) — **Accepted** (implemented + verified through sub-phase 3k, 2026-05-23; phase-exit report `docs/handoff/20260523-phase-3-verification.md`), the Phase-3 grammar commissioned by ADR-0030 §3: single- and multi-row `INSERT` (incl. `INSERT … SELECT` recursing through ADR-0032's `SQL_SELECT_COMPOUND`), `UPDATE` with `SET` assignment list, `DELETE`, all three optionally followed by `RETURNING projection_list`, plus full `ON CONFLICT … DO NOTHING / DO UPDATE` UPSERT on INSERT; **fixes the DSL-vs-SQL dispatch architecture for shared entry words (`insert`/`update`/`delete`)**: SQL-first / DSL-fallback in Advanced mode via a `Choice(SQL_shape, DSL_shape)` per shape, gated by a new walker capability `Node::Guard(fn)` — a zero-byte-consumption gating node that fails the enclosing Seq with a `ValidationError`; carries `Command::SqlInsert` / `SqlUpdate` / `SqlDelete` variants and `do_sql_*` worker handlers each of which knows the target table (for re-persistence) and the `returning: bool` flag (for DataResult routing); `shortid` auto-fill mirrors the DSL `do_insert` mechanism via worker post-fill; SQL DELETE produces the same per-relationship cascade summary the DSL DELETE does (ADR-0014 parity); three new walker diagnostics (`insert_arity_mismatch` ERROR, `auto_column_overridden` WARNING, `not_null_missing` WARNING) with positive + negative tests each; OOS list explicitly carves out `DEFAULT VALUES` (the project's planned seed feature), SQLite-specific `OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` prefixes, `UPDATE FROM` multi-table updates, and WITH-prefixed DML; the `excluded` keyword inside `ON CONFLICT DO UPDATE` is a deliberate carve-out from ADR-0030 §7's engine-neutral posture (no standard-SQL UPSERT spelling exists that SQLite and PostgreSQL share); eleven phased sub-phases each with explicit exit gates + written DA gate, opening with the dispatch mechanism before any DML grammar lands; initial DA review recorded seven critiques that were resolved before status moved to Proposed; **Amendment 1 supersedes §2's dispatch mechanism**: the originally-chosen `Node::Guard(fn)` + `Choice(SQL_shape, DSL_shape)` was found during 3a to be unworkable as framed (any guard-in-`Choice` mechanism forces a `walk_choice` change — `walk_choice` only falls through on `NoMatch`, so Simple-mode valid-DSL would wrongly surface "this is SQL", and `walk_seq` treats a `NoMatch` past `idx 0` as a hard `Failed`, breaking Advanced-mode DSL fall-through); replaced by **category-grouped, mode-aware dispatch** in `walker::walk` (each `REGISTRY` entry tagged `CommandCategory::{Simple, Advanced}`, generalising the existing whole-command `is_advanced_only` gate), shared entry words carrying a node in both groups, no `Node::Guard` and no `walk_choice`/`walk_seq` change, advanced-mode completion SQL-first with DSL as a full-line fallback; **Amendment 2 (sub-phase 3f) supersedes §7's cascade mechanism**: the WHERE-injected per-child pre-count rested on a premise that was factually wrong about the DSL handler (which detects cascades by before/after row-count diffing inside a transaction, not by `Expr`-derived pre-count subqueries) and would have broken the §2 parity promise by reporting `SET NULL` the DSL path doesn't; replaced by mirroring `do_delete`'s count-diff exactly (verbatim DELETE executes, child-count diff observes the cascade — `ON DELETE CASCADE` row removals only, SET NULL deferred for both paths to preserve parity), which shares the render-layer formatter for free via `CommandOutcome::Delete` and **withdraws risk R2** (no WHERE-byte extraction, no N+1 subquery); **Amendment 3 (sub-phase 3j) records the command-identity model and defers the execution-mode side-channel**: a command is the typed outcome of a *mode-rooted* grammar path and its identity is intrinsic (Advanced mode tries SQL first, falls back to the *Simple* DSL command when no SQL branch matches a token, e.g. `delete … --all-rows`; note `update … --all-rows` does *not* fall back — the SQL `SET` expression eats `--all-rows`, harmless since the engine treats it as a comment); **Simple mode commits the DSL candidate for shared words** so the *real* DSL error surfaces, and when that line would also run in advanced mode the rendering layer **combines** them — DSL error **plus** an `advanced_mode.also_valid_sql` pointer ("… (valid as SQL in advanced mode)") — keeping the actionable DSL fix while pointing at advanced mode; bare "this is SQL" is reserved for entry words with no DSL form (`select`/`with`); a fully-overlapping input (`insert … values …`) legitimately yields *two distinct commands* (`Command::Insert` typed-AST vs `Command::SqlInsert` validated-text) that do the same thing but execute differently (ADR-0030 §4), so each is tested in the mode that produces it; **corrects the plan's 3j exit-gate premise** that the DSL DML tests run in Simple mode (they call `parse_command`, which defaults to Advanced) — the real invariant is "Simple-mode behaviour unchanged, Advanced mode SQL-first, DSL grammar tested in Simple mode, both variants tested in their producing mode", with §6/§7 parity keeping the paths observably equivalent; and **defers to its own future ADR** the execution-time mode side-channel (three-way `Mode`: simple/advanced/advanced-one-shot threaded through `Action`→worker, for mode-dependent *output* like echoing generated SQL) — today only the *rendering* side-channel `OutputLine.mode_at_submission` exists, and the three-way distinction is not required for Phase 3 dispatch correctness - [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](0034-history-journal-and-replay-filter.md) — **Accepted**, resolves a three-way tension in `history.log`'s roles found while implementing ADR-0033 3f: (1) the persistent log is success-only while the in-memory Up/Down recall ring records *every* submission (success or failure, "so users can recall and edit typo'd commands"), and the ring is re-seeded from the log on project open — so **failed commands are recallable within a session but silently lost across sessions**; (2) replay wants the state-building (successful) commands while recall wants everything typed, which one success-only file cannot serve; (3) `replay history.log` never actually worked — `run_replay` parses each whole line through the DSL parser with no understanding of the `||` record shape, so a real log fails on line 1, and **no test ever fed the pipe format to replay** (the `replay_history_log_records_subcommands_only` test only checks what replay *writes*, never replays the log as input). Decision: `history.log` becomes a **complete journal** — every submission recorded, tagged `ok`/`err` via the status field the format already reserved (ADR-0015 §5) — and **each consumer filters**: hydration reads all records (cross-session recall matches in-session), replay reads `ok` only (and learns the journal format, while still accepting bare-command `.commands` scripts; detection by the leading timestamp+status prefix so a `|` inside a bare command isn't misread). Successful commands stay journalled transactionally by the worker; failed commands are journalled `err` best-effort from the runtime/app error path (a parse failure never reaches the worker). Amends ADR-0006's "successfully executed" wording and ADR-0015 §5 ("status always `ok`") / §12 (hydration). Code deferred to two tracked test-first sub-tasks (journal-failures+filtering; replay-parses-journal-format); existing all-`ok` logs need no migration; **implemented 2026-05-24** (plan `docs/plans/20260524-adr-0034-history-journal.md`); **Amendment 1 (2026-05-24): replay filters out app-lifecycle commands** — a working `replay history.log` (the §3 fix) exposed that the journal also records `save as`/`load`/`new`/`export`/`import`/`rebuild`/`mode` (which would panic the worker dispatch or abort the replay), so replay now re-applies **only** schema/data write commands and **skips** every `Command::App` + nested `Command::Replay`; **all skips continue** (never abort — reversing the prior nested-`replay` refusal, so a journal containing a once-run `replay` needs no hand-editing, and the infinite-loop footgun is closed by construction), with a `[skip]` **warning** on `import` and nested-`replay` skips (their omission can leave replayed state incomplete) and silent skips for the rest; `replay.error_nested` removed, `replay.skipped_import`/`replay.skipped_replay` added, `ReplayCompleted` carries `warnings` - [ADR-0035 — Advanced-mode SQL DDL](0035-advanced-mode-sql-ddl.md) — **Accepted** (design agreed 2026-05-24; validated end-to-end by sub-phases 4a/4a.2/4a.3/4b `CREATE TABLE` (incl. foreign keys) + 4c `DROP TABLE [IF EXISTS]` + 4d `CREATE [UNIQUE] INDEX` / `DROP INDEX [IF EXISTS]` + 4e `ALTER TABLE` add/drop/rename column + 4f `ALTER TABLE … ALTER COLUMN TYPE` + 4g `ALTER TABLE` add/drop constraint + add FK + 4h `ALTER TABLE … RENAME TO` + 4i verification sweep (completion merge + simple/advanced completion colour + describe of table-level constraints + self-ref FK indicator + CREATE-TABLE help/usage), implemented 2026-05-25/26 — **Phase 4 complete**; **Amendment 1, 2026-05-26**: drop a composite UNIQUE via a derived, engine-neutral `unique_` name that reuses the existing `DROP CONSTRAINT ` grammar — no new syntax, no metadata, §4g anonymity intact; `describe` shows the name; dropping a UNIQUE-covered *column* now refuses with that name + the drop command), **Phase 4** of the ADR-0030 roadmap (peer of 0031/0032/0033) and **clarifies ADR-0030 §4**. Advanced-mode `CREATE`/`DROP`/`ALTER TABLE` + `CREATE`/`DROP INDEX` get their **own per-statement commands** (`SqlCreateTable`/`SqlAlterTable`/`SqlDropTable`/`SqlCreateIndex`/`SqlDropIndex`), like DML's `Sql*` set — but unlike DML they **execute *structurally*, not verbatim** (raw execution would lose the playground's types, named relationships, and `STRICT`; "verbatim" was a DML convenience, not a rule). Handlers **reuse the low-level schema/metadata helpers** where the operation matches simple mode and **stand alone where the SQL surface is richer** (clarity over forced refactoring); simple mode is untouched (additive). Dispatch: `create`/`drop` reuse ADR-0033 Amendment 1's category-grouped mode-aware dispatch (SQL-first, simple fallback); `alter` is a new advanced-only entry word. Full surface (no pre-emptive cuts, `Q4`): `CREATE TABLE` with column + table constraints, single/compound `PRIMARY KEY`, inline + table-level `FOREIGN KEY` → **named relationships** (one statement = one command = **one undo step**, ADR-0006); `ALTER TABLE` add/drop/rename column, `ALTER COLUMN TYPE`, add/drop constraint, add FK, **`RENAME TO`** (advanced-only table rename — new low-level op renaming the table + its CSV + the relationship and table-CHECK metadata, closing the rename half of `C1`); `CREATE [UNIQUE] INDEX` / `DROP INDEX`. Type slot accepts the ten playground keywords **and** standard-SQL aliases (`integer`→`int`, `varchar`→`text`, `timestamp`→`datetime`, …; length args accepted-and-ignored; no engine type names in/out — ADR-0030 §5). `CHECK`/`DEFAULT` reuse ADR-0031 `sql_expr`. **Pre-implementation `/runda` refinements (2026-05-24, user-confirmed):** `CREATE TABLE`/`DROP TABLE` **admit `IF [NOT] EXISTS`** (no-op-that-succeeds-with-a-note — a near-universal cross-vendor idiom, reclassified *into* scope, not engine-specific); `INTEGER PRIMARY KEY` maps to a **plain `int`** PK, *not* auto-increment (`serial` stays the sole auto-increment type). **Column-type-conversion is unified** (ADR-0017 engine, mode-appropriate policy): clean auto-converts and incompatible/own-type-static cases refuse in both modes, but a **lossy** change refuses-by-default in simple mode (`--force-conversion` opts in) while advanced mode **performs it with a loss note** and relies on **`undo` as the safety net** — no force flag, no dropping to simple mode (a payoff of shipping ADR-0006 first). OOS: views/triggers/txn-control/PRAGMA/etc. (ADR-0030 §3), the Postgres `USING` clause, and the DSL→SQL teaching echo (ADR-0030 Phase 5). Sub-phases 4a–4i, plus **4a.2** (per-column `CHECK`/`DEFAULT` via raw `sql_expr` text — `sql_expr` is validate-only, no `Expr` AST — + composite `UNIQUE(a,b)`; no new internal table) and **4a.3** (table-level/multi-column `CHECK`, landed via the new `__rdbms_playground_table_checks` metadata table because SQLite has no PRAGMA for CHECK; the builder tells a table-level CHECK from a column-level one by element position) and **4b** (foreign keys — inline `REFERENCES` + table-level `FOREIGN KEY` → ADR-0013 named relationships in the create transaction, one undo step; self-references + bare `REFERENCES ` supported, user-confirmed) and **4c** (`DROP TABLE [IF EXISTS]` → `SqlDropTable`, reusing `do_drop_table`; `IF EXISTS` is a no-op-with-note via `DropOutcome::Skipped`) and **4d** (`CREATE [UNIQUE] INDEX [IF NOT EXISTS] [] ON (cols)` → `SqlCreateIndex` and `DROP INDEX [IF EXISTS] ` → `SqlDropIndex`, reusing `do_add_index`/`do_drop_index`; **`CREATE UNIQUE INDEX` admitted** — ADR-0025 **Amendment 1** — via an additive `IndexSchema.unique` flag that round-trips through `project.yaml` and rebuild, with `[unique]` markers in the structure view + items panel, while simple-mode `add unique index` stays deferred; `IF [NOT] EXISTS` reuses the 4c skip path; `create`/`drop` each gain a *second* advanced node, exercising the all-candidates dispatch) and **4e** (`ALTER TABLE` add/drop/rename column → `SqlAlterTable`; `alter` is a new advanced-**only** entry word, runtime-decomposed to the existing `do_add_column`/`do_drop_column`/`do_rename_column` — no new worker layer; `do_add_column` extended to consume raw `default_sql`/`check_sql` so ADD COLUMN reaches CREATE-TABLE constraint parity; drop/rename refuse a column any CHECK references (table-level AND column-level, incl. a column's own self-check on rename) — the 4a.3 deferral, via a raw-CHECK-text tokenizer in the shared executors, so it guards both surfaces and fixes a latent rename-drift bug; SQL DROP COLUMN refuses an index-covered column with no `--cascade` spelling; the column executors + `do_add_index` gained an internal-`__rdbms_*`-table guard — all user-confirmed) and **4f** (`ALTER TABLE … ALTER COLUMN TYPE` → a fourth `AlterTableAction`, runtime-decomposed to the existing `change_column_type` with `ChangeColumnMode::ForceConversion` — which **is** the §7 advanced policy: lossy converts *with a note* (no force flag), incompatible + ADR-0017 static refusals (`↔ blob`, same-type, `date ↔ datetime`, non-`int → serial`) still refuse, while **`int → serial` is allowed** (auto-fills nulls + UNIQUE, ADR-0018 §8 — the §7 "→serial refused" summary is looser than the code); the builder discriminates the fourth branch by the **`type` keyword** (unique — ADD COLUMN's type is an ident), the type slot reuses `SQL_TYPE`; the internal-`__rdbms_*` guard was folded into `do_change_column_type`, closing the simple `change column` exposure too — user-confirmed) and **4g** (`ALTER TABLE … ADD [CONSTRAINT ] (CHECK | UNIQUE | FOREIGN KEY)` + `DROP CONSTRAINT `; ADD = CHECK + composite UNIQUE + FK, with `ADD PRIMARY KEY` and a *named* UNIQUE refused — composite UNIQUE is anonymous in our model; each ADD reuses a low-level path (table-CHECK/UNIQUE rebuild with a dry-run guard; FK → `add_relationship`, bare `REFERENCES

` → parent single-PK), DROP CONSTRAINT resolves the name to a table-CHECK then a child-side FK; **named table-CHECKs round-trip** via a nullable `name` column on `__rdbms_playground_table_checks` (**rebuild-only** arrival — pre-4g projects gain it on `rebuild`, a named add on an un-upgraded project is refused with a friendly "rebuild first" message) *and* a `project.yaml` `check_constraints` extension to an `{expr, name}` mapping (the bare-string form still reads); the internal-`__rdbms_*` guard was folded into `do_add_constraint`/`do_add_relationship`, completing that guard class — all user-confirmed) and **4h** (`ALTER TABLE … RENAME TO` — the one genuinely new low-level op, `do_rename_table`: a native engine rename plus one-transaction reconciliation of every metadata row naming the table (`__rdbms_playground_columns`, **both ends** of `__rdbms_playground_relationships`, `__rdbms_playground_table_checks`), the CSV file (the existing rewrite+delete path — no new persistence method), and **CHECK text that qualifies a column with the old table name** (`T.age`→`U.age`, a planning-`/runda` finding — the engine rewrites the live CHECK but the stored text would drift and break a fresh rebuild; `rewrite_check_table_qualifier` keeps them in step); grammar splits the `rename` verb into one branch with an inner Choice on a distinct second keyword (`column` vs `to`), the new-name slot mirroring the `CREATE TABLE` name slot; refuses same-name / existing-target / `__rdbms_*` / non-existent, with **case-insensitive** collision checks behind an engine-neutral pre-check (a finished-slice `/runda` finding — the engine matches names case-insensitively); auto-named indexes *and* relationships keep their stale names (only table-name columns update — §6 scope); one undo step; advanced-only, closing the rename half of `C1` — all user-confirmed) and **4i** (the verification sweep that completes Phase 4: the shared-entry-word completion merge + the simple-vs-advanced completion colour-when-mixed with Both→Advanced→Simple block ordering; `describe` of table-level composite UNIQUE + table CHECK; the self-ref FK pre-submit indicator fix; and the CREATE-TABLE help/usage skeleton refresh). **All of Phase 4 (4a–4i) is shipped.** Each sub-phase has exit + DA gates -- [ADR-0036 — Value validation for advanced-mode DML](0036-typed-dml-values-vs-verbatim.md) — **Accepted** (design agreed + `/runda`'d 2026-05-26; mechanism then **deliberately narrowed** from "bind literals via the DSL path" to surgical **"validate-and-retain, execute verbatim"** after the user resisted consolidating the modes and a concrete auto-fill difference confirmed even the single-row literal case isn't identical across modes; Phase 1 implementing). **Augments — does NOT supersede — ADR-0030 §4 / ADR-0033 §10**: execution stays verbatim, ADR-0033 Amendment 3's two-command identity (`Insert` vs `SqlInsert`) **stands**. The problem (investigated 2026-05-26; characterization test `sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces` proves it): advanced-mode SQL DML gets **none** of the DSL's value feedback — a malformed `date` like `2025/01/15` is silently written, and the offending value is missing from constraint errors — because literal values are spliced into text and discarded (only `STRICT` storage types check them). **Fix (surgical): validate each literal value against its column type before the verbatim insert, and retain it for error reporting — sharing only the per-type validators (`Value::bind_for_column`/`validate_date`/`shortid::validate`), nothing else.** No binding, no statement reconstruction, no auto-fill change, no command-identity collapse — because the two gaps are closed by validation + retention alone, and executing the user's own text is already safe. The literal set = `NULL`/boolean/string/**signed**-numeric; arithmetic/functions/subqueries/column-refs are expressions (skipped — the engine evaluates them). `WHERE` not validated (it's an expression in general; motivation met by `VALUES`/`SET`); `SELECT`/`INSERT … SELECT`/`RETURNING`/`ON CONFLICT` need no special handling since execution is untouched. Phased: **Phase 1** capture-at-parse + validate + retain for `INSERT … VALUES` (no grammar change, no reparse — closes both proven gaps); **Phase 2** `UPDATE … SET` literals; **Phase 3** completion hinting/highlighting (the only part needing a grammar change — `Choice(typed-literal-slot, sql_expr)` reusing the DSL `TypedValueSlot`s at `data.rs:141`/`189`/`269`; supersedes only Phase 1's literal *detection*, not the validation/enrichment on top). Non-goals: binding/reconstruction, collapsing command identity (Am3 stands), changing `serial`/`shortid` auto-fill (`requirements.md` **X4**, a separate possible-bug), a structural `SELECT`, a full SQL-expression AST. Embodies `requirements.md` **X5** (share a *mechanic*, not a *command*); the neutral "that value" safety net (ADR-0035 Amendment 1) stays correct for genuinely-computed values +- [ADR-0036 — Value validation for advanced-mode DML](0036-typed-dml-values-vs-verbatim.md) — **Accepted** (design agreed + `/runda`'d 2026-05-26; mechanism then **deliberately narrowed** from "bind literals via the DSL path" to surgical **"validate-and-retain, execute verbatim"** after the user resisted consolidating the modes and a concrete auto-fill difference confirmed even the single-row literal case isn't identical across modes; **Phases 1–2 implemented** 2026-05-26 — `INSERT … VALUES` and `UPDATE … SET` literal validation + offending-value retention, capture-at-parse with no grammar change; Phase 3 (completion hinting/highlighting) pending). **Augments — does NOT supersede — ADR-0030 §4 / ADR-0033 §10**: execution stays verbatim, ADR-0033 Amendment 3's two-command identity (`Insert` vs `SqlInsert`) **stands**. The problem (investigated 2026-05-26; characterization test `sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces` proves it): advanced-mode SQL DML gets **none** of the DSL's value feedback — a malformed `date` like `2025/01/15` is silently written, and the offending value is missing from constraint errors — because literal values are spliced into text and discarded (only `STRICT` storage types check them). **Fix (surgical): validate each literal value against its column type before the verbatim insert, and retain it for error reporting — sharing only the per-type validators (`Value::bind_for_column`/`validate_date`/`shortid::validate`), nothing else.** No binding, no statement reconstruction, no auto-fill change, no command-identity collapse — because the two gaps are closed by validation + retention alone, and executing the user's own text is already safe. The literal set = `NULL`/boolean/string/**signed**-numeric; arithmetic/functions/subqueries/column-refs are expressions (skipped — the engine evaluates them). `WHERE` not validated (it's an expression in general; motivation met by `VALUES`/`SET`); `SELECT`/`INSERT … SELECT`/`RETURNING`/`ON CONFLICT` need no special handling since execution is untouched. Phased: **Phase 1** capture-at-parse + validate + retain for `INSERT … VALUES` (no grammar change, no reparse — closes both proven gaps); **Phase 2** `UPDATE … SET` literals; **Phase 3** completion hinting/highlighting (the only part needing a grammar change — `Choice(typed-literal-slot, sql_expr)` reusing the DSL `TypedValueSlot`s at `data.rs:141`/`189`/`269`; supersedes only Phase 1's literal *detection*, not the validation/enrichment on top). Non-goals: binding/reconstruction, collapsing command identity (Am3 stands), changing `serial`/`shortid` auto-fill (`requirements.md` **X4**, a separate possible-bug), a structural `SELECT`, a full SQL-expression AST. Embodies `requirements.md` **X5** (share a *mechanic*, not a *command*); the neutral "that value" safety net (ADR-0035 Amendment 1) stays correct for genuinely-computed values diff --git a/src/app.rs b/src/app.rs index ceac7e8..80da85f 100644 --- a/src/app.rs +++ b/src/app.rs @@ -3865,6 +3865,7 @@ mod tests { sql: "update t set v = 1".to_string(), target_table: "t".to_string(), returning: false, + set_literals: Vec::new(), }, result: crate::db::UpdateResult { rows_affected: 2, @@ -3897,6 +3898,7 @@ mod tests { sql: "update t set v = 1".to_string(), target_table: "t".to_string(), returning: false, + set_literals: Vec::new(), }, result: crate::db::UpdateResult { rows_affected: 1, diff --git a/src/db.rs b/src/db.rs index 9246a6a..52aff3e 100644 --- a/src/db.rs +++ b/src/db.rs @@ -711,6 +711,10 @@ enum Request { source: Option, target_table: String, returning: bool, + /// Captured literal `SET col = ` values (`(col, None)` = + /// expression RHS) for app-level type validation before the + /// verbatim update (ADR-0036 Phase 2). + set_literals: Vec<(String, Option)>, reply: oneshot::Sender>, }, /// Run a grammar-validated SQL `DELETE` (ADR-0033 §1/§7). The @@ -1506,17 +1510,38 @@ impl Database { recv.await.map_err(|_| DbError::WorkerGone)? } - /// Run a validated SQL `UPDATE` and return the affected-row - /// count (ADR-0033 §2, sub-phase 3e). `sql` is the + /// Run a validated SQL `UPDATE` with **no** captured literals (no + /// app-level value validation — the verbatim ADR-0033 path). Used by + /// worker-level callers that build the statement directly. The + /// runtime, which has a parsed command, uses + /// [`Self::run_sql_update_with_literals`] instead so the `SET` + /// literals are validated (ADR-0036 Phase 2). `sql` is the /// grammar-validated statement text; `source` is the literal - /// submitted line for `history.log`; `target_table` is the - /// parsed target whose CSV is re-persisted. + /// submitted line for `history.log`; `target_table` is the parsed + /// target whose CSV is re-persisted. pub async fn run_sql_update( &self, sql: String, source: Option, target_table: String, returning: bool, + ) -> Result { + self.run_sql_update_with_literals(sql, source, target_table, returning, Vec::new()) + .await + } + + /// As [`Self::run_sql_update`], plus the literal `SET` values captured + /// at parse (`(col, None)` = expression RHS) so the worker can + /// validate each literal against its column type before the (still + /// verbatim) update and the error layer can name the offending value + /// (ADR-0036 Phase 2). + pub async fn run_sql_update_with_literals( + &self, + sql: String, + source: Option, + target_table: String, + returning: bool, + set_literals: Vec<(String, Option)>, ) -> Result { let (reply, recv) = oneshot::channel(); self.send(Request::RunSqlUpdate { @@ -1524,6 +1549,7 @@ impl Database { source, target_table, returning, + set_literals, reply, }) .await?; @@ -2493,6 +2519,7 @@ fn handle_request( source, target_table, returning, + set_literals, reply, } => { snapshot_then(snap, batch, conn, source.as_deref(), reply, || do_sql_update( @@ -2502,6 +2529,7 @@ fn handle_request( &sql, &target_table, returning, + &set_literals, )); } Request::RunSqlDelete { @@ -8545,10 +8573,27 @@ fn do_sql_update( sql: &str, target_table: &str, returning: bool, + set_literals: &[(String, Option)], ) -> Result { debug!(sql = %sql, table = %target_table, returning, "sql_update"); let canonical_table = require_canonical_table(conn, target_table)?; let target_table = canonical_table.as_str(); + + // ADR-0036 Phase 2: validate each captured `SET col = ` + // against its column type BEFORE the (still verbatim) update runs — + // sharing the DSL's per-type validators (`impl_value_for`, the same + // helper `build_update_sql` uses) for identical wording. Only literal + // assignments are checked; expression positions (`None`) and the + // `WHERE` predicate are left to the engine (ADR-0036 §2). Execution + // below is unchanged (no binding). + if set_literals.iter().any(|(_, v)| v.is_some()) { + let schema = read_schema(conn, target_table)?; + for (col, slot) in set_literals { + if let Some(value) = slot { + impl_value_for(&schema, col, value)?; + } + } + } let tx = conn .unchecked_transaction() .map_err(DbError::from_rusqlite)?; diff --git a/src/dsl/command.rs b/src/dsl/command.rs index 5c76372..7f350d6 100644 --- a/src/dsl/command.rs +++ b/src/dsl/command.rs @@ -442,6 +442,16 @@ pub enum Command { /// Whether a `RETURNING` clause matched (ADR-0033 §5, /// sub-phase 3g). returning: bool, + /// Captured literal RHS of each top-level `SET col = ` + /// assignment (ADR-0036 Phase 2). `(col, Some(v))` for a bare + /// literal (incl. a signed number); `(col, None)` for an + /// expression RHS (arithmetic, function call, scalar subquery, + /// column ref — nothing static to validate). The worker validates + /// the `Some` values against their column types before the (still + /// verbatim) update; the error enricher reads them to name the + /// offending value. Execution itself is unchanged — these are + /// *not* bound. `WHERE` is deliberately excluded (ADR-0036 §2). + set_literals: Vec<(String, Option)>, }, /// A SQL `DELETE` validated by the walker (ADR-0033 §1/§7, /// advanced mode). Grammar-as-text: the worker executes `sql`, diff --git a/src/dsl/grammar/data.rs b/src/dsl/grammar/data.rs index 73b5594..7939f39 100644 --- a/src/dsl/grammar/data.rs +++ b/src/dsl/grammar/data.rs @@ -4,17 +4,25 @@ //! show table), `insert`, `update`, `delete`. The walker route //! owns these end-to-end. //! -//! Phase D scope deviation note: ADR-0024's Phase D describes -//! "full schema awareness" via `DynamicSubgrammar -//! (column_value_list)` that unfolds typed slots per column. This -//! milestone lands the data commands at functional parity with -//! the existing chumsky parser — value slots accept any -//! literal regardless of column type, with type validation -//! happening at bind time (matching today's behaviour). The -//! `DynamicSubgrammar` machinery and schema-cache plumbing are -//! deferred to a follow-up refinement; the trie shape is -//! ready to consume them when the schema reference flows -//! through `parse_command`. +//! Schema awareness (ADR-0024 §Phase D): the DSL value slots are +//! wired to `DynamicSubgrammar(column_value_list)` / +//! `current_column_value` (see `INSERT_VALUES_LIST`, +//! `insert_first_paren`, `PER_COLUMN_VALUE`), so the schema reference +//! that flows through `parse_command` unfolds a typed slot per column: +//! numeric-shape mismatch is caught at parse (`int`/`decimal`/`bool` +//! slots in `shared.rs`) and the full semantic type (`date` / `shortid` +//! format) is validated at bind time. So the simple-mode DSL gives data +//! values per-column feedback end-to-end. +//! +//! The advanced-mode SQL DML surface (`build_sql_insert` / +//! `build_sql_update` below) is a separate path: it executes the +//! validated statement verbatim (ADR-0030 §4) and is NOT yet wired to +//! the typed slots. ADR-0036 closes the resulting value-feedback gap +//! without a grammar change by *capturing* each literal value position +//! at parse (`capture_literal_rows` / `capture_set_literals`) and +//! validating it against the column type in the worker — Phase 3 will +//! later swap that capture for the same typed slots used here, adding +//! live hints/highlighting. use crate::dsl::command::{Command, Expr, RowFilter}; use crate::dsl::grammar::{ @@ -1080,13 +1088,117 @@ fn build_sql_update(path: &MatchedPath, source: &str) -> Result` + // assignment for app-level type validation + error enrichment + // (ADR-0036 Phase 2). Purely from the matched tokens — no reparse. + let set_literals = capture_set_literals(path); Ok(Command::SqlUpdate { sql, target_table, returning: path_has_returning(path), + set_literals, }) } +/// Capture the literal RHS of each top-level `SET col = ` +/// assignment from the matched path (ADR-0036 Phase 2). Returns +/// `(col, Some(Value))` for a bare-literal RHS (incl. a signed number) +/// and `(col, None)` for an expression RHS (arithmetic, function call, +/// scalar subquery, column ref — nothing static to validate). Works +/// purely from the tokens the walker already matched (no reparse). +/// +/// Boundaries: the assignment LHS is the `update_set_column` ident (a +/// role only ever emitted at the top level of an assignment — expression +/// column refs carry `sql_expr_ident` / `sql_expr_qualified_ref`, so they +/// are never confused with it). A *depth-0* comma separates assignments; +/// a *depth-0* `where` / `returning` keyword (or `;` / end of path) ends +/// the SET list. Parens raise the depth so a comma, `where`, or `=` +/// inside a function call or scalar subquery on the RHS is never mistaken +/// for an assignment / clause boundary or the assignment operator. +fn capture_set_literals(path: &MatchedPath) -> Vec<(String, Option)> { + let mut out: Vec<(String, Option)> = Vec::new(); + let mut after_set = false; + let mut depth: i32 = 0; + // The assignment currently being accumulated: its column name, its + // RHS tokens so far, and whether the assignment `=` has been consumed. + let mut cur_col: Option = None; + let mut cur_rhs: Vec<&MatchedItem> = Vec::new(); + let mut seen_eq = false; + + // Finalise the pending assignment (if any) into `out`. + fn flush( + col: &mut Option, + rhs: &mut Vec<&MatchedItem>, + out: &mut Vec<(String, Option)>, + ) { + if let Some(c) = col.take() { + out.push((c, classify_value_position(rhs))); + } + rhs.clear(); + } + + for item in &path.items { + if !after_set { + // Scan only the SET list — skip everything up to (and + // including) the `set` keyword. The first `update_set_column` + // appears after it. + if matches!(item.kind, MatchedKind::Word("set")) { + after_set = true; + } + continue; + } + // A depth-0 `where` / `returning` / `;` ends the SET list. + if depth == 0 + && matches!( + item.kind, + MatchedKind::Word("where" | "returning") | MatchedKind::Punct(';') + ) + { + break; + } + match &item.kind { + MatchedKind::Punct('(') => { + depth += 1; + if cur_col.is_some() && seen_eq { + cur_rhs.push(item); + } + } + MatchedKind::Punct(')') => { + depth -= 1; + if cur_col.is_some() && seen_eq { + cur_rhs.push(item); + } + } + MatchedKind::Ident { + role: "update_set_column", + .. + } if depth == 0 => { + // A new assignment begins — finalise the previous one. + flush(&mut cur_col, &mut cur_rhs, &mut out); + cur_col = Some(item.text.clone()); + seen_eq = false; + } + MatchedKind::Punct(',') if depth == 0 => { + // Assignment separator — finalise the current assignment; + // the next `update_set_column` starts the following one. + flush(&mut cur_col, &mut cur_rhs, &mut out); + } + MatchedKind::Punct('=') if depth == 0 && !seen_eq && cur_col.is_some() => { + // The assignment operator — consumed, not part of the RHS. + seen_eq = true; + } + _ => { + if cur_col.is_some() && seen_eq { + cur_rhs.push(item); + } + } + } + } + // Finalise the last assignment (ended by `where`/`returning`/`;`/EOF). + flush(&mut cur_col, &mut cur_rhs, &mut out); + out +} + /// Build `Command::SqlDelete` from a validated SQL `DELETE` /// (ADR-0033 §1/§7). Extracts the target table from the matched /// path so the worker re-persists the right CSV and snapshots the diff --git a/src/runtime.rs b/src/runtime.rs index a2fc4c1..98d8fdd 100644 --- a/src/runtime.rs +++ b/src/runtime.rs @@ -1583,6 +1583,14 @@ fn user_value_for_column(command: &Command, column: &str) -> Option set_literals + .iter() + .find(|(c, _)| c == column) + .and_then(|(_, v)| v.clone()), _ => None, } } @@ -2270,8 +2278,9 @@ async fn execute_command_typed( sql, target_table, returning, + set_literals, } => database - .run_sql_update(sql, src, target_table, returning) + .run_sql_update_with_literals(sql, src, target_table, returning, set_literals) .await .map(CommandOutcome::Update), // A SQL `DELETE` (advanced mode; ADR-0033 §1/§7). Grammar- diff --git a/tests/friendly_enrichment.rs b/tests/friendly_enrichment.rs index c4e6b5a..09ad4e3 100644 --- a/tests/friendly_enrichment.rs +++ b/tests/friendly_enrichment.rs @@ -200,6 +200,76 @@ fn enrich_unique_update_resolves_value_from_assignments() { }); } +#[test] +fn enrich_unique_sql_update_resolves_value_from_set_literals() { + // ADR-0036 Phase 2: an advanced-mode SQL `UPDATE` now retains its + // `SET` literals, so a UNIQUE violation names the offending value — + // closing the error-value gap for advanced mode, mirroring the DSL + // `Update` case above. The value flows from the parse-captured + // `set_literals` through `user_value_for_column`. + let db = db(); + rt().block_on(async { + db.create_table( + "Customers".to_string(), + vec![ + ColumnSpec::new("id".to_string(), Type::Int), + ColumnSpec::new("name".to_string(), Type::Text), + ], + vec!["id".to_string()], + None, + ) + .await + .unwrap(); + db.insert( + "Customers".to_string(), + None, + vec![Value::Number("1".to_string()), Value::Text("Alice".to_string())], + None, + ) + .await + .unwrap(); + db.insert( + "Customers".to_string(), + None, + vec![Value::Number("2".to_string()), Value::Text("Bob".to_string())], + None, + ) + .await + .unwrap(); + + // Advanced-mode SQL: set Bob's id to 1 — collides with Alice. + let input = "update Customers set id = 1 where name = 'Bob'"; + let cmd = parse_command(input).expect("parses as advanced-mode SQL update"); + let Command::SqlUpdate { + sql, + target_table, + returning, + set_literals, + } = cmd.clone() + else { + panic!("expected Command::SqlUpdate, got {cmd:?}"); + }; + // The literal `1` is a valid int, so Phase-2 validation passes and + // the engine-level UNIQUE violation is what surfaces. + let err = db + .run_sql_update_with_literals(sql, None, target_table, returning, set_literals) + .await + .unwrap_err(); + assert!(matches!( + err, + DbError::Sqlite { kind: SqliteErrorKind::UniqueViolation, .. } + )); + + let facts = enrich_dsl_failure(&db, &cmd, &err).await; + assert_eq!(facts.column.as_deref(), Some("id")); + assert_eq!( + facts.value.as_deref(), + Some("1"), + "the offending SET value is named (from set_literals)" + ); + }); +} + // ---- NOT NULL --------------------------------------------------- #[test] diff --git a/tests/sql_dml_e2e.rs b/tests/sql_dml_e2e.rs index a604a10..6c3778d 100644 --- a/tests/sql_dml_e2e.rs +++ b/tests/sql_dml_e2e.rs @@ -114,8 +114,14 @@ fn run_update( input: &str, ) -> Result { match parse_command(input).unwrap_or_else(|e| panic!("parse {input:?}: {e:?}")) { - Command::SqlUpdate { sql, target_table, returning } => rt.block_on( - db.run_sql_update(sql, Some(input.to_string()), target_table, returning), + Command::SqlUpdate { sql, target_table, returning, set_literals } => rt.block_on( + db.run_sql_update_with_literals( + sql, + Some(input.to_string()), + target_table, + returning, + set_literals, + ), ), other => panic!("expected Command::SqlUpdate from {input:?}, got {other:?}"), } diff --git a/tests/sql_update.rs b/tests/sql_update.rs index 3d56cc0..a6a6717 100644 --- a/tests/sql_update.rs +++ b/tests/sql_update.rs @@ -8,9 +8,11 @@ //! across all rows with no rail (ADR-0030 §12). use rdbms_playground::db::{Database, DbError, UpdateResult}; -use rdbms_playground::dsl::{ColumnSpec, Command, Type, parse_command}; +use rdbms_playground::dsl::{ColumnSpec, Command, Type, Value, parse_command}; +use rdbms_playground::event::AppEvent; use rdbms_playground::persistence::Persistence; use rdbms_playground::project; +use rdbms_playground::runtime::run_replay; fn rt() -> tokio::runtime::Runtime { tokio::runtime::Builder::new_current_thread() @@ -70,9 +72,15 @@ fn run_update( input: &str, ) -> Result { match parse_command(input).expect("parse update") { - Command::SqlUpdate { sql, target_table, returning } => { - rt.block_on(db.run_sql_update(sql, Some(input.to_string()), target_table, returning)) - } + Command::SqlUpdate { sql, target_table, returning, set_literals } => rt.block_on( + db.run_sql_update_with_literals( + sql, + Some(input.to_string()), + target_table, + returning, + set_literals, + ), + ), other => panic!("expected Command::SqlUpdate, got {other:?}"), } } @@ -205,6 +213,114 @@ fn update_appends_literal_line_to_history() { assert!(body.contains(input), "history records the literal line: {body:?}"); } +// ================================================================= +// ADR-0036 Phase 2 — `SET` literal value validation +// ================================================================= + +#[test] +fn sql_update_validates_set_literals_like_the_dsl() { + // ADR-0036 Phase 2: advanced-mode SQL `UPDATE` now validates each + // literal `SET col = ` value against its column type before + // the (still verbatim) update runs, sharing the DSL's per-type + // validators. `2025/01/15` is a malformed date (slashes, not dashes): + // the DSL update rejects it at bind time, and advanced-mode SQL now + // refuses it too (it used to splice the literal into text and let a + // STRICT TEXT column accept anything). + let (project, db, _dir) = open_project_db(); + let rt = rt(); + create_cols(&db, &rt, "t", &[("id", Type::Int), ("d", Type::Date)], &["id"]); + seed(&db, &rt, "insert into t (id, d) values (1, '2025-01-15')", "t"); + + // SQL path (advanced mode, full replay pipeline) — REJECTS the bad date. + std::fs::write( + project.path().join("bad.commands"), + "update t set d = '2025/01/15' where id = 1\n", + ) + .expect("write script"); + let events = rt.block_on(run_replay(&db, project.path(), "bad.commands")); + assert!( + matches!(events.last(), Some(AppEvent::ReplayFailed { .. })), + "advanced-mode SQL validates the `date` SET literal and refuses \ + 2025/01/15 (ADR-0036 Phase 2); events: {events:?}" + ); + + // A well-formed date still updates (the verbatim path is unaffected). + std::fs::write( + project.path().join("ok.commands"), + "update t set d = '2025-02-20' where id = 1\n", + ) + .expect("write script"); + let ok = rt.block_on(run_replay(&db, project.path(), "ok.commands")); + assert!( + matches!(ok.last(), Some(AppEvent::ReplayCompleted { count, .. }) if *count == 1), + "a well-formed date still updates; events: {ok:?}" + ); +} + +#[test] +fn sql_update_captures_set_literal_classification() { + // ADR-0036 Phase 2 seam (the "one new seam to keep honest"): each + // top-level `SET` RHS is classified — a bare literal (string / signed + // number / bool / null) is captured as `Some`, while an expression + // (arithmetic / scalar subquery / function call / column ref) is + // `None` and left to the engine. Critically, a comma *inside* a + // function call and a `where` *inside* a subquery must NOT be mistaken + // for an assignment separator / SET-list terminator (paren-depth + // guard), and the trailing top-level `WHERE` predicate is not captured. + let cmd = parse_command( + "update t set a = '2025-01-15', b = price * qty, c = -5, \ + d = (select max(n) from o where n < 100), e = true, \ + f = coalesce(g, 0), h = null where id = 7", + ) + .expect("advanced-mode SQL update parses"); + match cmd { + Command::SqlUpdate { set_literals, .. } => { + assert_eq!( + set_literals, + vec![ + ("a".to_string(), Some(Value::Text("2025-01-15".to_string()))), + ("b".to_string(), None), + ("c".to_string(), Some(Value::Number("-5".to_string()))), + ("d".to_string(), None), + ("e".to_string(), Some(Value::Bool(true))), + ("f".to_string(), None), + ("h".to_string(), Some(Value::Null)), + ], + "literals captured; arithmetic / subquery (with inner WHERE) / \ + function call (with inner comma) skipped; trailing WHERE excluded", + ); + } + other => panic!("expected Command::SqlUpdate, got {other:?}"), + } +} + +#[test] +fn sql_update_validates_every_assignment_not_just_the_first() { + // A malformed literal in the *second* assignment is caught — the + // validation loop covers every `SET` literal, not only the first + // (ADR-0036 Phase 2). The first assignment (`v = 'ok'`) is well-formed. + let (project, db, _dir) = open_project_db(); + let rt = rt(); + create_cols( + &db, + &rt, + "t", + &[("id", Type::Int), ("v", Type::Text), ("d", Type::Date)], + &["id"], + ); + seed(&db, &rt, "insert into t (id, v, d) values (1, 'a', '2025-01-01')", "t"); + std::fs::write( + project.path().join("multi.commands"), + "update t set v = 'ok', d = '2025/01/15' where id = 1\n", + ) + .expect("write script"); + let events = rt.block_on(run_replay(&db, project.path(), "multi.commands")); + assert!( + matches!(events.last(), Some(AppEvent::ReplayFailed { .. })), + "the malformed date in the second assignment is caught; events: {events:?}" + ); +} + // ================================================================= // Sub-phase 3g — RETURNING (ADR-0033 §5) // =================================================================