diff --git a/docs/adr/0036-typed-dml-values-vs-verbatim.md b/docs/adr/0036-typed-dml-values-vs-verbatim.md index 58896d8..a571772 100644 --- a/docs/adr/0036-typed-dml-values-vs-verbatim.md +++ b/docs/adr/0036-typed-dml-values-vs-verbatim.md @@ -20,8 +20,13 @@ validated, execution stays verbatim). **Phase 3a implemented 2026-05-26** — live typed-slot hints + numeric-shape highlighting for advanced-mode `UPDATE`/UPSERT `SET col = ` value positions, via a **boundary-aware lookahead** (not the naive `Choice` this ADR originally -sketched in §5 — see **Amendment 1**). Phase 3b (`INSERT … VALUES` typed -slots — needs a per-position grammar restructure + multi-row) pending. +sketched in §5 — see **Amendment 1**). **Phase 3b implemented 2026-05-27** +— live per-position typed-slot hints + highlighting for advanced-mode +`INSERT … VALUES (…)` (single- and multi-row, Form A and Form B), via a +new zero-width `Node::SetColumn` primitive that gives each positional +value its column identity, plus an arity-gating tuple lookahead that +preserves the per-tuple arity diagnostic (ADR-0033 §8.1); see +**Amendment 1**. ADR-0036 is now fully implemented. **Augments** **ADR-0030 §4** and **ADR-0033 §10** — it does **not** supersede them and does **not** change the execution model. Advanced-mode @@ -373,13 +378,44 @@ deliberate throwaway. UPDATE SET`. Mismatch examples now caught **live** (e.g. `set k = 3.14` at an `int` column), matching what simple mode already does — earlier, better feedback than Phase 2's execution-time catch. -- **Phase 3b (pending) — `INSERT … VALUES (…)`.** Harder: the values list - is `Repeated(VALUE_EXPR)` with **no per-position column identity**, and - multi-row `values (..),(..)` must be handled. It needs the DSL-style - per-position restructure (a `DynamicSubgrammar` emitting one - boundary-aware position per column), tracked as its own step. +- **Phase 3b (implemented 2026-05-27) — `INSERT … VALUES (…)`.** Harder, + because the values list is positional (no per-position column ident) + and multi-row. The resolution, agreed with the user (full-parity + option): + - **A new zero-width `Node::SetColumn(&'static TableColumn)` primitive** + establishes the active column for the value position that follows + (sets `current_column` + `pending_value_column`, exactly as an + `Ident { writes_column: true }` would, but without consuming an + identifier). A `DynamicSubgrammar` (`sql_insert::sql_value_list`) + emits `SetColumn(colᵢ)` then the shared `SET_VALUE` per position, so + each positional value reuses 3a's routing. One new enum variant; one + new walker arm. + - **Column mapping** mirrors `do_sql_insert`'s positional rule: + Form A → the listed columns in the user's order; Form B → **all** + columns in declaration order. Note on auto-fill (correcting a loose + statement made while scoping): advanced mode **does** auto-fill an + *omitted* `shortid` in **Form A** (`plan_shortid_autofill`), so that + column has no `VALUES` position and is correctly absent from the + mapping; it does **not** auto-fill `serial` (the X4 gap); and **Form + B auto-fills nothing** (the function returns early on an empty column + list), so the user supplies a value for every column — hence the + Form-B-maps-all-columns rule. + - **Coexistence with the §8.1 arity diagnostic.** A fixed-length typed + `Seq` would *reject* a wrong-arity tuple, suppressing the friendly + per-tuple `insert_arity_mismatch` (ADR-0033 §8.1), which is a + post-walk pass over the matched path and so needs the tuple to be + accepted. So the tuple value list is an **arity-gating lookahead** + (`tuple_value_list`): a closed tuple uses the typed `Seq` only on an + exact value/column match; an open (mid-typing) tuple uses it while + `count <= columns` (so the hint shows from `(` onward); every other + case falls back to the type-blind `Repeated(sql_expr)`, leaving the + §8.1 diagnostic to fire unchanged. Correct-arity tuples (the common + case) thus get full live typed feedback — including a wrong-*kind* + lone literal such as `('text')` into an `int` column, the case the + cheaper structural option would have missed — while wrong-arity + tuples keep the friendly arity message. -**Known limitation (both phases, matches the DSL).** `date` / `shortid` / +**Known limitation (all phases, matches the DSL).** `date` / `shortid` / `datetime` **format** is still not validated at parse — those slots accept any quoted string; the format is checked at bind/execution time (Phase 2). So the live highlight catches *numeric-shape* mismatches (`int`/`decimal`/ diff --git a/docs/adr/README.md b/docs/adr/README.md index 106664b..b8ed8f1 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -41,4 +41,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0033 — The full SQL DML grammar (`INSERT` / `UPDATE` / `DELETE`)](0033-sql-dml-grammar.md) — **Accepted** (implemented + verified through sub-phase 3k, 2026-05-23; phase-exit report `docs/handoff/20260523-phase-3-verification.md`), the Phase-3 grammar commissioned by ADR-0030 §3: single- and multi-row `INSERT` (incl. `INSERT … SELECT` recursing through ADR-0032's `SQL_SELECT_COMPOUND`), `UPDATE` with `SET` assignment list, `DELETE`, all three optionally followed by `RETURNING projection_list`, plus full `ON CONFLICT … DO NOTHING / DO UPDATE` UPSERT on INSERT; **fixes the DSL-vs-SQL dispatch architecture for shared entry words (`insert`/`update`/`delete`)**: SQL-first / DSL-fallback in Advanced mode via a `Choice(SQL_shape, DSL_shape)` per shape, gated by a new walker capability `Node::Guard(fn)` — a zero-byte-consumption gating node that fails the enclosing Seq with a `ValidationError`; carries `Command::SqlInsert` / `SqlUpdate` / `SqlDelete` variants and `do_sql_*` worker handlers each of which knows the target table (for re-persistence) and the `returning: bool` flag (for DataResult routing); `shortid` auto-fill mirrors the DSL `do_insert` mechanism via worker post-fill; SQL DELETE produces the same per-relationship cascade summary the DSL DELETE does (ADR-0014 parity); three new walker diagnostics (`insert_arity_mismatch` ERROR, `auto_column_overridden` WARNING, `not_null_missing` WARNING) with positive + negative tests each; OOS list explicitly carves out `DEFAULT VALUES` (the project's planned seed feature), SQLite-specific `OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` prefixes, `UPDATE FROM` multi-table updates, and WITH-prefixed DML; the `excluded` keyword inside `ON CONFLICT DO UPDATE` is a deliberate carve-out from ADR-0030 §7's engine-neutral posture (no standard-SQL UPSERT spelling exists that SQLite and PostgreSQL share); eleven phased sub-phases each with explicit exit gates + written DA gate, opening with the dispatch mechanism before any DML grammar lands; initial DA review recorded seven critiques that were resolved before status moved to Proposed; **Amendment 1 supersedes §2's dispatch mechanism**: the originally-chosen `Node::Guard(fn)` + `Choice(SQL_shape, DSL_shape)` was found during 3a to be unworkable as framed (any guard-in-`Choice` mechanism forces a `walk_choice` change — `walk_choice` only falls through on `NoMatch`, so Simple-mode valid-DSL would wrongly surface "this is SQL", and `walk_seq` treats a `NoMatch` past `idx 0` as a hard `Failed`, breaking Advanced-mode DSL fall-through); replaced by **category-grouped, mode-aware dispatch** in `walker::walk` (each `REGISTRY` entry tagged `CommandCategory::{Simple, Advanced}`, generalising the existing whole-command `is_advanced_only` gate), shared entry words carrying a node in both groups, no `Node::Guard` and no `walk_choice`/`walk_seq` change, advanced-mode completion SQL-first with DSL as a full-line fallback; **Amendment 2 (sub-phase 3f) supersedes §7's cascade mechanism**: the WHERE-injected per-child pre-count rested on a premise that was factually wrong about the DSL handler (which detects cascades by before/after row-count diffing inside a transaction, not by `Expr`-derived pre-count subqueries) and would have broken the §2 parity promise by reporting `SET NULL` the DSL path doesn't; replaced by mirroring `do_delete`'s count-diff exactly (verbatim DELETE executes, child-count diff observes the cascade — `ON DELETE CASCADE` row removals only, SET NULL deferred for both paths to preserve parity), which shares the render-layer formatter for free via `CommandOutcome::Delete` and **withdraws risk R2** (no WHERE-byte extraction, no N+1 subquery); **Amendment 3 (sub-phase 3j) records the command-identity model and defers the execution-mode side-channel**: a command is the typed outcome of a *mode-rooted* grammar path and its identity is intrinsic (Advanced mode tries SQL first, falls back to the *Simple* DSL command when no SQL branch matches a token, e.g. `delete … --all-rows`; note `update … --all-rows` does *not* fall back — the SQL `SET` expression eats `--all-rows`, harmless since the engine treats it as a comment); **Simple mode commits the DSL candidate for shared words** so the *real* DSL error surfaces, and when that line would also run in advanced mode the rendering layer **combines** them — DSL error **plus** an `advanced_mode.also_valid_sql` pointer ("… (valid as SQL in advanced mode)") — keeping the actionable DSL fix while pointing at advanced mode; bare "this is SQL" is reserved for entry words with no DSL form (`select`/`with`); a fully-overlapping input (`insert … values …`) legitimately yields *two distinct commands* (`Command::Insert` typed-AST vs `Command::SqlInsert` validated-text) that do the same thing but execute differently (ADR-0030 §4), so each is tested in the mode that produces it; **corrects the plan's 3j exit-gate premise** that the DSL DML tests run in Simple mode (they call `parse_command`, which defaults to Advanced) — the real invariant is "Simple-mode behaviour unchanged, Advanced mode SQL-first, DSL grammar tested in Simple mode, both variants tested in their producing mode", with §6/§7 parity keeping the paths observably equivalent; and **defers to its own future ADR** the execution-time mode side-channel (three-way `Mode`: simple/advanced/advanced-one-shot threaded through `Action`→worker, for mode-dependent *output* like echoing generated SQL) — today only the *rendering* side-channel `OutputLine.mode_at_submission` exists, and the three-way distinction is not required for Phase 3 dispatch correctness - [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](0034-history-journal-and-replay-filter.md) — **Accepted**, resolves a three-way tension in `history.log`'s roles found while implementing ADR-0033 3f: (1) the persistent log is success-only while the in-memory Up/Down recall ring records *every* submission (success or failure, "so users can recall and edit typo'd commands"), and the ring is re-seeded from the log on project open — so **failed commands are recallable within a session but silently lost across sessions**; (2) replay wants the state-building (successful) commands while recall wants everything typed, which one success-only file cannot serve; (3) `replay history.log` never actually worked — `run_replay` parses each whole line through the DSL parser with no understanding of the `||` record shape, so a real log fails on line 1, and **no test ever fed the pipe format to replay** (the `replay_history_log_records_subcommands_only` test only checks what replay *writes*, never replays the log as input). Decision: `history.log` becomes a **complete journal** — every submission recorded, tagged `ok`/`err` via the status field the format already reserved (ADR-0015 §5) — and **each consumer filters**: hydration reads all records (cross-session recall matches in-session), replay reads `ok` only (and learns the journal format, while still accepting bare-command `.commands` scripts; detection by the leading timestamp+status prefix so a `|` inside a bare command isn't misread). Successful commands stay journalled transactionally by the worker; failed commands are journalled `err` best-effort from the runtime/app error path (a parse failure never reaches the worker). Amends ADR-0006's "successfully executed" wording and ADR-0015 §5 ("status always `ok`") / §12 (hydration). Code deferred to two tracked test-first sub-tasks (journal-failures+filtering; replay-parses-journal-format); existing all-`ok` logs need no migration; **implemented 2026-05-24** (plan `docs/plans/20260524-adr-0034-history-journal.md`); **Amendment 1 (2026-05-24): replay filters out app-lifecycle commands** — a working `replay history.log` (the §3 fix) exposed that the journal also records `save as`/`load`/`new`/`export`/`import`/`rebuild`/`mode` (which would panic the worker dispatch or abort the replay), so replay now re-applies **only** schema/data write commands and **skips** every `Command::App` + nested `Command::Replay`; **all skips continue** (never abort — reversing the prior nested-`replay` refusal, so a journal containing a once-run `replay` needs no hand-editing, and the infinite-loop footgun is closed by construction), with a `[skip]` **warning** on `import` and nested-`replay` skips (their omission can leave replayed state incomplete) and silent skips for the rest; `replay.error_nested` removed, `replay.skipped_import`/`replay.skipped_replay` added, `ReplayCompleted` carries `warnings` - [ADR-0035 — Advanced-mode SQL DDL](0035-advanced-mode-sql-ddl.md) — **Accepted** (design agreed 2026-05-24; validated end-to-end by sub-phases 4a/4a.2/4a.3/4b `CREATE TABLE` (incl. foreign keys) + 4c `DROP TABLE [IF EXISTS]` + 4d `CREATE [UNIQUE] INDEX` / `DROP INDEX [IF EXISTS]` + 4e `ALTER TABLE` add/drop/rename column + 4f `ALTER TABLE … ALTER COLUMN TYPE` + 4g `ALTER TABLE` add/drop constraint + add FK + 4h `ALTER TABLE … RENAME TO` + 4i verification sweep (completion merge + simple/advanced completion colour + describe of table-level constraints + self-ref FK indicator + CREATE-TABLE help/usage), implemented 2026-05-25/26 — **Phase 4 complete**; **Amendment 1, 2026-05-26**: drop a composite UNIQUE via a derived, engine-neutral `unique_` name that reuses the existing `DROP CONSTRAINT ` grammar — no new syntax, no metadata, §4g anonymity intact; `describe` shows the name; dropping a UNIQUE-covered *column* now refuses with that name + the drop command), **Phase 4** of the ADR-0030 roadmap (peer of 0031/0032/0033) and **clarifies ADR-0030 §4**. Advanced-mode `CREATE`/`DROP`/`ALTER TABLE` + `CREATE`/`DROP INDEX` get their **own per-statement commands** (`SqlCreateTable`/`SqlAlterTable`/`SqlDropTable`/`SqlCreateIndex`/`SqlDropIndex`), like DML's `Sql*` set — but unlike DML they **execute *structurally*, not verbatim** (raw execution would lose the playground's types, named relationships, and `STRICT`; "verbatim" was a DML convenience, not a rule). Handlers **reuse the low-level schema/metadata helpers** where the operation matches simple mode and **stand alone where the SQL surface is richer** (clarity over forced refactoring); simple mode is untouched (additive). Dispatch: `create`/`drop` reuse ADR-0033 Amendment 1's category-grouped mode-aware dispatch (SQL-first, simple fallback); `alter` is a new advanced-only entry word. Full surface (no pre-emptive cuts, `Q4`): `CREATE TABLE` with column + table constraints, single/compound `PRIMARY KEY`, inline + table-level `FOREIGN KEY` → **named relationships** (one statement = one command = **one undo step**, ADR-0006); `ALTER TABLE` add/drop/rename column, `ALTER COLUMN TYPE`, add/drop constraint, add FK, **`RENAME TO`** (advanced-only table rename — new low-level op renaming the table + its CSV + the relationship and table-CHECK metadata, closing the rename half of `C1`); `CREATE [UNIQUE] INDEX` / `DROP INDEX`. Type slot accepts the ten playground keywords **and** standard-SQL aliases (`integer`→`int`, `varchar`→`text`, `timestamp`→`datetime`, …; length args accepted-and-ignored; no engine type names in/out — ADR-0030 §5). `CHECK`/`DEFAULT` reuse ADR-0031 `sql_expr`. **Pre-implementation `/runda` refinements (2026-05-24, user-confirmed):** `CREATE TABLE`/`DROP TABLE` **admit `IF [NOT] EXISTS`** (no-op-that-succeeds-with-a-note — a near-universal cross-vendor idiom, reclassified *into* scope, not engine-specific); `INTEGER PRIMARY KEY` maps to a **plain `int`** PK, *not* auto-increment (`serial` stays the sole auto-increment type). **Column-type-conversion is unified** (ADR-0017 engine, mode-appropriate policy): clean auto-converts and incompatible/own-type-static cases refuse in both modes, but a **lossy** change refuses-by-default in simple mode (`--force-conversion` opts in) while advanced mode **performs it with a loss note** and relies on **`undo` as the safety net** — no force flag, no dropping to simple mode (a payoff of shipping ADR-0006 first). OOS: views/triggers/txn-control/PRAGMA/etc. (ADR-0030 §3), the Postgres `USING` clause, and the DSL→SQL teaching echo (ADR-0030 Phase 5). Sub-phases 4a–4i, plus **4a.2** (per-column `CHECK`/`DEFAULT` via raw `sql_expr` text — `sql_expr` is validate-only, no `Expr` AST — + composite `UNIQUE(a,b)`; no new internal table) and **4a.3** (table-level/multi-column `CHECK`, landed via the new `__rdbms_playground_table_checks` metadata table because SQLite has no PRAGMA for CHECK; the builder tells a table-level CHECK from a column-level one by element position) and **4b** (foreign keys — inline `REFERENCES` + table-level `FOREIGN KEY` → ADR-0013 named relationships in the create transaction, one undo step; self-references + bare `REFERENCES ` supported, user-confirmed) and **4c** (`DROP TABLE [IF EXISTS]` → `SqlDropTable`, reusing `do_drop_table`; `IF EXISTS` is a no-op-with-note via `DropOutcome::Skipped`) and **4d** (`CREATE [UNIQUE] INDEX [IF NOT EXISTS] [] ON (cols)` → `SqlCreateIndex` and `DROP INDEX [IF EXISTS] ` → `SqlDropIndex`, reusing `do_add_index`/`do_drop_index`; **`CREATE UNIQUE INDEX` admitted** — ADR-0025 **Amendment 1** — via an additive `IndexSchema.unique` flag that round-trips through `project.yaml` and rebuild, with `[unique]` markers in the structure view + items panel, while simple-mode `add unique index` stays deferred; `IF [NOT] EXISTS` reuses the 4c skip path; `create`/`drop` each gain a *second* advanced node, exercising the all-candidates dispatch) and **4e** (`ALTER TABLE` add/drop/rename column → `SqlAlterTable`; `alter` is a new advanced-**only** entry word, runtime-decomposed to the existing `do_add_column`/`do_drop_column`/`do_rename_column` — no new worker layer; `do_add_column` extended to consume raw `default_sql`/`check_sql` so ADD COLUMN reaches CREATE-TABLE constraint parity; drop/rename refuse a column any CHECK references (table-level AND column-level, incl. a column's own self-check on rename) — the 4a.3 deferral, via a raw-CHECK-text tokenizer in the shared executors, so it guards both surfaces and fixes a latent rename-drift bug; SQL DROP COLUMN refuses an index-covered column with no `--cascade` spelling; the column executors + `do_add_index` gained an internal-`__rdbms_*`-table guard — all user-confirmed) and **4f** (`ALTER TABLE … ALTER COLUMN TYPE` → a fourth `AlterTableAction`, runtime-decomposed to the existing `change_column_type` with `ChangeColumnMode::ForceConversion` — which **is** the §7 advanced policy: lossy converts *with a note* (no force flag), incompatible + ADR-0017 static refusals (`↔ blob`, same-type, `date ↔ datetime`, non-`int → serial`) still refuse, while **`int → serial` is allowed** (auto-fills nulls + UNIQUE, ADR-0018 §8 — the §7 "→serial refused" summary is looser than the code); the builder discriminates the fourth branch by the **`type` keyword** (unique — ADD COLUMN's type is an ident), the type slot reuses `SQL_TYPE`; the internal-`__rdbms_*` guard was folded into `do_change_column_type`, closing the simple `change column` exposure too — user-confirmed) and **4g** (`ALTER TABLE … ADD [CONSTRAINT ] (CHECK | UNIQUE | FOREIGN KEY)` + `DROP CONSTRAINT `; ADD = CHECK + composite UNIQUE + FK, with `ADD PRIMARY KEY` and a *named* UNIQUE refused — composite UNIQUE is anonymous in our model; each ADD reuses a low-level path (table-CHECK/UNIQUE rebuild with a dry-run guard; FK → `add_relationship`, bare `REFERENCES

` → parent single-PK), DROP CONSTRAINT resolves the name to a table-CHECK then a child-side FK; **named table-CHECKs round-trip** via a nullable `name` column on `__rdbms_playground_table_checks` (**rebuild-only** arrival — pre-4g projects gain it on `rebuild`, a named add on an un-upgraded project is refused with a friendly "rebuild first" message) *and* a `project.yaml` `check_constraints` extension to an `{expr, name}` mapping (the bare-string form still reads); the internal-`__rdbms_*` guard was folded into `do_add_constraint`/`do_add_relationship`, completing that guard class — all user-confirmed) and **4h** (`ALTER TABLE … RENAME TO` — the one genuinely new low-level op, `do_rename_table`: a native engine rename plus one-transaction reconciliation of every metadata row naming the table (`__rdbms_playground_columns`, **both ends** of `__rdbms_playground_relationships`, `__rdbms_playground_table_checks`), the CSV file (the existing rewrite+delete path — no new persistence method), and **CHECK text that qualifies a column with the old table name** (`T.age`→`U.age`, a planning-`/runda` finding — the engine rewrites the live CHECK but the stored text would drift and break a fresh rebuild; `rewrite_check_table_qualifier` keeps them in step); grammar splits the `rename` verb into one branch with an inner Choice on a distinct second keyword (`column` vs `to`), the new-name slot mirroring the `CREATE TABLE` name slot; refuses same-name / existing-target / `__rdbms_*` / non-existent, with **case-insensitive** collision checks behind an engine-neutral pre-check (a finished-slice `/runda` finding — the engine matches names case-insensitively); auto-named indexes *and* relationships keep their stale names (only table-name columns update — §6 scope); one undo step; advanced-only, closing the rename half of `C1` — all user-confirmed) and **4i** (the verification sweep that completes Phase 4: the shared-entry-word completion merge + the simple-vs-advanced completion colour-when-mixed with Both→Advanced→Simple block ordering; `describe` of table-level composite UNIQUE + table CHECK; the self-ref FK pre-submit indicator fix; and the CREATE-TABLE help/usage skeleton refresh). **All of Phase 4 (4a–4i) is shipped.** Each sub-phase has exit + DA gates -- [ADR-0036 — Value validation for advanced-mode DML](0036-typed-dml-values-vs-verbatim.md) — **Accepted** (design agreed + `/runda`'d 2026-05-26; mechanism then **deliberately narrowed** from "bind literals via the DSL path" to surgical **"validate-and-retain, execute verbatim"** after the user resisted consolidating the modes and a concrete auto-fill difference confirmed even the single-row literal case isn't identical across modes; **Phases 1–2 implemented** 2026-05-26 — `INSERT … VALUES` and `UPDATE … SET` literal validation + offending-value retention, capture-at-parse with no grammar change; **Phase 3a implemented** 2026-05-26 — live typed-slot hints + numeric-shape highlighting for `UPDATE`/UPSERT `SET col = ` via a boundary-aware lookahead (Amendment 1 corrects this ADR's naive-`Choice` sketch); Phase 3b (`INSERT … VALUES` typed slots) pending). **Augments — does NOT supersede — ADR-0030 §4 / ADR-0033 §10**: execution stays verbatim, ADR-0033 Amendment 3's two-command identity (`Insert` vs `SqlInsert`) **stands**. The problem (investigated 2026-05-26; characterization test `sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces` proves it): advanced-mode SQL DML gets **none** of the DSL's value feedback — a malformed `date` like `2025/01/15` is silently written, and the offending value is missing from constraint errors — because literal values are spliced into text and discarded (only `STRICT` storage types check them). **Fix (surgical): validate each literal value against its column type before the verbatim insert, and retain it for error reporting — sharing only the per-type validators (`Value::bind_for_column`/`validate_date`/`shortid::validate`), nothing else.** No binding, no statement reconstruction, no auto-fill change, no command-identity collapse — because the two gaps are closed by validation + retention alone, and executing the user's own text is already safe. The literal set = `NULL`/boolean/string/**signed**-numeric; arithmetic/functions/subqueries/column-refs are expressions (skipped — the engine evaluates them). `WHERE` not validated (it's an expression in general; motivation met by `VALUES`/`SET`); `SELECT`/`INSERT … SELECT`/`RETURNING`/`ON CONFLICT` need no special handling since execution is untouched. Phased: **Phase 1** capture-at-parse + validate + retain for `INSERT … VALUES` (no grammar change, no reparse — closes both proven gaps); **Phase 2** `UPDATE … SET` literals; **Phase 3** completion hinting/highlighting (the only part needing a grammar change — a typed-literal slot vs `sql_expr` reusing the DSL `TypedValueSlot`s at `data.rs:141`/`189`/`269`, discriminated by a **boundary-aware lookahead** not a naive `Choice` per **Amendment 1**; split into **3a** `SET` (done) and **3b** `VALUES` (pending); supersedes only Phase 1/2's literal *detection*, not the validation/enrichment on top). Non-goals: binding/reconstruction, collapsing command identity (Am3 stands), changing `serial`/`shortid` auto-fill (`requirements.md` **X4**, a separate possible-bug), a structural `SELECT`, a full SQL-expression AST. Embodies `requirements.md` **X5** (share a *mechanic*, not a *command*); the neutral "that value" safety net (ADR-0035 Amendment 1) stays correct for genuinely-computed values +- [ADR-0036 — Value validation for advanced-mode DML](0036-typed-dml-values-vs-verbatim.md) — **Accepted** (design agreed + `/runda`'d 2026-05-26; mechanism then **deliberately narrowed** from "bind literals via the DSL path" to surgical **"validate-and-retain, execute verbatim"** after the user resisted consolidating the modes and a concrete auto-fill difference confirmed even the single-row literal case isn't identical across modes; **Phases 1–2 implemented** 2026-05-26 — `INSERT … VALUES` and `UPDATE … SET` literal validation + offending-value retention, capture-at-parse with no grammar change; **Phase 3a implemented** 2026-05-26 — live typed-slot hints + numeric-shape highlighting for `UPDATE`/UPSERT `SET col = ` via a boundary-aware lookahead (Amendment 1 corrects this ADR's naive-`Choice` sketch); **Phase 3b implemented** 2026-05-27 — per-position typed slots for `INSERT … VALUES` (single/multi-row, Form A/B) via a new zero-width `Node::SetColumn` primitive + an arity-gating tuple lookahead that preserves the §8.1 arity diagnostic; **fully implemented**). **Augments — does NOT supersede — ADR-0030 §4 / ADR-0033 §10**: execution stays verbatim, ADR-0033 Amendment 3's two-command identity (`Insert` vs `SqlInsert`) **stands**. The problem (investigated 2026-05-26; characterization test `sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces` proves it): advanced-mode SQL DML gets **none** of the DSL's value feedback — a malformed `date` like `2025/01/15` is silently written, and the offending value is missing from constraint errors — because literal values are spliced into text and discarded (only `STRICT` storage types check them). **Fix (surgical): validate each literal value against its column type before the verbatim insert, and retain it for error reporting — sharing only the per-type validators (`Value::bind_for_column`/`validate_date`/`shortid::validate`), nothing else.** No binding, no statement reconstruction, no auto-fill change, no command-identity collapse — because the two gaps are closed by validation + retention alone, and executing the user's own text is already safe. The literal set = `NULL`/boolean/string/**signed**-numeric; arithmetic/functions/subqueries/column-refs are expressions (skipped — the engine evaluates them). `WHERE` not validated (it's an expression in general; motivation met by `VALUES`/`SET`); `SELECT`/`INSERT … SELECT`/`RETURNING`/`ON CONFLICT` need no special handling since execution is untouched. Phased: **Phase 1** capture-at-parse + validate + retain for `INSERT … VALUES` (no grammar change, no reparse — closes both proven gaps); **Phase 2** `UPDATE … SET` literals; **Phase 3** completion hinting/highlighting (the only part needing a grammar change — a typed-literal slot vs `sql_expr` reusing the DSL `TypedValueSlot`s at `data.rs:141`/`189`/`269`, discriminated by a **boundary-aware lookahead** not a naive `Choice` per **Amendment 1**; split into **3a** `SET` (done) and **3b** `VALUES` (pending); supersedes only Phase 1/2's literal *detection*, not the validation/enrichment on top). Non-goals: binding/reconstruction, collapsing command identity (Am3 stands), changing `serial`/`shortid` auto-fill (`requirements.md` **X4**, a separate possible-bug), a structural `SELECT`, a full SQL-expression AST. Embodies `requirements.md` **X5** (share a *mechanic*, not a *command*); the neutral "that value" safety net (ADR-0035 Amendment 1) stays correct for genuinely-computed values diff --git a/src/dsl/grammar/mod.rs b/src/dsl/grammar/mod.rs index 79212de..b07d96f 100644 --- a/src/dsl/grammar/mod.rs +++ b/src/dsl/grammar/mod.rs @@ -375,6 +375,25 @@ pub enum Node { /// type-awareness). Not memoized: the output depends on the /// source, not just `ctx`. Lookahead(fn(&WalkContext, &str, usize) -> Self), + /// Zero-width node that *establishes the active column* for the + /// value slot that follows it (ADR-0036 Phase 3b). Matches the + /// empty string and, as a side effect, sets + /// `WalkContext::current_column` to the referenced column and + /// `pending_value_column` to its name — exactly as an + /// `Ident { writes_column: true }` does, but without consuming a + /// column identifier from the input. + /// + /// This is the primitive that gives `INSERT … VALUES (…)` + /// positions a per-position column identity: the positions are + /// positional (no per-position column ident to write + /// `current_column`), so a `DynamicSubgrammar` factory + /// (`sql_insert::sql_value_list`) emits `SetColumn(colᵢ)` before + /// each value position, then the shared boundary-aware `SET_VALUE` + /// slot routes a lone literal to that column's typed slot and any + /// expression to `sql_expr`. The referenced `TableColumn` is + /// leaked by the factory (bounded by the column count, like the + /// `DynamicSubgrammar` `Box::leak`). + SetColumn(&'static crate::completion::TableColumn), /// Typed value-literal slot (ADR-0024 §Phase D §typed-value-slots). /// /// Walks `inner` to consume the literal but records the diff --git a/src/dsl/grammar/sql_insert.rs b/src/dsl/grammar/sql_insert.rs index 3c04886..efb75d2 100644 --- a/src/dsl/grammar/sql_insert.rs +++ b/src/dsl/grammar/sql_insert.rs @@ -13,12 +13,14 @@ //! (3g), and `ON CONFLICT … ` UPSERT (3h) land in later //! sub-phases. +use crate::completion::TableColumn; use crate::dsl::grammar::shared::SET_VALUE; use crate::dsl::grammar::sql_expr; use crate::dsl::grammar::sql_select::{ RETURNING_CLAUSE, SQL_SELECT_COMPOUND, WHERE_CLAUSE, reject_internal_table, }; use crate::dsl::grammar::{IdentSource, Node, Word}; +use crate::dsl::walker::context::WalkContext; static COMMA: Node = Node::Punct(','); @@ -41,10 +43,13 @@ const TARGET_TABLE: Node = Node::Ident { /// One column name inside the optional `(col1, col2, …)` list. /// -/// `writes_user_listed_column` stays `false` in 3b — the worker -/// requires explicit values for every column, so the listed-column -/// set isn't needed yet. Sub-phase 3d (`shortid` auto-fill) turns -/// it on and threads `listed_columns` into `Command::SqlInsert`. +/// `writes_user_listed_column: true` records the listed columns into +/// `WalkContext::user_listed_columns` so the `VALUES` factory +/// (`sql_value_list`, ADR-0036 Phase 3b) maps each value position to +/// the listed column in the user's order (Form A). `build_sql_insert` +/// still collects `listed_columns` independently from the matched +/// `insert_column` idents, so this flag only adds the live typed-slot +/// mapping — nothing else reads `user_listed_columns` on the SQL path. static COLUMN_NAME: Node = Node::Ident { source: IdentSource::Columns, role: "insert_column", @@ -52,7 +57,7 @@ static COLUMN_NAME: Node = Node::Ident { highlight_override: None, writes_table: false, writes_column: false, - writes_user_listed_column: false, + writes_user_listed_column: true, writes_table_alias: false, writes_cte_name: false, writes_projection_alias: false, @@ -72,19 +77,167 @@ const OPTIONAL_COLUMN_LIST: Node = Node::Optional(&Node::Seq(COLUMN_LIST_NODES)) /// One value expression inside a `VALUES` tuple. Consumes the /// shared `sql_expr` grammar (ADR-0031), so literals, operators, /// `CASE`, function calls, etc. are all admitted; the engine -/// evaluates them at execution time. +/// evaluates them at execution time. Used as the schemaless / fallback +/// value (see `sql_value_list`). static VALUE_EXPR: Node = Node::Subgrammar(&sql_expr::SQL_OR_EXPR); -static VALUE_TUPLE_NODES: &[Node] = &[ - Node::Punct('('), +/// The fallback value list — the pre-Phase-3b type-blind +/// `Repeated(sql_expr)`. Used for schemaless walks and (crucially) for +/// any tuple whose value-count does NOT match the target column count, +/// so the post-walk per-tuple arity diagnostic (ADR-0033 §8.1) still +/// sees all the values in the matched path and fires its friendly +/// message — a fixed-length typed `Seq` would instead reject the tuple +/// and suppress that diagnostic. +fn fallback_value_list() -> Node { Node::Repeated { inner: &VALUE_EXPR, separator: Some(&COMMA), min: 1, - }, + } +} + +/// The target columns a `VALUES` tuple's positions map onto (ADR-0036 +/// Phase 3b). Mirrors `db::do_sql_insert`'s positional rule — NOT the +/// DSL's `column_value_list`: +/// - **Form A** (`user_listed_columns` set, from the `(col, …)` +/// list): the listed columns, in the user's order. An *omitted* +/// `shortid` is auto-filled at execution (the X4 note) and has no +/// `VALUES` position, so it is correctly absent here. +/// - **Form B** (no column list): ALL columns in declaration order, +/// including `serial` / `shortid` — advanced-mode Form B auto-fills +/// *nothing* (`plan_shortid_autofill` returns early on an empty +/// column list), so the user supplies a value for every column. +/// +/// Empty when schemaless, the table is unknown, or a Form A list +/// resolves to nothing (callers fall back to the type-blind list). +fn target_value_columns(ctx: &WalkContext) -> Vec { + let Some(table_cols) = ctx.current_table_columns.as_ref() else { + return Vec::new(); + }; + ctx.user_listed_columns.as_ref().map_or_else( + || table_cols.clone(), + |listed| { + listed + .iter() + .filter_map(|name| { + table_cols.iter().find(|c| c.name.eq_ignore_ascii_case(name)).cloned() + }) + .collect() + }, + ) +} + +/// Count the value positions in the `VALUES` tuple whose contents begin +/// at `pos` (just past the opening `(`), and whether the tuple is +/// *closed* (a depth-0 `)` was reached) vs still being typed (scan hit +/// end-of-input first). Depth-aware: commas nested in a function call / +/// subquery (paren depth ≥ 1) or inside a string literal are not +/// separators. Returns `(0, _)` for an empty tuple `()`. +fn count_tuple_values(source: &str, pos: usize) -> (usize, bool) { + let bytes = source.as_bytes(); + let mut i = pos; + let mut depth: i32 = 0; + let mut commas = 0usize; + let mut seen_value = false; + let mut closed = false; + while i < bytes.len() { + match bytes[i] { + b'\'' => { + // Skip a single-quoted string literal (`''` escape). + i += 1; + seen_value = true; + while i < bytes.len() { + if bytes[i] == b'\'' { + if bytes.get(i + 1) == Some(&b'\'') { + i += 2; + continue; + } + i += 1; + break; + } + i += 1; + } + continue; + } + b'(' => { + depth += 1; + seen_value = true; + } + b')' => { + if depth == 0 { + closed = true; + break; // tuple close + } + depth -= 1; + } + b',' if depth == 0 => commas += 1, + b if !b.is_ascii_whitespace() => seen_value = true, + _ => {} + } + i += 1; + } + (if seen_value { commas + 1 } else { 0 }, closed) +} + +/// Tuple value-list lookahead (ADR-0036 Phase 3b). Gates the typed +/// per-column path on arity so the typed `Seq` is used only where it +/// can succeed, leaving wrong-arity tuples to the type-blind path (and +/// thus to the per-tuple arity diagnostic, ADR-0033 §8.1, which a +/// fixed-length `Seq` would otherwise suppress by rejecting the tuple): +/// - a **closed** tuple routes to typed slots only on an *exact* +/// match (`count == columns`); +/// - an **open** (still-typing) tuple routes to typed slots while +/// there is still room (`count <= columns`), so the per-column hint +/// shows from the moment `(` is opened through each position. +/// +/// Returns a small node — the heavy typed `Seq` is built + memoized by +/// the `DynamicSubgrammar` — matching `insert_first_paren`'s leak +/// discipline. Schemaless / unknown table → type-blind fallback. +fn tuple_value_list(ctx: &WalkContext, source: &str, pos: usize) -> Node { + let cols = target_value_columns(ctx); + let (count, closed) = count_tuple_values(source, pos); + let arity_ok = if closed { count == cols.len() } else { count <= cols.len() }; + if !cols.is_empty() && arity_ok { + Node::DynamicSubgrammar(sql_value_list) + } else { + fallback_value_list() + } +} + +/// Schema-aware typed value list for one correct-arity `VALUES` tuple +/// (ADR-0036 Phase 3b). Emits, per target column, a zero-width +/// `SetColumn(col)` marker (establishes the active column) followed by +/// the shared boundary-aware [`SET_VALUE`] slot — so a lone literal +/// routes to the column's typed slot (live hint + numeric-shape +/// highlight) and any expression falls through to `sql_expr`. Reached +/// only via [`tuple_value_list`] when arity matches and the schema is +/// known; the empty-cols guard is defensive. +fn sql_value_list(ctx: &WalkContext) -> Node { + let cols = target_value_columns(ctx); + if cols.is_empty() { + return fallback_value_list(); + } + let mut children: Vec = Vec::with_capacity(cols.len() * 3); + for (i, col) in cols.into_iter().enumerate() { + if i > 0 { + children.push(Node::Punct(',')); + } + let leaked: &'static TableColumn = Box::leak(Box::new(col)); + children.push(Node::SetColumn(leaked)); + children.push(SET_VALUE); + } + Node::Seq(Box::leak(children.into_boxed_slice())) +} + +static VALUE_TUPLE_NODES: &[Node] = &[ + Node::Punct('('), + Node::Lookahead(tuple_value_list), Node::Punct(')'), ]; -/// `'(' sql_expr (',' sql_expr)* ')'` — one row of values. +/// `'(' ')'` — one row of values. The value list is the +/// arity-gated `tuple_value_list` (ADR-0036 Phase 3b): a correct-arity +/// tuple gets per-column typed slots; a wrong-arity tuple keeps the +/// type-blind `sql_expr` repeat so the §8.1 arity diagnostic fires. static VALUE_TUPLE: Node = Node::Seq(VALUE_TUPLE_NODES); static VALUES_CLAUSE_NODES: &[Node] = &[ diff --git a/src/dsl/walker/driver.rs b/src/dsl/walker/driver.rs index 3b30639..c63951a 100644 --- a/src/dsl/walker/driver.rs +++ b/src/dsl/walker/driver.rs @@ -250,6 +250,19 @@ fn walk_node_inner( Box::leak(Box::new(factory(ctx, source, pos))); walk_node(source, pos, resolved, ctx, path, per_byte) } + Node::SetColumn(col) => { + // ADR-0036 Phase 3b: zero-width — establish the active + // column for the value position that follows, exactly as an + // `Ident { writes_column: true }` would (current_column for + // the typed slot's dispatch; pending_value_column for the + // hint's "for `col`:" framing), but without consuming a + // column identifier (VALUES positions are positional). The + // following `SET_VALUE` slot reads `current_column`. + let col: &crate::completion::TableColumn = col; + ctx.current_column = Some(col.clone()); + ctx.pending_value_column = Some(col.name.clone()); + NodeWalkResult::Matched { end: pos, skipped: Vec::new() } + } Node::TypedValueSlot { ty, column_name, diff --git a/tests/sql_insert.rs b/tests/sql_insert.rs index ade2643..1e65fb5 100644 --- a/tests/sql_insert.rs +++ b/tests/sql_insert.rs @@ -24,7 +24,9 @@ use rdbms_playground::completion::{SchemaCache, TableColumn}; use rdbms_playground::db::{Database, DbError, InsertResult}; use rdbms_playground::dsl::{ColumnSpec, Command, Type, Value, parse_command}; use rdbms_playground::event::AppEvent; -use rdbms_playground::input_render::{AmbientHint, ambient_hint_in_mode}; +use rdbms_playground::input_render::{ + AmbientHint, InputState, ambient_hint_in_mode, classify_input_with_schema_in_mode, +}; use rdbms_playground::mode::Mode; use rdbms_playground::persistence::Persistence; use rdbms_playground::project; @@ -1217,3 +1219,156 @@ fn advanced_upsert_do_update_set_offers_typed_slot_hint() { "text-column hint says `quoted string`: {prose:?}" ); } + +// ================================================================= +// ADR-0036 Phase 3b — live typed-slot hints + highlighting for the +// INSERT `VALUES (…)` positions (per-position column mapping via the +// `Node::SetColumn` primitive; boundary-aware lookahead per position). +// ================================================================= + +/// Build a `SchemaCache` for the advanced-mode typing-surface tests. +fn vschema(tables: &[(&str, &[(&str, Type)])]) -> SchemaCache { + let mut cache = SchemaCache::default(); + for (table, cols) in tables { + let table_cols: Vec = cols + .iter() + .map(|(n, t)| TableColumn { + name: (*n).to_string(), + user_type: *t, + not_null: false, + has_default: false, + }) + .collect(); + cache.tables.push((*table).to_string()); + for c in &table_cols { + if !cache.columns.contains(&c.name) { + cache.columns.push(c.name.clone()); + } + } + cache.table_columns.insert((*table).to_string(), table_cols); + } + cache +} + +fn prose_at(input: &str, schema: &SchemaCache) -> String { + let hint = ambient_hint_in_mode(input, input.len(), None, schema, Mode::Advanced); + match hint { + Some(AmbientHint::Prose(p)) => p, + other => panic!("expected a Prose hint for {input:?}, got {other:?}"), + } +} + +#[test] +fn advanced_insert_form_a_value_offers_typed_slot_hint() { + // Form A (explicit column list): the value position maps to the + // user-listed column, so the hint is that column's typed prose. + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + let prose = prose_at("insert into Things (note) values (", &schema); + assert!(prose.contains("note"), "names listed column `note`: {prose:?}"); + assert!(prose.contains("quoted string"), "text-column prose: {prose:?}"); +} + +#[test] +fn advanced_insert_form_b_value_maps_first_column() { + // Form B (no column list): positions map to ALL columns in + // declaration order, so the first position is the first column. + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + let prose = prose_at("insert into Things values (", &schema); + assert!(prose.contains("k"), "names first column `k`: {prose:?}"); + assert!(prose.contains("integer"), "int-column prose: {prose:?}"); +} + +#[test] +fn advanced_insert_second_position_hints_second_column() { + // Per-position mapping advances: after the first value + comma, the + // hint is the SECOND column's typed prose. + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + let prose = prose_at("insert into Things (k, note) values (5, ", &schema); + assert!(prose.contains("note"), "second position names `note`: {prose:?}"); + assert!(prose.contains("quoted string"), "text-column prose: {prose:?}"); +} + +#[test] +fn advanced_insert_value_int_mismatch_is_caught_live() { + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + let bad = classify_input_with_schema_in_mode( + "insert into Things (k) values (3.14)", + &schema, + Mode::Advanced, + ); + assert!(!matches!(bad, InputState::Valid), "decimal into int rejected live: {bad:?}"); + let ok = classify_input_with_schema_in_mode( + "insert into Things (k) values (5)", + &schema, + Mode::Advanced, + ); + assert!(matches!(ok, InputState::Valid), "valid int literal parses: {ok:?}"); +} + +#[test] +fn advanced_insert_string_into_int_is_caught_live() { + // The Option-A win over the structural fallback: a wrong-KIND lone + // literal (a string into an int column) is rejected WHILE TYPING, + // not only at execution. + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + let bad = classify_input_with_schema_in_mode( + "insert into Things (k) values ('text')", + &schema, + Mode::Advanced, + ); + assert!(!matches!(bad, InputState::Valid), "string into int rejected live: {bad:?}"); +} + +#[test] +fn advanced_insert_multi_row_typed_and_mismatch_caught() { + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + let ok = classify_input_with_schema_in_mode( + "insert into Things (k, note) values (1, 'a'), (2, 'b')", + &schema, + Mode::Advanced, + ); + assert!(matches!(ok, InputState::Valid), "well-formed multi-row parses: {ok:?}"); + let bad = classify_input_with_schema_in_mode( + "insert into Things (k, note) values (1, 'a'), (3.14, 'b')", + &schema, + Mode::Advanced, + ); + assert!( + !matches!(bad, InputState::Valid), + "a mismatch in the second row is caught: {bad:?}" + ); +} + +#[test] +fn advanced_insert_form_b_maps_all_columns_including_serial() { + // SQL Form B supplies a value for EVERY column (no auto-fill), so + // the position count = all columns, and a serial column's position + // takes an int literal (unlike the DSL, which omits auto-gen cols). + let schema = vschema(&[( + "Customers", + &[("id", Type::Serial), ("Name", Type::Text), ("Email", Type::Text)], + )]); + let state = classify_input_with_schema_in_mode( + "insert into Customers values (1, 'Bob', 'b@c')", + &schema, + Mode::Advanced, + ); + assert!(matches!(state, InputState::Valid), "Form B maps all 3 columns: {state:?}"); +} + +#[test] +fn advanced_insert_value_expressions_still_parse_via_sql_expr() { + // Regression guard: a non-lone-literal value position (arithmetic, + // literal-prefixed, function call, signed number) falls through to + // sql_expr unchanged — the typed slot must not steal it. + let schema = vschema(&[("Things", &[("k", Type::Int), ("note", Type::Text)])]); + for input in [ + "insert into Things (k) values (1 + 2)", + "insert into Things (k, note) values (5, upper(note))", + "insert into Things (k) values (-5)", + "insert into Things (k) values ((select 1))", + ] { + let state = classify_input_with_schema_in_mode(input, &schema, Mode::Advanced); + assert!(matches!(state, InputState::Valid), "{input:?} must parse: {state:?}"); + } +}