diff --git a/docs/adr/0030-advanced-mode-sql-surface.md b/docs/adr/0030-advanced-mode-sql-surface.md index dd9c872..71e6267 100644 --- a/docs/adr/0030-advanced-mode-sql-surface.md +++ b/docs/adr/0030-advanced-mode-sql-surface.md @@ -157,6 +157,18 @@ that `docs/simple-mode-limitations.md` records as the inverse of the simple-mode subset. The DSL `Expr` is the *DSL's* representation; the SQL surface does not round-trip through it. +> **Forward note (2026-05-26).** The **Proposed ADR-0036** narrows the +> "DML → validated SQL text" half of this section: **literal data +> values** in `INSERT`/`UPDATE` should be parsed to typed `Value`s, +> validated, and bound as parameters (as the DSL already does), not +> handed to the engine as text — so advanced mode gets the same +> value-validation, hinting, highlighting, and error context as simple +> mode. The `SELECT` half and the full-expression-surface rationale above +> **stand**: expressions and queries remain verbatim text (ADR-0026's +> limited `Expr` is *not* imposed on the SQL surface). The dividing line +> is "bindable literal vs engine-evaluated expression-or-query." Pending +> acceptance. + ### 5. Type vocabulary — the playground's, not the engine's Advanced-mode DDL uses the playground's own ten-type diff --git a/docs/adr/0033-sql-dml-grammar.md b/docs/adr/0033-sql-dml-grammar.md index a9150fd..cb2c5cc 100644 --- a/docs/adr/0033-sql-dml-grammar.md +++ b/docs/adr/0033-sql-dml-grammar.md @@ -10,6 +10,16 @@ for the phase-exit report and the filled cross-cut matrix in `docs/plans/20260520-adr-0033-phase-3.md`. Amendments 1–3 below are part of this acceptance. +> **Forward note (2026-05-26).** The **Proposed ADR-0036** narrows the +> §10 verbatim-execution model for **literal data values**: rather than +> handing the user's literal text to the engine, `INSERT`/`UPDATE` should +> parse literal values to typed `Value`s, validate them, retain them on +> the command (for binding *and* error enrichment), and bind them as +> parameters — as the simple-mode DSL already does. Expressions, `WHERE` +> predicates, `INSERT … SELECT`, `RETURNING`, UPSERT, and `SELECT` stay +> verbatim text. The `Sql*` command variants and worker handlers remain; +> only the literal-value execution path changes. Pending acceptance. + ## Context ADR-0030 commissions advanced mode as a body of SQL grammar diff --git a/docs/adr/0036-typed-dml-values-vs-verbatim.md b/docs/adr/0036-typed-dml-values-vs-verbatim.md new file mode 100644 index 0000000..36f482d --- /dev/null +++ b/docs/adr/0036-typed-dml-values-vs-verbatim.md @@ -0,0 +1,326 @@ +# ADR-0036: DML execution — bind literal values, reserve verbatim text for expressions and queries + +## Status + +**Proposed** (design agreed with the user in conversation, 2026-05-26; +`/runda` verification pass completed 2026-05-26 — three factual/precision +corrections applied to the evidence + matrix, and three design forks +confirmed with the user: per-position classification with +`RETURNING`/`ON CONFLICT` as clauses over bindable `VALUES`; the +bindable-literal set including signed numerics, `NULL`, and boolean/string +literals; and `WHERE` left as text deliberately. Pending the user's +go-ahead to implement). + +**Supersedes**, in part, **ADR-0030 §4** ("DML and `SELECT` → executed +as the validated SQL itself … modelling them as a typed `Command` buys +nothing") and the **execution model of ADR-0033 §10** (the three +`Sql{Insert,Update,Delete}` variants executing the user's text +verbatim). It **narrows** those decisions rather than reversing them: +the `SELECT` half of ADR-0030 §4 stands unchanged, and verbatim text +execution remains correct — and necessary — for everything that is an +**expression or a query**. What changes is that **literal data values** +stop being handed to the engine as text and instead flow through the +same typed, validated, parameter-bound path the simple-mode DSL has used +all along. + +Builds on the ADR-0035 precedent (DDL executes *structurally*, not +verbatim) and completes the thought: structure was the first place +"grammar as text" broke; literal data values are the second. + +## Context + +### How we got here + +ADR-0030 §4 set the advanced-mode execute path: **DDL** lowers to a typed +`Command` and runs the structural executor (to preserve the playground +type vocabulary, named relationships, metadata tables, and `STRICT`); +**DML and `SELECT`** execute "as the validated SQL itself," on the +stated rationale that "they change no schema, so modelling them as a +typed `Command` buys nothing." ADR-0033 implemented that for DML: +`SqlInsert`/`SqlUpdate`/`SqlDelete` carry the validated statement text +(`row_source`, the raw `sql`) and the worker hands it to the engine. + +ADR-0035 already found the rationale too broad for DDL and went +structural. This ADR finds it too broad for one more case: **the literal +data values inside DML.** + +### What "verbatim text for literal values" actually costs + +The simple-mode DSL never did it this way. `do_insert` parses each value +into a typed `Value`, validates/normalises it (`Value::bind_for_column` +→ `validate_date`, `shortid::validate`, …), and executes +`INSERT INTO T (…) VALUES (?1, ?2, …)` with the values **bound as +parameters**. The value never becomes SQL text. The advanced-mode SQL +path, by contrast, splices the user's literal into SQL text and lets a +`STRICT` engine be the only check. + +A `date` column is `STRICT TEXT`; a `shortid` is `TEXT`; a `bool` is an +int — the engine's storage types do **not** enforce the playground's +*semantic* types. So the two paths diverge, and advanced mode is +materially weaker. Investigated 2026-05-26; the matrix: + +| Feedback for a DML value | DSL (simple) | SQL (advanced) | +| --- | --- | --- | +| Column-type hint in completion | ✅ typed slots (incl. `date` format examples) | ❌ raw `sql_expr` | +| Value-vs-column highlighting | ✅ numeric-shape mismatch at parse | ❌ none | +| Validation at parse | ⚠️ numeric shape only (`int`/`decimal`/`bool`); `date`/`shortid` format deferred to bind | ❌ none | +| Validation at execution (bind) | ✅ full semantic type | ❌ none (verbatim) | + +Precise reading (verified 2026-05-26): the DSL typed slots +(`shared.rs`) validate *numeric shape* at parse — `INT_SLOT` rejects +decimals, `DECIMAL_SLOT` checks format, `BOOL_SLOT` restricts to boolean +literals — and surface a per-type hint for *every* type (the `DATE_SLOT` +carries the `YYYY-MM-DD` example prose). Full semantic validation — +`date`/`shortid`/`datetime` *format* — happens at **bind** time +(`Value::bind_for_column` → `validate_date` / `shortid::validate`). So +the DSL catches a bad value *somewhere* (parse for numeric shape, bind +for the rest); advanced-mode SQL catches it **nowhere** but the engine's +storage-type floor. That asymmetry — "DSL always catches it, SQL never +does" — is the gap, and it holds across all semantic types. + +The execution-layer gap is **proven** by a characterization test +(`tests/sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces`): +the DSL rejects the malformed date `2025/01/15`; advanced-mode SQL +accepts it and writes the bad row. The only advanced-mode DML +diagnostics are *structural* (`insert_arity_mismatch`, +`auto_column_overridden`, `not_null_missing`) — never value-vs-type. + +The machinery to fix this **already exists and is live for the DSL**: +`column_value_list` unfolds a per-column `TypedValueSlot` when the walker +has schema (`data.rs:141`/`189`/`269`; slots in `shared.rs`). The SQL DML +grammar simply was never wired to it — every value position is +`Node::Subgrammar(&sql_expr::SQL_OR_EXPR)` (`sql_insert.rs:75`), +type-blind by construction. So the asymmetry is **not** a deliberate +"advanced mode doesn't need this" decision — **no ADR says so** — it is +an un-wired surface. (A stale header comment at `data.rs:8-17` still +describes the DSL slots themselves as "deferred"; it predates the wiring +that data.rs:141/189/269 now show, and should be corrected as part of +this work.) For a teaching tool, where the whole point is to catch a +learner's mistake and explain it, silently accepting a malformed value +is a pedagogy failure, not a feature. + +### The same root cause behind the error-value gap + +A separate symptom shares this root cause. When a SQL `INSERT`/`UPDATE` +violates a UNIQUE/CHECK constraint, the friendly-error layer cannot show +the offending **value** — because the value was discarded (only +`row_source` text survives), so `enrich_unique_violation` / +`enrich_check_violation` come up empty and degrade to a neutral "that +value" (ADR-0035 Amendment 1, F2 follow-up). Validation, hinting, +highlighting, and the offending-value-in-errors display are **four faces +of one defect**: literal values are thrown away instead of owned. + +### The sharp edge — why we do *not* go fully structural + +ADR-0030 §4's text choice was not gratuitous. It deliberately keeps +DML/`SELECT`/`CHECK` **expressions** out of the DSL's intentionally +*limited* `Expr` (ADR-0026), so advanced mode delivers the **full** SQL +expression surface — arithmetic, functions, subqueries, nested boolean +operands — that `docs/simple-mode-limitations.md` records as the inverse +of the simple subset. Lowering SQL expressions into the DSL `Expr` would +**regress that surface**; building a full typed SQL-expression AST + +serializer is a large undertaking that ADR-0031 explicitly declined +(`sql_expr` is validate-only, no `Expr` AST). + +And `SELECT` is the proof that text-to-engine is the *right* tool for +queries: ADR-0032 already delivers rich feedback for `SELECT` — +completion, qualified-name resolution, predicate warnings, post-prepare +type recovery — entirely from **walking the validated parse**, with the +engine executing the text. Queries have no data values to validate +against columns; owning them buys nothing and costs enormously. + +So the dividing line is **not** "DDL vs DML." It is **bindable literal +vs engine-evaluated expression-or-query.** + +## Decision + +### 1. The principle + +> **Own structure and literal values; reserve verbatim text for +> expressions and queries.** +> +> Classification is **per value position**, not per statement. Wherever a +> value position holds a **bare literal**, parse it to a typed `Value`, +> validate it against the playground type system, surface the column type +> in completion, highlight a mismatch, and execute it as a **bound +> parameter** — exactly as the simple-mode DSL does. Wherever the +> position holds a **computed expression** (`qty*2`, a function call, a +> scalar subquery), or the value *source* is itself a query +> (`INSERT … SELECT`), hand the validated text to the engine — because +> that is where text-to-engine is necessary, not merely convenient. + +The test for which side a position falls on is operational: **can this +value be bound as a parameter?** A literal can; a computed expression or +a subquery cannot — the engine must evaluate it. + +**What counts as a literal** (the bindable set — matching the DSL lexer's +`consume_number_literal` / `consume_string_literal` and the +`null`/`true`/`false` words): `NULL`, a boolean literal, a string +literal, and a **signed** numeric literal (`-5`, `3.14`). A signed +numeric must count as a literal *even if* the grammar would otherwise +read `-5` as unary-minus-of-`5`, or negative numbers would silently skip +validation. Anything else in a value position — arithmetic, function +calls, `CASE`, subqueries, column references — is an expression and stays +text. + +**Clauses are not value sources.** `RETURNING` and `ON CONFLICT …` are +clauses *on* a statement, not value positions. An +`INSERT INTO T VALUES (1, 'x') RETURNING id` or `… ON CONFLICT DO UPDATE +…` still has **bindable literal `VALUES`**; the clause rides on the +generated statement (and its own expressions — e.g. `excluded.x` in a +`DO UPDATE SET` — stay text). Only when the value *source* is a query +(`INSERT … SELECT`) is there nothing to bind. + +### 2. What this means per statement + +- **`INSERT … VALUES`** — each value position is a literal *or* an + expression. Literal positions are typed/validated/bound; expression + positions stay text. A row that is all literals executes exactly like + the DSL `do_insert` (`VALUES (?1, ?2, …)`); a row mixing the two + generates `VALUES (?1, a+1, ?2)`. Multi-row `VALUES` repeats this per + row. A trailing **`RETURNING`** or **`ON CONFLICT …`** clause is + carried on the generated statement — the bound `VALUES` are unaffected; + the clause's own expressions stay text. +- **`UPDATE … SET`** — `SET col = ` is typed/validated/bound; + `SET col = ` stays text. +- **`WHERE` (UPDATE/DELETE)** — **stays text, deliberately**, even for a + simple `WHERE id = 5`. `WHERE` is an expression in general; keeping it + text preserves the full SQL surface (ADR-0026's `Expr` is the *DSL's* + representation and is not imposed on SQL), and the value-feedback + motivation is already met by `VALUES`/`SET` — a constraint error names + a *written* value, not a predicate operand. This is a scope choice, not + an oversight; per-position binding is **not** applied to `WHERE`. +- **`DELETE`** — no data values to bind; `WHERE` per the rule above. +- **`INSERT … SELECT`** — entirely text: the value *source* is a query, + so there are no literals to bind. +- **`SELECT`** — unchanged. Text-to-engine, feedback-from-walk + (ADR-0032). Explicitly **not** lowered to a typed AST. + +### 3. What it fixes — one change, four faces + +Retaining the typed literal (a) lets validation run on it (the proven +gap), (b) lets completion hint the column type and highlighting flag a +mismatch, (c) lets the worker **bind** it (safe, type-correct), and (d) +leaves it **on the command** so `enrich_*` can show the real value in a +constraint error. The neutral "that value" safety net (ADR-0035 +Amendment 1) remains correct for the genuinely-computed cases — there is +no input literal to show, and that is the honest answer. + +### 4. Explicit requirement — carry the typed literals + +`SqlInsert`/`SqlUpdate` (or their successors) **must retain the parsed +typed literal values** (per column position), not only the raw text, so +that both the executor (binding) and the error enricher (display) can +use them. This is the concrete hinge on which faces (a), (c), and (d) +turn; it must not be left implicit. + +The command **also** keeps the user's **original source text**: that is +what `history.log` records and what `replay` re-runs (ADR-0034), so the +journal stays faithful to what the user typed even though execution now +binds the literals from a generated statement. Two representations, one +command: original text for the journal, parsed literals for execution + +feedback + errors. + +### 5. Mechanism (recommended, with the alternative recorded) + +- **Recommended — typed-slot hybrid grammar.** At each DML value + position, the SQL grammar offers `Choice(typed-literal-slot, + sql_expr)`: the typed slot (reusing the DSL's live `column_value_list` + / per-type `TypedValueSlot`s — `data.rs:141`/`189`/`269`, `shared.rs`) + captures + validates + retains the literal and drives + completion/highlighting; `sql_expr` is the fallback for a genuine + expression. This is the only + option that delivers **live completion hints** for SQL DML values, + because the hint comes from the grammar walk. +- **Alternative — post-parse semantic pass.** Keep the grammar as + `sql_expr` and add a pass that, for positions that are bare literals, + validates + retains them. Simpler; delivers validation, retention, and + the error-value fix, but **not** live completion hinting. Recorded as + the fallback if the hybrid grammar proves too costly in the walker. + +### 6. Phased implementation + +1. **Literal `INSERT … VALUES`** — including the `RETURNING` and + `ON CONFLICT …` variants (they exist today and the shortid-autofill + path already carries those trailing clauses, so excluding them would + be the artificial split). The common case; converges execution on the + DSL bound path; closes the proven validation gap **and** the + error-value display in one move. Highest value, smallest blast radius. +2. **`UPDATE … SET` literals** (bind literal assignments; expression + assignments stay text). +3. **Completion/highlighting** for the typed slots on the SQL surface + (if not already delivered by the mechanism chosen in §5). + +`SELECT` and `INSERT … SELECT` (and every *expression* value position) +remain on the text side throughout — **by design and documented**, not by +omission. `RETURNING` / `ON CONFLICT` are not on this list: they are +clauses over a statement whose literal `VALUES` still bind (§1/§2), and +Phase 1 carries them. + +### 7. Non-goals + +- **A structural `SELECT`.** Not now, not as a consequence of this ADR. + Queries stay text. (If a future need ever forces `SELECT` toward an + owned AST, that is its own ADR with its own justification — flagged as + a watch-item, not a plan.) +- **A full typed SQL-expression AST.** Expressions stay validated text; + ADR-0031's "no `Expr` AST" stands. We do not impose the DSL's limited + `Expr` on the SQL surface (that would regress ADR-0030 §4's full-surface + guarantee). +- **Removing the verbatim path.** It remains the executor for every + expression and query position; this ADR narrows *where* it applies, it + does not delete it. + +## Consequences + +- **Advanced mode stops being a feedback-free zone for data values.** A + learner typing a malformed `date`/`shortid`/`int` literal in a SQL + `INSERT` gets the same catch-and-explain they get in simple mode. +- **One model, fewer paths.** Literal DML converges on the DSL's proven + parameter-binding execution; the two surfaces stop diverging on the + case they share. +- **The full SQL expression + query surface is preserved**, because + expressions and `SELECT` are explicitly left on the text side. No + regression to the advanced surface ADR-0030 §4 protects. +- **Rework, not rewrite.** ADR-0033's `Sql*` command variants and the + worker handlers stay; they gain typed-literal payloads and the literal + execution path changes. Phased so each step is independently shippable + and testable, with the existing ADR-0033 tests as the regression net. +- **A new seam to keep honest:** the literal-vs-expression boundary in + the grammar. It must be explicit and tested (a position that is a bare + literal binds; a position that is any expression stays text; a signed + numeric is a literal; `NULL`/`true`/`false` are literals), or it will + drift. +- **Execution reconstruction is the load-bearing implementation + challenge.** A row mixing literals and expressions must be re-emitted + as `VALUES (?1, expr-text, ?2)` — bound placeholders spliced with the + original expression spans, with any `RETURNING`/`ON CONFLICT` tail + re-appended. This is the strongest argument for the **typed-slot + grammar** mechanism (§5): it yields per-position structure (slot → + bind; `sql_expr` → text span) directly, whereas the post-parse pass + must recover those spans itself. The existing shortid-autofill rewrite + (`plan_shortid_autofill`) already does statement reconstruction with + bound params + a preserved trailing clause, so the pattern is not + new — but it must be got right or a literal silently becomes text. +- **Cost if we are wrong about `SELECT`.** If a future requirement shows + `SELECT` genuinely needs an owned AST, this ADR does not help with it — + but nor does it obstruct it. The boundary chosen here (queries = text) + is the current best line, taken with eyes open. + +## See also + +- ADR-0030 §4 — the execute-path split this ADR narrows for DML (the + `SELECT` half stands). +- ADR-0033 — the SQL DML grammar + the verbatim execution model this ADR + amends for literal values. +- ADR-0035 — the DDL precedent: structural, not verbatim ("verbatim was + a DML convenience, not a rule"). +- ADR-0026 — the DSL's deliberately-limited `Expr`; *not* imposed on the + SQL surface here. +- ADR-0031 — `sql_expr` is validate-only (no `Expr` AST); unchanged. +- ADR-0032 — `SELECT` feedback-from-walk; the proof that text-to-engine + is right for queries. +- ADR-0029 — the column type/constraint model the literal validators + enforce. +- ADR-0035 Amendment 1 (F2 follow-up) — the neutral "that value" safety + net that remains correct for computed values. diff --git a/docs/adr/README.md b/docs/adr/README.md index df7c0a0..8490b64 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -41,3 +41,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0033 — The full SQL DML grammar (`INSERT` / `UPDATE` / `DELETE`)](0033-sql-dml-grammar.md) — **Accepted** (implemented + verified through sub-phase 3k, 2026-05-23; phase-exit report `docs/handoff/20260523-phase-3-verification.md`), the Phase-3 grammar commissioned by ADR-0030 §3: single- and multi-row `INSERT` (incl. `INSERT … SELECT` recursing through ADR-0032's `SQL_SELECT_COMPOUND`), `UPDATE` with `SET` assignment list, `DELETE`, all three optionally followed by `RETURNING projection_list`, plus full `ON CONFLICT … DO NOTHING / DO UPDATE` UPSERT on INSERT; **fixes the DSL-vs-SQL dispatch architecture for shared entry words (`insert`/`update`/`delete`)**: SQL-first / DSL-fallback in Advanced mode via a `Choice(SQL_shape, DSL_shape)` per shape, gated by a new walker capability `Node::Guard(fn)` — a zero-byte-consumption gating node that fails the enclosing Seq with a `ValidationError`; carries `Command::SqlInsert` / `SqlUpdate` / `SqlDelete` variants and `do_sql_*` worker handlers each of which knows the target table (for re-persistence) and the `returning: bool` flag (for DataResult routing); `shortid` auto-fill mirrors the DSL `do_insert` mechanism via worker post-fill; SQL DELETE produces the same per-relationship cascade summary the DSL DELETE does (ADR-0014 parity); three new walker diagnostics (`insert_arity_mismatch` ERROR, `auto_column_overridden` WARNING, `not_null_missing` WARNING) with positive + negative tests each; OOS list explicitly carves out `DEFAULT VALUES` (the project's planned seed feature), SQLite-specific `OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` prefixes, `UPDATE FROM` multi-table updates, and WITH-prefixed DML; the `excluded` keyword inside `ON CONFLICT DO UPDATE` is a deliberate carve-out from ADR-0030 §7's engine-neutral posture (no standard-SQL UPSERT spelling exists that SQLite and PostgreSQL share); eleven phased sub-phases each with explicit exit gates + written DA gate, opening with the dispatch mechanism before any DML grammar lands; initial DA review recorded seven critiques that were resolved before status moved to Proposed; **Amendment 1 supersedes §2's dispatch mechanism**: the originally-chosen `Node::Guard(fn)` + `Choice(SQL_shape, DSL_shape)` was found during 3a to be unworkable as framed (any guard-in-`Choice` mechanism forces a `walk_choice` change — `walk_choice` only falls through on `NoMatch`, so Simple-mode valid-DSL would wrongly surface "this is SQL", and `walk_seq` treats a `NoMatch` past `idx 0` as a hard `Failed`, breaking Advanced-mode DSL fall-through); replaced by **category-grouped, mode-aware dispatch** in `walker::walk` (each `REGISTRY` entry tagged `CommandCategory::{Simple, Advanced}`, generalising the existing whole-command `is_advanced_only` gate), shared entry words carrying a node in both groups, no `Node::Guard` and no `walk_choice`/`walk_seq` change, advanced-mode completion SQL-first with DSL as a full-line fallback; **Amendment 2 (sub-phase 3f) supersedes §7's cascade mechanism**: the WHERE-injected per-child pre-count rested on a premise that was factually wrong about the DSL handler (which detects cascades by before/after row-count diffing inside a transaction, not by `Expr`-derived pre-count subqueries) and would have broken the §2 parity promise by reporting `SET NULL` the DSL path doesn't; replaced by mirroring `do_delete`'s count-diff exactly (verbatim DELETE executes, child-count diff observes the cascade — `ON DELETE CASCADE` row removals only, SET NULL deferred for both paths to preserve parity), which shares the render-layer formatter for free via `CommandOutcome::Delete` and **withdraws risk R2** (no WHERE-byte extraction, no N+1 subquery); **Amendment 3 (sub-phase 3j) records the command-identity model and defers the execution-mode side-channel**: a command is the typed outcome of a *mode-rooted* grammar path and its identity is intrinsic (Advanced mode tries SQL first, falls back to the *Simple* DSL command when no SQL branch matches a token, e.g. `delete … --all-rows`; note `update … --all-rows` does *not* fall back — the SQL `SET` expression eats `--all-rows`, harmless since the engine treats it as a comment); **Simple mode commits the DSL candidate for shared words** so the *real* DSL error surfaces, and when that line would also run in advanced mode the rendering layer **combines** them — DSL error **plus** an `advanced_mode.also_valid_sql` pointer ("… (valid as SQL in advanced mode)") — keeping the actionable DSL fix while pointing at advanced mode; bare "this is SQL" is reserved for entry words with no DSL form (`select`/`with`); a fully-overlapping input (`insert … values …`) legitimately yields *two distinct commands* (`Command::Insert` typed-AST vs `Command::SqlInsert` validated-text) that do the same thing but execute differently (ADR-0030 §4), so each is tested in the mode that produces it; **corrects the plan's 3j exit-gate premise** that the DSL DML tests run in Simple mode (they call `parse_command`, which defaults to Advanced) — the real invariant is "Simple-mode behaviour unchanged, Advanced mode SQL-first, DSL grammar tested in Simple mode, both variants tested in their producing mode", with §6/§7 parity keeping the paths observably equivalent; and **defers to its own future ADR** the execution-time mode side-channel (three-way `Mode`: simple/advanced/advanced-one-shot threaded through `Action`→worker, for mode-dependent *output* like echoing generated SQL) — today only the *rendering* side-channel `OutputLine.mode_at_submission` exists, and the three-way distinction is not required for Phase 3 dispatch correctness - [ADR-0034 — `history.log` as a complete command journal; replay reads success-only](0034-history-journal-and-replay-filter.md) — **Accepted**, resolves a three-way tension in `history.log`'s roles found while implementing ADR-0033 3f: (1) the persistent log is success-only while the in-memory Up/Down recall ring records *every* submission (success or failure, "so users can recall and edit typo'd commands"), and the ring is re-seeded from the log on project open — so **failed commands are recallable within a session but silently lost across sessions**; (2) replay wants the state-building (successful) commands while recall wants everything typed, which one success-only file cannot serve; (3) `replay history.log` never actually worked — `run_replay` parses each whole line through the DSL parser with no understanding of the `||` record shape, so a real log fails on line 1, and **no test ever fed the pipe format to replay** (the `replay_history_log_records_subcommands_only` test only checks what replay *writes*, never replays the log as input). Decision: `history.log` becomes a **complete journal** — every submission recorded, tagged `ok`/`err` via the status field the format already reserved (ADR-0015 §5) — and **each consumer filters**: hydration reads all records (cross-session recall matches in-session), replay reads `ok` only (and learns the journal format, while still accepting bare-command `.commands` scripts; detection by the leading timestamp+status prefix so a `|` inside a bare command isn't misread). Successful commands stay journalled transactionally by the worker; failed commands are journalled `err` best-effort from the runtime/app error path (a parse failure never reaches the worker). Amends ADR-0006's "successfully executed" wording and ADR-0015 §5 ("status always `ok`") / §12 (hydration). Code deferred to two tracked test-first sub-tasks (journal-failures+filtering; replay-parses-journal-format); existing all-`ok` logs need no migration; **implemented 2026-05-24** (plan `docs/plans/20260524-adr-0034-history-journal.md`); **Amendment 1 (2026-05-24): replay filters out app-lifecycle commands** — a working `replay history.log` (the §3 fix) exposed that the journal also records `save as`/`load`/`new`/`export`/`import`/`rebuild`/`mode` (which would panic the worker dispatch or abort the replay), so replay now re-applies **only** schema/data write commands and **skips** every `Command::App` + nested `Command::Replay`; **all skips continue** (never abort — reversing the prior nested-`replay` refusal, so a journal containing a once-run `replay` needs no hand-editing, and the infinite-loop footgun is closed by construction), with a `[skip]` **warning** on `import` and nested-`replay` skips (their omission can leave replayed state incomplete) and silent skips for the rest; `replay.error_nested` removed, `replay.skipped_import`/`replay.skipped_replay` added, `ReplayCompleted` carries `warnings` - [ADR-0035 — Advanced-mode SQL DDL](0035-advanced-mode-sql-ddl.md) — **Accepted** (design agreed 2026-05-24; validated end-to-end by sub-phases 4a/4a.2/4a.3/4b `CREATE TABLE` (incl. foreign keys) + 4c `DROP TABLE [IF EXISTS]` + 4d `CREATE [UNIQUE] INDEX` / `DROP INDEX [IF EXISTS]` + 4e `ALTER TABLE` add/drop/rename column + 4f `ALTER TABLE … ALTER COLUMN TYPE` + 4g `ALTER TABLE` add/drop constraint + add FK + 4h `ALTER TABLE … RENAME TO` + 4i verification sweep (completion merge + simple/advanced completion colour + describe of table-level constraints + self-ref FK indicator + CREATE-TABLE help/usage), implemented 2026-05-25/26 — **Phase 4 complete**; **Amendment 1, 2026-05-26**: drop a composite UNIQUE via a derived, engine-neutral `unique_` name that reuses the existing `DROP CONSTRAINT ` grammar — no new syntax, no metadata, §4g anonymity intact; `describe` shows the name; dropping a UNIQUE-covered *column* now refuses with that name + the drop command), **Phase 4** of the ADR-0030 roadmap (peer of 0031/0032/0033) and **clarifies ADR-0030 §4**. Advanced-mode `CREATE`/`DROP`/`ALTER TABLE` + `CREATE`/`DROP INDEX` get their **own per-statement commands** (`SqlCreateTable`/`SqlAlterTable`/`SqlDropTable`/`SqlCreateIndex`/`SqlDropIndex`), like DML's `Sql*` set — but unlike DML they **execute *structurally*, not verbatim** (raw execution would lose the playground's types, named relationships, and `STRICT`; "verbatim" was a DML convenience, not a rule). Handlers **reuse the low-level schema/metadata helpers** where the operation matches simple mode and **stand alone where the SQL surface is richer** (clarity over forced refactoring); simple mode is untouched (additive). Dispatch: `create`/`drop` reuse ADR-0033 Amendment 1's category-grouped mode-aware dispatch (SQL-first, simple fallback); `alter` is a new advanced-only entry word. Full surface (no pre-emptive cuts, `Q4`): `CREATE TABLE` with column + table constraints, single/compound `PRIMARY KEY`, inline + table-level `FOREIGN KEY` → **named relationships** (one statement = one command = **one undo step**, ADR-0006); `ALTER TABLE` add/drop/rename column, `ALTER COLUMN TYPE`, add/drop constraint, add FK, **`RENAME TO`** (advanced-only table rename — new low-level op renaming the table + its CSV + the relationship and table-CHECK metadata, closing the rename half of `C1`); `CREATE [UNIQUE] INDEX` / `DROP INDEX`. Type slot accepts the ten playground keywords **and** standard-SQL aliases (`integer`→`int`, `varchar`→`text`, `timestamp`→`datetime`, …; length args accepted-and-ignored; no engine type names in/out — ADR-0030 §5). `CHECK`/`DEFAULT` reuse ADR-0031 `sql_expr`. **Pre-implementation `/runda` refinements (2026-05-24, user-confirmed):** `CREATE TABLE`/`DROP TABLE` **admit `IF [NOT] EXISTS`** (no-op-that-succeeds-with-a-note — a near-universal cross-vendor idiom, reclassified *into* scope, not engine-specific); `INTEGER PRIMARY KEY` maps to a **plain `int`** PK, *not* auto-increment (`serial` stays the sole auto-increment type). **Column-type-conversion is unified** (ADR-0017 engine, mode-appropriate policy): clean auto-converts and incompatible/own-type-static cases refuse in both modes, but a **lossy** change refuses-by-default in simple mode (`--force-conversion` opts in) while advanced mode **performs it with a loss note** and relies on **`undo` as the safety net** — no force flag, no dropping to simple mode (a payoff of shipping ADR-0006 first). OOS: views/triggers/txn-control/PRAGMA/etc. (ADR-0030 §3), the Postgres `USING` clause, and the DSL→SQL teaching echo (ADR-0030 Phase 5). Sub-phases 4a–4i, plus **4a.2** (per-column `CHECK`/`DEFAULT` via raw `sql_expr` text — `sql_expr` is validate-only, no `Expr` AST — + composite `UNIQUE(a,b)`; no new internal table) and **4a.3** (table-level/multi-column `CHECK`, landed via the new `__rdbms_playground_table_checks` metadata table because SQLite has no PRAGMA for CHECK; the builder tells a table-level CHECK from a column-level one by element position) and **4b** (foreign keys — inline `REFERENCES` + table-level `FOREIGN KEY` → ADR-0013 named relationships in the create transaction, one undo step; self-references + bare `REFERENCES ` supported, user-confirmed) and **4c** (`DROP TABLE [IF EXISTS]` → `SqlDropTable`, reusing `do_drop_table`; `IF EXISTS` is a no-op-with-note via `DropOutcome::Skipped`) and **4d** (`CREATE [UNIQUE] INDEX [IF NOT EXISTS] [] ON (cols)` → `SqlCreateIndex` and `DROP INDEX [IF EXISTS] ` → `SqlDropIndex`, reusing `do_add_index`/`do_drop_index`; **`CREATE UNIQUE INDEX` admitted** — ADR-0025 **Amendment 1** — via an additive `IndexSchema.unique` flag that round-trips through `project.yaml` and rebuild, with `[unique]` markers in the structure view + items panel, while simple-mode `add unique index` stays deferred; `IF [NOT] EXISTS` reuses the 4c skip path; `create`/`drop` each gain a *second* advanced node, exercising the all-candidates dispatch) and **4e** (`ALTER TABLE` add/drop/rename column → `SqlAlterTable`; `alter` is a new advanced-**only** entry word, runtime-decomposed to the existing `do_add_column`/`do_drop_column`/`do_rename_column` — no new worker layer; `do_add_column` extended to consume raw `default_sql`/`check_sql` so ADD COLUMN reaches CREATE-TABLE constraint parity; drop/rename refuse a column any CHECK references (table-level AND column-level, incl. a column's own self-check on rename) — the 4a.3 deferral, via a raw-CHECK-text tokenizer in the shared executors, so it guards both surfaces and fixes a latent rename-drift bug; SQL DROP COLUMN refuses an index-covered column with no `--cascade` spelling; the column executors + `do_add_index` gained an internal-`__rdbms_*`-table guard — all user-confirmed) and **4f** (`ALTER TABLE … ALTER COLUMN TYPE` → a fourth `AlterTableAction`, runtime-decomposed to the existing `change_column_type` with `ChangeColumnMode::ForceConversion` — which **is** the §7 advanced policy: lossy converts *with a note* (no force flag), incompatible + ADR-0017 static refusals (`↔ blob`, same-type, `date ↔ datetime`, non-`int → serial`) still refuse, while **`int → serial` is allowed** (auto-fills nulls + UNIQUE, ADR-0018 §8 — the §7 "→serial refused" summary is looser than the code); the builder discriminates the fourth branch by the **`type` keyword** (unique — ADD COLUMN's type is an ident), the type slot reuses `SQL_TYPE`; the internal-`__rdbms_*` guard was folded into `do_change_column_type`, closing the simple `change column` exposure too — user-confirmed) and **4g** (`ALTER TABLE … ADD [CONSTRAINT ] (CHECK | UNIQUE | FOREIGN KEY)` + `DROP CONSTRAINT `; ADD = CHECK + composite UNIQUE + FK, with `ADD PRIMARY KEY` and a *named* UNIQUE refused — composite UNIQUE is anonymous in our model; each ADD reuses a low-level path (table-CHECK/UNIQUE rebuild with a dry-run guard; FK → `add_relationship`, bare `REFERENCES

` → parent single-PK), DROP CONSTRAINT resolves the name to a table-CHECK then a child-side FK; **named table-CHECKs round-trip** via a nullable `name` column on `__rdbms_playground_table_checks` (**rebuild-only** arrival — pre-4g projects gain it on `rebuild`, a named add on an un-upgraded project is refused with a friendly "rebuild first" message) *and* a `project.yaml` `check_constraints` extension to an `{expr, name}` mapping (the bare-string form still reads); the internal-`__rdbms_*` guard was folded into `do_add_constraint`/`do_add_relationship`, completing that guard class — all user-confirmed) and **4h** (`ALTER TABLE … RENAME TO` — the one genuinely new low-level op, `do_rename_table`: a native engine rename plus one-transaction reconciliation of every metadata row naming the table (`__rdbms_playground_columns`, **both ends** of `__rdbms_playground_relationships`, `__rdbms_playground_table_checks`), the CSV file (the existing rewrite+delete path — no new persistence method), and **CHECK text that qualifies a column with the old table name** (`T.age`→`U.age`, a planning-`/runda` finding — the engine rewrites the live CHECK but the stored text would drift and break a fresh rebuild; `rewrite_check_table_qualifier` keeps them in step); grammar splits the `rename` verb into one branch with an inner Choice on a distinct second keyword (`column` vs `to`), the new-name slot mirroring the `CREATE TABLE` name slot; refuses same-name / existing-target / `__rdbms_*` / non-existent, with **case-insensitive** collision checks behind an engine-neutral pre-check (a finished-slice `/runda` finding — the engine matches names case-insensitively); auto-named indexes *and* relationships keep their stale names (only table-name columns update — §6 scope); one undo step; advanced-only, closing the rename half of `C1` — all user-confirmed) and **4i** (the verification sweep that completes Phase 4: the shared-entry-word completion merge + the simple-vs-advanced completion colour-when-mixed with Both→Advanced→Simple block ordering; `describe` of table-level composite UNIQUE + table CHECK; the self-ref FK pre-submit indicator fix; and the CREATE-TABLE help/usage skeleton refresh). **All of Phase 4 (4a–4i) is shipped.** Each sub-phase has exit + DA gates +- [ADR-0036 — DML execution: bind literal values, verbatim text for expressions and queries](0036-typed-dml-values-vs-verbatim.md) — **Proposed** (design agreed 2026-05-26, pending a `/runda` pass + go-ahead to implement). **Narrows ADR-0030 §4 / ADR-0033 §10**: keeps DDL structural (ADR-0035) and `SELECT`/expressions as verbatim text (full SQL surface preserved, ADR-0026's limited `Expr` *not* imposed), but stops handing **literal data values** to the engine as text. The simple-mode DSL never did — `do_insert` parses each value to a typed `Value`, validates it (`validate_date`/`shortid::validate`/…), and binds it as a parameter; advanced-mode SQL spliced literals into text and let `STRICT` (storage types only) be the sole check. Result (investigated 2026-05-26; characterization test `sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces` proves it): advanced-mode SQL DML gets **none** of the DSL's value feedback — no column-type hint, no value-vs-column highlight, no parse-time or bind-time type validation; a malformed `date` like `2025/01/15` is silently written. Same root cause as the F2 error-value gap (the literal is discarded, so `enrich_*` can't show it). **Principle: own structure + literal values, reserve verbatim text for expressions and queries** — the operational test is *can this value be bound as a parameter?* (a literal can; a computed expression/subquery/`SELECT` cannot). Classification is **per value position**, not per statement (a literal = `NULL`/boolean/string/**signed**-numeric binds; arithmetic/functions/subqueries/column-refs stay text). `INSERT … VALUES` literal positions typed/validated/bound; `UPDATE … SET` literal assignments bound. `RETURNING`/`ON CONFLICT …` are **clauses** carried on the generated statement — an `INSERT … VALUES (…) RETURNING/ON CONFLICT` still binds its literal `VALUES`; only `INSERT … SELECT` (value source *is* a query) has nothing to bind. `WHERE` (UPDATE/DELETE) stays text **deliberately** (it's an expression in general; the value-feedback motivation is met by `VALUES`/`SET`). `SELECT` stays text-with-feedback-from-walk by design. One change, four faces: validation + completion hint + highlight + offending-value-in-errors all follow from **retaining the typed literal on the command** (explicit requirement). Mechanism: recommended typed-slot hybrid grammar (`Choice(typed-literal-slot, sql_expr)`, reusing the DSL's live `column_value_list` / `TypedValueSlot`s at `data.rs:141`/`189`/`269` — the only option giving live completion hints) or a post-parse semantic pass (simpler, no live hints). Note: the DSL slots validate numeric *shape* at parse (int/decimal/bool) but defer `date`/`shortid` *format* to bind time, so SQL would converge on the same parse-then-bind split. Phased: literal `INSERT` first (closes the proven validation gap + error-value display together, smallest blast radius), then `UPDATE SET` literals, then hinting/highlighting. Non-goals: a structural `SELECT`, a full typed SQL-expression AST, removing the verbatim path. Supersedes-in-part once accepted; the neutral "that value" safety net (ADR-0035 Amendment 1) stays correct for genuinely-computed values diff --git a/tests/sql_insert.rs b/tests/sql_insert.rs index 817d2d9..df62d08 100644 --- a/tests/sql_insert.rs +++ b/tests/sql_insert.rs @@ -21,9 +21,11 @@ //! real reconstructed SQL. use rdbms_playground::db::{Database, DbError, InsertResult}; -use rdbms_playground::dsl::{ColumnSpec, Command, Type, parse_command}; +use rdbms_playground::dsl::{ColumnSpec, Command, Type, Value, parse_command}; +use rdbms_playground::event::AppEvent; use rdbms_playground::persistence::Persistence; use rdbms_playground::project; +use rdbms_playground::runtime::run_replay; fn rt() -> tokio::runtime::Runtime { tokio::runtime::Builder::new_current_thread() @@ -1033,3 +1035,54 @@ fn autofill_preserves_on_conflict_clause() { let rows = csv_rows(&project, "t"); assert_eq!(rows.len(), 1, "one row landed"); } + +#[test] +fn sql_dml_skips_app_level_value_validation_that_the_dsl_enforces() { + // CHARACTERIZATION of a real divergence (ADR pending — see the + // verbatim-vs-structural DML discussion). The DSL insert path validates + // a value against the playground type system at *bind* time + // (`Value::bind_for_column` → `validate_date`); the verbatim SQL insert + // path hands the literal straight to the engine, whose STRICT `date` + // column is TEXT and accepts any string. `2025/01/15` is a malformed + // date (slashes, not dashes): the DSL rejects it, advanced-mode SQL + // currently accepts it. When the forthcoming ADR routes SQL literal + // values through the same validation, FLIP the SQL assertion to expect + // a rejection. + let (project, db, _d) = open_project_db(); + let r = rt(); + r.block_on(db.create_table( + "T".to_string(), + vec![ + ColumnSpec::new("id", Type::Int), + ColumnSpec::new("d", Type::Date), + ], + vec!["id".to_string()], + Some("create table T with pk id(int)".to_string()), + )) + .expect("create T(id int pk, d date)"); + + // DSL path — validates the `date` and rejects the malformed value. + let dsl = r.block_on(db.insert( + "T".to_string(), + Some(vec!["id".to_string(), "d".to_string()]), + vec![Value::Number("1".to_string()), Value::Text("2025/01/15".to_string())], + Some("insert".to_string()), + )); + assert!( + dsl.is_err(), + "the DSL insert path validates `date` and rejects 2025/01/15; got {dsl:?}" + ); + + // SQL path (advanced mode, full pipeline) — currently ACCEPTS it. + std::fs::write( + project.path().join("ins.commands"), + "insert into T (id, d) values (2, '2025/01/15')\n", + ) + .expect("write script"); + let events = r.block_on(run_replay(&db, project.path(), "ins.commands")); + assert!( + matches!(events.last(), Some(AppEvent::ReplayCompleted { count, .. }) if *count == 1), + "CHARACTERIZATION: verbatim SQL insert skips `date` validation and \ + accepts 2025/01/15 — the gap the ADR will close; events: {events:?}" + ); +}