docs: ADR-0036 (Proposed) — bind literal DML values, verbatim text only for expressions/queries

Records the decision that advanced-mode SQL DML should stop handing literal data values to the engine as text and instead parse/validate/bind them through the DSL's proven path — closing the value-validation gap, the hint/highlight gap, and the offending-value-in-errors gap together. Verbatim text stays for expressions, WHERE, INSERT…SELECT, and SELECT (full SQL surface preserved; ADR-0026's limited Expr not imposed). Narrows ADR-0030 §4 / ADR-0033 §10 once accepted; SELECT half of §4 stands. Includes a characterization test (tests/sql_insert.rs) proving the bind-layer gap: the DSL rejects the malformed date 2025/01/15, advanced-mode SQL accepts it. Forward-notes added to ADR-0030/0033; README index updated. Status: Proposed (design + /runda done; pending go-ahead to implement).
2026-05-26 20:49:40 +00:00
parent f8a91f41c9
commit 3e3a2fb171
5 changed files with 403 additions and 1 deletions
@@ -157,6 +157,18 @@ that `docs/simple-mode-limitations.md` records as the inverse
 of the simple-mode subset. The DSL `Expr` is the *DSL's*
 representation; the SQL surface does not round-trip through it.
 > **Forward note (2026-05-26).** The **Proposed ADR-0036** narrows the
 > "DML → validated SQL text" half of this section: **literal data
 > values** in `INSERT`/`UPDATE` should be parsed to typed `Value`s,
 > validated, and bound as parameters (as the DSL already does), not
 > handed to the engine as text — so advanced mode gets the same
 > value-validation, hinting, highlighting, and error context as simple
 > mode. The `SELECT` half and the full-expression-surface rationale above
 > **stand**: expressions and queries remain verbatim text (ADR-0026's
 > limited `Expr` is *not* imposed on the SQL surface). The dividing line
 > is "bindable literal vs engine-evaluated expression-or-query." Pending
 > acceptance.
 ### 5. Type vocabulary — the playground's, not the engine's
 Advanced-mode DDL uses the playground's own ten-type
@@ -10,6 +10,16 @@ for the phase-exit report and the filled cross-cut matrix in
 `docs/plans/20260520-adr-0033-phase-3.md`. Amendments 1–3 below
 are part of this acceptance.
 > **Forward note (2026-05-26).** The **Proposed ADR-0036** narrows the
 > §10 verbatim-execution model for **literal data values**: rather than
 > handing the user's literal text to the engine, `INSERT`/`UPDATE` should
 > parse literal values to typed `Value`s, validate them, retain them on
 > the command (for binding *and* error enrichment), and bind them as
 > parameters — as the simple-mode DSL already does. Expressions, `WHERE`
 > predicates, `INSERT … SELECT`, `RETURNING`, UPSERT, and `SELECT` stay
 > verbatim text. The `Sql*` command variants and worker handlers remain;
 > only the literal-value execution path changes. Pending acceptance.
 ## Context
 ADR-0030 commissions advanced mode as a body of SQL grammar
@@ -0,0 +1,326 @@
 # ADR-0036: DML execution — bind literal values, reserve verbatim text for expressions and queries
 ## Status
 **Proposed** (design agreed with the user in conversation, 2026-05-26;
 `/runda` verification pass completed 2026-05-26 — three factual/precision
 corrections applied to the evidence + matrix, and three design forks
 confirmed with the user: per-position classification with
 `RETURNING`/`ON CONFLICT` as clauses over bindable `VALUES`; the
 bindable-literal set including signed numerics, `NULL`, and boolean/string
 literals; and `WHERE` left as text deliberately. Pending the user's
 go-ahead to implement).
 **Supersedes**, in part, **ADR-0030 §4** ("DML and `SELECT` → executed
 as the validated SQL itself … modelling them as a typed `Command` buys
 nothing") and the **execution model of ADR-0033 §10** (the three
 `Sql{Insert,Update,Delete}` variants executing the user's text
 verbatim). It **narrows** those decisions rather than reversing them:
 the `SELECT` half of ADR-0030 §4 stands unchanged, and verbatim text
 execution remains correct — and necessary — for everything that is an
 **expression or a query**. What changes is that **literal data values**
 stop being handed to the engine as text and instead flow through the
 same typed, validated, parameter-bound path the simple-mode DSL has used
 all along.
 Builds on the ADR-0035 precedent (DDL executes *structurally*, not
 verbatim) and completes the thought: structure was the first place
 "grammar as text" broke; literal data values are the second.
 ## Context
 ### How we got here
 ADR-0030 §4 set the advanced-mode execute path: **DDL** lowers to a typed
 `Command` and runs the structural executor (to preserve the playground
 type vocabulary, named relationships, metadata tables, and `STRICT`);
 **DML and `SELECT`** execute "as the validated SQL itself," on the
 stated rationale that "they change no schema, so modelling them as a
 typed `Command` buys nothing." ADR-0033 implemented that for DML:
 `SqlInsert`/`SqlUpdate`/`SqlDelete` carry the validated statement text
 (`row_source`, the raw `sql`) and the worker hands it to the engine.
 ADR-0035 already found the rationale too broad for DDL and went
 structural. This ADR finds it too broad for one more case: **the literal
 data values inside DML.**
 ### What "verbatim text for literal values" actually costs
 The simple-mode DSL never did it this way. `do_insert` parses each value
 into a typed `Value`, validates/normalises it (`Value::bind_for_column`
 → `validate_date`, `shortid::validate`, …), and executes
 `INSERT INTO T (…) VALUES (?1, ?2, …)` with the values **bound as
 parameters**. The value never becomes SQL text. The advanced-mode SQL
 path, by contrast, splices the user's literal into SQL text and lets a
 `STRICT` engine be the only check.
 A `date` column is `STRICT TEXT`; a `shortid` is `TEXT`; a `bool` is an
 int — the engine's storage types do **not** enforce the playground's
 *semantic* types. So the two paths diverge, and advanced mode is
 materially weaker. Investigated 2026-05-26; the matrix:
 | Feedback for a DML value | DSL (simple) | SQL (advanced) |
 | --- | --- | --- |
 | Column-type hint in completion | ✅ typed slots (incl. `date` format examples) | ❌ raw `sql_expr` |
 | Value-vs-column highlighting | ✅ numeric-shape mismatch at parse | ❌ none |
 | Validation at parse | ⚠️ numeric shape only (`int`/`decimal`/`bool`); `date`/`shortid` format deferred to bind | ❌ none |
 | Validation at execution (bind) | ✅ full semantic type | ❌ none (verbatim) |
 Precise reading (verified 2026-05-26): the DSL typed slots
 (`shared.rs`) validate *numeric shape* at parse — `INT_SLOT` rejects
 decimals, `DECIMAL_SLOT` checks format, `BOOL_SLOT` restricts to boolean
 literals — and surface a per-type hint for *every* type (the `DATE_SLOT`
 carries the `YYYY-MM-DD` example prose). Full semantic validation —
 `date`/`shortid`/`datetime` *format* — happens at **bind** time
 (`Value::bind_for_column` → `validate_date` / `shortid::validate`). So
 the DSL catches a bad value *somewhere* (parse for numeric shape, bind
 for the rest); advanced-mode SQL catches it **nowhere** but the engine's
 storage-type floor. That asymmetry — "DSL always catches it, SQL never
 does" — is the gap, and it holds across all semantic types.
 The execution-layer gap is **proven** by a characterization test
 (`tests/sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces`):
 the DSL rejects the malformed date `2025/01/15`; advanced-mode SQL
 accepts it and writes the bad row. The only advanced-mode DML
 diagnostics are *structural* (`insert_arity_mismatch`,
 `auto_column_overridden`, `not_null_missing`) — never value-vs-type.
 The machinery to fix this **already exists and is live for the DSL**:
 `column_value_list` unfolds a per-column `TypedValueSlot` when the walker
 has schema (`data.rs:141`/`189`/`269`; slots in `shared.rs`). The SQL DML
 grammar simply was never wired to it — every value position is
 `Node::Subgrammar(&sql_expr::SQL_OR_EXPR)` (`sql_insert.rs:75`),
 type-blind by construction. So the asymmetry is **not** a deliberate
 "advanced mode doesn't need this" decision — **no ADR says so** — it is
 an un-wired surface. (A stale header comment at `data.rs:8-17` still
 describes the DSL slots themselves as "deferred"; it predates the wiring
 that data.rs:141/189/269 now show, and should be corrected as part of
 this work.) For a teaching tool, where the whole point is to catch a
 learner's mistake and explain it, silently accepting a malformed value
 is a pedagogy failure, not a feature.
 ### The same root cause behind the error-value gap
 A separate symptom shares this root cause. When a SQL `INSERT`/`UPDATE`
 violates a UNIQUE/CHECK constraint, the friendly-error layer cannot show
 the offending **value** — because the value was discarded (only
 `row_source` text survives), so `enrich_unique_violation` /
 `enrich_check_violation` come up empty and degrade to a neutral "that
 value" (ADR-0035 Amendment 1, F2 follow-up). Validation, hinting,
 highlighting, and the offending-value-in-errors display are **four faces
 of one defect**: literal values are thrown away instead of owned.
 ### The sharp edge — why we do *not* go fully structural
 ADR-0030 §4's text choice was not gratuitous. It deliberately keeps
 DML/`SELECT`/`CHECK` **expressions** out of the DSL's intentionally
 *limited* `Expr` (ADR-0026), so advanced mode delivers the **full** SQL
 expression surface — arithmetic, functions, subqueries, nested boolean
 operands — that `docs/simple-mode-limitations.md` records as the inverse
 of the simple subset. Lowering SQL expressions into the DSL `Expr` would
 **regress that surface**; building a full typed SQL-expression AST +
 serializer is a large undertaking that ADR-0031 explicitly declined
 (`sql_expr` is validate-only, no `Expr` AST).
 And `SELECT` is the proof that text-to-engine is the *right* tool for
 queries: ADR-0032 already delivers rich feedback for `SELECT` —
 completion, qualified-name resolution, predicate warnings, post-prepare
 type recovery — entirely from **walking the validated parse**, with the
 engine executing the text. Queries have no data values to validate
 against columns; owning them buys nothing and costs enormously.
 So the dividing line is **not** "DDL vs DML." It is **bindable literal
 vs engine-evaluated expression-or-query.**
 ## Decision
 ### 1. The principle
 > **Own structure and literal values; reserve verbatim text for
 > expressions and queries.**
 >
 > Classification is **per value position**, not per statement. Wherever a
 > value position holds a **bare literal**, parse it to a typed `Value`,
 > validate it against the playground type system, surface the column type
 > in completion, highlight a mismatch, and execute it as a **bound
 > parameter** — exactly as the simple-mode DSL does. Wherever the
 > position holds a **computed expression** (`qty*2`, a function call, a
 > scalar subquery), or the value *source* is itself a query
 > (`INSERT … SELECT`), hand the validated text to the engine — because
 > that is where text-to-engine is necessary, not merely convenient.
 The test for which side a position falls on is operational: **can this
 value be bound as a parameter?** A literal can; a computed expression or
 a subquery cannot — the engine must evaluate it.
 **What counts as a literal** (the bindable set — matching the DSL lexer's
 `consume_number_literal` / `consume_string_literal` and the
 `null`/`true`/`false` words): `NULL`, a boolean literal, a string
 literal, and a **signed** numeric literal (`-5`, `3.14`). A signed
 numeric must count as a literal *even if* the grammar would otherwise
 read `-5` as unary-minus-of-`5`, or negative numbers would silently skip
 validation. Anything else in a value position — arithmetic, function
 calls, `CASE`, subqueries, column references — is an expression and stays
 text.
 **Clauses are not value sources.** `RETURNING` and `ON CONFLICT …` are
 clauses *on* a statement, not value positions. An
 `INSERT INTO T VALUES (1, 'x') RETURNING id` or `… ON CONFLICT DO UPDATE
 …` still has **bindable literal `VALUES`**; the clause rides on the
 generated statement (and its own expressions — e.g. `excluded.x` in a
 `DO UPDATE SET` — stay text). Only when the value *source* is a query
 (`INSERT … SELECT`) is there nothing to bind.
 ### 2. What this means per statement
 - **`INSERT … VALUES`** — each value position is a literal *or* an
  expression. Literal positions are typed/validated/bound; expression
  positions stay text. A row that is all literals executes exactly like
  the DSL `do_insert` (`VALUES (?1, ?2, …)`); a row mixing the two
  generates `VALUES (?1, a+1, ?2)`. Multi-row `VALUES` repeats this per
  row. A trailing **`RETURNING`** or **`ON CONFLICT …`** clause is
  carried on the generated statement — the bound `VALUES` are unaffected;
  the clause's own expressions stay text.
 - **`UPDATE … SET`** — `SET col = <literal>` is typed/validated/bound;
  `SET col = <expr>` stays text.
 - **`WHERE` (UPDATE/DELETE)** — **stays text, deliberately**, even for a
  simple `WHERE id = 5`. `WHERE` is an expression in general; keeping it
  text preserves the full SQL surface (ADR-0026's `Expr` is the *DSL's*
  representation and is not imposed on SQL), and the value-feedback
  motivation is already met by `VALUES`/`SET` — a constraint error names
  a *written* value, not a predicate operand. This is a scope choice, not
  an oversight; per-position binding is **not** applied to `WHERE`.
 - **`DELETE`** — no data values to bind; `WHERE` per the rule above.
 - **`INSERT … SELECT`** — entirely text: the value *source* is a query,
  so there are no literals to bind.
 - **`SELECT`** — unchanged. Text-to-engine, feedback-from-walk
  (ADR-0032). Explicitly **not** lowered to a typed AST.
 ### 3. What it fixes — one change, four faces
 Retaining the typed literal (a) lets validation run on it (the proven
 gap), (b) lets completion hint the column type and highlighting flag a
 mismatch, (c) lets the worker **bind** it (safe, type-correct), and (d)
 leaves it **on the command** so `enrich_*` can show the real value in a
 constraint error. The neutral "that value" safety net (ADR-0035
 Amendment 1) remains correct for the genuinely-computed cases — there is
 no input literal to show, and that is the honest answer.
 ### 4. Explicit requirement — carry the typed literals
 `SqlInsert`/`SqlUpdate` (or their successors) **must retain the parsed
 typed literal values** (per column position), not only the raw text, so
 that both the executor (binding) and the error enricher (display) can
 use them. This is the concrete hinge on which faces (a), (c), and (d)
 turn; it must not be left implicit.
 The command **also** keeps the user's **original source text**: that is
 what `history.log` records and what `replay` re-runs (ADR-0034), so the
 journal stays faithful to what the user typed even though execution now
 binds the literals from a generated statement. Two representations, one
 command: original text for the journal, parsed literals for execution +
 feedback + errors.
 ### 5. Mechanism (recommended, with the alternative recorded)
 - **Recommended — typed-slot hybrid grammar.** At each DML value
  position, the SQL grammar offers `Choice(typed-literal-slot,
  sql_expr)`: the typed slot (reusing the DSL's live `column_value_list`
  / per-type `TypedValueSlot`s — `data.rs:141`/`189`/`269`, `shared.rs`)
  captures + validates + retains the literal and drives
  completion/highlighting; `sql_expr` is the fallback for a genuine
  expression. This is the only
  option that delivers **live completion hints** for SQL DML values,
  because the hint comes from the grammar walk.
 - **Alternative — post-parse semantic pass.** Keep the grammar as
  `sql_expr` and add a pass that, for positions that are bare literals,
  validates + retains them. Simpler; delivers validation, retention, and
  the error-value fix, but **not** live completion hinting. Recorded as
  the fallback if the hybrid grammar proves too costly in the walker.
 ### 6. Phased implementation
 1. **Literal `INSERT … VALUES`** — including the `RETURNING` and
   `ON CONFLICT …` variants (they exist today and the shortid-autofill
   path already carries those trailing clauses, so excluding them would
   be the artificial split). The common case; converges execution on the
   DSL bound path; closes the proven validation gap **and** the
   error-value display in one move. Highest value, smallest blast radius.
 2. **`UPDATE … SET` literals** (bind literal assignments; expression
   assignments stay text).
 3. **Completion/highlighting** for the typed slots on the SQL surface
   (if not already delivered by the mechanism chosen in §5).
 `SELECT` and `INSERT … SELECT` (and every *expression* value position)
 remain on the text side throughout — **by design and documented**, not by
 omission. `RETURNING` / `ON CONFLICT` are not on this list: they are
 clauses over a statement whose literal `VALUES` still bind (§1/§2), and
 Phase 1 carries them.
 ### 7. Non-goals
 - **A structural `SELECT`.** Not now, not as a consequence of this ADR.
  Queries stay text. (If a future need ever forces `SELECT` toward an
  owned AST, that is its own ADR with its own justification — flagged as
  a watch-item, not a plan.)
 - **A full typed SQL-expression AST.** Expressions stay validated text;
  ADR-0031's "no `Expr` AST" stands. We do not impose the DSL's limited
  `Expr` on the SQL surface (that would regress ADR-0030 §4's full-surface
  guarantee).
 - **Removing the verbatim path.** It remains the executor for every
  expression and query position; this ADR narrows *where* it applies, it
  does not delete it.
 ## Consequences
 - **Advanced mode stops being a feedback-free zone for data values.** A
  learner typing a malformed `date`/`shortid`/`int` literal in a SQL
  `INSERT` gets the same catch-and-explain they get in simple mode.
 - **One model, fewer paths.** Literal DML converges on the DSL's proven
  parameter-binding execution; the two surfaces stop diverging on the
  case they share.
 - **The full SQL expression + query surface is preserved**, because
  expressions and `SELECT` are explicitly left on the text side. No
  regression to the advanced surface ADR-0030 §4 protects.
 - **Rework, not rewrite.** ADR-0033's `Sql*` command variants and the
  worker handlers stay; they gain typed-literal payloads and the literal
  execution path changes. Phased so each step is independently shippable
  and testable, with the existing ADR-0033 tests as the regression net.
 - **A new seam to keep honest:** the literal-vs-expression boundary in
  the grammar. It must be explicit and tested (a position that is a bare
  literal binds; a position that is any expression stays text; a signed
  numeric is a literal; `NULL`/`true`/`false` are literals), or it will
  drift.
 - **Execution reconstruction is the load-bearing implementation
  challenge.** A row mixing literals and expressions must be re-emitted
  as `VALUES (?1, expr-text, ?2)` — bound placeholders spliced with the
  original expression spans, with any `RETURNING`/`ON CONFLICT` tail
  re-appended. This is the strongest argument for the **typed-slot
  grammar** mechanism (§5): it yields per-position structure (slot →
  bind; `sql_expr` → text span) directly, whereas the post-parse pass
  must recover those spans itself. The existing shortid-autofill rewrite
  (`plan_shortid_autofill`) already does statement reconstruction with
  bound params + a preserved trailing clause, so the pattern is not
  new — but it must be got right or a literal silently becomes text.
 - **Cost if we are wrong about `SELECT`.** If a future requirement shows
  `SELECT` genuinely needs an owned AST, this ADR does not help with it —
  but nor does it obstruct it. The boundary chosen here (queries = text)
  is the current best line, taken with eyes open.
 ## See also
 - ADR-0030 §4 — the execute-path split this ADR narrows for DML (the
  `SELECT` half stands).
 - ADR-0033 — the SQL DML grammar + the verbatim execution model this ADR
  amends for literal values.
 - ADR-0035 — the DDL precedent: structural, not verbatim ("verbatim was
  a DML convenience, not a rule").
 - ADR-0026 — the DSL's deliberately-limited `Expr`; *not* imposed on the
  SQL surface here.
 - ADR-0031 — `sql_expr` is validate-only (no `Expr` AST); unchanged.
 - ADR-0032 — `SELECT` feedback-from-walk; the proof that text-to-engine
  is right for queries.
 - ADR-0029 — the column type/constraint model the literal validators
  enforce.
 - ADR-0035 Amendment 1 (F2 follow-up) — the neutral "that value" safety
  net that remains correct for computed values.
@@ -21,9 +21,11 @@
 //! real reconstructed SQL.
 use rdbms_playground::db::{Database, DbError, InsertResult};
-use rdbms_playground::dsl::{ColumnSpec, Command, Type, parse_command};
+use rdbms_playground::dsl::{ColumnSpec, Command, Type, Value, parse_command};
 use rdbms_playground::event::AppEvent;
 use rdbms_playground::persistence::Persistence;
 use rdbms_playground::project;
 use rdbms_playground::runtime::run_replay;
 fn rt() -> tokio::runtime::Runtime {
    tokio::runtime::Builder::new_current_thread()
@@ -1033,3 +1035,54 @@ fn autofill_preserves_on_conflict_clause() {
    let rows = csv_rows(&project, "t");
    assert_eq!(rows.len(), 1, "one row landed");
 }
 #[test]
 fn sql_dml_skips_app_level_value_validation_that_the_dsl_enforces() {
    // CHARACTERIZATION of a real divergence (ADR pending — see the
    // verbatim-vs-structural DML discussion). The DSL insert path validates
    // a value against the playground type system at *bind* time
    // (`Value::bind_for_column` → `validate_date`); the verbatim SQL insert
    // path hands the literal straight to the engine, whose STRICT `date`
    // column is TEXT and accepts any string. `2025/01/15` is a malformed
    // date (slashes, not dashes): the DSL rejects it, advanced-mode SQL
    // currently accepts it. When the forthcoming ADR routes SQL literal
    // values through the same validation, FLIP the SQL assertion to expect
    // a rejection.
    let (project, db, _d) = open_project_db();
    let r = rt();
    r.block_on(db.create_table(
        "T".to_string(),
        vec![
            ColumnSpec::new("id", Type::Int),
            ColumnSpec::new("d", Type::Date),
        ],
        vec!["id".to_string()],
        Some("create table T with pk id(int)".to_string()),
    ))
    .expect("create T(id int pk, d date)");
    // DSL path — validates the `date` and rejects the malformed value.
    let dsl = r.block_on(db.insert(
        "T".to_string(),
        Some(vec!["id".to_string(), "d".to_string()]),
        vec![Value::Number("1".to_string()), Value::Text("2025/01/15".to_string())],
        Some("insert".to_string()),
    ));
    assert!(
        dsl.is_err(),
        "the DSL insert path validates `date` and rejects 2025/01/15; got {dsl:?}"
    );
    // SQL path (advanced mode, full pipeline) — currently ACCEPTS it.
    std::fs::write(
        project.path().join("ins.commands"),
        "insert into T (id, d) values (2, '2025/01/15')\n",
    )
    .expect("write script");
    let events = r.block_on(run_replay(&db, project.path(), "ins.commands"));
    assert!(
        matches!(events.last(), Some(AppEvent::ReplayCompleted { count, .. }) if *count == 1),
        "CHARACTERIZATION: verbatim SQL insert skips `date` validation and \
         accepts 2025/01/15 — the gap the ADR will close; events: {events:?}"
    );
 }