Files
rdbms-playground/docs/adr/0036-typed-dml-values-vs-verbatim.md
T
claude@clouddev1 3e3a2fb171 docs: ADR-0036 (Proposed) — bind literal DML values, verbatim text only for expressions/queries
Records the decision that advanced-mode SQL DML should stop handing literal
data values to the engine as text and instead parse/validate/bind them
through the DSL's proven path — closing the value-validation gap, the
hint/highlight gap, and the offending-value-in-errors gap together. Verbatim
text stays for expressions, WHERE, INSERT…SELECT, and SELECT (full SQL
surface preserved; ADR-0026's limited Expr not imposed). Narrows ADR-0030 §4
/ ADR-0033 §10 once accepted; SELECT half of §4 stands.

Includes a characterization test (tests/sql_insert.rs) proving the bind-layer
gap: the DSL rejects the malformed date 2025/01/15, advanced-mode SQL accepts
it. Forward-notes added to ADR-0030/0033; README index updated.

Status: Proposed (design + /runda done; pending go-ahead to implement).
2026-05-26 20:49:40 +00:00

17 KiB

ADR-0036: DML execution — bind literal values, reserve verbatim text for expressions and queries

Status

Proposed (design agreed with the user in conversation, 2026-05-26; /runda verification pass completed 2026-05-26 — three factual/precision corrections applied to the evidence + matrix, and three design forks confirmed with the user: per-position classification with RETURNING/ON CONFLICT as clauses over bindable VALUES; the bindable-literal set including signed numerics, NULL, and boolean/string literals; and WHERE left as text deliberately. Pending the user's go-ahead to implement).

Supersedes, in part, ADR-0030 §4 ("DML and SELECT → executed as the validated SQL itself … modelling them as a typed Command buys nothing") and the execution model of ADR-0033 §10 (the three Sql{Insert,Update,Delete} variants executing the user's text verbatim). It narrows those decisions rather than reversing them: the SELECT half of ADR-0030 §4 stands unchanged, and verbatim text execution remains correct — and necessary — for everything that is an expression or a query. What changes is that literal data values stop being handed to the engine as text and instead flow through the same typed, validated, parameter-bound path the simple-mode DSL has used all along.

Builds on the ADR-0035 precedent (DDL executes structurally, not verbatim) and completes the thought: structure was the first place "grammar as text" broke; literal data values are the second.

Context

How we got here

ADR-0030 §4 set the advanced-mode execute path: DDL lowers to a typed Command and runs the structural executor (to preserve the playground type vocabulary, named relationships, metadata tables, and STRICT); DML and SELECT execute "as the validated SQL itself," on the stated rationale that "they change no schema, so modelling them as a typed Command buys nothing." ADR-0033 implemented that for DML: SqlInsert/SqlUpdate/SqlDelete carry the validated statement text (row_source, the raw sql) and the worker hands it to the engine.

ADR-0035 already found the rationale too broad for DDL and went structural. This ADR finds it too broad for one more case: the literal data values inside DML.

What "verbatim text for literal values" actually costs

The simple-mode DSL never did it this way. do_insert parses each value into a typed Value, validates/normalises it (Value::bind_for_columnvalidate_date, shortid::validate, …), and executes INSERT INTO T (…) VALUES (?1, ?2, …) with the values bound as parameters. The value never becomes SQL text. The advanced-mode SQL path, by contrast, splices the user's literal into SQL text and lets a STRICT engine be the only check.

A date column is STRICT TEXT; a shortid is TEXT; a bool is an int — the engine's storage types do not enforce the playground's semantic types. So the two paths diverge, and advanced mode is materially weaker. Investigated 2026-05-26; the matrix:

Feedback for a DML value DSL (simple) SQL (advanced)
Column-type hint in completion typed slots (incl. date format examples) raw sql_expr
Value-vs-column highlighting numeric-shape mismatch at parse none
Validation at parse ⚠️ numeric shape only (int/decimal/bool); date/shortid format deferred to bind none
Validation at execution (bind) full semantic type none (verbatim)

Precise reading (verified 2026-05-26): the DSL typed slots (shared.rs) validate numeric shape at parse — INT_SLOT rejects decimals, DECIMAL_SLOT checks format, BOOL_SLOT restricts to boolean literals — and surface a per-type hint for every type (the DATE_SLOT carries the YYYY-MM-DD example prose). Full semantic validation — date/shortid/datetime format — happens at bind time (Value::bind_for_columnvalidate_date / shortid::validate). So the DSL catches a bad value somewhere (parse for numeric shape, bind for the rest); advanced-mode SQL catches it nowhere but the engine's storage-type floor. That asymmetry — "DSL always catches it, SQL never does" — is the gap, and it holds across all semantic types.

The execution-layer gap is proven by a characterization test (tests/sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces): the DSL rejects the malformed date 2025/01/15; advanced-mode SQL accepts it and writes the bad row. The only advanced-mode DML diagnostics are structural (insert_arity_mismatch, auto_column_overridden, not_null_missing) — never value-vs-type.

The machinery to fix this already exists and is live for the DSL: column_value_list unfolds a per-column TypedValueSlot when the walker has schema (data.rs:141/189/269; slots in shared.rs). The SQL DML grammar simply was never wired to it — every value position is Node::Subgrammar(&sql_expr::SQL_OR_EXPR) (sql_insert.rs:75), type-blind by construction. So the asymmetry is not a deliberate "advanced mode doesn't need this" decision — no ADR says so — it is an un-wired surface. (A stale header comment at data.rs:8-17 still describes the DSL slots themselves as "deferred"; it predates the wiring that data.rs:141/189/269 now show, and should be corrected as part of this work.) For a teaching tool, where the whole point is to catch a learner's mistake and explain it, silently accepting a malformed value is a pedagogy failure, not a feature.

The same root cause behind the error-value gap

A separate symptom shares this root cause. When a SQL INSERT/UPDATE violates a UNIQUE/CHECK constraint, the friendly-error layer cannot show the offending value — because the value was discarded (only row_source text survives), so enrich_unique_violation / enrich_check_violation come up empty and degrade to a neutral "that value" (ADR-0035 Amendment 1, F2 follow-up). Validation, hinting, highlighting, and the offending-value-in-errors display are four faces of one defect: literal values are thrown away instead of owned.

The sharp edge — why we do not go fully structural

ADR-0030 §4's text choice was not gratuitous. It deliberately keeps DML/SELECT/CHECK expressions out of the DSL's intentionally limited Expr (ADR-0026), so advanced mode delivers the full SQL expression surface — arithmetic, functions, subqueries, nested boolean operands — that docs/simple-mode-limitations.md records as the inverse of the simple subset. Lowering SQL expressions into the DSL Expr would regress that surface; building a full typed SQL-expression AST + serializer is a large undertaking that ADR-0031 explicitly declined (sql_expr is validate-only, no Expr AST).

And SELECT is the proof that text-to-engine is the right tool for queries: ADR-0032 already delivers rich feedback for SELECT — completion, qualified-name resolution, predicate warnings, post-prepare type recovery — entirely from walking the validated parse, with the engine executing the text. Queries have no data values to validate against columns; owning them buys nothing and costs enormously.

So the dividing line is not "DDL vs DML." It is bindable literal vs engine-evaluated expression-or-query.

Decision

1. The principle

Own structure and literal values; reserve verbatim text for expressions and queries.

Classification is per value position, not per statement. Wherever a value position holds a bare literal, parse it to a typed Value, validate it against the playground type system, surface the column type in completion, highlight a mismatch, and execute it as a bound parameter — exactly as the simple-mode DSL does. Wherever the position holds a computed expression (qty*2, a function call, a scalar subquery), or the value source is itself a query (INSERT … SELECT), hand the validated text to the engine — because that is where text-to-engine is necessary, not merely convenient.

The test for which side a position falls on is operational: can this value be bound as a parameter? A literal can; a computed expression or a subquery cannot — the engine must evaluate it.

What counts as a literal (the bindable set — matching the DSL lexer's consume_number_literal / consume_string_literal and the null/true/false words): NULL, a boolean literal, a string literal, and a signed numeric literal (-5, 3.14). A signed numeric must count as a literal even if the grammar would otherwise read -5 as unary-minus-of-5, or negative numbers would silently skip validation. Anything else in a value position — arithmetic, function calls, CASE, subqueries, column references — is an expression and stays text.

Clauses are not value sources. RETURNING and ON CONFLICT … are clauses on a statement, not value positions. An INSERT INTO T VALUES (1, 'x') RETURNING id or … ON CONFLICT DO UPDATE … still has bindable literal VALUES; the clause rides on the generated statement (and its own expressions — e.g. excluded.x in a DO UPDATE SET — stay text). Only when the value source is a query (INSERT … SELECT) is there nothing to bind.

2. What this means per statement

  • INSERT … VALUES — each value position is a literal or an expression. Literal positions are typed/validated/bound; expression positions stay text. A row that is all literals executes exactly like the DSL do_insert (VALUES (?1, ?2, …)); a row mixing the two generates VALUES (?1, a+1, ?2). Multi-row VALUES repeats this per row. A trailing RETURNING or ON CONFLICT … clause is carried on the generated statement — the bound VALUES are unaffected; the clause's own expressions stay text.
  • UPDATE … SETSET col = <literal> is typed/validated/bound; SET col = <expr> stays text.
  • WHERE (UPDATE/DELETE)stays text, deliberately, even for a simple WHERE id = 5. WHERE is an expression in general; keeping it text preserves the full SQL surface (ADR-0026's Expr is the DSL's representation and is not imposed on SQL), and the value-feedback motivation is already met by VALUES/SET — a constraint error names a written value, not a predicate operand. This is a scope choice, not an oversight; per-position binding is not applied to WHERE.
  • DELETE — no data values to bind; WHERE per the rule above.
  • INSERT … SELECT — entirely text: the value source is a query, so there are no literals to bind.
  • SELECT — unchanged. Text-to-engine, feedback-from-walk (ADR-0032). Explicitly not lowered to a typed AST.

3. What it fixes — one change, four faces

Retaining the typed literal (a) lets validation run on it (the proven gap), (b) lets completion hint the column type and highlighting flag a mismatch, (c) lets the worker bind it (safe, type-correct), and (d) leaves it on the command so enrich_* can show the real value in a constraint error. The neutral "that value" safety net (ADR-0035 Amendment 1) remains correct for the genuinely-computed cases — there is no input literal to show, and that is the honest answer.

4. Explicit requirement — carry the typed literals

SqlInsert/SqlUpdate (or their successors) must retain the parsed typed literal values (per column position), not only the raw text, so that both the executor (binding) and the error enricher (display) can use them. This is the concrete hinge on which faces (a), (c), and (d) turn; it must not be left implicit.

The command also keeps the user's original source text: that is what history.log records and what replay re-runs (ADR-0034), so the journal stays faithful to what the user typed even though execution now binds the literals from a generated statement. Two representations, one command: original text for the journal, parsed literals for execution + feedback + errors.

  • Recommended — typed-slot hybrid grammar. At each DML value position, the SQL grammar offers Choice(typed-literal-slot, sql_expr): the typed slot (reusing the DSL's live column_value_list / per-type TypedValueSlots — data.rs:141/189/269, shared.rs) captures + validates + retains the literal and drives completion/highlighting; sql_expr is the fallback for a genuine expression. This is the only option that delivers live completion hints for SQL DML values, because the hint comes from the grammar walk.
  • Alternative — post-parse semantic pass. Keep the grammar as sql_expr and add a pass that, for positions that are bare literals, validates + retains them. Simpler; delivers validation, retention, and the error-value fix, but not live completion hinting. Recorded as the fallback if the hybrid grammar proves too costly in the walker.

6. Phased implementation

  1. Literal INSERT … VALUES — including the RETURNING and ON CONFLICT … variants (they exist today and the shortid-autofill path already carries those trailing clauses, so excluding them would be the artificial split). The common case; converges execution on the DSL bound path; closes the proven validation gap and the error-value display in one move. Highest value, smallest blast radius.
  2. UPDATE … SET literals (bind literal assignments; expression assignments stay text).
  3. Completion/highlighting for the typed slots on the SQL surface (if not already delivered by the mechanism chosen in §5).

SELECT and INSERT … SELECT (and every expression value position) remain on the text side throughout — by design and documented, not by omission. RETURNING / ON CONFLICT are not on this list: they are clauses over a statement whose literal VALUES still bind (§1/§2), and Phase 1 carries them.

7. Non-goals

  • A structural SELECT. Not now, not as a consequence of this ADR. Queries stay text. (If a future need ever forces SELECT toward an owned AST, that is its own ADR with its own justification — flagged as a watch-item, not a plan.)
  • A full typed SQL-expression AST. Expressions stay validated text; ADR-0031's "no Expr AST" stands. We do not impose the DSL's limited Expr on the SQL surface (that would regress ADR-0030 §4's full-surface guarantee).
  • Removing the verbatim path. It remains the executor for every expression and query position; this ADR narrows where it applies, it does not delete it.

Consequences

  • Advanced mode stops being a feedback-free zone for data values. A learner typing a malformed date/shortid/int literal in a SQL INSERT gets the same catch-and-explain they get in simple mode.
  • One model, fewer paths. Literal DML converges on the DSL's proven parameter-binding execution; the two surfaces stop diverging on the case they share.
  • The full SQL expression + query surface is preserved, because expressions and SELECT are explicitly left on the text side. No regression to the advanced surface ADR-0030 §4 protects.
  • Rework, not rewrite. ADR-0033's Sql* command variants and the worker handlers stay; they gain typed-literal payloads and the literal execution path changes. Phased so each step is independently shippable and testable, with the existing ADR-0033 tests as the regression net.
  • A new seam to keep honest: the literal-vs-expression boundary in the grammar. It must be explicit and tested (a position that is a bare literal binds; a position that is any expression stays text; a signed numeric is a literal; NULL/true/false are literals), or it will drift.
  • Execution reconstruction is the load-bearing implementation challenge. A row mixing literals and expressions must be re-emitted as VALUES (?1, expr-text, ?2) — bound placeholders spliced with the original expression spans, with any RETURNING/ON CONFLICT tail re-appended. This is the strongest argument for the typed-slot grammar mechanism (§5): it yields per-position structure (slot → bind; sql_expr → text span) directly, whereas the post-parse pass must recover those spans itself. The existing shortid-autofill rewrite (plan_shortid_autofill) already does statement reconstruction with bound params + a preserved trailing clause, so the pattern is not new — but it must be got right or a literal silently becomes text.
  • Cost if we are wrong about SELECT. If a future requirement shows SELECT genuinely needs an owned AST, this ADR does not help with it — but nor does it obstruct it. The boundary chosen here (queries = text) is the current best line, taken with eyes open.

See also

  • ADR-0030 §4 — the execute-path split this ADR narrows for DML (the SELECT half stands).
  • ADR-0033 — the SQL DML grammar + the verbatim execution model this ADR amends for literal values.
  • ADR-0035 — the DDL precedent: structural, not verbatim ("verbatim was a DML convenience, not a rule").
  • ADR-0026 — the DSL's deliberately-limited Expr; not imposed on the SQL surface here.
  • ADR-0031 — sql_expr is validate-only (no Expr AST); unchanged.
  • ADR-0032 — SELECT feedback-from-walk; the proof that text-to-engine is right for queries.
  • ADR-0029 — the column type/constraint model the literal validators enforce.
  • ADR-0035 Amendment 1 (F2 follow-up) — the neutral "that value" safety net that remains correct for computed values.