# ADR-0036: Value validation for advanced-mode DML — validate literals, keep execution and identity mode-specific

## Status

**Accepted** (design agreed with the user in conversation, 2026-05-26;
`/runda` verification pass completed 2026-05-26; the mechanism was then
**deliberately narrowed** during the same conversation — see below — from
"bind literal values through the DSL's path" to the surgical
**"validate-and-retain, execute verbatim"** after the user pushed back on
consolidating the two modes and a concrete auto-fill difference confirmed
that even the single-row literal case is **not** identical across modes).
**Phase 1 implemented 2026-05-26** (`INSERT … VALUES` literal validation +
offending-value retention; capture-at-parse, no grammar change, execution
unchanged). **Phase 2 implemented 2026-05-26** (`UPDATE … SET` literal
validation + offending-value retention; the same capture-at-parse technique
on the SET assignment list — `capture_set_literals` in `data.rs` —
classifying each top-level RHS literal-vs-expression, validating literals in
`do_sql_update`, and reading them in `user_value_for_column`; `WHERE` is not
validated, execution stays verbatim). Phase 3 (completion
hinting/highlighting — the only part needing a grammar change) pending.

**Augments** **ADR-0030 §4** and **ADR-0033 §10** — it does **not**
supersede them and does **not** change the execution model. Advanced-mode
DML still executes the validated SQL **verbatim**; ADR-0033 Amendment 3's
two-command identity (`Command::Insert` vs `Command::SqlInsert`) **stands
unchanged**. What this ADR adds is a **value-validation step**: the word
"validated" in "executed as the validated SQL itself" (ADR-0030 §4) is
extended to mean *value*-validated, not merely *syntactically* validated —
the literal data values in an advanced-mode `INSERT`/`UPDATE` are checked
against the playground type system (and retained for error reporting)
before the statement runs.

Builds on the ADR-0035 precedent (DDL executes *structurally*, not
verbatim): there, structure was the first place "grammar as text" was too
broad. This ADR makes a **narrower** correction for DML — not to *how* it
executes, but to *what gets checked* before it does.

**Conversation note (the principle this records).** The first instinct was
to *consolidate* — bind literals via the DSL path, even emit
`Command::Insert` from the advanced surface. That was rejected, for a
reason worth preserving: simple- and advanced-mode commands are kept
distinct **because they can legitimately differ**, and they do — e.g.
auto-fill: simple-mode `do_insert` fills an omitted non-PK `serial` with
`MAX(col)+1`, advanced-mode does not (`requirements.md` **X4**, flagged as
a possible bug to investigate separately). Collapsing the commands would
silently drag in such differences. The durable principle (also
`requirements.md` **X5**): **keep a distinct command per distinct case;
share execution *mechanics* as library helpers, never by fusing command
identity.** This ADR shares exactly one mechanic — the per-type value
validators — and nothing else.

## Context

### How we got here

ADR-0030 §4 set the advanced-mode execute path: **DDL** lowers to a typed
`Command` and runs the structural executor (to preserve the playground
type vocabulary, named relationships, metadata tables, and `STRICT`);
**DML and `SELECT`** execute "as the validated SQL itself," on the
stated rationale that "they change no schema, so modelling them as a
typed `Command` buys nothing." ADR-0033 implemented that for DML:
`SqlInsert`/`SqlUpdate`/`SqlDelete` carry the validated statement text
(`row_source`, the raw `sql`) and the worker hands it to the engine.

ADR-0035 already found the rationale too broad for DDL and went
structural. This ADR finds it too broad for one more case: **the literal
data values inside DML.**

### What "verbatim text for literal values" actually costs

The simple-mode DSL never did it this way. `do_insert` parses each value
into a typed `Value`, validates/normalises it (`Value::bind_for_column`
→ `validate_date`, `shortid::validate`, …), and executes
`INSERT INTO T (…) VALUES (?1, ?2, …)` with the values **bound as
parameters**. The value never becomes SQL text. The advanced-mode SQL
path, by contrast, splices the user's literal into SQL text and lets a
`STRICT` engine be the only check.

A `date` column is `STRICT TEXT`; a `shortid` is `TEXT`; a `bool` is an
int — the engine's storage types do **not** enforce the playground's
*semantic* types. So the two paths diverge, and advanced mode is
materially weaker. Investigated 2026-05-26; the matrix:

| Feedback for a DML value | DSL (simple) | SQL (advanced) |
| --- | --- | --- |
| Column-type hint in completion | ✅ typed slots (incl. `date` format examples) | ❌ raw `sql_expr` |
| Value-vs-column highlighting | ✅ numeric-shape mismatch at parse | ❌ none |
| Validation at parse | ⚠️ numeric shape only (`int`/`decimal`/`bool`); `date`/`shortid` format deferred to bind | ❌ none |
| Validation at execution (bind) | ✅ full semantic type | ❌ none (verbatim) |

Precise reading (verified 2026-05-26): the DSL typed slots
(`shared.rs`) validate *numeric shape* at parse — `INT_SLOT` rejects
decimals, `DECIMAL_SLOT` checks format, `BOOL_SLOT` restricts to boolean
literals — and surface a per-type hint for *every* type (the `DATE_SLOT`
carries the `YYYY-MM-DD` example prose). Full semantic validation —
`date`/`shortid`/`datetime` *format* — happens at **bind** time
(`Value::bind_for_column` → `validate_date` / `shortid::validate`). So
the DSL catches a bad value *somewhere* (parse for numeric shape, bind
for the rest); advanced-mode SQL catches it **nowhere** but the engine's
storage-type floor. That asymmetry — "DSL always catches it, SQL never
does" — is the gap, and it holds across all semantic types.

The execution-layer gap is **proven** by a characterization test
(`tests/sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces`):
the DSL rejects the malformed date `2025/01/15`; advanced-mode SQL
accepts it and writes the bad row. The only advanced-mode DML
diagnostics are *structural* (`insert_arity_mismatch`,
`auto_column_overridden`, `not_null_missing`) — never value-vs-type.

The machinery to fix this **already exists and is live for the DSL**:
`column_value_list` unfolds a per-column `TypedValueSlot` when the walker
has schema (`data.rs:141`/`189`/`269`; slots in `shared.rs`). The SQL DML
grammar simply was never wired to it — every value position is
`Node::Subgrammar(&sql_expr::SQL_OR_EXPR)` (`sql_insert.rs:75`),
type-blind by construction. So the asymmetry is **not** a deliberate
"advanced mode doesn't need this" decision — **no ADR says so** — it is
an un-wired surface. (A stale header comment at `data.rs:8-17` still
describes the DSL slots themselves as "deferred"; it predates the wiring
that data.rs:141/189/269 now show, and should be corrected as part of
this work.) For a teaching tool, where the whole point is to catch a
learner's mistake and explain it, silently accepting a malformed value
is a pedagogy failure, not a feature.

### The same root cause behind the error-value gap

A separate symptom shares this root cause. When a SQL `INSERT`/`UPDATE`
violates a UNIQUE/CHECK constraint, the friendly-error layer cannot show
the offending **value** — because the value was discarded (only
`row_source` text survives), so `enrich_unique_violation` /
`enrich_check_violation` come up empty and degrade to a neutral "that
value" (ADR-0035 Amendment 1, F2 follow-up). Validation, hinting,
highlighting, and the offending-value-in-errors display are **four faces
of one defect**: literal values are thrown away instead of owned.

### The sharp edge — why we do *not* go fully structural

ADR-0030 §4's text choice was not gratuitous. It deliberately keeps
DML/`SELECT`/`CHECK` **expressions** out of the DSL's intentionally
*limited* `Expr` (ADR-0026), so advanced mode delivers the **full** SQL
expression surface — arithmetic, functions, subqueries, nested boolean
operands — that `docs/simple-mode-limitations.md` records as the inverse
of the simple subset. Lowering SQL expressions into the DSL `Expr` would
**regress that surface**; building a full typed SQL-expression AST +
serializer is a large undertaking that ADR-0031 explicitly declined
(`sql_expr` is validate-only, no `Expr` AST).

And `SELECT` is the proof that text-to-engine is the *right* tool for
queries: ADR-0032 already delivers rich feedback for `SELECT` —
completion, qualified-name resolution, predicate warnings, post-prepare
type recovery — entirely from **walking the validated parse**, with the
engine executing the text. Queries have no data values to validate
against columns; owning them buys nothing and costs enormously.

So the dividing line is **not** "DDL vs DML." It is **a static literal
value (which we can validate) vs an engine-evaluated
expression-or-query (which we cannot).**

## Decision

### 1. The principle

> **In advanced-mode `INSERT`/`UPDATE`, validate each literal data value
> against its target column's type before executing, and retain the
> literal so a constraint error can name it. Execute the statement
> verbatim, exactly as today. Do not bind, do not reconstruct, do not
> touch auto-fill, do not collapse command identity.**

Only the **value validation** is shared between simple and advanced mode —
via the existing per-type validators (`Value::bind_for_column` /
`validate_date` / `shortid::validate`). Everything else stays
mode-specific: execution is still verbatim text-to-engine,
`plan_shortid_autofill` is untouched, and `Command::SqlInsert` /
`Command::SqlUpdate` remain distinct from their DSL counterparts.

**What counts as a literal** (the set we validate — matching the
`null`/`true`/`false` words plus number/string literals as the walker
tokenises them): `NULL`, a boolean literal, a string literal, and a
**signed** numeric literal (`-5`, `3.14`). A signed numeric counts as a
literal even though `sql_expr` tokenises the sign separately (`Punct('-')`
then `NumberLit`) — a leading sign at the start of a value position is
part of the literal, not an operator. Anything else in a value position —
arithmetic, function calls, `CASE`, subqueries, column references — is an
**expression**: there is no static value to validate, so it is left to the
engine (unchanged).

**Why not bind / converge.** Binding was the *first* instinct and is
**rejected**. The two proven gaps (a malformed literal slipping through;
the offending value missing from errors) are closed by **validation +
retention alone** — binding adds nothing to either. Meanwhile, executing
the user's *own* text verbatim is already safe (their quoting stands; no
re-quoting risk because we do not reconstruct), and binding/convergence
would risk dragging in genuinely mode-specific behaviour (auto-fill — X4;
natural-order column mapping) that must stay separate. So we share the
validators and nothing else. This keeps the modes cleanly apart
(`requirements.md` X5) while fixing the bug that they should *not* differ
on: whether a learner's malformed value is caught.

### 2. What this means per statement

Execution is **unchanged** for every statement below; the only addition is
a pre-execution validation of literal value positions.

- **`INSERT … VALUES`** — every literal position (single- or multi-row) is
  validated against its column type before the verbatim insert runs; a
  malformed literal is refused with the same friendly wording the DSL uses
  (shared `bind_for_column`). Expression positions are skipped (nothing to
  validate). `RETURNING` / `ON CONFLICT` / `INSERT … SELECT` need no
  special handling — validation simply applies to whatever literal
  `VALUES` are present, and the statement still executes verbatim.
- **`UPDATE … SET`** — `SET col = <literal>` is validated; `SET col =
  <expr>` is skipped. (Phase 2 — see §5.)
- **`WHERE` (UPDATE/DELETE)** — **not** validated. `WHERE` is an
  expression in general; the value-feedback motivation is met by
  `VALUES`/`SET` (a constraint error names a *written* value). Deliberate
  scope choice, not an oversight.
- **`SELECT`** — entirely unchanged. No data values to validate.

### 3. What it fixes

Validating the literal closes the **validation gap** (the malformed `date`
`2025/01/15` is now refused in advanced mode, as proven by the
characterization test). Retaining the literal on the command closes the
**error-value gap** (`enrich_*` reads it, so a constraint error shows the
real value instead of the neutral "that value"). Completion **hinting /
highlighting** is **not** delivered here — it needs a grammar-level change
(§5, Phase 2). The neutral "that value" safety net (ADR-0035 Amendment 1)
remains correct for genuinely-computed expression values — there is no
input literal to show.

### 4. Explicit requirement — retain the literals, change nothing else

`Command::SqlInsert` (and later `SqlUpdate`) **gains a captured-literals
payload** (per row, per position; `None` for an expression position) in
addition to the existing raw text. The executor validates from it and the
error enricher reads it. The original source text is **unchanged** and is
still what `history.log` records and `replay` re-runs (ADR-0034). The
command variant, its execution, and `plan_shortid_autofill` are **not**
modified. Validation reuses the existing value-binding helper
(`impl_value_for` / `Value::bind_for_column`) for wording parity with the
DSL — the resulting bound value is **discarded** (we do not bind for
execution), only its `Result` is used.

### 5. Mechanism + phasing

- **Phase 1 (this ADR's immediate work) — capture + validate + retain.**
  At parse, `build_sql_insert` classifies each `VALUES` position from the
  matched path (a single literal token, or a signed number → a typed
  `Value`; anything else → an expression marker) and stores the per-row
  result on the command — **no grammar change, no reparse**. The executor
  validates the captured literals against the resolved column types before
  the verbatim insert; the enricher reads them. Covers single- and
  multi-row, with or without `RETURNING`/`ON CONFLICT`, because execution
  is untouched.
- **Phase 2 (implemented 2026-05-26) — `UPDATE … SET` literal
  validation.** The same capture-at-parse technique on the SET assignment
  list: `build_sql_update` calls `capture_set_literals`, which walks the
  matched tokens (no reparse) and classifies each *top-level* `SET col =
  <rhs>` into `(col, Some(Value))` for a bare literal (incl. a signed
  number) or `(col, None)` for an expression — using paren depth so a comma
  inside a function call or a `where` inside a scalar subquery is never
  mistaken for an assignment/clause boundary, and so the trailing top-level
  `WHERE` predicate is excluded. `Command::SqlUpdate` gains a
  `set_literals` payload; `do_sql_update` validates the literals against
  their column types (via the shared `impl_value_for`) before the still
  verbatim update; `user_value_for_column` reads them so a constraint error
  names the offending value. `WHERE` is deliberately not validated (§2).
- **Phase 3 — completion hinting / highlighting.** This is the *only*
  part that needs a grammar change: a `Choice(typed-literal-slot,
  sql_expr)` at each value position (reusing the DSL's live
  `column_value_list` / `TypedValueSlot`s — `data.rs:141`/`189`/`269`),
  so the column type drives a live hint and a mismatch highlights while
  typing. When Phase 3 lands, the typed slot supersedes Phase 1's
  classification of literals (the validation/enrichment built on top is
  unaffected — that is the only throwaway, by design).

### 6. Non-goals

- **Binding / statement reconstruction.** Explicitly out. Execution stays
  verbatim. (This was the rejected first instinct.)
- **Collapsing command identity.** `Command::Insert` and
  `Command::SqlInsert` stay distinct; **ADR-0033 Amendment 3 stands**.
- **Changing auto-fill.** The simple-vs-advanced `serial`/`shortid`
  auto-fill difference (`requirements.md` X4) is **untouched** here and
  tracked separately as a possible bug.
- **A structural `SELECT`** and **a full typed SQL-expression AST** —
  both out (queries and expressions stay text; ADR-0031's "no `Expr` AST"
  and ADR-0030 §4's full-surface guarantee stand).

## Consequences

- **Advanced mode stops being a feedback-free zone for data values.** A
  learner typing a malformed `date`/`shortid`/`int` literal in a SQL
  `INSERT` gets the same catch-and-explain they get in simple mode —
  via the *shared validator*, not a shared command.
- **The modes stay cleanly separate.** Execution, auto-fill, and command
  identity are all unchanged; the only thing now shared is the value
  validators. This is the `requirements.md` X5 principle in practice
  (share a mechanic, not a command) and avoids the consolidation traps
  (X4 auto-fill) that the bind/converge approach would have hit.
- **Small, low-risk, no execution reconstruction.** Because we do not
  rebuild the statement, there is no "mixed `VALUES (?1, expr, ?2)`"
  splicing problem, no multi-row execution change, and no `RETURNING`/
  `ON CONFLICT`/`INSERT … SELECT` special-casing — they keep working as
  the existing ADR-0033 tests assert.
- **One new seam to keep honest:** the literal-vs-expression
  classification at parse. It must be tested (single literal / signed
  literal / `NULL`/`true`/`false` → validated; arithmetic / function /
  subquery → skipped), or it will drift.
- **A normalization difference is *avoided*, not introduced.** We
  validate the literal but do not rewrite it; the engine stores the
  user's text as written. (Had we bound/normalized, advanced inserts
  might store a canonicalised value — a behaviour change we sidestep.)
- **Phase 3 will revisit literal *detection*** (swapping the parse-time
  classification for typed slots that also drive hints). The
  validation/enrichment built on it is permanent; only the detection is
  provisional — a deliberate, documented small throwaway.

## See also

- ADR-0030 §4 / ADR-0033 §10 — the execute-path this ADR **augments**
  (adds value validation); the verbatim execution model and the
  `SELECT`/expression text path both stand.
- ADR-0033 Amendment 3 — the two-command identity, **preserved** (this ADR
  does *not* collapse `Insert`/`SqlInsert`).
- ADR-0035 — the DDL precedent (structural, not verbatim); this ADR is the
  narrower DML analogue (validate, don't restructure).
- ADR-0026 — the DSL's deliberately-limited `Expr`; *not* imposed on the
  SQL surface. ADR-0031 — `sql_expr` is validate-only; unchanged.
- ADR-0032 — `SELECT` feedback-from-walk; the proof that text-to-engine is
  right for queries.
- ADR-0029 — the column type/constraint model the shared validators
  enforce.
- ADR-0035 Amendment 1 (F2 follow-up) — the neutral "that value" safety
  net, correct for computed values.
- `requirements.md` **X4** (auto-fill difference — possible bug, untouched
  here) and **X5** (framework cohesion / share-mechanics-not-commands —
  the principle this ADR follows).