Mirror Phase 1's capture-at-parse technique on the UPDATE SET assignment list. build_sql_update calls the new capture_set_literals (data.rs), which walks the matched tokens (no reparse, no grammar change) and classifies each top-level `SET col = <rhs>` as a literal (Some, incl. signed numbers) or an expression (None), using paren depth so a comma inside a function call or a `where` inside a scalar subquery is not mistaken for a boundary, and the trailing top-level WHERE is excluded. Command::SqlUpdate gains set_literals; do_sql_update validates the literals against their column types via the shared impl_value_for before the still verbatim update; user_value_for_column reads them so a constraint error names the offending value. WHERE stays unvalidated; execution and command identity are unchanged. Also corrects the stale data.rs header comment (DSL typed slots are wired, not "deferred") and flips ADR-0036 + README to Phases 1–2 implemented. Tests: 1934 passing (+4), 0 failed, 0 skipped, 1 ignored; clippy clean.
19 KiB
ADR-0036: Value validation for advanced-mode DML — validate literals, keep execution and identity mode-specific
Status
Accepted (design agreed with the user in conversation, 2026-05-26;
/runda verification pass completed 2026-05-26; the mechanism was then
deliberately narrowed during the same conversation — see below — from
"bind literal values through the DSL's path" to the surgical
"validate-and-retain, execute verbatim" after the user pushed back on
consolidating the two modes and a concrete auto-fill difference confirmed
that even the single-row literal case is not identical across modes).
Phase 1 implemented 2026-05-26 (INSERT … VALUES literal validation +
offending-value retention; capture-at-parse, no grammar change, execution
unchanged). Phase 2 implemented 2026-05-26 (UPDATE … SET literal
validation + offending-value retention; the same capture-at-parse technique
on the SET assignment list — capture_set_literals in data.rs —
classifying each top-level RHS literal-vs-expression, validating literals in
do_sql_update, and reading them in user_value_for_column; WHERE is not
validated, execution stays verbatim). Phase 3 (completion
hinting/highlighting — the only part needing a grammar change) pending.
Augments ADR-0030 §4 and ADR-0033 §10 — it does not
supersede them and does not change the execution model. Advanced-mode
DML still executes the validated SQL verbatim; ADR-0033 Amendment 3's
two-command identity (Command::Insert vs Command::SqlInsert) stands
unchanged. What this ADR adds is a value-validation step: the word
"validated" in "executed as the validated SQL itself" (ADR-0030 §4) is
extended to mean value-validated, not merely syntactically validated —
the literal data values in an advanced-mode INSERT/UPDATE are checked
against the playground type system (and retained for error reporting)
before the statement runs.
Builds on the ADR-0035 precedent (DDL executes structurally, not verbatim): there, structure was the first place "grammar as text" was too broad. This ADR makes a narrower correction for DML — not to how it executes, but to what gets checked before it does.
Conversation note (the principle this records). The first instinct was
to consolidate — bind literals via the DSL path, even emit
Command::Insert from the advanced surface. That was rejected, for a
reason worth preserving: simple- and advanced-mode commands are kept
distinct because they can legitimately differ, and they do — e.g.
auto-fill: simple-mode do_insert fills an omitted non-PK serial with
MAX(col)+1, advanced-mode does not (requirements.md X4, flagged as
a possible bug to investigate separately). Collapsing the commands would
silently drag in such differences. The durable principle (also
requirements.md X5): keep a distinct command per distinct case;
share execution mechanics as library helpers, never by fusing command
identity. This ADR shares exactly one mechanic — the per-type value
validators — and nothing else.
Context
How we got here
ADR-0030 §4 set the advanced-mode execute path: DDL lowers to a typed
Command and runs the structural executor (to preserve the playground
type vocabulary, named relationships, metadata tables, and STRICT);
DML and SELECT execute "as the validated SQL itself," on the
stated rationale that "they change no schema, so modelling them as a
typed Command buys nothing." ADR-0033 implemented that for DML:
SqlInsert/SqlUpdate/SqlDelete carry the validated statement text
(row_source, the raw sql) and the worker hands it to the engine.
ADR-0035 already found the rationale too broad for DDL and went structural. This ADR finds it too broad for one more case: the literal data values inside DML.
What "verbatim text for literal values" actually costs
The simple-mode DSL never did it this way. do_insert parses each value
into a typed Value, validates/normalises it (Value::bind_for_column
→ validate_date, shortid::validate, …), and executes
INSERT INTO T (…) VALUES (?1, ?2, …) with the values bound as
parameters. The value never becomes SQL text. The advanced-mode SQL
path, by contrast, splices the user's literal into SQL text and lets a
STRICT engine be the only check.
A date column is STRICT TEXT; a shortid is TEXT; a bool is an
int — the engine's storage types do not enforce the playground's
semantic types. So the two paths diverge, and advanced mode is
materially weaker. Investigated 2026-05-26; the matrix:
| Feedback for a DML value | DSL (simple) | SQL (advanced) |
|---|---|---|
| Column-type hint in completion | ✅ typed slots (incl. date format examples) |
❌ raw sql_expr |
| Value-vs-column highlighting | ✅ numeric-shape mismatch at parse | ❌ none |
| Validation at parse | ⚠️ numeric shape only (int/decimal/bool); date/shortid format deferred to bind |
❌ none |
| Validation at execution (bind) | ✅ full semantic type | ❌ none (verbatim) |
Precise reading (verified 2026-05-26): the DSL typed slots
(shared.rs) validate numeric shape at parse — INT_SLOT rejects
decimals, DECIMAL_SLOT checks format, BOOL_SLOT restricts to boolean
literals — and surface a per-type hint for every type (the DATE_SLOT
carries the YYYY-MM-DD example prose). Full semantic validation —
date/shortid/datetime format — happens at bind time
(Value::bind_for_column → validate_date / shortid::validate). So
the DSL catches a bad value somewhere (parse for numeric shape, bind
for the rest); advanced-mode SQL catches it nowhere but the engine's
storage-type floor. That asymmetry — "DSL always catches it, SQL never
does" — is the gap, and it holds across all semantic types.
The execution-layer gap is proven by a characterization test
(tests/sql_insert.rs::sql_dml_skips_app_level_value_validation_that_the_dsl_enforces):
the DSL rejects the malformed date 2025/01/15; advanced-mode SQL
accepts it and writes the bad row. The only advanced-mode DML
diagnostics are structural (insert_arity_mismatch,
auto_column_overridden, not_null_missing) — never value-vs-type.
The machinery to fix this already exists and is live for the DSL:
column_value_list unfolds a per-column TypedValueSlot when the walker
has schema (data.rs:141/189/269; slots in shared.rs). The SQL DML
grammar simply was never wired to it — every value position is
Node::Subgrammar(&sql_expr::SQL_OR_EXPR) (sql_insert.rs:75),
type-blind by construction. So the asymmetry is not a deliberate
"advanced mode doesn't need this" decision — no ADR says so — it is
an un-wired surface. (A stale header comment at data.rs:8-17 still
describes the DSL slots themselves as "deferred"; it predates the wiring
that data.rs:141/189/269 now show, and should be corrected as part of
this work.) For a teaching tool, where the whole point is to catch a
learner's mistake and explain it, silently accepting a malformed value
is a pedagogy failure, not a feature.
The same root cause behind the error-value gap
A separate symptom shares this root cause. When a SQL INSERT/UPDATE
violates a UNIQUE/CHECK constraint, the friendly-error layer cannot show
the offending value — because the value was discarded (only
row_source text survives), so enrich_unique_violation /
enrich_check_violation come up empty and degrade to a neutral "that
value" (ADR-0035 Amendment 1, F2 follow-up). Validation, hinting,
highlighting, and the offending-value-in-errors display are four faces
of one defect: literal values are thrown away instead of owned.
The sharp edge — why we do not go fully structural
ADR-0030 §4's text choice was not gratuitous. It deliberately keeps
DML/SELECT/CHECK expressions out of the DSL's intentionally
limited Expr (ADR-0026), so advanced mode delivers the full SQL
expression surface — arithmetic, functions, subqueries, nested boolean
operands — that docs/simple-mode-limitations.md records as the inverse
of the simple subset. Lowering SQL expressions into the DSL Expr would
regress that surface; building a full typed SQL-expression AST +
serializer is a large undertaking that ADR-0031 explicitly declined
(sql_expr is validate-only, no Expr AST).
And SELECT is the proof that text-to-engine is the right tool for
queries: ADR-0032 already delivers rich feedback for SELECT —
completion, qualified-name resolution, predicate warnings, post-prepare
type recovery — entirely from walking the validated parse, with the
engine executing the text. Queries have no data values to validate
against columns; owning them buys nothing and costs enormously.
So the dividing line is not "DDL vs DML." It is a static literal value (which we can validate) vs an engine-evaluated expression-or-query (which we cannot).
Decision
1. The principle
In advanced-mode
INSERT/UPDATE, validate each literal data value against its target column's type before executing, and retain the literal so a constraint error can name it. Execute the statement verbatim, exactly as today. Do not bind, do not reconstruct, do not touch auto-fill, do not collapse command identity.
Only the value validation is shared between simple and advanced mode —
via the existing per-type validators (Value::bind_for_column /
validate_date / shortid::validate). Everything else stays
mode-specific: execution is still verbatim text-to-engine,
plan_shortid_autofill is untouched, and Command::SqlInsert /
Command::SqlUpdate remain distinct from their DSL counterparts.
What counts as a literal (the set we validate — matching the
null/true/false words plus number/string literals as the walker
tokenises them): NULL, a boolean literal, a string literal, and a
signed numeric literal (-5, 3.14). A signed numeric counts as a
literal even though sql_expr tokenises the sign separately (Punct('-')
then NumberLit) — a leading sign at the start of a value position is
part of the literal, not an operator. Anything else in a value position —
arithmetic, function calls, CASE, subqueries, column references — is an
expression: there is no static value to validate, so it is left to the
engine (unchanged).
Why not bind / converge. Binding was the first instinct and is
rejected. The two proven gaps (a malformed literal slipping through;
the offending value missing from errors) are closed by validation +
retention alone — binding adds nothing to either. Meanwhile, executing
the user's own text verbatim is already safe (their quoting stands; no
re-quoting risk because we do not reconstruct), and binding/convergence
would risk dragging in genuinely mode-specific behaviour (auto-fill — X4;
natural-order column mapping) that must stay separate. So we share the
validators and nothing else. This keeps the modes cleanly apart
(requirements.md X5) while fixing the bug that they should not differ
on: whether a learner's malformed value is caught.
2. What this means per statement
Execution is unchanged for every statement below; the only addition is a pre-execution validation of literal value positions.
INSERT … VALUES— every literal position (single- or multi-row) is validated against its column type before the verbatim insert runs; a malformed literal is refused with the same friendly wording the DSL uses (sharedbind_for_column). Expression positions are skipped (nothing to validate).RETURNING/ON CONFLICT/INSERT … SELECTneed no special handling — validation simply applies to whatever literalVALUESare present, and the statement still executes verbatim.UPDATE … SET—SET col = <literal>is validated;SET col = <expr>is skipped. (Phase 2 — see §5.)WHERE(UPDATE/DELETE) — not validated.WHEREis an expression in general; the value-feedback motivation is met byVALUES/SET(a constraint error names a written value). Deliberate scope choice, not an oversight.SELECT— entirely unchanged. No data values to validate.
3. What it fixes
Validating the literal closes the validation gap (the malformed date
2025/01/15 is now refused in advanced mode, as proven by the
characterization test). Retaining the literal on the command closes the
error-value gap (enrich_* reads it, so a constraint error shows the
real value instead of the neutral "that value"). Completion hinting /
highlighting is not delivered here — it needs a grammar-level change
(§5, Phase 2). The neutral "that value" safety net (ADR-0035 Amendment 1)
remains correct for genuinely-computed expression values — there is no
input literal to show.
4. Explicit requirement — retain the literals, change nothing else
Command::SqlInsert (and later SqlUpdate) gains a captured-literals
payload (per row, per position; None for an expression position) in
addition to the existing raw text. The executor validates from it and the
error enricher reads it. The original source text is unchanged and is
still what history.log records and replay re-runs (ADR-0034). The
command variant, its execution, and plan_shortid_autofill are not
modified. Validation reuses the existing value-binding helper
(impl_value_for / Value::bind_for_column) for wording parity with the
DSL — the resulting bound value is discarded (we do not bind for
execution), only its Result is used.
5. Mechanism + phasing
- Phase 1 (this ADR's immediate work) — capture + validate + retain.
At parse,
build_sql_insertclassifies eachVALUESposition from the matched path (a single literal token, or a signed number → a typedValue; anything else → an expression marker) and stores the per-row result on the command — no grammar change, no reparse. The executor validates the captured literals against the resolved column types before the verbatim insert; the enricher reads them. Covers single- and multi-row, with or withoutRETURNING/ON CONFLICT, because execution is untouched. - Phase 2 (implemented 2026-05-26) —
UPDATE … SETliteral validation. The same capture-at-parse technique on the SET assignment list:build_sql_updatecallscapture_set_literals, which walks the matched tokens (no reparse) and classifies each top-levelSET col = <rhs>into(col, Some(Value))for a bare literal (incl. a signed number) or(col, None)for an expression — using paren depth so a comma inside a function call or awhereinside a scalar subquery is never mistaken for an assignment/clause boundary, and so the trailing top-levelWHEREpredicate is excluded.Command::SqlUpdategains aset_literalspayload;do_sql_updatevalidates the literals against their column types (via the sharedimpl_value_for) before the still verbatim update;user_value_for_columnreads them so a constraint error names the offending value.WHEREis deliberately not validated (§2). - Phase 3 — completion hinting / highlighting. This is the only
part that needs a grammar change: a
Choice(typed-literal-slot, sql_expr)at each value position (reusing the DSL's livecolumn_value_list/TypedValueSlots —data.rs:141/189/269), so the column type drives a live hint and a mismatch highlights while typing. When Phase 3 lands, the typed slot supersedes Phase 1's classification of literals (the validation/enrichment built on top is unaffected — that is the only throwaway, by design).
6. Non-goals
- Binding / statement reconstruction. Explicitly out. Execution stays verbatim. (This was the rejected first instinct.)
- Collapsing command identity.
Command::InsertandCommand::SqlInsertstay distinct; ADR-0033 Amendment 3 stands. - Changing auto-fill. The simple-vs-advanced
serial/shortidauto-fill difference (requirements.mdX4) is untouched here and tracked separately as a possible bug. - A structural
SELECTand a full typed SQL-expression AST — both out (queries and expressions stay text; ADR-0031's "noExprAST" and ADR-0030 §4's full-surface guarantee stand).
Consequences
- Advanced mode stops being a feedback-free zone for data values. A
learner typing a malformed
date/shortid/intliteral in a SQLINSERTgets the same catch-and-explain they get in simple mode — via the shared validator, not a shared command. - The modes stay cleanly separate. Execution, auto-fill, and command
identity are all unchanged; the only thing now shared is the value
validators. This is the
requirements.mdX5 principle in practice (share a mechanic, not a command) and avoids the consolidation traps (X4 auto-fill) that the bind/converge approach would have hit. - Small, low-risk, no execution reconstruction. Because we do not
rebuild the statement, there is no "mixed
VALUES (?1, expr, ?2)" splicing problem, no multi-row execution change, and noRETURNING/ON CONFLICT/INSERT … SELECTspecial-casing — they keep working as the existing ADR-0033 tests assert. - One new seam to keep honest: the literal-vs-expression
classification at parse. It must be tested (single literal / signed
literal /
NULL/true/false→ validated; arithmetic / function / subquery → skipped), or it will drift. - A normalization difference is avoided, not introduced. We validate the literal but do not rewrite it; the engine stores the user's text as written. (Had we bound/normalized, advanced inserts might store a canonicalised value — a behaviour change we sidestep.)
- Phase 3 will revisit literal detection (swapping the parse-time classification for typed slots that also drive hints). The validation/enrichment built on it is permanent; only the detection is provisional — a deliberate, documented small throwaway.
See also
- ADR-0030 §4 / ADR-0033 §10 — the execute-path this ADR augments
(adds value validation); the verbatim execution model and the
SELECT/expression text path both stand. - ADR-0033 Amendment 3 — the two-command identity, preserved (this ADR
does not collapse
Insert/SqlInsert). - ADR-0035 — the DDL precedent (structural, not verbatim); this ADR is the narrower DML analogue (validate, don't restructure).
- ADR-0026 — the DSL's deliberately-limited
Expr; not imposed on the SQL surface. ADR-0031 —sql_expris validate-only; unchanged. - ADR-0032 —
SELECTfeedback-from-walk; the proof that text-to-engine is right for queries. - ADR-0029 — the column type/constraint model the shared validators enforce.
- ADR-0035 Amendment 1 (F2 follow-up) — the neutral "that value" safety net, correct for computed values.
requirements.mdX4 (auto-fill difference — possible bug, untouched here) and X5 (framework cohesion / share-mechanics-not-commands — the principle this ADR follows).