Replaces ADR-0033 §2's original Node::Guard + Choice(SQL,DSL) mechanism,
which was found during 3a to be unworkable: any guard-in-Choice approach
forces a walk_choice change (walk_choice falls through only on NoMatch, so
simple-mode valid-DSL would wrongly surface "this is SQL"), and walk_seq
treats a NoMatch past idx 0 as a hard Failed, breaking advanced-mode DSL
fall-through.
Mechanism (Amendment 1): each REGISTRY entry is tagged
CommandCategory::{Simple, Advanced}, generalising the whole-command
is_advanced_only gate. walk() becomes a thin dispatcher over decide()
(mode-aware candidate selection: simple commits the DSL node or emits the
"this is SQL" hint; advanced tries SQL first, DSL as a full-line fallback)
and an extracted walk_one_command(); speculative match-testing runs on a
scratch WalkContext so the caller's context is only touched by the
committed walk. No Node::Guard, no walk_choice/walk_seq change.
6 dispatch smoke tests on a shared-entry-word smoke registry; 1446 baseline
green; clippy clean.
49 KiB
ADR-0033: The full SQL DML grammar — INSERT / UPDATE / DELETE
Status
Proposed
Context
ADR-0030 commissions advanced mode as a body of SQL grammar
inside the unified grammar tree. Phase 1 ("foundations + first
SELECT") and Phase 2 ("SELECT — full") have shipped through
ADR-0031 and ADR-0032 respectively. The next slice on ADR-0030's
roadmap is Phase 3 — DML: INSERT / UPDATE / DELETE as
validated SQL.
ADR-0030 §3 fixed the headline scope:
INSERT(single- and multi-row),UPDATE,DELETE.
ADR-0030 §4 fixed the execute path:
DML and
SELECT→ executed as the validated SQL itself (re-rendered canonically from the matched parse, or the validated original text). … For DML the worker — knowing the statement kind and target table from the parse — runs the statement and re-persists that table's CSV.
ADR-0030 §11 fixed the persistence model. ADR-0030 §12 fixed the
safety posture: SQL UPDATE/DELETE without WHERE executes as
written (no --all-rows rail).
ADR-0030 §3 also flagged this ADR's existence in advance:
Settle multi-row
INSERTandshortidauto-fill on a SQLINSERT.
This ADR fixes the grammar shape, the dispatch architecture for shared entry words, the diagnostic surface new to DML, and the execute-path Command variants. It does not revisit ADR-0030's controlling decisions; references in this text to ADR-0030 §X mean "that decision is controlling."
What ADR-0030, ADR-0031, ADR-0032 already fix
- One walker, grammar-as-text execution, ambient assistance for free (ADR-0030 §1/§4/§6/§8).
sql_expris the shared expression grammar — UPDATESETexpressions, WHERE predicates, INSERT VALUES tuples, RETURNING projection items all consume it (ADR-0031, ADR-0032 §5/§6).Node::ScopedSubgrammar+from_scope_stackcarry CTEs and per-statement scope into nested constructs (ADR-0032 §10).- The diagnostic pipeline —
schema_existence_diagnostics,sql_predicate_warnings,compound_arity_diagnostics,pending_diagnosticsaccumulator — applies uniformly to DML'ssql_exprslots (ADR-0032 §11). - The friendly-error layer translates engine messages through the
engine.*catalog (ADR-0019, ADR-0030 §7, ADR-0032 §11.5). - The
__rdbms_*rejection at every table-source slot (ADR-0030 §6, ADR-0032 §1/§4). MAX_SUBGRAMMAR_DEPTH = 64(ADR-0026) is the recursion budget.
Decision
1. The top-level statement shapes
insert_statement := INSERT INTO table_name
[ '(' column_name_list ')' ]
( values_clause | sql_select_compound )
[ on_conflict_clause ]
[ returning_clause ]
[ ';' ]
update_statement := UPDATE table_name
SET assignment_list
[ where_clause ]
[ returning_clause ]
[ ';' ]
delete_statement := DELETE FROM table_name
[ where_clause ]
[ returning_clause ]
[ ';' ]
Where:
values_clause := VALUES value_tuple ( ',' value_tuple )*
value_tuple := '(' sql_expr ( ',' sql_expr )* ')'
assignment_list := assignment ( ',' assignment )*
assignment := column_name '=' sql_expr
returning_clause := RETURNING projection_list
on_conflict_clause := ON CONFLICT [ '(' column_name_list ')' ]
( DO NOTHING
| DO UPDATE SET assignment_list [ where_clause ] )
table_nameis the existingTablesident slot, withreject_internal_tablevalidator (ADR-0030 §6).column_name_listis comma-separatedColumnsidents — the same shapedata::INSERT(DSL) uses for Form A. Reuses the existing pattern; no new node infrastructure.where_clauseisWHERE Subgrammar(&sql_expr::SQL_OR_EXPR).projection_listis the Phase-2 projection list, re-used unchanged.
The body of an INSERT … SELECT recurses through
Node::Subgrammar(&sql_select::SQL_SELECT_COMPOUND) — the same
recursion point Phase 2's CTE bodies, scalar subqueries, and
IN/EXISTS bodies use. The Phase-2 SELECT grammar already
admits an optional leading WITH (Amendment 2 / commit
fd25904), so INSERT INTO t WITH x AS (…) SELECT * FROM x
works for free.
2. Dispatch — DSL vs SQL on shared entry words
The entry words insert, update, delete are shared
between DSL and SQL. This ADR fixes the dispatch rule and
the mechanism that implements it (a known-risk item — see
"Open implementation risks" below).
Simple mode → DSL only. SQL constructs (multi-row VALUES,
INSERT…SELECT, RETURNING, UPSERT) attempted in simple mode
produce a ValidationFailed "this is SQL" hint, matching the
gating that select / with already produce (ADR-0030 §2).
Advanced mode → SQL first, DSL fallback. Each shared entry
word's CommandNode.shape becomes a Choice between the SQL
shape and the DSL shape, declared in that order:
shape: Node::Choice(&[SQL_INSERT_SHAPE, DSL_INSERT_SHAPE])
The walker's Choice is first-match-wins. The SQL shape's
constructs are a strict superset of the DSL shape's for these
statements (multi-row VALUES, INSERT…SELECT, RETURNING, UPSERT
are SQL-only; DSL's --all-rows flag is DSL-only). Any
structurally-ambiguous input — delete from t where id = 1 —
matches the SQL shape first and routes through the SQL
machinery.
This is the design point an earlier draft of ADR-0030 §2 left
open ("choice order or a disambiguator decides"). The chosen
disambiguator is Choice order: SQL > DSL in Advanced. A
DSL-specific construct (e.g., --all-rows) won't match the SQL
shape and falls through to DSL — accommodating, not surprising.
Cascade-summary parity (§7) and shortid auto-fill parity (§6)
mean a DSL-style INSERT/UPDATE/DELETE in Advanced mode produces
identical observable behaviour whether it routes through the SQL
or DSL path. The user's mental model can be "Advanced mode runs
SQL; DSL syntax still works as a familiar fallback."
Mode-gating mechanism
The existing is_advanced_only infrastructure
(src/dsl/grammar/mod.rs::ADVANCED_ONLY_ENTRIES) gates by
entry word, fires in the walker's dispatch_command_node
path (walker/mod.rs:1707), and emits the "this is SQL"
ValidationFailed verdict. It can't be reused to gate a Choice
branch by mode.
Mechanism options for branch-level gating:
- (a) Mode-aware validator on the SQL branch. Each
SQL-shape Seq starts with a no-op
Node::Validatedwhose validator inspectsctx.mode; in Simple mode it returnsValidationError(message_key: "advanced_mode.sql_in_simple")before any structural tokens commit. Trade-off: clean, reuses the existing validator path; the failure shape is the sameWalkOutcome::ValidationFailedas the entry-word gate. - (b) Mode flag on
Node::Choice. A new walker capability:Node::Choicebranches can carry a min-mode marker. Heavier (touches the walker'swalk_choicedriver) but lets the grammar declare gating without per-branch boilerplate. - (c) Two CommandNodes per entry word; registry picks by
mode. The dispatcher consults
ctx.modeand skips the SQL CommandNode in Simple mode. Heavier (registry semantics change) and surfaces a "multiple CommandNodes per entry word" pattern the rest of the codebase doesn't use.
Default choice: (a) the validator approach. Smallest
mechanism change. Walker-capability caveat: validators
today are an Ident { validator: Option<fn> } field — there
is no standalone "guard" node that runs a validator without
consuming input. Sub-phase 3a's first job is to add
Node::Guard(fn(&WalkContext) -> Result<(), ValidationError>)
to the walker — a zero-consumption node whose only effect is
to fail the Seq with a ValidationError in Simple mode. The
existing internal-table rejection (reject_internal_table)
shows the wiring pattern; Node::Guard is the
"no-byte-consuming version" of it.
3. Multi-row INSERT … VALUES
INSERT INTO orders (customer_id, total) VALUES (1, 99.50), (2, 12.00), (3, 6.25)
The grammar's values_clause repeats value_tuple with a comma
separator (Node::Repeated { inner: value_tuple, separator: Some(COMMA), min: 1 }). Each tuple matches the same arity as
the (column_name_list) (see §8 for the arity diagnostic).
4. INSERT … SELECT
INSERT INTO archive SELECT * FROM orders WHERE created < '2025-01-01'
INSERT INTO summary (id, total) WITH t AS (…) SELECT id, sum(total) FROM t GROUP BY id
A Choice between values_clause and
Subgrammar(&sql_select::SQL_SELECT_COMPOUND) after the
optional column list. The Phase-2 SELECT machinery (CTEs, JOINs,
subqueries, set ops) applies as the row source for free.
Arity verification: at parse time, the walker counts the
declared column list (or, when absent, the target table's column
count from the schema cache) against the SELECT's projection
list arity. Mismatch emits diagnostic.insert_arity_mismatch
(see §8).
5. RETURNING
RETURNING projection_list admits the Phase-2 projection_list
unchanged — column refs, *, t.*, expressions with AS
aliases, computed expressions.
Execution: the Command::Sql{Insert,Update,Delete} variants
carry a returning: bool flag set by the walker's
ast_builder when a RETURNING clause matched. The worker
reads this flag (no SQL text re-parsing) and routes:
returning == true→ prepare + step the statement, collect rows into aDataResult(per-result-column type recovery via the same column-origin path Phase 2 introduced, ADR-0032 §12 + Amendment 1).returning == false→ execute the statement, collect the affected-row count.
The flag is sufficient because the shape of the projection is already in the SQL the engine prepares; the worker doesn't need to re-parse to know which columns to read.
6. shortid auto-fill — worker post-fills
Parity with the DSL do_insert handler (ADR-0005 / ADR-0018).
For an INSERT whose (column_name_list) omits one or more
shortid columns, the worker:
- Inspects the table metadata to find the auto-generated columns
(
serial,shortid) NOT in the user's column list. - For each row in the VALUES list (single or multi-row), and for
INSERT … SELECTeach yielded row, synthesises a freshshortidvalue (the existingshortid::generatepath). - Constructs the parameterised INSERT (or executes a rewrite of the SQL with the values bound positionally) and runs it.
The engine sees a fully-specified row; the user's submitted SQL
is unmodified in history.log (ADR-0030 §11). The walker
recognises an explicit value provided for an auto-gen column —
that's an explicit override, not a missing slot — and emits a
WARNING (§8b) but lets it through.
serial columns rely on SQLite's INTEGER PRIMARY KEY rowid
auto-increment — no worker post-fill needed; the engine
populates the value on its own.
7. Cascade summary on DELETE
Parity with the DSL do_delete handler (ADR-0014).
When Command::SqlDelete { target_table, sql, … } runs:
- Pre-execution cascade pre-count. For each child table
that has an FK referencing
target_tablewithON DELETE CASCADE(orSET NULL), the worker runs a pre-count query of the formSELECT count(*) FROM <child> WHERE <fk_col> IN (SELECT <pk_col> FROM <target_table> WHERE <user_where_predicate>). The<user_where_predicate>is extracted from the SQL — see "Predicate extraction" below. - The worker executes the DELETE.
- Post-execution summary. Per-relationship row counts + cascade impact, formatted by the same cascade-summary formatter the DSL path uses.
Predicate extraction. The DSL handler reuses the typed
Expr AST to construct the pre-count subqueries. For SQL
DELETE the walker captures the WHERE clause's source byte
range from the matched path and re-injects that text into
the pre-count SQL. This is grammar-as-text reuse, consistent
with ADR-0030 §4's posture; no AST construction required.
When the SQL DELETE has no WHERE, the pre-count is unbounded
("all rows of target_table"). Cost: one extra count(*) per
child table; cheaper than the DSL parallel path on a typical
schema.
The cascade-summary formatter itself (format_cascade_summary
in the DSL handler) is shared with do_sql_delete — same
output, same i18n keys.
8. Diagnostics new to Phase 3
Three new keys, classified per ADR-0027's ERROR / WARNING model.
8.1. diagnostic.insert_arity_mismatch (ERROR)
Walker-level pre-flight when the declared (column_name_list)'s
arity disagrees with a value_tuple's arity (or, for INSERT … SELECT, with the SELECT's projection list arity, counted at
parse time via the same projection-counting infrastructure
Phase-2's compound_arity_diagnostics uses).
For multi-row VALUES, each offending row emits its own
diagnostic on that row's value_tuple span. A query with five
rows and three wrong arities produces three diagnostics; the
two right rows are silent. Pre-empts the engine's less-helpful
"X values for Y columns" error.
For INSERT … SELECT, the diagnostic anchors on the SELECT's
first projection_item span. The Phase-2 projection-count
infrastructure (the comma-counting at depth = leg-depth used by
compound_arity_diagnostics) is the underlying mechanism.
8.2. diagnostic.auto_column_overridden (WARNING)
Walker-level when an INSERT explicitly assigns a value to a
column whose type is serial or shortid. Pedagogical: the
explicit value bypasses the auto-counter / generator, which can
collide with future auto-generated values.
Catalog wording: "column \{column}` is auto-generated
(`{type}`); providing an explicit value bypasses the
auto-counter and may collide with later auto-generated values"`.
The statement still runs — the warning is advisory, matching the Phase-2 predicate-warning posture (ADR-0027 §1 / ADR-0032 §11.6).
8.3. diagnostic.not_null_missing (WARNING)
Walker-level when an INSERT's (column_name_list) omits a
column declared NOT NULL with no DEFAULT. Pedagogical
nudge — the engine will reject the statement, but the
walker can pre-flight the missing column at parse time.
Catalog wording: "column \{column}` is required (`NOT NULL`,
no default); the statement will fail when run"`.
The walker doesn't BLOCK the statement (it's structurally valid). The engine does. The friendly layer translates the engine error via the existing NOT-NULL path.
8.4. Inherited diagnostics
The Phase-2 diagnostic passes apply to DML's sql_expr slots
without further work:
schema_existence_diagnosticsflags unknown columns in WHERE, SET, RETURNING, VALUES tuples.sql_predicate_warningsflags= NULL,LIKE-on-numeric, type-mismatched comparisons in WHERE and CASE expressions.- The
pending_diagnosticsaccumulator path stays available for in-walk emits.
9. UPSERT — ON CONFLICT … DO NOTHING / DO UPDATE
INSERT INTO t (id, name) VALUES (1, 'x')
ON CONFLICT (id) DO UPDATE SET name = excluded.name
INSERT INTO t (id, name) VALUES (1, 'x')
ON CONFLICT DO NOTHING
Grammar shape:
on_conflict_clause := ON CONFLICT [ '(' column_name_list ')' ]
( DO NOTHING
| DO UPDATE SET assignment_list [ where_clause ] )
The optional (column_name_list) is the conflict target —
which columns' unique constraint to react to. Standard SQL allows
this list to be empty (ON CONFLICT DO NOTHING is shorthand for
"any conflict").
The DO UPDATE branch reuses the same assignment_list and
optional where_clause shapes as the top-level UPDATE statement.
Inside DO UPDATE expressions, the keyword excluded is a
pseudo-table reference to the would-have-been-inserted row.
Note: excluded is the SQLite + PostgreSQL spelling;
standard SQL (MERGE … WHEN MATCHED) uses a different model.
The playground commits to the excluded form because it is what
SQLite supports and what the project's pedagogical target dialect
uses (ADR-0001 / ADR-0002 — the engine is SQLite, even though
its name is hidden from user-facing strings). This is a
deliberate carve-out from ADR-0030 §7's engine-neutral posture,
justified by the absence of a standard-SQL UPSERT spelling that
both SQLite and PostgreSQL share.
The walker admits excluded as a table alias for
ident-completion purposes inside the DO UPDATE branch — its
columns are the target table's columns.
10. Execute path — three typed Command variants
Command gains three variants:
Command::SqlInsert {
sql: String,
source: Option<String>, // for history.log
target_table: String, // for re-persistence
listed_columns: Option<Vec<String>>, // None if no col-list provided
}
Command::SqlUpdate {
sql: String,
source: Option<String>,
target_table: String,
}
Command::SqlDelete {
sql: String,
source: Option<String>,
target_table: String,
}
The walker's ast_builder for each shape extracts
target_table and listed_columns from the matched path. The
sql field is the validated source text the engine executes.
Database gains three request types:
Request::RunSqlInsert { command, reply }
Request::RunSqlUpdate { command, reply }
Request::RunSqlDelete { command, reply }
Each worker handler:
- Applies statement-specific pre-execution work (shortid
auto-fill for
RunSqlInsert; cascade pre-count forRunSqlDelete). - Prepares + executes the statement. If RETURNING is present,
collects the result set as a
DataResult. - Re-persists the target table's CSV (ADR-0030 §11) using the existing persistence write-through machinery.
- Returns the result (DataResult for RETURNING; affected-row count + cascade summary for plain DELETE; affected-row count for plain INSERT/UPDATE).
History logging: the worker writes the literal submitted line
to history.log (ADR-0030 §11) before execution, exactly as
run_select does.
11. Engine-error translation
Three new catalog keys (the friendly-error layer's engine.* translation set, ADR-0019 / ADR-0030 §7):
| Key | Engine condition |
|---|---|
engine.unique_constraint_failed |
The engine's UNIQUE constraint failed: t.c (already exists; will be re-used as-is for UPSERT-without-conflict-target cases) |
engine.not_null_constraint_failed |
Already exists; covers SQL-side NOT-NULL failures uniformly |
engine.foreign_key_constraint_failed |
Already exists; SQL DELETE that violates FK without cascade |
No NEW engine.* keys are required for Phase 3 — the existing FK / UNIQUE / NOT-NULL translation paths cover the engine-side DML errors. Phase 3 reuses them.
12. Result-column type recovery on RETURNING
A RETURNING projection that's a bare column reference recovers
its playground type via the same column-origin metadata path
Phase 2 introduced (resolve_select_column_types in db.rs,
ADR-0032 §12 + Amendment 1).
Computed expressions in RETURNING stay typeless. The renderer uses neutral alignment for typeless columns, consistent with SELECT.
13. Out of scope
| ID | OOS |
|---|---|
| OOS-1 | INSERT INTO t DEFAULT VALUES (the project's planned seed feature covers this) |
| OOS-2 | INSERT OR REPLACE / OR IGNORE / OR ABORT / OR FAIL / OR ROLLBACK (SQLite-specific) |
| OOS-3 | UPDATE … FROM other_table (SQLite/PostgreSQL extension) |
| OOS-4 | WITH x AS (…) UPDATE … / WITH x AS (…) DELETE … (deferrable; SELECT-side WITH is in scope) |
| OOS-5 | Indexed-by / not-indexed table-source modifiers (SQLite-specific) |
| OOS-6 | Multi-statement batches (already OOS via ADR-0030 §13 OOS-4) |
OOS-4 is a deliberate choice: WITH-prefixed DML is standard SQL since SQL:2003, but its execution semantics (the WITH-defined CTE is visible to the DML's WHERE / SET / RETURNING expressions) require additional scope-frame plumbing that doesn't yet exist on the DML side. The Phase-2 grammar admits WITH at the SELECT level; Phase 3 keeps WITH a SELECT-only prefix. A later phase can lift this restriction with a clean ADR Amendment.
Consequences
- The walker gains a new node variant
Node::Guard(§2 mechanism caveat) — a zero-byte-consumption gating node that can fail its enclosing Seq with aValidationError. The walker driver's exhaustivewalk_node_innermatch grows one arm. Reusable beyond DML (Phase 4 DDL's similar shared-entry problem is a likely future consumer). Commandgains three variants (SqlInsert,SqlUpdate,SqlDelete); every exhaustivematch Commandcallsite picks up three new arms (the recurring ADR-0028/0029 gotcha).- The DSL
data::INSERT/data::UPDATE/data::DELETECommandNodes' shapes becomeChoice(SQL_shape, DSL_shape)with the SQL branch mode-gated via a no-op leading validator (R1). - Three new walker diagnostics with three new catalog keys
(
diagnostic.insert_arity_mismatch,diagnostic.auto_column_overridden,diagnostic.not_null_missing). - Three new
Requestvariants (RunSqlInsert/RunSqlUpdate/RunSqlDelete) and three new worker handlers (do_sql_insert/do_sql_update/do_sql_delete). - The persistence write-through path gains per-DML-kind entry
points; the cascade-affected re-persistence in
do_sql_deleteis new (cascade-affected child tables are also re-persisted). RETURNINGlands a write-statement-returns-rows code path the project hasn't had before. Worker handlers either return affected-row count or aDataResultkeyed by thereturning: boolflag.- The cascade-summary formatter is reused between the DSL and SQL DELETE paths; the predicate-extraction byte-range injection (§7) is new code with the R2 mitigation.
shortidauto-fill code indo_sql_insertmirrors the DSLdo_insertmechanism but takes alisted_columnsslice from the parse rather than a typedCommand::Insert.- The auto-snapshot machinery (ADR-0006) is unchanged but fires for SQL DML the same way it does for DSL DML — the snapshot trigger lives in the worker handlers.
- All Phase-2 tests stay green (the SQL grammar pieces this ADR builds on are unchanged).
- The DSL-vs-SQL dispatch mechanism (3a) is a foundation other projects in this codebase may need (e.g., Phase 4 DDL). Land it generically so it's reusable.
Initial DA review
A first-pass DA review of this ADR's draft surfaced ten critiques. The seven that surfaced real gaps were resolved in the body before this status moved to Proposed:
- OOS-4 (WITH-prefixed DML) under-justified → R4 risk describes the mechanism gap concretely.
- Dispatch mechanism hand-waved → §2's "Mode-gating mechanism" subsection enumerates three options and commits to the validator approach with R1 as the mitigation if it fails.
- RETURNING execution hand-waved → §5 commits to a
returning: boolflag on the Command variants. - Cascade-summary code-sharing claim too optimistic → §7's "Predicate extraction" subsection details the byte-range injection mechanism; R2 budgets for the performance-pathology case.
excludedkeyword's engine-neutrality status → §9 carves it out explicitly as a justified non-neutral choice.- Multi-row INSERT arity diagnostic emit pattern unspecified → §8.1 clarifies "each offending row emits its own diagnostic" plus the INSERT…SELECT span.
- Sub-phase ordering put dispatch (the riskiest piece) last → reordered so 3a lands the mechanism FIRST on a no-op shape.
The remaining three critiques were either process points (DA gates at sub-phase boundaries — added per-sub-phase in Implementation notes) or already-covered (verification matrix — will live in the plan doc, matching Phase 2's pattern).
Per CLAUDE.md / the Phase 2 process lesson, the DA review of THIS ADR will get a second pass before status moves to Accepted. Pin: do not rubber-stamp.
Open implementation risks
Recorded explicitly so the implementation plan can budget for them, not to allow silent drift later.
R1. Mode-gated Choice branch mechanism unproven
§2's option (a) — a no-op validator at the head of the SQL
shape's Seq, rejecting in Simple mode — is plausible but
untested. If the walker's walk_seq commits the Seq before the
validator runs, the SQL branch's NoMatch becomes Failed and
the Choice can't fall through to the DSL branch.
Mitigation: sub-phase 3a lands the mechanism end-to-end on
the smallest viable shape (a single SqlNop CommandNode whose
shape is Choice(SQL_branch[mode-gated-validator + token], DSL_branch[different-token])) before any DML grammar lands. If
the validator approach doesn't work, the alternatives (a
mode-aware Node::Choice variant, or split-CommandNode
dispatch) get evaluated before sub-phases 3b/3c/3d depend on
the mechanism.
R2. Cascade-summary predicate extraction
§7 says the worker re-injects the WHERE-clause source bytes into
pre-count subqueries. This is straightforward when the WHERE
clause is a top-level predicate, but a future user might write
WHERE clauses that themselves contain subqueries (e.g.,
DELETE FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE …)).
The byte-range injection still works structurally, but a
performance pathology is possible if the subquery is expensive
and the cascade child count is large (the same subquery runs N+1
times).
Mitigation: sub-phase 3c benchmarks the pre-count overhead on a representative schema before declaring the path acceptable. If the overhead is prohibitive, fall back to "report cascade count post-execution only" (less informative, but cheaper).
R3. RETURNING result-column type recovery on multi-row INSERT
§5 says result columns get the same column-origin type recovery SELECT uses. For a single-row INSERT … RETURNING this works identically to a SELECT. For multi-row INSERT, the engine emits one result row per inserted row; the column-origin metadata is the same for all (per ADR-0032 Amendment 1, column-origin is a property of the prepared statement, not the rows).
No risk here — calling it out so the implementer doesn't re-derive the question. The empty-table type-recovery test already pins this invariant.
R4. INSERT … SELECT with a CTE row source
The Phase-2 SELECT grammar already admits WITH … SELECT … at
the compound level (ADR-0032 §10.3 / Amendment 2). The
SQL_SELECT_COMPOUND recursion point §1 uses pulls this in for
free. INSERT INTO archive WITH t AS (…) SELECT * FROM t parses
and runs.
A user might reasonably expect WITH-prefixed DML to also work (see §13 OOS-4). That's explicitly out of scope for Phase 3; the discrepancy is documented here so the implementer doesn't need to figure it out from the grammar shape.
Implementation notes
Phased sub-plan with explicit exit gates per sub-phase. The
implementation plan doc (docs/plans/<date>-adr-0033-phase-3.md,
modelled on docs/plans/20260520-adr-0032-phase-2.md) refines
these and adds the cross-cut verification matrix in 3i.
Ordering rationale. The dispatch mechanism (3a) is the foundation; if it fails to work, every later sub-phase has to be reworked. Land it first on a no-op shape, then build the DML grammar on top.
3a — Node::Guard + mode-gated Choice mechanism
Scope (in):
- Add a new walker node variant
Node::Guard(fn(&WalkContext) -> Result<(), ValidationError>)(per §2 mechanism caveat) — zero-byte consumption, fails the enclosing Seq withValidationFailedwhen the validator returns Err. - Implement
walk_guardinsrc/dsl/walker/driver.rs. - Add the
reject_sql_in_simple_modeguard function. - Build a smoke-test grammar: an experimental
CommandNodewithshape: Choice(Seq[Guard(reject_sql_in_simple_mode), SQL_branch_token], DSL_branch_token). Verify the Choice falls through cleanly in Simple mode (DSL wins) and routes SQL-first in Advanced mode.
Scope (out):
- Any real DML grammar (3b onwards).
- Wiring the existing
data::INSERT/UPDATE/DELETEshapes (3j).
Exit gate (required green tests):
Node::Guardadmits viawalk_guard; the walker driver's match arm exists and is tested.- Smoke-test grammar in Simple mode → DSL branch matches.
- Smoke-test grammar in Advanced mode + SQL-shape input → SQL branch matches.
- Smoke-test grammar in Simple mode + SQL-shape input →
ValidationFailedwithmessage_key: "advanced_mode.sql_in_simple". - Smoke-test grammar in Advanced mode + DSL-shape input → DSL branch matches (Choice fall-through works even when SQL branch is admissible).
- All Phase-2 tests (1446 baseline) still green.
- Clippy clean.
DA gate (written):
- Does the Guard actually fire BEFORE the Seq commits? Failure mode: the Seq processes the Guard, the Guard fails, but the Seq has already "committed" to its branch and the Choice can't fall through. Test specifically: Advanced mode + an input that the DSL branch matches but the SQL branch's remaining tokens reject → must fall through to DSL, not Failed.
- Is the Guard mechanism reusable beyond DML (e.g., for
Phase 4 DDL's similar dispatch)? If yes, document the
pattern in
src/dsl/grammar/mod.rsdoc comments so the next phase doesn't reinvent it. - If the Guard mechanism doesn't work (R1's worst case), the sub-phase REOPENS with mechanism option (b) or (c) evaluated before any DML grammar lands on top of broken scaffolding.
3b — INSERT grammar + minimal execution
Scope (in):
SQL_INSERT_SHAPE: single-row + multi-row VALUES; the optional(column_name_list); the__rdbms_*rejection on the table-name slot.Command::SqlInsert { sql, source, target_table, listed_columns, returning: false }—returningalways false in 3b.Request::RunSqlInsert+do_sql_insertworker handler (no shortid auto-fill yet — explicit values required).- Persistence write-through on the target table after execution.
- History-log of the literal submitted line.
Scope (out):
INSERT … SELECT(3c).RETURNING(3e).UPSERT(3f).shortidauto-fill (3d).- Diagnostics (3g).
- DSL-vs-SQL dispatch wiring on
data::INSERT(3h — development uses a separatedata::SqlInsertCommandNode).
Exit gate (required green tests):
- Single-row INSERT runs end-to-end; affected-row count surfaces; CSV is re-persisted.
- Multi-row INSERT (
INSERT INTO t VALUES (1, 'a'), (2, 'b')) runs; both rows land; CSV reflects them. (column_name_list)variant works (INSERT INTO t (a, b) VALUES (1, 'x')).__rdbms_*rejection at the INSERT target-table slot (per ADR-0030 §6 / ADR-0032 §1).- History line is the literal submitted SQL.
- All Phase-2 tests (1446 baseline) still green.
DA gate (written):
- Does an INSERT failing mid-multi-row leave persistence in a consistent state? (Engine transactional behaviour applies; worker must not re-persist on failure.)
- Does the
target_tableextraction handle case-folding the same way the rest of the codebase does? - Are the existing DSL INSERT tests still green? Any DSL-side regression means 3b broke something it shouldn't have.
3c — INSERT … SELECT
Scope (in):
- Add the row-source
ChoicetoSQL_INSERT_SHAPE:values_clause | Subgrammar(&sql_select::SQL_SELECT_COMPOUND). - Update
do_sql_insertto handle the SELECT-driven path (no special worker work needed beyond preparing the composite statement; the engine handles the insert-from-query flow).
Exit gate:
INSERT INTO archive SELECT * FROM orders WHERE …runs.INSERT INTO target (a, b) SELECT x, y FROM sourceruns.INSERT INTO t WITH x AS (…) SELECT * FROM xruns (R4 invariant — Phase-2 CTE prefix on SELECT applies).- Schema-existence diagnostics fire on the SELECT's projection (Phase-2 machinery, free).
DA gate (written):
- Does the WITH-prefixed SELECT row source actually parse
through SQL_SELECT_COMPOUND, or does the INSERT prefix
break the recursion? Test on a real
WITH … INSERT … SELECT. - Does the cascade-summary path (3e DELETE) accidentally apply to INSERTs whose SELECT touches the target table? (It must not — INSERTs have no cascade.)
3d — shortid auto-fill (worker)
Scope (in):
- Detect omitted
shortidcolumns at execution time indo_sql_insert. - Synthesise fresh shortid values; bind them positionally.
- Cover both VALUES and SELECT row sources (for SELECT, the shortid columns are added to the projection's bind list before execution).
Exit gate:
INSERT INTO t (name) VALUES ('x')wherethas apk shortidcolumn auto-fills that PK.- An explicit value for a shortid column is respected (override behaviour — warning fires per 3g but the value is preserved).
- Multi-row INSERT produces distinct shortids per row.
- INSERT … SELECT auto-fills shortids for each yielded row.
DA gate (written):
- Are the synthesised shortids actually unique across a multi-row INSERT? (The generator should be called per-row, not once.)
- Does the auto-fill respect the column's case-preserved name?
- What happens if the table has both
serialandshortidPK columns? (Spec: serial via SQLite rowid; shortid via worker post-fill. Verify no double-fill collision.)
3e — UPDATE grammar + execution
Scope (in):
SQL_UPDATE_SHAPE: target table, SET assignment list, optional WHERE.Command::SqlUpdate,Request::RunSqlUpdate,do_sql_update.- Persistence write-through on the target table.
Exit gate:
- Single-column UPDATE with WHERE runs; affected-row count surfaces; CSV reflects changes.
- Multi-column UPDATE (
SET a = 1, b = 2) runs. - UPDATE without WHERE runs (per ADR-0030 §12 — no
--all-rowsguard). - Schema-existence diagnostic fires on unknown columns in SET and WHERE.
DA gate (written):
- Does the SET assignment expression accept the full
sql_expr(function calls, subqueries, CASE)? Verify on a representative expression. - Does an UPDATE that touches an auto-gen column produce the 3g auto-gen warning?
3f — DELETE grammar + execution + cascade summary
Scope (in):
SQL_DELETE_SHAPE: target table, optional WHERE.Command::SqlDelete,Request::RunSqlDelete,do_sql_delete.- Cascade-summary pre-count and post-execution formatting per §7.
- Persistence write-through on target + affected child tables (cascade-affected tables also need re-persistence).
Exit gate:
- DELETE with WHERE runs; affected-row count + cascade summary surface; CSVs of target AND cascade-affected children are re-persisted.
- DELETE without WHERE runs (no
--all-rowsrail). - Cascade summary matches the DSL
do_deleteoutput on the same schema/data. - The cascade-summary code path is shared (no duplicate formatter).
DA gate (written):
- R2 mitigation: does the predicate-extraction byte-range
injection work on a WHERE containing a subquery? Test
DELETE FROM a WHERE id IN (SELECT id FROM b WHERE …). - Does the cascade pre-count run in a way that doesn't see the post-DELETE state? (Order matters: pre-count BEFORE execute.)
3g — RETURNING
Scope (in):
- Add
RETURNING projection_listto all three top-level shapes. - Walker captures
returning: boolon the Command variants. - Worker handlers route to
DataResultcollection whenreturning == true. - Result-column type recovery via the Phase-2 column-origin path.
Exit gate:
INSERT … RETURNING *returns the inserted row(s) as a DataResult; affected-row count is implicit (rows.len()).UPDATE … RETURNING id, new_valuereturns the modified columns.DELETE … RETURNING *returns what was just deleted (BEFORE the row is gone).- Result-column types recovered for bare-column refs (per ADR-0032 Amendment 1).
- The cascade summary on a
DELETE … RETURNINGis preserved alongside the result set (both surface to the user).
DA gate (written):
- Does the RETURNING result-set rendering use the same DataResult path SELECT uses? Or did a DML-specific copy sneak in?
- Does
RETURNING expr AS aliasadmit aliases the same way the projection list does?
3h — UPSERT (ON CONFLICT … DO NOTHING / DO UPDATE)
Scope (in):
- Add
on_conflict_clausetoSQL_INSERT_SHAPE. - Admit
excludedas a table alias inside DO UPDATE expressions (§9's clarification). - Walker captures the UPSERT shape; worker executes as-is (no special pre-processing).
Exit gate:
INSERT … ON CONFLICT (id) DO NOTHINGruns on a conflicting row; affected-row count is 0.INSERT … ON CONFLICT (id) DO UPDATE SET v = excluded.vruns; the conflicting row is updated to the new value.ON CONFLICT DO NOTHING(no target spec) handles any conflict.excluded.colcompletes / highlights as a column ref to the target table's columns inside the DO UPDATE expressions.
DA gate (written):
- Does ON CONFLICT … DO UPDATE trigger persistence write-through? (Yes; the target row is modified.)
- Is the
excludedalias scoped to ONLY the DO UPDATE branch, not leaking into the outer INSERT?
3i — Diagnostics
Scope (in):
diagnostic.insert_arity_mismatch(§8.1).diagnostic.auto_column_overridden(§8.2).diagnostic.not_null_missing(§8.3).- Per-key positive + negative tests + catalog entries.
Exit gate:
- One positive + one negative test per new key (the
ADR-0032 §2d exit gate pattern):
insert_arity_mismatch:INSERT INTO t (a, b) VALUES (1, 2, 3)fires; matched arity doesn't.auto_column_overridden:INSERT INTO t (id, name) VALUES (5, 'x')(id isserial) fires; omittingiddoesn't.not_null_missing:INSERT INTO t (a) VALUES (1)(b is NOT NULL no default) fires; includingbdoesn't.
- Multi-row arity mismatch emits per-row, not just on the first offending row.
- The inherited Phase-2 diagnostic passes
(
schema_existence,sql_predicate_warnings,compound_arity_diagnostics) still fire on DML'ssql_exprslots.
DA gate (written):
- Does
auto_column_overriddenproduce a Warning, not an Error? (Spec: WARNING — the statement still runs.) - Does
not_null_missingproduce a Warning? (Same.) - Are the catalog wordings engine-neutral?
3j — Dispatch wiring on shared entry words
Scope (in):
- Convert
data::INSERT,data::UPDATE,data::DELETEshapes toChoice(SQL_shape, DSL_shape). - Remove the experimental separate
data::SqlInsert/SqlUpdate/SqlDeleteCommandNodes that 3a-3i used as scaffolding. - Verify DSL-only inputs still match the DSL branch in both modes.
- Verify SQL-only inputs match the SQL branch in Advanced mode and fail with "this is SQL" in Simple mode.
Exit gate:
- All existing DSL INSERT/UPDATE/DELETE tests still green (unmodified inputs, unmodified outputs).
- All Phase-3 SQL DML tests still green via the new dispatch
path (the SQL branch is reached through
data::INSERTetc., not throughdata::SqlInsert). - A structurally-ambiguous input (
delete from t where id = 1) in Advanced mode routes through the SQL branch (verifiable by checking that the cascade-summary code path fired — DSL and SQL paths share the formatter but the entry point differs). - A DSL-specific input (
delete from t --all-rows) in Advanced mode falls through to the DSL branch.
DA gate (written):
- Verify that no test was changed to make 3j pass — only the shape's structure changes; the observable behaviour is identical.
- Is the test count higher than before 3j? It should be (the scaffolding CommandNodes had their own tests; now the shared CommandNodes carry both).
3k — Verification sweep
Scope (in):
- Fill in the cross-cut verification matrix
(
docs/plans/<date>-adr-0033-phase-3.md). - Produce the phase-exit verification report.
- DA final review (genuine, not rubber-stamp — see Phase 2's
process lesson in
docs/handoff/20260520-handoff-28.md§4).
Exit gate:
- Every matrix row green with a concrete
file::functiontest reference. - Final test run: zero failures, zero skips, no regressions from the Phase-2 baseline of 1446 / 0 / 1.
- Clippy clean.
- Report at
docs/handoff/<date>-phase-3-verification.mddocuments all autonomous decisions with explicit user-confirmation pointers and the DA's written critiques (not just the verdict).
DA gate (written):
- Did filling the matrix surface gaps that need closing? (Phase 2 surfaced THREE; expect at least one for Phase 3.)
- Are any sub-phase exit gates above weaker than they should be in hindsight?
- Did any sub-phase make an autonomous decision that's not in this ADR? If so, flag for retroactive user approval before phase-exit.
Out of scope (clarifications already established)
- No engine type names in or out (ADR-0030 §5 / §7).
- No
STRICT, no SQLite storage clauses (ADR-0030 §7). - No transaction control (ADR-0030 §13 OOS-4).
- No multi-statement batches (ADR-0030 §13 OOS-4).
- No
EXPLAINof SQL DML (ADR-0030 §13 OOS-2 — the DSLexplainstill wraps what it already wraps).
Amendment 1 — Dispatch mechanism: category-grouped dispatch supersedes Node::Guard (2026-05-21)
This amendment supersedes §2's "Mode-gating mechanism" subsection and revises sub-phase 3a's scope. It was written during 3a implementation, after tracing the proposed mechanism against the actual walker code, and is recorded with explicit user approval before any code landed.
The finding — option (a) does not work as §2 frames it
§2 selected option (a): a standalone zero-byte Node::Guard(fn)
placed as the first element of the SQL branch's Seq, inside a
Choice(SQL_shape, DSL_shape). §2 framed this as the "smallest
mechanism change" that "reuses the existing validator path" with
the failure surfacing as the same WalkOutcome::ValidationFailed
as the entry-word gate — implying no walk_choice change.
Tracing the four required behaviours against
src/dsl/walker/driver.rs shows that framing is inaccurate, and
that any guard-in-Choice mechanism is forced to modify
walk_choice:
-
walk_choicereturns immediately on a branchFailed(driver.rswalk_choice) — it only falls through to the next branch onNoMatch. So in Simple mode,delete from t where id = 1(which is valid DSL) would hit the SQL branch's guard, the guard would fail, and the user would see the "this is SQL" hint instead of the statement running as DSL. That contradicts §2's own "Simple mode → DSL only" promise. Fixing it requires awalk_choicechange regardless of how the guard node is shaped. -
walk_seqtreats aNoMatchatidx > 0as a hardFailed(driver.rswalk_seq). A zero-byte Guard atidx 0still advancesidxto 1, pushing the SQL branch's first real token toidx 1. So in Advanced mode,delete from t --all-rows(DSL-only) cannot fall through to DSL — the SQL branch returnsFailed, andwalk_choicewon't try the DSL branch. The literal form needs an additionalwalk_seqsemantics change (idx == 0→ "no bytes consumed yet").
This is exactly the R1 risk the ADR budgeted for ("if walk_seq
commits the Seq before the validator runs … the Choice can't
fall through").
The replacement — category-grouped, mode-aware dispatch
Rather than gate a Choice branch from inside the recursive
grammar, the dispatch decision moves up to the command
dispatcher (walker::walk), which is where mode gating
already lives: the existing whole-command gate
(is_advanced_only + the Simple-mode short-circuit in walk)
is precisely this pattern, hardcoded for select / with. The
amendment generalises that existing code instead of inventing a
parallel in-grammar mechanism.
Concretely:
-
Each registry entry carries a category —
CommandCategory::{Simple, Advanced}. The category lives at the registration site (theREGISTRYtable), not as a field on everyCommandNodeliteral, because category is a dispatcher concern, not intrinsic to a command's grammar.select/with(and, from 3b, the SQLinsert/update/deletenodes) areAdvanced; everything else isSimple. This subsumesADVANCED_ONLY_ENTRIES:is_advanced_only(entry)becomes "every registry candidate for this entry word isAdvanced". -
A shared entry word has a node in both groups — e.g. a
SimpleDSLinsertnode and anAdvancedSQLinsertnode, both with entry wordinsert. The entry-word lookup returns all candidates; the dispatcher selects by mode. -
Mode-aware selection in
walk(no combinator changes):- Simple mode considers
Simplecandidates. If a Simple candidate exists, it is committed. If none exists but anAdvancedcandidate does (a pure-SQL entry word likeselect), the dispatcher emits theadvanced_mode.sql_in_simpleValidationFailed— identical to today's whole-command gate. - Advanced mode tries
Advancedcandidate(s) first, then theSimplecandidate as a fallback. The first candidate that fully matches wins (delete from t where id = 1→ SQL;delete from t --all-rows→ falls through to DSL because the SQL shape doesn't match--all-rows).
- Simple mode considers
-
No
Node::Guard, nowalk_choice/walk_seqchange. The recursive combinators are untouched; each candidate is walked cleanly and independently. Speculative evaluation runs on a scratchWalkContext; only the committed candidate is walked into the caller's context, so post-walk context state (completion / hint accumulators) always reflects the chosen candidate.
The two details (user-approved)
-
"This is SQL" hint in Simple mode for SQL-shaped input. For SQL-only input in Simple mode (e.g.
delete … returning *), the Simple candidate won't match. To keep parity withselect/with, the dispatcher speculatively checks whether theAdvancedcandidate would match; if so, it emits theadvanced_mode.sql_in_simplehint rather than a generic DSL parse error. Lives entirely inwalk. -
Advanced-mode completion is SQL-first, DSL as full-line fallback. The
Choiceapproach would have unioned SQL and DSL continuations mid-typing for free; the grouping approach surfaces one candidate's walk for completion / hints. The decision is to show SQL completions primarily in Advanced mode and fall to DSL only on a full DSL-shaped line — simpler and matching the "SQL-first, DSL as familiar fallback" intent. No unioned completion.
Consequences of the amendment
Node::Guardis NOT added. §2's mechanism caveat, Consequences bullet 1 ("the walker gains a new node variantNode::Guard"), and the original 3a scope are withdrawn.REGISTRY's element type changes to carry the category alongside each&CommandNode. The handful of registry iteration sites (entry-word listing,command_for_entry_word, the completion entry-word filters,note_help) destructure the tuple; no behavioural change for the current single-candidate registry.- The
Commandvariants (§10), worker handlers, RETURNING flag, cascade-summary plan, shortid auto-fill, and diagnostics are unaffected — this amendment is purely about how the dispatcher routes a shared entry word to the right grammar. - Sub-phase 3a now validates the dispatch mechanism, not the
guard: a smoke registry with a shared entry word (a
Simpleand anAdvancednode using distinguishable tokens) exercises the Simple/Advanced selection, the fallback, and the "this is SQL" path. The 3a exit gate's four cases map onto the new mechanism unchanged in spirit. - 3b adds the first real shared entry words to
REGISTRY(the SQLinsert/update/deletenodes). At that point the completion entry-word lists will need to de-duplicate the shared primary (two candidates, one entry word) — tracked for 3b, not 3a. - The mechanism remains reusable for Phase 4 DDL's analogous
shared-entry problem (
create, …): tag the SQL DDL nodesAdvanced; the dispatcher handles the rest.
See also
- ADR-0005 — the ten-type vocabulary INSERT works with.
- ADR-0006 — the auto-snapshot before destructive ops.
- ADR-0014 — the DSL DML model + cascade-summary precedent.
- ADR-0015 — the persistence write-through path.
- ADR-0018 — the DSL INSERT forms (A / B / C).
- ADR-0019 — the friendly-error layer DML errors route through.
- ADR-0027 — the validity indicator surfaces DML diagnostics.
- ADR-0030 — the SQL surface in advanced mode; Phase 3 is this ADR's commission.
- ADR-0031 — the SQL expression grammar, consumed by DML WHERE / SET / VALUES / RETURNING.
- ADR-0032 — the SQL SELECT grammar;
INSERT … SELECTand RETURNING reuse its projection-list and scope-frame machinery.