# ADR-0033: The full SQL DML grammar — `INSERT` / `UPDATE` / `DELETE` ## Status Proposed ## Context ADR-0030 commissions advanced mode as a body of SQL grammar inside the unified grammar tree. Phase 1 ("foundations + first `SELECT`") and Phase 2 ("`SELECT` — full") have shipped through ADR-0031 and ADR-0032 respectively. The next slice on ADR-0030's roadmap is **Phase 3 — DML**: `INSERT` / `UPDATE` / `DELETE` as validated SQL. ADR-0030 §3 fixed the headline scope: > **`INSERT`** (single- and multi-row), **`UPDATE`**, **`DELETE`**. ADR-0030 §4 fixed the execute path: > **DML and `SELECT`** → executed as the **validated SQL itself** > (re-rendered canonically from the matched parse, or the validated > original text). … For DML the worker — knowing the statement kind > and target table from the parse — runs the statement and > re-persists that table's CSV. ADR-0030 §11 fixed the persistence model. ADR-0030 §12 fixed the safety posture: SQL `UPDATE`/`DELETE` without `WHERE` executes as written (no `--all-rows` rail). ADR-0030 §3 also flagged this ADR's existence in advance: > Settle multi-row `INSERT` and `shortid` auto-fill on a SQL > `INSERT`. This ADR fixes the **grammar shape**, the **dispatch architecture** for shared entry words, the **diagnostic surface** new to DML, and the **execute-path** Command variants. It does not revisit ADR-0030's controlling decisions; references in this text to ADR-0030 §X mean "that decision is controlling." ### What ADR-0030, ADR-0031, ADR-0032 already fix - One walker, grammar-as-text execution, ambient assistance for free (ADR-0030 §1/§4/§6/§8). - `sql_expr` is the shared expression grammar — UPDATE `SET` expressions, WHERE predicates, INSERT VALUES tuples, RETURNING projection items all consume it (ADR-0031, ADR-0032 §5/§6). - `Node::ScopedSubgrammar` + `from_scope_stack` carry CTEs and per-statement scope into nested constructs (ADR-0032 §10). - The diagnostic pipeline — `schema_existence_diagnostics`, `sql_predicate_warnings`, `compound_arity_diagnostics`, `pending_diagnostics` accumulator — applies uniformly to DML's `sql_expr` slots (ADR-0032 §11). - The friendly-error layer translates engine messages through the `engine.*` catalog (ADR-0019, ADR-0030 §7, ADR-0032 §11.5). - The `__rdbms_*` rejection at every table-source slot (ADR-0030 §6, ADR-0032 §1/§4). - `MAX_SUBGRAMMAR_DEPTH = 64` (ADR-0026) is the recursion budget. ## Decision ### 1. The top-level statement shapes ``` insert_statement := INSERT INTO table_name [ '(' column_name_list ')' ] ( values_clause | sql_select_compound ) [ on_conflict_clause ] [ returning_clause ] [ ';' ] update_statement := UPDATE table_name SET assignment_list [ where_clause ] [ returning_clause ] [ ';' ] delete_statement := DELETE FROM table_name [ where_clause ] [ returning_clause ] [ ';' ] ``` Where: ``` values_clause := VALUES value_tuple ( ',' value_tuple )* value_tuple := '(' sql_expr ( ',' sql_expr )* ')' assignment_list := assignment ( ',' assignment )* assignment := column_name '=' sql_expr returning_clause := RETURNING projection_list on_conflict_clause := ON CONFLICT [ '(' column_name_list ')' ] ( DO NOTHING | DO UPDATE SET assignment_list [ where_clause ] ) ``` - `table_name` is the existing `Tables` ident slot, with `reject_internal_table` validator (ADR-0030 §6). - `column_name_list` is comma-separated `Columns` idents — the same shape `data::INSERT` (DSL) uses for Form A. Reuses the existing pattern; no new node infrastructure. - `where_clause` is `WHERE Subgrammar(&sql_expr::SQL_OR_EXPR)`. - `projection_list` is the Phase-2 projection list, re-used unchanged. The body of an `INSERT … SELECT` recurses through `Node::Subgrammar(&sql_select::SQL_SELECT_COMPOUND)` — the same recursion point Phase 2's CTE bodies, scalar subqueries, and `IN`/`EXISTS` bodies use. The Phase-2 SELECT grammar already admits an optional leading `WITH` (Amendment 2 / commit `fd25904`), so `INSERT INTO t WITH x AS (…) SELECT * FROM x` works for free. ### 2. Dispatch — DSL vs SQL on shared entry words The entry words `insert`, `update`, `delete` are **shared** between DSL and SQL. This ADR fixes the dispatch rule **and** the mechanism that implements it (a known-risk item — see "Open implementation risks" below). **Simple mode → DSL only.** SQL constructs (multi-row VALUES, INSERT…SELECT, RETURNING, UPSERT) attempted in simple mode produce a `ValidationFailed` "this is SQL" hint, matching the gating that `select` / `with` already produce (ADR-0030 §2). **Advanced mode → SQL first, DSL fallback.** Each shared entry word's `CommandNode.shape` becomes a `Choice` between the SQL shape and the DSL shape, declared in that order: ``` shape: Node::Choice(&[SQL_INSERT_SHAPE, DSL_INSERT_SHAPE]) ``` The walker's `Choice` is first-match-wins. The SQL shape's constructs are a strict superset of the DSL shape's for these statements (multi-row VALUES, INSERT…SELECT, RETURNING, UPSERT are SQL-only; DSL's `--all-rows` flag is DSL-only). Any structurally-ambiguous input — `delete from t where id = 1` — matches the SQL shape first and routes through the SQL machinery. This is the design point an earlier draft of ADR-0030 §2 left open ("choice order or a disambiguator decides"). The chosen disambiguator is `Choice` order: **SQL > DSL in Advanced**. A DSL-specific construct (e.g., `--all-rows`) won't match the SQL shape and falls through to DSL — accommodating, not surprising. Cascade-summary parity (§7) and `shortid` auto-fill parity (§6) mean a DSL-style INSERT/UPDATE/DELETE in Advanced mode produces identical observable behaviour whether it routes through the SQL or DSL path. The user's mental model can be "Advanced mode runs SQL; DSL syntax still works as a familiar fallback." #### Mode-gating mechanism The existing `is_advanced_only` infrastructure (`src/dsl/grammar/mod.rs::ADVANCED_ONLY_ENTRIES`) gates by **entry word**, fires in the walker's `dispatch_command_node` path (`walker/mod.rs:1707`), and emits the "this is SQL" ValidationFailed verdict. It can't be reused to gate a Choice *branch* by mode. Mechanism options for branch-level gating: - **(a) Mode-aware validator on the SQL branch.** Each SQL-shape Seq starts with a no-op `Node::Validated` whose validator inspects `ctx.mode`; in Simple mode it returns `ValidationError(message_key: "advanced_mode.sql_in_simple")` before any structural tokens commit. Trade-off: clean, reuses the existing validator path; the failure shape is the same `WalkOutcome::ValidationFailed` as the entry-word gate. - **(b) Mode flag on `Node::Choice`.** A new walker capability: `Node::Choice` branches can carry a min-mode marker. Heavier (touches the walker's `walk_choice` driver) but lets the grammar declare gating without per-branch boilerplate. - **(c) Two CommandNodes per entry word; registry picks by mode.** The dispatcher consults `ctx.mode` and skips the SQL CommandNode in Simple mode. Heavier (registry semantics change) and surfaces a "multiple CommandNodes per entry word" pattern the rest of the codebase doesn't use. **Default choice: (a) the validator approach.** Smallest mechanism change. **Walker-capability caveat:** validators today are an `Ident { validator: Option }` field — there is no standalone "guard" node that runs a validator without consuming input. Sub-phase 3a's first job is to add `Node::Guard(fn(&WalkContext) -> Result<(), ValidationError>)` to the walker — a zero-consumption node whose only effect is to fail the Seq with a ValidationError in Simple mode. The existing internal-table rejection (`reject_internal_table`) shows the wiring pattern; `Node::Guard` is the "no-byte-consuming version" of it. ### 3. Multi-row `INSERT … VALUES` ``` INSERT INTO orders (customer_id, total) VALUES (1, 99.50), (2, 12.00), (3, 6.25) ``` The grammar's `values_clause` repeats `value_tuple` with a comma separator (`Node::Repeated { inner: value_tuple, separator: Some(COMMA), min: 1 }`). Each tuple matches the same arity as the `(column_name_list)` (see §8 for the arity diagnostic). ### 4. `INSERT … SELECT` ``` INSERT INTO archive SELECT * FROM orders WHERE created < '2025-01-01' INSERT INTO summary (id, total) WITH t AS (…) SELECT id, sum(total) FROM t GROUP BY id ``` A `Choice` between `values_clause` and `Subgrammar(&sql_select::SQL_SELECT_COMPOUND)` after the optional column list. The Phase-2 SELECT machinery (CTEs, JOINs, subqueries, set ops) applies as the row source for free. Arity verification: at parse time, the walker counts the declared column list (or, when absent, the target table's column count from the schema cache) against the SELECT's projection list arity. Mismatch emits `diagnostic.insert_arity_mismatch` (see §8). ### 5. `RETURNING` `RETURNING projection_list` admits the Phase-2 projection_list unchanged — column refs, `*`, `t.*`, expressions with `AS` aliases, computed expressions. Execution: the `Command::Sql{Insert,Update,Delete}` variants carry a `returning: bool` flag set by the walker's `ast_builder` when a `RETURNING` clause matched. The worker reads this flag (no SQL text re-parsing) and routes: - `returning == true` → prepare + step the statement, collect rows into a `DataResult` (per-result-column type recovery via the same column-origin path Phase 2 introduced, ADR-0032 §12 + Amendment 1). - `returning == false` → execute the statement, collect the affected-row count. The flag is sufficient because the *shape* of the projection is already in the SQL the engine prepares; the worker doesn't need to re-parse to know which columns to read. ### 6. `shortid` auto-fill — worker post-fills Parity with the DSL `do_insert` handler (ADR-0005 / ADR-0018). For an `INSERT` whose `(column_name_list)` omits one or more `shortid` columns, the worker: 1. Inspects the table metadata to find the auto-generated columns (`serial`, `shortid`) NOT in the user's column list. 2. For each row in the VALUES list (single or multi-row), and for `INSERT … SELECT` each yielded row, synthesises a fresh `shortid` value (the existing `shortid::generate` path). 3. Constructs the parameterised INSERT (or executes a rewrite of the SQL with the values bound positionally) and runs it. The engine sees a fully-specified row; the user's submitted SQL is unmodified in `history.log` (ADR-0030 §11). The walker recognises an explicit value provided for an auto-gen column — that's an explicit override, not a missing slot — and emits a WARNING (§8b) but lets it through. `serial` columns rely on SQLite's `INTEGER PRIMARY KEY` rowid auto-increment — no worker post-fill needed; the engine populates the value on its own. ### 7. Cascade summary on `DELETE` Parity with the DSL `do_delete` handler (ADR-0014). When `Command::SqlDelete { target_table, sql, … }` runs: 1. **Pre-execution cascade pre-count.** For each child table that has an FK referencing `target_table` with `ON DELETE CASCADE` (or `SET NULL`), the worker runs a pre-count query of the form `SELECT count(*) FROM WHERE IN (SELECT FROM WHERE )`. The `` is extracted from the SQL — see "Predicate extraction" below. 2. The worker executes the DELETE. 3. **Post-execution summary.** Per-relationship row counts + cascade impact, formatted by the same cascade-summary formatter the DSL path uses. **Predicate extraction.** The DSL handler reuses the typed `Expr` AST to construct the pre-count subqueries. For SQL DELETE the walker captures the WHERE clause's source byte range from the matched path and re-injects that text into the pre-count SQL. This is grammar-as-text reuse, consistent with ADR-0030 §4's posture; no AST construction required. When the SQL DELETE has no WHERE, the pre-count is unbounded ("all rows of `target_table`"). Cost: one extra count(*) per child table; cheaper than the DSL parallel path on a typical schema. The cascade-summary formatter itself (`format_cascade_summary` in the DSL handler) is shared with `do_sql_delete` — same output, same i18n keys. ### 8. Diagnostics new to Phase 3 Three new keys, classified per ADR-0027's ERROR / WARNING model. #### 8.1. `diagnostic.insert_arity_mismatch` (ERROR) Walker-level pre-flight when the declared `(column_name_list)`'s arity disagrees with a `value_tuple`'s arity (or, for `INSERT … SELECT`, with the SELECT's projection list arity, counted at parse time via the same projection-counting infrastructure Phase-2's `compound_arity_diagnostics` uses). For multi-row VALUES, **each** offending row emits its own diagnostic on that row's `value_tuple` span. A query with five rows and three wrong arities produces three diagnostics; the two right rows are silent. Pre-empts the engine's less-helpful "X values for Y columns" error. For `INSERT … SELECT`, the diagnostic anchors on the SELECT's first `projection_item` span. The Phase-2 projection-count infrastructure (the comma-counting at depth = leg-depth used by `compound_arity_diagnostics`) is the underlying mechanism. #### 8.2. `diagnostic.auto_column_overridden` (WARNING) Walker-level when an INSERT explicitly assigns a value to a column whose type is `serial` or `shortid`. Pedagogical: the explicit value bypasses the auto-counter / generator, which can collide with future auto-generated values. Catalog wording: `"column \`{column}\` is auto-generated (\`{type}\`); providing an explicit value bypasses the auto-counter and may collide with later auto-generated values"`. The statement still runs — the warning is advisory, matching the Phase-2 predicate-warning posture (ADR-0027 §1 / ADR-0032 §11.6). #### 8.3. `diagnostic.not_null_missing` (WARNING) Walker-level when an INSERT's `(column_name_list)` omits a column declared `NOT NULL` with no `DEFAULT`. Pedagogical nudge — the engine will reject the statement, but the walker can pre-flight the missing column at parse time. Catalog wording: `"column \`{column}\` is required (\`NOT NULL\`, no default); the statement will fail when run"`. The walker doesn't BLOCK the statement (it's structurally valid). The engine does. The friendly layer translates the engine error via the existing NOT-NULL path. #### 8.4. Inherited diagnostics The Phase-2 diagnostic passes apply to DML's `sql_expr` slots without further work: - `schema_existence_diagnostics` flags unknown columns in WHERE, SET, RETURNING, VALUES tuples. - `sql_predicate_warnings` flags `= NULL`, `LIKE`-on-numeric, type-mismatched comparisons in WHERE and CASE expressions. - The `pending_diagnostics` accumulator path stays available for in-walk emits. ### 9. UPSERT — `ON CONFLICT … DO NOTHING / DO UPDATE` ``` INSERT INTO t (id, name) VALUES (1, 'x') ON CONFLICT (id) DO UPDATE SET name = excluded.name INSERT INTO t (id, name) VALUES (1, 'x') ON CONFLICT DO NOTHING ``` Grammar shape: ``` on_conflict_clause := ON CONFLICT [ '(' column_name_list ')' ] ( DO NOTHING | DO UPDATE SET assignment_list [ where_clause ] ) ``` The optional `(column_name_list)` is the conflict target — which columns' unique constraint to react to. Standard SQL allows this list to be empty (`ON CONFLICT DO NOTHING` is shorthand for "any conflict"). The DO UPDATE branch reuses the same `assignment_list` and optional `where_clause` shapes as the top-level UPDATE statement. Inside DO UPDATE expressions, the keyword `excluded` is a pseudo-table reference to the would-have-been-inserted row. **Note:** `excluded` is the SQLite + PostgreSQL spelling; standard SQL (`MERGE … WHEN MATCHED`) uses a different model. The playground commits to the `excluded` form because it is what SQLite supports and what the project's pedagogical target dialect uses (ADR-0001 / ADR-0002 — the engine is SQLite, even though its name is hidden from user-facing strings). This is a deliberate carve-out from ADR-0030 §7's engine-neutral posture, justified by the absence of a standard-SQL UPSERT spelling that both SQLite and PostgreSQL share. The walker admits `excluded` as a table alias for ident-completion purposes inside the DO UPDATE branch — its columns are the target table's columns. ### 10. Execute path — three typed Command variants `Command` gains three variants: ``` Command::SqlInsert { sql: String, source: Option, // for history.log target_table: String, // for re-persistence listed_columns: Option>, // None if no col-list provided } Command::SqlUpdate { sql: String, source: Option, target_table: String, } Command::SqlDelete { sql: String, source: Option, target_table: String, } ``` The walker's `ast_builder` for each shape extracts `target_table` and `listed_columns` from the matched path. The `sql` field is the validated source text the engine executes. `Database` gains three request types: ``` Request::RunSqlInsert { command, reply } Request::RunSqlUpdate { command, reply } Request::RunSqlDelete { command, reply } ``` Each worker handler: 1. Applies statement-specific pre-execution work (shortid auto-fill for `RunSqlInsert`; cascade pre-count for `RunSqlDelete`). 2. Prepares + executes the statement. If RETURNING is present, collects the result set as a `DataResult`. 3. Re-persists the target table's CSV (ADR-0030 §11) using the existing persistence write-through machinery. 4. Returns the result (DataResult for RETURNING; affected-row count + cascade summary for plain DELETE; affected-row count for plain INSERT/UPDATE). History logging: the worker writes the literal submitted line to `history.log` (ADR-0030 §11) before execution, exactly as `run_select` does. ### 11. Engine-error translation Three new catalog keys (the friendly-error layer's engine.* translation set, ADR-0019 / ADR-0030 §7): | Key | Engine condition | |-------------------------------------------|-----------------------------------------------------------------| | `engine.unique_constraint_failed` | The engine's `UNIQUE constraint failed: t.c` (already exists; will be re-used as-is for UPSERT-without-conflict-target cases) | | `engine.not_null_constraint_failed` | Already exists; covers SQL-side NOT-NULL failures uniformly | | `engine.foreign_key_constraint_failed` | Already exists; SQL DELETE that violates FK without cascade | No NEW engine.* keys are required for Phase 3 — the existing FK / UNIQUE / NOT-NULL translation paths cover the engine-side DML errors. Phase 3 reuses them. ### 12. Result-column type recovery on RETURNING A RETURNING projection that's a bare column reference recovers its playground type via the same column-origin metadata path Phase 2 introduced (`resolve_select_column_types` in `db.rs`, ADR-0032 §12 + Amendment 1). Computed expressions in RETURNING stay typeless. The renderer uses neutral alignment for typeless columns, consistent with SELECT. ### 13. Out of scope | ID | OOS | |-------|--------------------------------------------------------------------------------------| | OOS-1 | `INSERT INTO t DEFAULT VALUES` (the project's planned seed feature covers this) | | OOS-2 | `INSERT OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` (SQLite-specific) | | OOS-3 | `UPDATE … FROM other_table` (SQLite/PostgreSQL extension) | | OOS-4 | `WITH x AS (…) UPDATE …` / `WITH x AS (…) DELETE …` (deferrable; SELECT-side WITH is in scope) | | OOS-5 | Indexed-by / not-indexed table-source modifiers (SQLite-specific) | | OOS-6 | Multi-statement batches (already OOS via ADR-0030 §13 OOS-4) | OOS-4 is a deliberate choice: WITH-prefixed DML is standard SQL since SQL:2003, but its execution semantics (the WITH-defined CTE is visible to the DML's WHERE / SET / RETURNING expressions) require additional scope-frame plumbing that doesn't yet exist on the DML side. The Phase-2 grammar admits WITH at the SELECT level; Phase 3 keeps WITH a SELECT-only prefix. A later phase can lift this restriction with a clean ADR Amendment. ## Consequences - The walker gains a new node variant `Node::Guard` (§2 mechanism caveat) — a zero-byte-consumption gating node that can fail its enclosing Seq with a `ValidationError`. The walker driver's exhaustive `walk_node_inner` match grows one arm. Reusable beyond DML (Phase 4 DDL's similar shared-entry problem is a likely future consumer). - `Command` gains three variants (`SqlInsert`, `SqlUpdate`, `SqlDelete`); every exhaustive `match Command` callsite picks up three new arms (the recurring ADR-0028/0029 gotcha). - The DSL `data::INSERT` / `data::UPDATE` / `data::DELETE` CommandNodes' shapes become `Choice(SQL_shape, DSL_shape)` with the SQL branch mode-gated via a no-op leading validator (R1). - Three new walker diagnostics with three new catalog keys (`diagnostic.insert_arity_mismatch`, `diagnostic.auto_column_overridden`, `diagnostic.not_null_missing`). - Three new `Request` variants (`RunSqlInsert`/`RunSqlUpdate`/`RunSqlDelete`) and three new worker handlers (`do_sql_insert`/`do_sql_update`/`do_sql_delete`). - The persistence write-through path gains per-DML-kind entry points; the cascade-affected re-persistence in `do_sql_delete` is new (cascade-affected child tables are also re-persisted). - `RETURNING` lands a write-statement-returns-rows code path the project hasn't had before. Worker handlers either return affected-row count or a `DataResult` keyed by the `returning: bool` flag. - The cascade-summary formatter is reused between the DSL and SQL DELETE paths; the predicate-extraction byte-range injection (§7) is new code with the R2 mitigation. - `shortid` auto-fill code in `do_sql_insert` mirrors the DSL `do_insert` mechanism but takes a `listed_columns` slice from the parse rather than a typed `Command::Insert`. - The auto-snapshot machinery (ADR-0006) is unchanged but fires for SQL DML the same way it does for DSL DML — the snapshot trigger lives in the worker handlers. - All Phase-2 tests stay green (the SQL grammar pieces this ADR builds on are unchanged). - The DSL-vs-SQL dispatch mechanism (3a) is a foundation other projects in this codebase may need (e.g., Phase 4 DDL). Land it generically so it's reusable. ## Initial DA review A first-pass DA review of this ADR's draft surfaced ten critiques. The seven that surfaced real gaps were resolved in the body before this status moved to Proposed: 1. OOS-4 (WITH-prefixed DML) under-justified → R4 risk describes the mechanism gap concretely. 2. Dispatch mechanism hand-waved → §2's "Mode-gating mechanism" subsection enumerates three options and commits to the validator approach with R1 as the mitigation if it fails. 3. RETURNING execution hand-waved → §5 commits to a `returning: bool` flag on the Command variants. 4. Cascade-summary code-sharing claim too optimistic → §7's "Predicate extraction" subsection details the byte-range injection mechanism; R2 budgets for the performance-pathology case. 5. `excluded` keyword's engine-neutrality status → §9 carves it out explicitly as a justified non-neutral choice. 6. Multi-row INSERT arity diagnostic emit pattern unspecified → §8.1 clarifies "each offending row emits its own diagnostic" plus the INSERT…SELECT span. 7. Sub-phase ordering put dispatch (the riskiest piece) last → reordered so 3a lands the mechanism FIRST on a no-op shape. The remaining three critiques were either process points (DA gates at sub-phase boundaries — added per-sub-phase in Implementation notes) or already-covered (verification matrix — will live in the plan doc, matching Phase 2's pattern). Per CLAUDE.md / the Phase 2 process lesson, the DA review of THIS ADR will get a second pass before status moves to Accepted. Pin: do not rubber-stamp. ## Open implementation risks Recorded explicitly so the implementation plan can budget for them, not to allow silent drift later. ### R1. Mode-gated `Choice` branch mechanism unproven §2's option (a) — a no-op validator at the head of the SQL shape's Seq, rejecting in Simple mode — is plausible but untested. If the walker's `walk_seq` commits the Seq before the validator runs, the SQL branch's NoMatch becomes Failed and the Choice can't fall through to the DSL branch. **Mitigation:** sub-phase 3a lands the mechanism end-to-end on the smallest viable shape (a single `SqlNop` CommandNode whose shape is `Choice(SQL_branch[mode-gated-validator + token], DSL_branch[different-token])`) before any DML grammar lands. If the validator approach doesn't work, the alternatives (a mode-aware `Node::Choice` variant, or split-CommandNode dispatch) get evaluated before sub-phases 3b/3c/3d depend on the mechanism. ### R2. Cascade-summary predicate extraction §7 says the worker re-injects the WHERE-clause source bytes into pre-count subqueries. This is straightforward when the WHERE clause is a top-level predicate, but a future user might write WHERE clauses that themselves contain subqueries (e.g., `DELETE FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE …)`). The byte-range injection still works structurally, but a performance pathology is possible if the subquery is expensive and the cascade child count is large (the same subquery runs N+1 times). **Mitigation:** sub-phase 3c benchmarks the pre-count overhead on a representative schema before declaring the path acceptable. If the overhead is prohibitive, fall back to "report cascade count post-execution only" (less informative, but cheaper). ### R3. RETURNING result-column type recovery on multi-row INSERT §5 says result columns get the same column-origin type recovery SELECT uses. For a single-row INSERT … RETURNING this works identically to a SELECT. For multi-row INSERT, the engine emits one result row per inserted row; the column-origin metadata is the same for all (per ADR-0032 Amendment 1, column-origin is a property of the prepared statement, not the rows). No risk here — calling it out so the implementer doesn't re-derive the question. The empty-table type-recovery test already pins this invariant. ### R4. `INSERT … SELECT` with a CTE row source The Phase-2 SELECT grammar already admits `WITH … SELECT …` at the compound level (ADR-0032 §10.3 / Amendment 2). The `SQL_SELECT_COMPOUND` recursion point §1 uses pulls this in for free. `INSERT INTO archive WITH t AS (…) SELECT * FROM t` parses and runs. A user might reasonably expect WITH-prefixed DML to also work (see §13 OOS-4). That's explicitly out of scope for Phase 3; the discrepancy is documented here so the implementer doesn't need to figure it out from the grammar shape. ## Implementation notes Phased sub-plan with **explicit exit gates per sub-phase**. The implementation plan doc (`docs/plans/-adr-0033-phase-3.md`, modelled on `docs/plans/20260520-adr-0032-phase-2.md`) refines these and adds the cross-cut verification matrix in 3i. **Ordering rationale.** The dispatch mechanism (3a) is the foundation; if it fails to work, every later sub-phase has to be reworked. Land it first on a no-op shape, then build the DML grammar on top. ### 3a — `Node::Guard` + mode-gated Choice mechanism **Scope (in):** - Add a new walker node variant `Node::Guard(fn(&WalkContext) -> Result<(), ValidationError>)` (per §2 mechanism caveat) — zero-byte consumption, fails the enclosing Seq with `ValidationFailed` when the validator returns Err. - Implement `walk_guard` in `src/dsl/walker/driver.rs`. - Add the `reject_sql_in_simple_mode` guard function. - Build a smoke-test grammar: an experimental `CommandNode` with `shape: Choice(Seq[Guard(reject_sql_in_simple_mode), SQL_branch_token], DSL_branch_token)`. Verify the Choice falls through cleanly in Simple mode (DSL wins) and routes SQL-first in Advanced mode. **Scope (out):** - Any real DML grammar (3b onwards). - Wiring the existing `data::INSERT/UPDATE/DELETE` shapes (3j). **Exit gate (required green tests):** - `Node::Guard` admits via `walk_guard`; the walker driver's match arm exists and is tested. - Smoke-test grammar in Simple mode → DSL branch matches. - Smoke-test grammar in Advanced mode + SQL-shape input → SQL branch matches. - Smoke-test grammar in Simple mode + SQL-shape input → `ValidationFailed` with `message_key: "advanced_mode.sql_in_simple"`. - Smoke-test grammar in Advanced mode + DSL-shape input → DSL branch matches (Choice fall-through works even when SQL branch is admissible). - All Phase-2 tests (1446 baseline) still green. - Clippy clean. **DA gate (written):** - Does the Guard actually fire BEFORE the Seq commits? Failure mode: the Seq processes the Guard, the Guard fails, but the Seq has already "committed" to its branch and the Choice can't fall through. Test specifically: Advanced mode + an input that the DSL branch matches but the SQL branch's remaining tokens reject → must fall through to DSL, not Failed. - Is the Guard mechanism reusable beyond DML (e.g., for Phase 4 DDL's similar dispatch)? If yes, document the pattern in `src/dsl/grammar/mod.rs` doc comments so the next phase doesn't reinvent it. - If the Guard mechanism doesn't work (R1's worst case), the sub-phase REOPENS with mechanism option (b) or (c) evaluated before any DML grammar lands on top of broken scaffolding. ### 3b — INSERT grammar + minimal execution **Scope (in):** - `SQL_INSERT_SHAPE`: single-row + multi-row VALUES; the optional `(column_name_list)`; the `__rdbms_*` rejection on the table-name slot. - `Command::SqlInsert { sql, source, target_table, listed_columns, returning: false }` — `returning` always false in 3b. - `Request::RunSqlInsert` + `do_sql_insert` worker handler (no shortid auto-fill yet — explicit values required). - Persistence write-through on the target table after execution. - History-log of the literal submitted line. **Scope (out):** - `INSERT … SELECT` (3c). - `RETURNING` (3e). - `UPSERT` (3f). - `shortid` auto-fill (3d). - Diagnostics (3g). - DSL-vs-SQL dispatch wiring on `data::INSERT` (3h — development uses a separate `data::SqlInsert` CommandNode). **Exit gate (required green tests):** - **Single-row INSERT** runs end-to-end; affected-row count surfaces; CSV is re-persisted. - **Multi-row INSERT** (`INSERT INTO t VALUES (1, 'a'), (2, 'b')`) runs; both rows land; CSV reflects them. - **`(column_name_list)`** variant works (`INSERT INTO t (a, b) VALUES (1, 'x')`). - **`__rdbms_*` rejection** at the INSERT target-table slot (per ADR-0030 §6 / ADR-0032 §1). - **History line** is the literal submitted SQL. - All Phase-2 tests (1446 baseline) still green. **DA gate (written):** - Does an INSERT failing mid-multi-row leave persistence in a consistent state? (Engine transactional behaviour applies; worker must not re-persist on failure.) - Does the `target_table` extraction handle case-folding the same way the rest of the codebase does? - Are the existing DSL INSERT tests still green? Any DSL-side regression means 3b broke something it shouldn't have. ### 3c — INSERT … SELECT **Scope (in):** - Add the row-source `Choice` to `SQL_INSERT_SHAPE`: `values_clause | Subgrammar(&sql_select::SQL_SELECT_COMPOUND)`. - Update `do_sql_insert` to handle the SELECT-driven path (no special worker work needed beyond preparing the composite statement; the engine handles the insert-from-query flow). **Exit gate:** - `INSERT INTO archive SELECT * FROM orders WHERE …` runs. - `INSERT INTO target (a, b) SELECT x, y FROM source` runs. - `INSERT INTO t WITH x AS (…) SELECT * FROM x` runs (R4 invariant — Phase-2 CTE prefix on SELECT applies). - Schema-existence diagnostics fire on the SELECT's projection (Phase-2 machinery, free). **DA gate (written):** - Does the WITH-prefixed SELECT row source actually parse through SQL_SELECT_COMPOUND, or does the INSERT prefix break the recursion? Test on a real `WITH … INSERT … SELECT`. - Does the cascade-summary path (3e DELETE) accidentally apply to INSERTs whose SELECT touches the target table? (It must not — INSERTs have no cascade.) ### 3d — `shortid` auto-fill (worker) **Scope (in):** - Detect omitted `shortid` columns at execution time in `do_sql_insert`. - Synthesise fresh shortid values; bind them positionally. - Cover both VALUES and SELECT row sources (for SELECT, the shortid columns are added to the projection's bind list before execution). **Exit gate:** - `INSERT INTO t (name) VALUES ('x')` where `t` has a `pk shortid` column auto-fills that PK. - An explicit value for a shortid column is respected (override behaviour — warning fires per 3g but the value is preserved). - Multi-row INSERT produces distinct shortids per row. - INSERT … SELECT auto-fills shortids for each yielded row. **DA gate (written):** - Are the synthesised shortids actually unique across a multi-row INSERT? (The generator should be called per-row, not once.) - Does the auto-fill respect the column's case-preserved name? - What happens if the table has both `serial` and `shortid` PK columns? (Spec: serial via SQLite rowid; shortid via worker post-fill. Verify no double-fill collision.) ### 3e — UPDATE grammar + execution **Scope (in):** - `SQL_UPDATE_SHAPE`: target table, SET assignment list, optional WHERE. - `Command::SqlUpdate`, `Request::RunSqlUpdate`, `do_sql_update`. - Persistence write-through on the target table. **Exit gate:** - Single-column UPDATE with WHERE runs; affected-row count surfaces; CSV reflects changes. - Multi-column UPDATE (`SET a = 1, b = 2`) runs. - UPDATE without WHERE runs (per ADR-0030 §12 — no `--all-rows` guard). - Schema-existence diagnostic fires on unknown columns in SET and WHERE. **DA gate (written):** - Does the SET assignment expression accept the full `sql_expr` (function calls, subqueries, CASE)? Verify on a representative expression. - Does an UPDATE that touches an auto-gen column produce the 3g auto-gen warning? ### 3f — DELETE grammar + execution + cascade summary **Scope (in):** - `SQL_DELETE_SHAPE`: target table, optional WHERE. - `Command::SqlDelete`, `Request::RunSqlDelete`, `do_sql_delete`. - Cascade-summary pre-count and post-execution formatting per §7. - Persistence write-through on target + affected child tables (cascade-affected tables also need re-persistence). **Exit gate:** - DELETE with WHERE runs; affected-row count + cascade summary surface; CSVs of target AND cascade-affected children are re-persisted. - DELETE without WHERE runs (no `--all-rows` rail). - Cascade summary matches the DSL `do_delete` output on the same schema/data. - The cascade-summary code path is shared (no duplicate formatter). **DA gate (written):** - R2 mitigation: does the predicate-extraction byte-range injection work on a WHERE containing a subquery? Test `DELETE FROM a WHERE id IN (SELECT id FROM b WHERE …)`. - Does the cascade pre-count run in a way that doesn't see the post-DELETE state? (Order matters: pre-count BEFORE execute.) ### 3g — RETURNING **Scope (in):** - Add `RETURNING projection_list` to all three top-level shapes. - Walker captures `returning: bool` on the Command variants. - Worker handlers route to `DataResult` collection when `returning == true`. - Result-column type recovery via the Phase-2 column-origin path. **Exit gate:** - `INSERT … RETURNING *` returns the inserted row(s) as a DataResult; affected-row count is implicit (rows.len()). - `UPDATE … RETURNING id, new_value` returns the modified columns. - `DELETE … RETURNING *` returns what was just deleted (BEFORE the row is gone). - Result-column types recovered for bare-column refs (per ADR-0032 Amendment 1). - The cascade summary on a `DELETE … RETURNING` is preserved alongside the result set (both surface to the user). **DA gate (written):** - Does the RETURNING result-set rendering use the same DataResult path SELECT uses? Or did a DML-specific copy sneak in? - Does `RETURNING expr AS alias` admit aliases the same way the projection list does? ### 3h — UPSERT (ON CONFLICT … DO NOTHING / DO UPDATE) **Scope (in):** - Add `on_conflict_clause` to `SQL_INSERT_SHAPE`. - Admit `excluded` as a table alias inside DO UPDATE expressions (§9's clarification). - Walker captures the UPSERT shape; worker executes as-is (no special pre-processing). **Exit gate:** - `INSERT … ON CONFLICT (id) DO NOTHING` runs on a conflicting row; affected-row count is 0. - `INSERT … ON CONFLICT (id) DO UPDATE SET v = excluded.v` runs; the conflicting row is updated to the new value. - `ON CONFLICT DO NOTHING` (no target spec) handles any conflict. - `excluded.col` completes / highlights as a column ref to the target table's columns inside the DO UPDATE expressions. **DA gate (written):** - Does ON CONFLICT … DO UPDATE trigger persistence write-through? (Yes; the target row is modified.) - Is the `excluded` alias scoped to ONLY the DO UPDATE branch, not leaking into the outer INSERT? ### 3i — Diagnostics **Scope (in):** - `diagnostic.insert_arity_mismatch` (§8.1). - `diagnostic.auto_column_overridden` (§8.2). - `diagnostic.not_null_missing` (§8.3). - Per-key positive + negative tests + catalog entries. **Exit gate:** - One positive + one negative test per new key (the ADR-0032 §2d exit gate pattern): - `insert_arity_mismatch`: `INSERT INTO t (a, b) VALUES (1, 2, 3)` fires; matched arity doesn't. - `auto_column_overridden`: `INSERT INTO t (id, name) VALUES (5, 'x')` (id is `serial`) fires; omitting `id` doesn't. - `not_null_missing`: `INSERT INTO t (a) VALUES (1)` (b is NOT NULL no default) fires; including `b` doesn't. - Multi-row arity mismatch emits per-row, not just on the first offending row. - The inherited Phase-2 diagnostic passes (`schema_existence`, `sql_predicate_warnings`, `compound_arity_diagnostics`) still fire on DML's `sql_expr` slots. **DA gate (written):** - Does `auto_column_overridden` produce a Warning, not an Error? (Spec: WARNING — the statement still runs.) - Does `not_null_missing` produce a Warning? (Same.) - Are the catalog wordings engine-neutral? ### 3j — Dispatch wiring on shared entry words **Scope (in):** - Convert `data::INSERT`, `data::UPDATE`, `data::DELETE` shapes to `Choice(SQL_shape, DSL_shape)`. - Remove the experimental separate `data::SqlInsert` / `SqlUpdate` / `SqlDelete` CommandNodes that 3a-3i used as scaffolding. - Verify DSL-only inputs still match the DSL branch in both modes. - Verify SQL-only inputs match the SQL branch in Advanced mode and fail with "this is SQL" in Simple mode. **Exit gate:** - All existing DSL INSERT/UPDATE/DELETE tests still green (unmodified inputs, unmodified outputs). - All Phase-3 SQL DML tests still green via the new dispatch path (the SQL branch is reached through `data::INSERT` etc., not through `data::SqlInsert`). - A structurally-ambiguous input (`delete from t where id = 1`) in Advanced mode routes through the SQL branch (verifiable by checking that the cascade-summary code path fired — DSL and SQL paths share the formatter but the entry point differs). - A DSL-specific input (`delete from t --all-rows`) in Advanced mode falls through to the DSL branch. **DA gate (written):** - Verify that no test was changed to make 3j pass — only the shape's structure changes; the observable behaviour is identical. - Is the test count higher than before 3j? It should be (the scaffolding CommandNodes had their own tests; now the shared CommandNodes carry both). ### 3k — Verification sweep **Scope (in):** - Fill in the cross-cut verification matrix (`docs/plans/-adr-0033-phase-3.md`). - Produce the phase-exit verification report. - DA final review (genuine, not rubber-stamp — see Phase 2's process lesson in `docs/handoff/20260520-handoff-28.md` §4). **Exit gate:** - Every matrix row green with a concrete `file::function` test reference. - Final test run: zero failures, zero skips, no regressions from the Phase-2 baseline of 1446 / 0 / 1. - Clippy clean. - Report at `docs/handoff/-phase-3-verification.md` documents all autonomous decisions with explicit user-confirmation pointers and the DA's written critiques (not just the verdict). **DA gate (written):** - Did filling the matrix surface gaps that need closing? (Phase 2 surfaced THREE; expect at least one for Phase 3.) - Are any sub-phase exit gates above weaker than they should be in hindsight? - Did any sub-phase make an autonomous decision that's not in this ADR? If so, flag for retroactive user approval before phase-exit. ## Out of scope (clarifications already established) - **No engine type names** in or out (ADR-0030 §5 / §7). - **No `STRICT`**, no SQLite storage clauses (ADR-0030 §7). - **No transaction control** (ADR-0030 §13 OOS-4). - **No multi-statement batches** (ADR-0030 §13 OOS-4). - **No `EXPLAIN` of SQL DML** (ADR-0030 §13 OOS-2 — the DSL `explain` still wraps what it already wraps). ## See also - ADR-0005 — the ten-type vocabulary INSERT works with. - ADR-0006 — the auto-snapshot before destructive ops. - ADR-0014 — the DSL DML model + cascade-summary precedent. - ADR-0015 — the persistence write-through path. - ADR-0018 — the DSL INSERT forms (A / B / C). - ADR-0019 — the friendly-error layer DML errors route through. - ADR-0027 — the validity indicator surfaces DML diagnostics. - ADR-0030 — the SQL surface in advanced mode; Phase 3 is this ADR's commission. - ADR-0031 — the SQL expression grammar, consumed by DML WHERE / SET / VALUES / RETURNING. - ADR-0032 — the SQL SELECT grammar; `INSERT … SELECT` and RETURNING reuse its projection-list and scope-frame machinery.