# ADR-0033: The full SQL DML grammar — `INSERT` / `UPDATE` / `DELETE`

## Status

Proposed

## Context

ADR-0030 commissions advanced mode as a body of SQL grammar
inside the unified grammar tree. Phase 1 ("foundations + first
`SELECT`") and Phase 2 ("`SELECT` — full") have shipped through
ADR-0031 and ADR-0032 respectively. The next slice on ADR-0030's
roadmap is **Phase 3 — DML**: `INSERT` / `UPDATE` / `DELETE` as
validated SQL.

ADR-0030 §3 fixed the headline scope:

> **`INSERT`** (single- and multi-row), **`UPDATE`**, **`DELETE`**.

ADR-0030 §4 fixed the execute path:

> **DML and `SELECT`** → executed as the **validated SQL itself**
> (re-rendered canonically from the matched parse, or the validated
> original text). … For DML the worker — knowing the statement kind
> and target table from the parse — runs the statement and
> re-persists that table's CSV.

ADR-0030 §11 fixed the persistence model. ADR-0030 §12 fixed the
safety posture: SQL `UPDATE`/`DELETE` without `WHERE` executes as
written (no `--all-rows` rail).

ADR-0030 §3 also flagged this ADR's existence in advance:

> Settle multi-row `INSERT` and `shortid` auto-fill on a SQL
> `INSERT`.

This ADR fixes the **grammar shape**, the **dispatch
architecture** for shared entry words, the **diagnostic surface**
new to DML, and the **execute-path** Command variants. It does
not revisit ADR-0030's controlling decisions; references in this
text to ADR-0030 §X mean "that decision is controlling."

### What ADR-0030, ADR-0031, ADR-0032 already fix

- One walker, grammar-as-text execution, ambient assistance for
  free (ADR-0030 §1/§4/§6/§8).
- `sql_expr` is the shared expression grammar — UPDATE `SET`
  expressions, WHERE predicates, INSERT VALUES tuples, RETURNING
  projection items all consume it (ADR-0031, ADR-0032 §5/§6).
- `Node::ScopedSubgrammar` + `from_scope_stack` carry CTEs and
  per-statement scope into nested constructs (ADR-0032 §10).
- The diagnostic pipeline — `schema_existence_diagnostics`,
  `sql_predicate_warnings`, `compound_arity_diagnostics`,
  `pending_diagnostics` accumulator — applies uniformly to DML's
  `sql_expr` slots (ADR-0032 §11).
- The friendly-error layer translates engine messages through the
  `engine.*` catalog (ADR-0019, ADR-0030 §7, ADR-0032 §11.5).
- The `__rdbms_*` rejection at every table-source slot
  (ADR-0030 §6, ADR-0032 §1/§4).
- `MAX_SUBGRAMMAR_DEPTH = 64` (ADR-0026) is the recursion budget.

## Decision

### 1. The top-level statement shapes

```
insert_statement   := INSERT INTO table_name
                      [ '(' column_name_list ')' ]
                      ( values_clause | sql_select_compound )
                      [ on_conflict_clause ]
                      [ returning_clause ]
                      [ ';' ]

update_statement   := UPDATE table_name
                      SET assignment_list
                      [ where_clause ]
                      [ returning_clause ]
                      [ ';' ]

delete_statement   := DELETE FROM table_name
                      [ where_clause ]
                      [ returning_clause ]
                      [ ';' ]
```

Where:

```
values_clause      := VALUES value_tuple ( ',' value_tuple )*
value_tuple        := '(' sql_expr ( ',' sql_expr )* ')'
assignment_list    := assignment ( ',' assignment )*
assignment         := column_name '=' sql_expr
returning_clause   := RETURNING projection_list
on_conflict_clause := ON CONFLICT [ '(' column_name_list ')' ]
                      ( DO NOTHING
                      | DO UPDATE SET assignment_list [ where_clause ] )
```

- `table_name` is the existing `Tables` ident slot, with
  `reject_internal_table` validator (ADR-0030 §6).
- `column_name_list` is comma-separated `Columns` idents — the
  same shape `data::INSERT` (DSL) uses for Form A. Reuses the
  existing pattern; no new node infrastructure.
- `where_clause` is `WHERE Subgrammar(&sql_expr::SQL_OR_EXPR)`.
- `projection_list` is the Phase-2 projection list, re-used
  unchanged.

The body of an `INSERT … SELECT` recurses through
`Node::Subgrammar(&sql_select::SQL_SELECT_COMPOUND)` — the same
recursion point Phase 2's CTE bodies, scalar subqueries, and
`IN`/`EXISTS` bodies use. The Phase-2 SELECT grammar already
admits an optional leading `WITH` (Amendment 2 / commit
`fd25904`), so `INSERT INTO t WITH x AS (…) SELECT * FROM x`
works for free.

### 2. Dispatch — DSL vs SQL on shared entry words

The entry words `insert`, `update`, `delete` are **shared**
between DSL and SQL. This ADR fixes the dispatch rule **and**
the mechanism that implements it (a known-risk item — see
"Open implementation risks" below).

**Simple mode → DSL only.** SQL constructs (multi-row VALUES,
INSERT…SELECT, RETURNING, UPSERT) attempted in simple mode
produce a `ValidationFailed` "this is SQL" hint, matching the
gating that `select` / `with` already produce (ADR-0030 §2).

**Advanced mode → SQL first, DSL fallback.** Each shared entry
word's `CommandNode.shape` becomes a `Choice` between the SQL
shape and the DSL shape, declared in that order:

```
shape: Node::Choice(&[SQL_INSERT_SHAPE, DSL_INSERT_SHAPE])
```

The walker's `Choice` is first-match-wins. The SQL shape's
constructs are a strict superset of the DSL shape's for these
statements (multi-row VALUES, INSERT…SELECT, RETURNING, UPSERT
are SQL-only; DSL's `--all-rows` flag is DSL-only). Any
structurally-ambiguous input — `delete from t where id = 1` —
matches the SQL shape first and routes through the SQL
machinery.

This is the design point an earlier draft of ADR-0030 §2 left
open ("choice order or a disambiguator decides"). The chosen
disambiguator is `Choice` order: **SQL > DSL in Advanced**. A
DSL-specific construct (e.g., `--all-rows`) won't match the SQL
shape and falls through to DSL — accommodating, not surprising.

Cascade-summary parity (§7) and `shortid` auto-fill parity (§6)
mean a DSL-style INSERT/UPDATE/DELETE in Advanced mode produces
identical observable behaviour whether it routes through the SQL
or DSL path. The user's mental model can be "Advanced mode runs
SQL; DSL syntax still works as a familiar fallback."

#### Mode-gating mechanism

The existing `is_advanced_only` infrastructure
(`src/dsl/grammar/mod.rs::ADVANCED_ONLY_ENTRIES`) gates by
**entry word**, fires in the walker's `dispatch_command_node`
path (`walker/mod.rs:1707`), and emits the "this is SQL"
ValidationFailed verdict. It can't be reused to gate a Choice
*branch* by mode.

Mechanism options for branch-level gating:

- **(a) Mode-aware validator on the SQL branch.** Each
  SQL-shape Seq starts with a no-op `Node::Validated` whose
  validator inspects `ctx.mode`; in Simple mode it returns
  `ValidationError(message_key: "advanced_mode.sql_in_simple")`
  before any structural tokens commit. Trade-off: clean, reuses
  the existing validator path; the failure shape is the same
  `WalkOutcome::ValidationFailed` as the entry-word gate.
- **(b) Mode flag on `Node::Choice`.** A new walker capability:
  `Node::Choice` branches can carry a min-mode marker. Heavier
  (touches the walker's `walk_choice` driver) but lets the
  grammar declare gating without per-branch boilerplate.
- **(c) Two CommandNodes per entry word; registry picks by
  mode.** The dispatcher consults `ctx.mode` and skips the SQL
  CommandNode in Simple mode. Heavier (registry semantics
  change) and surfaces a "multiple CommandNodes per entry word"
  pattern the rest of the codebase doesn't use.

**Default choice: (a) the validator approach.** Smallest
mechanism change. **Walker-capability caveat:** validators
today are an `Ident { validator: Option<fn> }` field — there
is no standalone "guard" node that runs a validator without
consuming input. Sub-phase 3a's first job is to add
`Node::Guard(fn(&WalkContext) -> Result<(), ValidationError>)`
to the walker — a zero-consumption node whose only effect is
to fail the Seq with a ValidationError in Simple mode. The
existing internal-table rejection (`reject_internal_table`)
shows the wiring pattern; `Node::Guard` is the
"no-byte-consuming version" of it.

### 3. Multi-row `INSERT … VALUES`

```
INSERT INTO orders (customer_id, total) VALUES (1, 99.50), (2, 12.00), (3, 6.25)
```

The grammar's `values_clause` repeats `value_tuple` with a comma
separator (`Node::Repeated { inner: value_tuple, separator:
Some(COMMA), min: 1 }`). Each tuple matches the same arity as
the `(column_name_list)` (see §8 for the arity diagnostic).

### 4. `INSERT … SELECT`

```
INSERT INTO archive SELECT * FROM orders WHERE created < '2025-01-01'
INSERT INTO summary (id, total) WITH t AS (…) SELECT id, sum(total) FROM t GROUP BY id
```

A `Choice` between `values_clause` and
`Subgrammar(&sql_select::SQL_SELECT_COMPOUND)` after the
optional column list. The Phase-2 SELECT machinery (CTEs, JOINs,
subqueries, set ops) applies as the row source for free.

Arity verification: at parse time, the walker counts the
declared column list (or, when absent, the target table's column
count from the schema cache) against the SELECT's projection
list arity. Mismatch emits `diagnostic.insert_arity_mismatch`
(see §8).

### 5. `RETURNING`

`RETURNING projection_list` admits the Phase-2 projection_list
unchanged — column refs, `*`, `t.*`, expressions with `AS`
aliases, computed expressions.

Execution: the `Command::Sql{Insert,Update,Delete}` variants
carry a `returning: bool` flag set by the walker's
`ast_builder` when a `RETURNING` clause matched. The worker
reads this flag (no SQL text re-parsing) and routes:

- `returning == true` → prepare + step the statement,
  collect rows into a `DataResult` (per-result-column type
  recovery via the same column-origin path Phase 2 introduced,
  ADR-0032 §12 + Amendment 1).
- `returning == false` → execute the statement,
  collect the affected-row count.

The flag is sufficient because the *shape* of the projection
is already in the SQL the engine prepares; the worker doesn't
need to re-parse to know which columns to read.

### 6. `shortid` auto-fill — worker post-fills

Parity with the DSL `do_insert` handler (ADR-0005 / ADR-0018).

For an `INSERT` whose `(column_name_list)` omits one or more
`shortid` columns, the worker:

1. Inspects the table metadata to find the auto-generated columns
   (`serial`, `shortid`) NOT in the user's column list.
2. For each row in the VALUES list (single or multi-row), and for
   `INSERT … SELECT` each yielded row, synthesises a fresh
   `shortid` value (the existing `shortid::generate` path).
3. Constructs the parameterised INSERT (or executes a rewrite of
   the SQL with the values bound positionally) and runs it.

The engine sees a fully-specified row; the user's submitted SQL
is unmodified in `history.log` (ADR-0030 §11). The walker
recognises an explicit value provided for an auto-gen column —
that's an explicit override, not a missing slot — and emits a
WARNING (§8b) but lets it through.

`serial` columns rely on SQLite's `INTEGER PRIMARY KEY` rowid
auto-increment — no worker post-fill needed; the engine
populates the value on its own.

### 7. Cascade summary on `DELETE`

Parity with the DSL `do_delete` handler (ADR-0014).

When `Command::SqlDelete { target_table, sql, … }` runs:

1. **Pre-execution cascade pre-count.** For each child table
   that has an FK referencing `target_table` with
   `ON DELETE CASCADE` (or `SET NULL`), the worker runs a
   pre-count query of the form `SELECT count(*) FROM <child>
   WHERE <fk_col> IN (SELECT <pk_col> FROM <target_table>
   WHERE <user_where_predicate>)`. The `<user_where_predicate>`
   is extracted from the SQL — see "Predicate extraction" below.
2. The worker executes the DELETE.
3. **Post-execution summary.** Per-relationship row counts +
   cascade impact, formatted by the same cascade-summary
   formatter the DSL path uses.

**Predicate extraction.** The DSL handler reuses the typed
`Expr` AST to construct the pre-count subqueries. For SQL
DELETE the walker captures the WHERE clause's source byte
range from the matched path and re-injects that text into
the pre-count SQL. This is grammar-as-text reuse, consistent
with ADR-0030 §4's posture; no AST construction required.

When the SQL DELETE has no WHERE, the pre-count is unbounded
("all rows of `target_table`"). Cost: one extra count(*) per
child table; cheaper than the DSL parallel path on a typical
schema.

The cascade-summary formatter itself (`format_cascade_summary`
in the DSL handler) is shared with `do_sql_delete` — same
output, same i18n keys.

### 8. Diagnostics new to Phase 3

Three new keys, classified per ADR-0027's ERROR / WARNING model.

#### 8.1. `diagnostic.insert_arity_mismatch` (ERROR)

Walker-level pre-flight when the declared `(column_name_list)`'s
arity disagrees with a `value_tuple`'s arity (or, for `INSERT …
SELECT`, with the SELECT's projection list arity, counted at
parse time via the same projection-counting infrastructure
Phase-2's `compound_arity_diagnostics` uses).

For multi-row VALUES, **each** offending row emits its own
diagnostic on that row's `value_tuple` span. A query with five
rows and three wrong arities produces three diagnostics; the
two right rows are silent. Pre-empts the engine's less-helpful
"X values for Y columns" error.

For `INSERT … SELECT`, the diagnostic anchors on the SELECT's
first `projection_item` span. The Phase-2 projection-count
infrastructure (the comma-counting at depth = leg-depth used by
`compound_arity_diagnostics`) is the underlying mechanism.

#### 8.2. `diagnostic.auto_column_overridden` (WARNING)

Walker-level when an INSERT explicitly assigns a value to a
column whose type is `serial` or `shortid`. Pedagogical: the
explicit value bypasses the auto-counter / generator, which can
collide with future auto-generated values.

Catalog wording: `"column \`{column}\` is auto-generated
(\`{type}\`); providing an explicit value bypasses the
auto-counter and may collide with later auto-generated values"`.

The statement still runs — the warning is advisory, matching
the Phase-2 predicate-warning posture (ADR-0027 §1 /
ADR-0032 §11.6).

#### 8.3. `diagnostic.not_null_missing` (WARNING)

Walker-level when an INSERT's `(column_name_list)` omits a
column declared `NOT NULL` with no `DEFAULT`. Pedagogical
nudge — the engine will reject the statement, but the
walker can pre-flight the missing column at parse time.

Catalog wording: `"column \`{column}\` is required (\`NOT NULL\`,
no default); the statement will fail when run"`.

The walker doesn't BLOCK the statement (it's structurally
valid). The engine does. The friendly layer translates the
engine error via the existing NOT-NULL path.

#### 8.4. Inherited diagnostics

The Phase-2 diagnostic passes apply to DML's `sql_expr` slots
without further work:

- `schema_existence_diagnostics` flags unknown columns in WHERE,
  SET, RETURNING, VALUES tuples.
- `sql_predicate_warnings` flags `= NULL`, `LIKE`-on-numeric,
  type-mismatched comparisons in WHERE and CASE expressions.
- The `pending_diagnostics` accumulator path stays available
  for in-walk emits.

### 9. UPSERT — `ON CONFLICT … DO NOTHING / DO UPDATE`

```
INSERT INTO t (id, name) VALUES (1, 'x')
ON CONFLICT (id) DO UPDATE SET name = excluded.name

INSERT INTO t (id, name) VALUES (1, 'x')
ON CONFLICT DO NOTHING
```

Grammar shape:

```
on_conflict_clause := ON CONFLICT [ '(' column_name_list ')' ]
                      ( DO NOTHING
                      | DO UPDATE SET assignment_list [ where_clause ] )
```

The optional `(column_name_list)` is the conflict target —
which columns' unique constraint to react to. Standard SQL allows
this list to be empty (`ON CONFLICT DO NOTHING` is shorthand for
"any conflict").

The DO UPDATE branch reuses the same `assignment_list` and
optional `where_clause` shapes as the top-level UPDATE statement.

Inside DO UPDATE expressions, the keyword `excluded` is a
pseudo-table reference to the would-have-been-inserted row.
**Note:** `excluded` is the SQLite + PostgreSQL spelling;
standard SQL (`MERGE … WHEN MATCHED`) uses a different model.
The playground commits to the `excluded` form because it is what
SQLite supports and what the project's pedagogical target dialect
uses (ADR-0001 / ADR-0002 — the engine is SQLite, even though
its name is hidden from user-facing strings). This is a
deliberate carve-out from ADR-0030 §7's engine-neutral posture,
justified by the absence of a standard-SQL UPSERT spelling that
both SQLite and PostgreSQL share.

The walker admits `excluded` as a table alias for
ident-completion purposes inside the DO UPDATE branch — its
columns are the target table's columns.

### 10. Execute path — three typed Command variants

`Command` gains three variants:

```
Command::SqlInsert {
    sql: String,
    source: Option<String>,         // for history.log
    target_table: String,           // for re-persistence
    listed_columns: Option<Vec<String>>,  // None if no col-list provided
}

Command::SqlUpdate {
    sql: String,
    source: Option<String>,
    target_table: String,
}

Command::SqlDelete {
    sql: String,
    source: Option<String>,
    target_table: String,
}
```

The walker's `ast_builder` for each shape extracts
`target_table` and `listed_columns` from the matched path. The
`sql` field is the validated source text the engine executes.

`Database` gains three request types:

```
Request::RunSqlInsert { command, reply }
Request::RunSqlUpdate { command, reply }
Request::RunSqlDelete { command, reply }
```

Each worker handler:

1. Applies statement-specific pre-execution work (shortid
   auto-fill for `RunSqlInsert`; cascade pre-count for
   `RunSqlDelete`).
2. Prepares + executes the statement. If RETURNING is present,
   collects the result set as a `DataResult`.
3. Re-persists the target table's CSV (ADR-0030 §11) using the
   existing persistence write-through machinery.
4. Returns the result (DataResult for RETURNING; affected-row
   count + cascade summary for plain DELETE; affected-row count
   for plain INSERT/UPDATE).

History logging: the worker writes the literal submitted line
to `history.log` (ADR-0030 §11) before execution, exactly as
`run_select` does.

### 11. Engine-error translation

Three new catalog keys (the friendly-error layer's engine.*
translation set, ADR-0019 / ADR-0030 §7):

| Key                                       | Engine condition                                                |
|-------------------------------------------|-----------------------------------------------------------------|
| `engine.unique_constraint_failed`         | The engine's `UNIQUE constraint failed: t.c` (already exists; will be re-used as-is for UPSERT-without-conflict-target cases) |
| `engine.not_null_constraint_failed`       | Already exists; covers SQL-side NOT-NULL failures uniformly     |
| `engine.foreign_key_constraint_failed`    | Already exists; SQL DELETE that violates FK without cascade     |

No NEW engine.* keys are required for Phase 3 — the existing FK
/ UNIQUE / NOT-NULL translation paths cover the engine-side DML
errors. Phase 3 reuses them.

### 12. Result-column type recovery on RETURNING

A RETURNING projection that's a bare column reference recovers
its playground type via the same column-origin metadata path
Phase 2 introduced (`resolve_select_column_types` in `db.rs`,
ADR-0032 §12 + Amendment 1).

Computed expressions in RETURNING stay typeless. The renderer
uses neutral alignment for typeless columns, consistent with
SELECT.

### 13. Out of scope

| ID    | OOS                                                                                  |
|-------|--------------------------------------------------------------------------------------|
| OOS-1 | `INSERT INTO t DEFAULT VALUES` (the project's planned seed feature covers this)      |
| OOS-2 | `INSERT OR REPLACE` / `OR IGNORE` / `OR ABORT` / `OR FAIL` / `OR ROLLBACK` (SQLite-specific) |
| OOS-3 | `UPDATE … FROM other_table` (SQLite/PostgreSQL extension)                           |
| OOS-4 | `WITH x AS (…) UPDATE …` / `WITH x AS (…) DELETE …` (deferrable; SELECT-side WITH is in scope) |
| OOS-5 | Indexed-by / not-indexed table-source modifiers (SQLite-specific)                   |
| OOS-6 | Multi-statement batches (already OOS via ADR-0030 §13 OOS-4)                        |

OOS-4 is a deliberate choice: WITH-prefixed DML is standard SQL
since SQL:2003, but its execution semantics (the WITH-defined
CTE is visible to the DML's WHERE / SET / RETURNING expressions)
require additional scope-frame plumbing that doesn't yet exist
on the DML side. The Phase-2 grammar admits WITH at the
SELECT level; Phase 3 keeps WITH a SELECT-only prefix. A later
phase can lift this restriction with a clean ADR Amendment.

## Consequences

- The walker gains a new node variant `Node::Guard` (§2
  mechanism caveat) — a zero-byte-consumption gating node that
  can fail its enclosing Seq with a `ValidationError`. The
  walker driver's exhaustive `walk_node_inner` match grows one
  arm. Reusable beyond DML (Phase 4 DDL's similar shared-entry
  problem is a likely future consumer).
- `Command` gains three variants (`SqlInsert`, `SqlUpdate`,
  `SqlDelete`); every exhaustive `match Command` callsite picks
  up three new arms (the recurring ADR-0028/0029 gotcha).
- The DSL `data::INSERT` / `data::UPDATE` / `data::DELETE`
  CommandNodes' shapes become `Choice(SQL_shape, DSL_shape)`
  with the SQL branch mode-gated via a no-op leading validator
  (R1).
- Three new walker diagnostics with three new catalog keys
  (`diagnostic.insert_arity_mismatch`,
  `diagnostic.auto_column_overridden`,
  `diagnostic.not_null_missing`).
- Three new `Request` variants
  (`RunSqlInsert`/`RunSqlUpdate`/`RunSqlDelete`) and three new
  worker handlers
  (`do_sql_insert`/`do_sql_update`/`do_sql_delete`).
- The persistence write-through path gains per-DML-kind entry
  points; the cascade-affected re-persistence in `do_sql_delete`
  is new (cascade-affected child tables are also re-persisted).
- `RETURNING` lands a write-statement-returns-rows code path
  the project hasn't had before. Worker handlers either return
  affected-row count or a `DataResult` keyed by the
  `returning: bool` flag.
- The cascade-summary formatter is reused between the DSL and
  SQL DELETE paths; the predicate-extraction byte-range
  injection (§7) is new code with the R2 mitigation.
- `shortid` auto-fill code in `do_sql_insert` mirrors the DSL
  `do_insert` mechanism but takes a `listed_columns` slice from
  the parse rather than a typed `Command::Insert`.
- The auto-snapshot machinery (ADR-0006) is unchanged but
  fires for SQL DML the same way it does for DSL DML — the
  snapshot trigger lives in the worker handlers.
- All Phase-2 tests stay green (the SQL grammar pieces this ADR
  builds on are unchanged).
- The DSL-vs-SQL dispatch mechanism (3a) is a foundation other
  projects in this codebase may need (e.g., Phase 4 DDL). Land
  it generically so it's reusable.

## Initial DA review

A first-pass DA review of this ADR's draft surfaced ten
critiques. The seven that surfaced real gaps were resolved in
the body before this status moved to Proposed:

1. OOS-4 (WITH-prefixed DML) under-justified → R4 risk
   describes the mechanism gap concretely.
2. Dispatch mechanism hand-waved → §2's "Mode-gating
   mechanism" subsection enumerates three options and commits
   to the validator approach with R1 as the mitigation if it
   fails.
3. RETURNING execution hand-waved → §5 commits to a
   `returning: bool` flag on the Command variants.
4. Cascade-summary code-sharing claim too optimistic → §7's
   "Predicate extraction" subsection details the byte-range
   injection mechanism; R2 budgets for the
   performance-pathology case.
5. `excluded` keyword's engine-neutrality status → §9 carves
   it out explicitly as a justified non-neutral choice.
6. Multi-row INSERT arity diagnostic emit pattern unspecified
   → §8.1 clarifies "each offending row emits its own
   diagnostic" plus the INSERT…SELECT span.
7. Sub-phase ordering put dispatch (the riskiest piece) last
   → reordered so 3a lands the mechanism FIRST on a no-op
   shape.

The remaining three critiques were either process points (DA
gates at sub-phase boundaries — added per-sub-phase in
Implementation notes) or already-covered (verification matrix —
will live in the plan doc, matching Phase 2's pattern).

Per CLAUDE.md / the Phase 2 process lesson, the DA review of
THIS ADR will get a second pass before status moves to
Accepted. Pin: do not rubber-stamp.

## Open implementation risks

Recorded explicitly so the implementation plan can budget for
them, not to allow silent drift later.

### R1. Mode-gated `Choice` branch mechanism unproven

§2's option (a) — a no-op validator at the head of the SQL
shape's Seq, rejecting in Simple mode — is plausible but
untested. If the walker's `walk_seq` commits the Seq before the
validator runs, the SQL branch's NoMatch becomes Failed and
the Choice can't fall through to the DSL branch.

**Mitigation:** sub-phase 3a lands the mechanism end-to-end on
the smallest viable shape (a single `SqlNop` CommandNode whose
shape is `Choice(SQL_branch[mode-gated-validator + token],
DSL_branch[different-token])`) before any DML grammar lands. If
the validator approach doesn't work, the alternatives (a
mode-aware `Node::Choice` variant, or split-CommandNode
dispatch) get evaluated before sub-phases 3b/3c/3d depend on
the mechanism.

### R2. Cascade-summary predicate extraction

§7 says the worker re-injects the WHERE-clause source bytes into
pre-count subqueries. This is straightforward when the WHERE
clause is a top-level predicate, but a future user might write
WHERE clauses that themselves contain subqueries (e.g.,
`DELETE FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE …)`).
The byte-range injection still works structurally, but a
performance pathology is possible if the subquery is expensive
and the cascade child count is large (the same subquery runs N+1
times).

**Mitigation:** sub-phase 3c benchmarks the pre-count overhead
on a representative schema before declaring the path acceptable.
If the overhead is prohibitive, fall back to "report cascade
count post-execution only" (less informative, but cheaper).

### R3. RETURNING result-column type recovery on multi-row INSERT

§5 says result columns get the same column-origin type recovery
SELECT uses. For a single-row INSERT … RETURNING this works
identically to a SELECT. For multi-row INSERT, the engine emits
one result row per inserted row; the column-origin metadata is
the same for all (per ADR-0032 Amendment 1, column-origin is a
property of the prepared statement, not the rows).

No risk here — calling it out so the implementer doesn't
re-derive the question. The empty-table type-recovery test
already pins this invariant.

### R4. `INSERT … SELECT` with a CTE row source

The Phase-2 SELECT grammar already admits `WITH … SELECT …` at
the compound level (ADR-0032 §10.3 / Amendment 2). The
`SQL_SELECT_COMPOUND` recursion point §1 uses pulls this in for
free. `INSERT INTO archive WITH t AS (…) SELECT * FROM t` parses
and runs.

A user might reasonably expect WITH-prefixed DML to also work
(see §13 OOS-4). That's explicitly out of scope for Phase 3;
the discrepancy is documented here so the implementer doesn't
need to figure it out from the grammar shape.

## Implementation notes

Phased sub-plan with **explicit exit gates per sub-phase**. The
implementation plan doc (`docs/plans/<date>-adr-0033-phase-3.md`,
modelled on `docs/plans/20260520-adr-0032-phase-2.md`) refines
these and adds the cross-cut verification matrix in 3i.

**Ordering rationale.** The dispatch mechanism (3a) is the
foundation; if it fails to work, every later sub-phase has to
be reworked. Land it first on a no-op shape, then build the DML
grammar on top.

### 3a — `Node::Guard` + mode-gated Choice mechanism

**Scope (in):**
- Add a new walker node variant
  `Node::Guard(fn(&WalkContext) -> Result<(), ValidationError>)`
  (per §2 mechanism caveat) — zero-byte consumption, fails the
  enclosing Seq with `ValidationFailed` when the validator
  returns Err.
- Implement `walk_guard` in `src/dsl/walker/driver.rs`.
- Add the `reject_sql_in_simple_mode` guard function.
- Build a smoke-test grammar: an experimental `CommandNode`
  with `shape: Choice(Seq[Guard(reject_sql_in_simple_mode),
  SQL_branch_token], DSL_branch_token)`. Verify the Choice
  falls through cleanly in Simple mode (DSL wins) and routes
  SQL-first in Advanced mode.

**Scope (out):**
- Any real DML grammar (3b onwards).
- Wiring the existing `data::INSERT/UPDATE/DELETE` shapes (3j).

**Exit gate (required green tests):**
- `Node::Guard` admits via `walk_guard`; the walker driver's
  match arm exists and is tested.
- Smoke-test grammar in Simple mode → DSL branch matches.
- Smoke-test grammar in Advanced mode + SQL-shape input → SQL
  branch matches.
- Smoke-test grammar in Simple mode + SQL-shape input →
  `ValidationFailed` with `message_key:
  "advanced_mode.sql_in_simple"`.
- Smoke-test grammar in Advanced mode + DSL-shape input →
  DSL branch matches (Choice fall-through works even when SQL
  branch is admissible).
- All Phase-2 tests (1446 baseline) still green.
- Clippy clean.

**DA gate (written):**
- Does the Guard actually fire BEFORE the Seq commits? Failure
  mode: the Seq processes the Guard, the Guard fails, but the
  Seq has already "committed" to its branch and the Choice
  can't fall through. Test specifically: Advanced mode + an
  input that the DSL branch matches but the SQL branch's
  remaining tokens reject → must fall through to DSL, not
  Failed.
- Is the Guard mechanism reusable beyond DML (e.g., for
  Phase 4 DDL's similar dispatch)? If yes, document the
  pattern in `src/dsl/grammar/mod.rs` doc comments so the
  next phase doesn't reinvent it.
- If the Guard mechanism doesn't work (R1's worst case), the
  sub-phase REOPENS with mechanism option (b) or (c)
  evaluated before any DML grammar lands on top of broken
  scaffolding.

### 3b — INSERT grammar + minimal execution

**Scope (in):**
- `SQL_INSERT_SHAPE`: single-row + multi-row VALUES; the
  optional `(column_name_list)`; the `__rdbms_*` rejection on
  the table-name slot.
- `Command::SqlInsert { sql, source, target_table,
  listed_columns, returning: false }` — `returning` always
  false in 3b.
- `Request::RunSqlInsert` + `do_sql_insert` worker handler
  (no shortid auto-fill yet — explicit values required).
- Persistence write-through on the target table after
  execution.
- History-log of the literal submitted line.

**Scope (out):**
- `INSERT … SELECT` (3c).
- `RETURNING` (3e).
- `UPSERT` (3f).
- `shortid` auto-fill (3d).
- Diagnostics (3g).
- DSL-vs-SQL dispatch wiring on `data::INSERT` (3h —
  development uses a separate `data::SqlInsert` CommandNode).

**Exit gate (required green tests):**
- **Single-row INSERT** runs end-to-end; affected-row count
  surfaces; CSV is re-persisted.
- **Multi-row INSERT** (`INSERT INTO t VALUES (1, 'a'), (2, 'b')`)
  runs; both rows land; CSV reflects them.
- **`(column_name_list)`** variant works (`INSERT INTO t (a, b)
  VALUES (1, 'x')`).
- **`__rdbms_*` rejection** at the INSERT target-table slot
  (per ADR-0030 §6 / ADR-0032 §1).
- **History line** is the literal submitted SQL.
- All Phase-2 tests (1446 baseline) still green.

**DA gate (written):**
- Does an INSERT failing mid-multi-row leave persistence in a
  consistent state? (Engine transactional behaviour applies;
  worker must not re-persist on failure.)
- Does the `target_table` extraction handle case-folding the
  same way the rest of the codebase does?
- Are the existing DSL INSERT tests still green? Any DSL-side
  regression means 3b broke something it shouldn't have.

### 3c — INSERT … SELECT

**Scope (in):**
- Add the row-source `Choice` to `SQL_INSERT_SHAPE`:
  `values_clause | Subgrammar(&sql_select::SQL_SELECT_COMPOUND)`.
- Update `do_sql_insert` to handle the SELECT-driven path
  (no special worker work needed beyond preparing the
  composite statement; the engine handles the
  insert-from-query flow).

**Exit gate:**
- `INSERT INTO archive SELECT * FROM orders WHERE …` runs.
- `INSERT INTO target (a, b) SELECT x, y FROM source` runs.
- `INSERT INTO t WITH x AS (…) SELECT * FROM x` runs
  (R4 invariant — Phase-2 CTE prefix on SELECT applies).
- Schema-existence diagnostics fire on the SELECT's projection
  (Phase-2 machinery, free).

**DA gate (written):**
- Does the WITH-prefixed SELECT row source actually parse
  through SQL_SELECT_COMPOUND, or does the INSERT prefix
  break the recursion? Test on a real `WITH … INSERT … SELECT`.
- Does the cascade-summary path (3e DELETE) accidentally apply
  to INSERTs whose SELECT touches the target table? (It must
  not — INSERTs have no cascade.)

### 3d — `shortid` auto-fill (worker)

**Scope (in):**
- Detect omitted `shortid` columns at execution time in
  `do_sql_insert`.
- Synthesise fresh shortid values; bind them positionally.
- Cover both VALUES and SELECT row sources (for SELECT, the
  shortid columns are added to the projection's bind list
  before execution).

**Exit gate:**
- `INSERT INTO t (name) VALUES ('x')` where `t` has a `pk
  shortid` column auto-fills that PK.
- An explicit value for a shortid column is respected
  (override behaviour — warning fires per 3g but the value
  is preserved).
- Multi-row INSERT produces distinct shortids per row.
- INSERT … SELECT auto-fills shortids for each yielded row.

**DA gate (written):**
- Are the synthesised shortids actually unique across a
  multi-row INSERT? (The generator should be called per-row,
  not once.)
- Does the auto-fill respect the column's case-preserved name?
- What happens if the table has both `serial` and `shortid`
  PK columns? (Spec: serial via SQLite rowid; shortid via
  worker post-fill. Verify no double-fill collision.)

### 3e — UPDATE grammar + execution

**Scope (in):**
- `SQL_UPDATE_SHAPE`: target table, SET assignment list,
  optional WHERE.
- `Command::SqlUpdate`, `Request::RunSqlUpdate`,
  `do_sql_update`.
- Persistence write-through on the target table.

**Exit gate:**
- Single-column UPDATE with WHERE runs; affected-row count
  surfaces; CSV reflects changes.
- Multi-column UPDATE (`SET a = 1, b = 2`) runs.
- UPDATE without WHERE runs (per ADR-0030 §12 — no
  `--all-rows` guard).
- Schema-existence diagnostic fires on unknown columns in
  SET and WHERE.

**DA gate (written):**
- Does the SET assignment expression accept the full
  `sql_expr` (function calls, subqueries, CASE)? Verify on a
  representative expression.
- Does an UPDATE that touches an auto-gen column produce the
  3g auto-gen warning?

### 3f — DELETE grammar + execution + cascade summary

**Scope (in):**
- `SQL_DELETE_SHAPE`: target table, optional WHERE.
- `Command::SqlDelete`, `Request::RunSqlDelete`,
  `do_sql_delete`.
- Cascade-summary pre-count and post-execution formatting
  per §7.
- Persistence write-through on target + affected child tables
  (cascade-affected tables also need re-persistence).

**Exit gate:**
- DELETE with WHERE runs; affected-row count + cascade
  summary surface; CSVs of target AND cascade-affected
  children are re-persisted.
- DELETE without WHERE runs (no `--all-rows` rail).
- Cascade summary matches the DSL `do_delete` output on the
  same schema/data.
- The cascade-summary code path is shared (no duplicate
  formatter).

**DA gate (written):**
- R2 mitigation: does the predicate-extraction byte-range
  injection work on a WHERE containing a subquery? Test
  `DELETE FROM a WHERE id IN (SELECT id FROM b WHERE …)`.
- Does the cascade pre-count run in a way that doesn't see
  the post-DELETE state? (Order matters: pre-count BEFORE
  execute.)

### 3g — RETURNING

**Scope (in):**
- Add `RETURNING projection_list` to all three top-level
  shapes.
- Walker captures `returning: bool` on the Command variants.
- Worker handlers route to `DataResult` collection when
  `returning == true`.
- Result-column type recovery via the Phase-2 column-origin
  path.

**Exit gate:**
- `INSERT … RETURNING *` returns the inserted row(s) as a
  DataResult; affected-row count is implicit (rows.len()).
- `UPDATE … RETURNING id, new_value` returns the modified
  columns.
- `DELETE … RETURNING *` returns what was just deleted
  (BEFORE the row is gone).
- Result-column types recovered for bare-column refs
  (per ADR-0032 Amendment 1).
- The cascade summary on a `DELETE … RETURNING` is preserved
  alongside the result set (both surface to the user).

**DA gate (written):**
- Does the RETURNING result-set rendering use the same
  DataResult path SELECT uses? Or did a DML-specific copy
  sneak in?
- Does `RETURNING expr AS alias` admit aliases the same way
  the projection list does?

### 3h — UPSERT (ON CONFLICT … DO NOTHING / DO UPDATE)

**Scope (in):**
- Add `on_conflict_clause` to `SQL_INSERT_SHAPE`.
- Admit `excluded` as a table alias inside DO UPDATE
  expressions (§9's clarification).
- Walker captures the UPSERT shape; worker executes as-is
  (no special pre-processing).

**Exit gate:**
- `INSERT … ON CONFLICT (id) DO NOTHING` runs on a
  conflicting row; affected-row count is 0.
- `INSERT … ON CONFLICT (id) DO UPDATE SET v = excluded.v`
  runs; the conflicting row is updated to the new value.
- `ON CONFLICT DO NOTHING` (no target spec) handles any
  conflict.
- `excluded.col` completes / highlights as a column ref to the
  target table's columns inside the DO UPDATE expressions.

**DA gate (written):**
- Does ON CONFLICT … DO UPDATE trigger persistence
  write-through? (Yes; the target row is modified.)
- Is the `excluded` alias scoped to ONLY the DO UPDATE
  branch, not leaking into the outer INSERT?

### 3i — Diagnostics

**Scope (in):**
- `diagnostic.insert_arity_mismatch` (§8.1).
- `diagnostic.auto_column_overridden` (§8.2).
- `diagnostic.not_null_missing` (§8.3).
- Per-key positive + negative tests + catalog entries.

**Exit gate:**
- One positive + one negative test per new key (the
  ADR-0032 §2d exit gate pattern):
  - `insert_arity_mismatch`: `INSERT INTO t (a, b) VALUES
    (1, 2, 3)` fires; matched arity doesn't.
  - `auto_column_overridden`: `INSERT INTO t (id, name)
    VALUES (5, 'x')` (id is `serial`) fires; omitting `id`
    doesn't.
  - `not_null_missing`: `INSERT INTO t (a) VALUES (1)`
    (b is NOT NULL no default) fires; including `b` doesn't.
- Multi-row arity mismatch emits per-row, not just on the
  first offending row.
- The inherited Phase-2 diagnostic passes
  (`schema_existence`, `sql_predicate_warnings`,
  `compound_arity_diagnostics`) still fire on DML's
  `sql_expr` slots.

**DA gate (written):**
- Does `auto_column_overridden` produce a Warning, not an
  Error? (Spec: WARNING — the statement still runs.)
- Does `not_null_missing` produce a Warning? (Same.)
- Are the catalog wordings engine-neutral?

### 3j — Dispatch wiring on shared entry words

**Scope (in):**
- Convert `data::INSERT`, `data::UPDATE`, `data::DELETE`
  shapes to `Choice(SQL_shape, DSL_shape)`.
- Remove the experimental separate `data::SqlInsert` /
  `SqlUpdate` / `SqlDelete` CommandNodes that 3a-3i used as
  scaffolding.
- Verify DSL-only inputs still match the DSL branch in both
  modes.
- Verify SQL-only inputs match the SQL branch in Advanced
  mode and fail with "this is SQL" in Simple mode.

**Exit gate:**
- All existing DSL INSERT/UPDATE/DELETE tests still green
  (unmodified inputs, unmodified outputs).
- All Phase-3 SQL DML tests still green via the new dispatch
  path (the SQL branch is reached through `data::INSERT` etc.,
  not through `data::SqlInsert`).
- A structurally-ambiguous input (`delete from t where id =
  1`) in Advanced mode routes through the SQL branch
  (verifiable by checking that the cascade-summary code path
  fired — DSL and SQL paths share the formatter but the entry
  point differs).
- A DSL-specific input (`delete from t --all-rows`) in
  Advanced mode falls through to the DSL branch.

**DA gate (written):**
- Verify that no test was changed to make 3j pass — only the
  shape's structure changes; the observable behaviour is
  identical.
- Is the test count higher than before 3j? It should be (the
  scaffolding CommandNodes had their own tests; now the
  shared CommandNodes carry both).

### 3k — Verification sweep

**Scope (in):**
- Fill in the cross-cut verification matrix
  (`docs/plans/<date>-adr-0033-phase-3.md`).
- Produce the phase-exit verification report.
- DA final review (genuine, not rubber-stamp — see Phase 2's
  process lesson in `docs/handoff/20260520-handoff-28.md` §4).

**Exit gate:**
- Every matrix row green with a concrete `file::function`
  test reference.
- Final test run: zero failures, zero skips, no regressions
  from the Phase-2 baseline of 1446 / 0 / 1.
- Clippy clean.
- Report at `docs/handoff/<date>-phase-3-verification.md`
  documents all autonomous decisions with explicit
  user-confirmation pointers and the DA's written critiques
  (not just the verdict).

**DA gate (written):**
- Did filling the matrix surface gaps that need closing?
  (Phase 2 surfaced THREE; expect at least one for Phase 3.)
- Are any sub-phase exit gates above weaker than they should
  be in hindsight?
- Did any sub-phase make an autonomous decision that's not
  in this ADR? If so, flag for retroactive user approval
  before phase-exit.

## Out of scope (clarifications already established)

- **No engine type names** in or out (ADR-0030 §5 / §7).
- **No `STRICT`**, no SQLite storage clauses (ADR-0030 §7).
- **No transaction control** (ADR-0030 §13 OOS-4).
- **No multi-statement batches** (ADR-0030 §13 OOS-4).
- **No `EXPLAIN` of SQL DML** (ADR-0030 §13 OOS-2 — the DSL
  `explain` still wraps what it already wraps).

## See also

- ADR-0005 — the ten-type vocabulary INSERT works with.
- ADR-0006 — the auto-snapshot before destructive ops.
- ADR-0014 — the DSL DML model + cascade-summary precedent.
- ADR-0015 — the persistence write-through path.
- ADR-0018 — the DSL INSERT forms (A / B / C).
- ADR-0019 — the friendly-error layer DML errors route through.
- ADR-0027 — the validity indicator surfaces DML diagnostics.
- ADR-0030 — the SQL surface in advanced mode; Phase 3 is this
  ADR's commission.
- ADR-0031 — the SQL expression grammar, consumed by DML
  WHERE / SET / VALUES / RETURNING.
- ADR-0032 — the SQL SELECT grammar; `INSERT … SELECT` and
  RETURNING reuse its projection-list and scope-frame machinery.