rdbms-playground/docs/adr/0030-advanced-mode-sql-surface.md

# ADR-0030: Advanced mode — the standard-SQL surface

## Status

Accepted

## Context

ADR-0003 split the input field into two modes. **Simple mode**
(the default) takes the teaching DSL; **advanced mode** was
specified to take "raw SQL, including DDL and queries". The DSL
half is fully built (ADR-0009, ADR-0023/0024, and everything
since); advanced mode is still a **placeholder** — a submitted
line is echoed back unexecuted.

Requirement `Q1` commits to a *defined* SQL subset, `Q2` to
rejecting out-of-subset syntax clearly, `Q4` is the subset
specification — this ADR. Two constraints shape every decision
below; both come from how this project already works.

1. **The engine is an implementation detail.** ADR-0002
   established that the database product is never named in
   user-facing strings. Advanced mode must *extend* that
   posture: it is a way to work with **standard SQL**, as
   independent of the storage engine as we can make it — not a
   console onto the engine. The engine's type names, its
   `STRICT` keyword, its dialect quirks, and its raw error text
   must not surface. And handing typed text straight to the
   engine would bypass the typed executor that keeps the
   internal metadata tables (ADR-0012/0013) in sync, writes
   `project.yaml` + CSV (ADR-0015), and preserves the
   playground's rich type vocabulary (ADR-0005).

2. **Assistance comes from one place.** Completion, syntax
   highlighting, hint-panel prose, the `[ERR]`/`[WRN]`
   indicator, and per-command parse-error usage all derive
   from a single **unified grammar tree** walked incrementally
   (ADR-0022/0023/0024 — explicitly "the single source of
   truth"). A *batch* SQL parser — the kind `sqlparser-rs`
   (reserved in ADR-0001) is — produces an AST and nothing
   else: it cannot say what is valid at the cursor, cannot
   drive completion, highlighting, or hints. Parsing SQL with
   such a library would leave advanced mode either *without*
   the ambient assistance the DSL has, or dependent on a
   second, parallel assistance system — both contrary to
   ADR-0023/0024.

The decision: **SQL is not parsed by a separate library. SQL
becomes additional grammar within the unified tree**, walked by
the same walker as the DSL. Advanced mode is not a different
parser — it is the same parser with more grammar unlocked.

## Decision

### 1. SQL lives in the unified grammar tree

SQL statements are authored as `CommandNode` / `Node` grammar
in the ADR-0024 tree and parsed by the existing walker. The
consequence is the whole point: completion, highlighting,
hint prose, the validity indicator, and parse-error usage
**work for SQL exactly as for the DSL, for free**, because
they are all walker outputs (§8).

`sqlparser-rs` is therefore *not* used as the parser;
ADR-0001's reservation of it is superseded. (An implementer
may retain it narrowly as a test oracle — parse the same SQL,
compare — but it is not on the execution path.)

The honest cost: the supported SQL is exactly what we author
into the tree — we are, in effect, writing a SQL grammar. This
is the project's largest single feature to date. The target is
the full teaching-relevant standard-SQL surface (§3); scope is
cut only on *demonstrated* difficulty, as a deliberate
escalation to the user, never silently.

### 2. Mode gates the grammar

There is one grammar tree. **Simple mode** exposes the DSL
subset of it; **advanced mode** additionally exposes the SQL
forms.

- Shared entry words — `create`, `drop`, `insert`, `update`,
  `delete` — carry both a DSL form and a SQL form as `Choice`
  branches under one `CommandNode` (mechanically how `add`
  already holds four sub-commands today). `select` is a new,
  SQL-only entry word.
- SQL branches are mode-tagged; the walker presents the
  DSL-only view in simple mode and the full view in advanced.
- The `:` one-shot escape and `mode advanced` unlock the SQL
  view for a line / persistently — unchanged from ADR-0003.
- Because the grammar *knows* a node is SQL (it is tagged,
  merely gated), a simple-mode line that matches a gated SQL
  form yields a precise hint — "this is SQL; switch with `mode
  advanced`, or prefix the line with `:`" — rather than a
  generic parse error. This satisfies `M1`'s "recognised as
  SQL" promise.

The DSL stays usable in advanced mode (the superset rule):
nothing a learner already knows stops working.

### 3. The supported SQL surface (`Q4`)

The target is the teaching-relevant standard-SQL surface,
authored into the tree with **no pre-emptive cuts**:

- **`SELECT`** — the full query surface: projection, `WHERE`,
  inner/outer `JOIN`s, `GROUP BY` / `HAVING`, aggregate
  functions, `ORDER BY`, `LIMIT` / `OFFSET`, scalar and
  correlated subqueries, `UNION` / `INTERSECT` / `EXCEPT`, and
  common table expressions (`WITH`).
- **`INSERT`** (single- and multi-row), **`UPDATE`**,
  **`DELETE`**.
- **`CREATE` / `DROP` / `ALTER TABLE`**, **`CREATE` / `DROP
  INDEX`**.
- A **SQL expression grammar** — arithmetic, function calls,
  `CASE`, the comparison / `LIKE` / `IN` / `BETWEEN` / `IS
  NULL` predicate set, subquery expressions — the superset of
  ADR-0026's `WHERE` grammar, shared by `WHERE`, `HAVING`,
  `CHECK`, `SELECT` projections, and `DEFAULT`.

Out of the surface: views, triggers, transaction control
(`BEGIN`/`COMMIT`/…), `PRAGMA`, `ATTACH`/`DETACH`, `VACUUM`,
virtual tables, multi-statement batches. One statement per
submission; a trailing `;` is tolerated.

The **SQL expression grammar** and the **full `SELECT`
grammar** are each large enough to warrant their own focused
ADR when implemented — the precedent is ADR-0026 for the
`WHERE` grammar. ADR-0030 fixes the *architecture*; those
ADRs fix the detailed grammar.

### 4. Execution — DDL through `Command`, DML and `SELECT` as validated SQL

The walker parsing a SQL statement yields a matched parse.
From it:

- **DDL** → a `Command` (`CreateTable`, `DropTable`,
  `AddColumn`, `AddConstraint`, `AddIndex`, …). DDL *must* run
  the typed executor, because that is what keeps the metadata
  tables, the playground type vocabulary, and `STRICT` intact.
  The `CommandNode`'s `ast_builder` is the SQL → `Command`
  translator.
- **DML and `SELECT`** → executed as the **validated SQL
  itself** (re-rendered canonically from the matched parse, or
  the validated original text). They change no schema, so
  modelling them as a typed `Command` buys nothing. For DML
  the worker — knowing the statement kind and target table
  from the parse — runs the statement and re-persists that
  table's CSV; `SELECT` is read-only, run and rendered (§6).

This split is also what makes advanced mode genuinely *full*.
Because DML / `SELECT` / `CHECK` expressions are **not**
lowered into the DSL's deliberately-limited `Expr` (ADR-0026),
advanced mode delivers the full SQL expression surface —
arithmetic, functions, subqueries, nested boolean operands —
that `docs/simple-mode-limitations.md` records as the inverse
of the simple-mode subset. The DSL `Expr` is the *DSL's*
representation; the SQL surface does not round-trip through it.

> **Forward note (2026-05-26, ADR-0036 Accepted).** ADR-0036 **augments**
> the "DML → validated SQL text" half of this section — it does **not**
> change the execution model. Advanced-mode DML still executes verbatim;
> what's added is that "validated" now also means *value*-validated: the
> **literal data values** in `INSERT`/`UPDATE` are checked against the
> playground type system (and retained for error reporting) **before** the
> verbatim statement runs, sharing the DSL's per-type validators. No
> binding, no reconstruction, no command-identity change (ADR-0033 Am3
> stands). The `SELECT` half and the full-expression-surface rationale
> above **stand**: expressions and queries remain verbatim text (ADR-0026's
> limited `Expr` is *not* imposed on SQL). Dividing line: a static literal
> value (validate it) vs an engine-evaluated expression-or-query.

### 5. Type vocabulary — the playground's, not the engine's

Advanced-mode DDL uses the playground's own ten-type
vocabulary (ADR-0005). There is **no fallback to engine
storage types**: a column created in advanced mode is a
first-class `serial` / `decimal` / `date` / … exactly as a
DSL-created one, with the same metadata row.

The type-name slot accepts the playground keywords directly
(`text`, `int`, `real`, `decimal`, `bool`, `date`,
`datetime`, `blob`, `serial`, `shortid`) and standard-SQL
aliases that map onto them — `integer`/`smallint`/`bigint` →
`int`; `varchar`/`char` → `text`; `boolean` → `bool`;
`timestamp` → `datetime`; `numeric` → `decimal`;
`float`/`double precision` → `real`; `binary`/`varbinary` →
`blob`. A length / precision argument (`varchar(255)`) is
accepted and ignored — the playground's types are
unparameterised. The engine's own type names are an internal
mapping and are neither accepted as input nor shown.

### 6. `SELECT` — the read-only query path

`SELECT` touches no metadata, no persistence, no types. It is
carried as `Command::Select` holding the validated SQL; the
worker (`Request::RunSelect`) prepares and runs it, producing
the existing `DataResult`, which renders through the existing
data-table renderer (the one `show data` uses, ADR-0016).
Columns that carry no playground type — computed expressions,
joined columns — render with neutral alignment; the result is
capped like `show data`, with `LIMIT` suggested for large
outputs. A reference to an internal `__rdbms_*` table is
rejected by the grammar (those tables are not in scope).

### 7. Engine neutrality

- **No engine type names** in or out (§5).
- **No `STRICT`**, no storage options. `STRICT` is applied
  internally by `do_create_table`; the user neither writes nor
  sees it. It is simply not part of the authored grammar, so
  typing it is an ordinary parse error — not a SQLite feature
  surfaced to the learner.
- **Engine-neutral errors.** SQL parse errors, out-of-subset
  refusals, and execution failures all route through the
  friendly-error layer (ADR-0019); the engine's raw message
  and product name never appear.
- **Honest limitation.** The grammar enforces the *structural*
  subset exactly. *Expression-level* neutrality is best-effort:
  an exotic engine-specific function the grammar admits and the
  engine then rejects surfaces an engine-neutral error rather
  than being caught up front. A function allowlist is a
  possible future hardening (§13).

### 8. Ambient assistance comes for free

Because SQL is grammar in the unified tree (§1), the walker
gives SQL — with no SQL-specific assistance code — the same as
the DSL:

- **Syntax highlighting** of SQL keywords, identifiers,
  literals.
- **Tab completion** of SQL keywords, and of schema names
  (tables, columns) drawn from the same `SchemaCache` the DSL
  completion already uses.
- **Hint-panel prose** at each grammar slot.
- The **`[ERR]`/`[WRN]` validity indicator** (ADR-0027).
- **Per-command parse-error usage** (ADR-0021).

This is the reason for §1: assistance and a batch parser are
incompatible; assistance and the unified grammar tree are the
same thing.

### 9. Parse errors and the unsupported surface (`Q2`)

A construct not in the authored grammar is an ordinary walker
parse error; the ADR-0021 per-command usage machinery and the
ADR-0027 indicator apply, with engine-neutral wording. There
is no separate "valid SQL but unsupported" classifier — that
would require the batch parser §1 dropped; the walker's
expected-set drives the message instead.

### 10. The DSL → SQL teaching bridge

When a **DSL** command runs **in advanced mode**, its output
includes the equivalent SQL — so a learner who knows the
simple-mode form reads off how to express it in SQL.

- It is a `Command` → SQL renderer: the inverse of §4's DDL
  translator.
- It fires only for commands entered via the DSL form, and
  only in advanced mode (a command the user already typed as
  SQL is not echoed back; simple mode is left uncluttered).
- It renders as a distinct, de-emphasised output line beneath
  the `[ok]` summary, using the `OutputLine` styled-runs
  mechanism (ADR-0028).
- App-level commands have no SQL form and are not echoed.

### 11. Persistence, metadata, history, replay

- **DDL** → `Command` → the typed executor, so `project.yaml`,
  the metadata tables, and `history.log` stay correct with no
  new code (§4).
- **DML** → the worker re-persists the affected table's CSV
  after running the statement.
- **`history.log`** records the **literal submitted line** — a
  statement typed as SQL is logged as that SQL. The replay
  format is therefore app-enterable syntax, no divergence.
- **Replay** re-runs each log line through the one walker with
  the advanced view active, so a project whose history mixes
  DSL and SQL replays faithfully.
- **`project.yaml`** stays a structured schema snapshot; its
  embedded expressions (a column `CHECK`) are stored as SQL
  the user could re-enter in advanced mode — one syntax, not a
  third.

### 12. Safety in advanced mode

Advanced mode carries **fewer rails** by design. The DSL's
`WHERE`-or-`--all-rows` guard on `update`/`delete` (ADR-0014)
is a simple-mode teaching aid; a SQL `DELETE FROM t` with no
`WHERE` executes as written. The safety net is the
auto-snapshot before destructive operations (ADR-0006), which
fires regardless of which surface produced the statement; the
mode's visual distinction (ADR-0003) is the user's signal
until then.

### 13. Out of scope

- **OOS-1.** `CREATE VIEW` / `TRIGGER`. Views are anticipated
  by the items panel's design (`S2`) but need their own model.
- **OOS-2.** `EXPLAIN` of advanced-mode SQL queries. The DSL
  `explain` (ADR-0028) still works for what it already wraps.
  **(Superseded by ADR-0039, 2026-05-27 — this was a *deferred*
  scope exclusion, not a principled rejection; EXPLAIN over advanced
  SQL is now in scope, as a deferred follow-up.)**
- **OOS-3.** A function/expression allowlist for full
  expression-level engine neutrality (§7) — best-effort now.
- **OOS-4.** Multi-statement batches and transaction control.
- **OOS-5.** A SQL → DSL echo (the reverse of §10).

## Consequences

- The unified grammar tree gains a large body of SQL grammar.
  The `Node` taxonomy and the walker may need extension to
  carry it (e.g. deeper recursion for subqueries / CTEs) — a
  known risk, addressed per phase.
- `sqlparser-rs` is **not** adopted as the parser; ADR-0001's
  reservation is superseded. `Q1`'s wording ("SQL parsed via
  `sqlparser-rs`") is superseded — SQL is parsed by the
  unified walker.
- `Command` gains a `Select` variant; every exhaustive `match
  Command` gains an arm (the recurring ADR-0028/0029 gotcha).
- The `Database` worker gains a `RunSelect` request and a
  "run validated DML, re-persist the table" request; DDL
  reuses the existing typed requests unchanged.
- Mode-gating is added to the grammar / walker.
- The metadata, persistence, and type machinery is reused
  unchanged for DDL — the payoff of routing DDL through
  `Command`.
- This is the project's largest single feature so far. The
  phased plan keeps each step independently shippable;
  scope-cutting, if a slice proves disproportionate, is an
  explicit escalation, never a silent trim.
- `Q4` is satisfied by this ADR; `Q1` / `Q2` are unblocked and
  reframed around the unified walker; `M1` gains its
  "recognised as SQL" hint.

## Implementation notes

Phased; each phase independently shippable and test-guarded.
The two large grammar slices each warrant their own focused
ADR when taken up (ADR-0026-style).

1. **Foundations + first `SELECT`.** Mode-gate the grammar
   (advanced unlocks the SQL nodes). Author the core SQL
   **expression grammar** — the ADR-0026 superset — as its own
   ADR. A single-table `SELECT` (projection, `WHERE`, `ORDER
   BY`, `LIMIT`) as a SQL `CommandNode` → `Command::Select` →
   worker `RunSelect` → the existing renderer. Replace the
   placeholder echo; add the simple-mode "this is SQL" hint.
   This proves the path end-to-end *with full walker
   assistance*.
2. **`SELECT` — full.** `JOIN`s, `GROUP BY`/`HAVING`,
   aggregates, subqueries, `UNION`, CTEs. The big grammar
   phase — its own ADR.
3. **DML.** `INSERT` / `UPDATE` / `DELETE` grammar; the
   execute-as-validated-SQL path; the worker re-persist step;
   settle multi-row `INSERT` and `shortid` auto-fill on a SQL
   `INSERT`.
4. **DDL.** `CREATE` / `DROP` / `ALTER TABLE`, `CREATE` /
   `DROP INDEX` grammar → `Command`; the §5 type-name map; FK
   clauses → `AddRelationship`; may land table-rename (`C1`).
5. **The DSL → SQL teaching echo** (§10).
6. **Polish.** `help sql`; an engine-neutral error sweep;
   typing-surface / matrix coverage; the `DOC1` SQL-surface
   reference page.

## See also

- ADR-0001 — reserved `sqlparser-rs`; that reservation is
  superseded here (§1).
- ADR-0002 — the engine is an implementation detail; "no
  engine name in user-facing strings" — §7 extends it.
- ADR-0003 — the simple / advanced mode model this builds on.
- ADR-0005 — the ten-type vocabulary advanced DDL uses (§5).
- ADR-0009 — the DSL conventions; the DSL stays usable in
  advanced mode.
- ADR-0012 / ADR-0013 — the metadata tables the `Command` core
  keeps in sync, inherited for free (§4, §11).
- ADR-0014 — the data-operation model and the `--all-rows`
  guard advanced mode deliberately relaxes (§12).
- ADR-0015 — persistence write-through and replay, reused and
  made surface-agnostic (§11).
- ADR-0016 — the data-table renderer `SELECT` results reuse
  (§6).
- ADR-0019 — the friendly-error layer all SQL errors route
  through (§7, §9).
- ADR-0021 — per-command parse-error usage, free for SQL (§9).
- ADR-0022 — ambient typing assistance; §8 is its extension to
  SQL.
- ADR-0023 / ADR-0024 — the unified grammar tree SQL becomes
  part of (§1, §2).
- ADR-0026 — the `WHERE` expression grammar the SQL expression
  grammar is the superset of (§3).
- ADR-0027 — the validity indicator, free for SQL (§8).
- ADR-0028 — the `OutputLine` styled-runs the teaching echo
  uses (§10).