Files
rdbms-playground/docs/adr/0030-advanced-mode-sql-surface.md
T
claude@clouddev1 9f15f386d5 docs(adr): design the DSL→SQL teaching echo (ADR-0038) + dependencies
Realises ADR-0030 §10 (the DSL→SQL teaching bridge) as a /runda'd design
set, before implementation:

- ADR-0037 (new): execution-time mode side-channel — SubmissionMode
  {Simple, Advanced, AdvancedOneShot} threaded Action→worker, output-only;
  redeems ADR-0033 Amendment 3's deferred follow-up. Replay stays silent.
- ADR-0038 (new): the teaching echo + full catalogue (Buckets A/B/C),
  the copy-paste round-trip contract, the three-category framework, and
  the Value→SQL-literal renderer. DDL + show-data centric (overlapping
  DML is SQL-first, so already SQL). Build-order deps recorded.
- ADR-0035 Amendment 2: standard-first dialect stance + ALTER COLUMN
  SET/DROP NOT NULL, SET/DROP DEFAULT, ISO SET DATA TYPE gap-fill.
- ADR-0033 Amendment 4: reclassifies the `update … --all-rows`
  non-fall-back as a bug; it now falls back to the DSL Update and echoes
  (keyed on adjacent `--`; spaced arithmetic preserved).
- ADR-0039 (new): EXPLAIN over advanced SQL — decision recorded, build
  deferred; supersedes ADR-0030 §13 OOS-2.
- ADR-0000: out-of-scope discipline (deferred vs rejected). README index
  updated for all of the above.

Reconcile CLAUDE.md: simple-mode column ops are implemented, not pending
(requirements.md C2/B2 already [x]).
2026-05-27 20:44:38 +00:00

399 lines
18 KiB
Markdown

# ADR-0030: Advanced mode — the standard-SQL surface
## Status
Accepted
## Context
ADR-0003 split the input field into two modes. **Simple mode**
(the default) takes the teaching DSL; **advanced mode** was
specified to take "raw SQL, including DDL and queries". The DSL
half is fully built (ADR-0009, ADR-0023/0024, and everything
since); advanced mode is still a **placeholder** — a submitted
line is echoed back unexecuted.
Requirement `Q1` commits to a *defined* SQL subset, `Q2` to
rejecting out-of-subset syntax clearly, `Q4` is the subset
specification — this ADR. Two constraints shape every decision
below; both come from how this project already works.
1. **The engine is an implementation detail.** ADR-0002
established that the database product is never named in
user-facing strings. Advanced mode must *extend* that
posture: it is a way to work with **standard SQL**, as
independent of the storage engine as we can make it — not a
console onto the engine. The engine's type names, its
`STRICT` keyword, its dialect quirks, and its raw error text
must not surface. And handing typed text straight to the
engine would bypass the typed executor that keeps the
internal metadata tables (ADR-0012/0013) in sync, writes
`project.yaml` + CSV (ADR-0015), and preserves the
playground's rich type vocabulary (ADR-0005).
2. **Assistance comes from one place.** Completion, syntax
highlighting, hint-panel prose, the `[ERR]`/`[WRN]`
indicator, and per-command parse-error usage all derive
from a single **unified grammar tree** walked incrementally
(ADR-0022/0023/0024 — explicitly "the single source of
truth"). A *batch* SQL parser — the kind `sqlparser-rs`
(reserved in ADR-0001) is — produces an AST and nothing
else: it cannot say what is valid at the cursor, cannot
drive completion, highlighting, or hints. Parsing SQL with
such a library would leave advanced mode either *without*
the ambient assistance the DSL has, or dependent on a
second, parallel assistance system — both contrary to
ADR-0023/0024.
The decision: **SQL is not parsed by a separate library. SQL
becomes additional grammar within the unified tree**, walked by
the same walker as the DSL. Advanced mode is not a different
parser — it is the same parser with more grammar unlocked.
## Decision
### 1. SQL lives in the unified grammar tree
SQL statements are authored as `CommandNode` / `Node` grammar
in the ADR-0024 tree and parsed by the existing walker. The
consequence is the whole point: completion, highlighting,
hint prose, the validity indicator, and parse-error usage
**work for SQL exactly as for the DSL, for free**, because
they are all walker outputs (§8).
`sqlparser-rs` is therefore *not* used as the parser;
ADR-0001's reservation of it is superseded. (An implementer
may retain it narrowly as a test oracle — parse the same SQL,
compare — but it is not on the execution path.)
The honest cost: the supported SQL is exactly what we author
into the tree — we are, in effect, writing a SQL grammar. This
is the project's largest single feature to date. The target is
the full teaching-relevant standard-SQL surface (§3); scope is
cut only on *demonstrated* difficulty, as a deliberate
escalation to the user, never silently.
### 2. Mode gates the grammar
There is one grammar tree. **Simple mode** exposes the DSL
subset of it; **advanced mode** additionally exposes the SQL
forms.
- Shared entry words — `create`, `drop`, `insert`, `update`,
`delete` — carry both a DSL form and a SQL form as `Choice`
branches under one `CommandNode` (mechanically how `add`
already holds four sub-commands today). `select` is a new,
SQL-only entry word.
- SQL branches are mode-tagged; the walker presents the
DSL-only view in simple mode and the full view in advanced.
- The `:` one-shot escape and `mode advanced` unlock the SQL
view for a line / persistently — unchanged from ADR-0003.
- Because the grammar *knows* a node is SQL (it is tagged,
merely gated), a simple-mode line that matches a gated SQL
form yields a precise hint — "this is SQL; switch with `mode
advanced`, or prefix the line with `:`" — rather than a
generic parse error. This satisfies `M1`'s "recognised as
SQL" promise.
The DSL stays usable in advanced mode (the superset rule):
nothing a learner already knows stops working.
### 3. The supported SQL surface (`Q4`)
The target is the teaching-relevant standard-SQL surface,
authored into the tree with **no pre-emptive cuts**:
- **`SELECT`** — the full query surface: projection, `WHERE`,
inner/outer `JOIN`s, `GROUP BY` / `HAVING`, aggregate
functions, `ORDER BY`, `LIMIT` / `OFFSET`, scalar and
correlated subqueries, `UNION` / `INTERSECT` / `EXCEPT`, and
common table expressions (`WITH`).
- **`INSERT`** (single- and multi-row), **`UPDATE`**,
**`DELETE`**.
- **`CREATE` / `DROP` / `ALTER TABLE`**, **`CREATE` / `DROP
INDEX`**.
- A **SQL expression grammar** — arithmetic, function calls,
`CASE`, the comparison / `LIKE` / `IN` / `BETWEEN` / `IS
NULL` predicate set, subquery expressions — the superset of
ADR-0026's `WHERE` grammar, shared by `WHERE`, `HAVING`,
`CHECK`, `SELECT` projections, and `DEFAULT`.
Out of the surface: views, triggers, transaction control
(`BEGIN`/`COMMIT`/…), `PRAGMA`, `ATTACH`/`DETACH`, `VACUUM`,
virtual tables, multi-statement batches. One statement per
submission; a trailing `;` is tolerated.
The **SQL expression grammar** and the **full `SELECT`
grammar** are each large enough to warrant their own focused
ADR when implemented — the precedent is ADR-0026 for the
`WHERE` grammar. ADR-0030 fixes the *architecture*; those
ADRs fix the detailed grammar.
### 4. Execution — DDL through `Command`, DML and `SELECT` as validated SQL
The walker parsing a SQL statement yields a matched parse.
From it:
- **DDL** → a `Command` (`CreateTable`, `DropTable`,
`AddColumn`, `AddConstraint`, `AddIndex`, …). DDL *must* run
the typed executor, because that is what keeps the metadata
tables, the playground type vocabulary, and `STRICT` intact.
The `CommandNode`'s `ast_builder` is the SQL → `Command`
translator.
- **DML and `SELECT`** → executed as the **validated SQL
itself** (re-rendered canonically from the matched parse, or
the validated original text). They change no schema, so
modelling them as a typed `Command` buys nothing. For DML
the worker — knowing the statement kind and target table
from the parse — runs the statement and re-persists that
table's CSV; `SELECT` is read-only, run and rendered (§6).
This split is also what makes advanced mode genuinely *full*.
Because DML / `SELECT` / `CHECK` expressions are **not**
lowered into the DSL's deliberately-limited `Expr` (ADR-0026),
advanced mode delivers the full SQL expression surface —
arithmetic, functions, subqueries, nested boolean operands —
that `docs/simple-mode-limitations.md` records as the inverse
of the simple-mode subset. The DSL `Expr` is the *DSL's*
representation; the SQL surface does not round-trip through it.
> **Forward note (2026-05-26, ADR-0036 Accepted).** ADR-0036 **augments**
> the "DML → validated SQL text" half of this section — it does **not**
> change the execution model. Advanced-mode DML still executes verbatim;
> what's added is that "validated" now also means *value*-validated: the
> **literal data values** in `INSERT`/`UPDATE` are checked against the
> playground type system (and retained for error reporting) **before** the
> verbatim statement runs, sharing the DSL's per-type validators. No
> binding, no reconstruction, no command-identity change (ADR-0033 Am3
> stands). The `SELECT` half and the full-expression-surface rationale
> above **stand**: expressions and queries remain verbatim text (ADR-0026's
> limited `Expr` is *not* imposed on SQL). Dividing line: a static literal
> value (validate it) vs an engine-evaluated expression-or-query.
### 5. Type vocabulary — the playground's, not the engine's
Advanced-mode DDL uses the playground's own ten-type
vocabulary (ADR-0005). There is **no fallback to engine
storage types**: a column created in advanced mode is a
first-class `serial` / `decimal` / `date` / … exactly as a
DSL-created one, with the same metadata row.
The type-name slot accepts the playground keywords directly
(`text`, `int`, `real`, `decimal`, `bool`, `date`,
`datetime`, `blob`, `serial`, `shortid`) and standard-SQL
aliases that map onto them — `integer`/`smallint`/`bigint` →
`int`; `varchar`/`char` → `text`; `boolean` → `bool`;
`timestamp` → `datetime`; `numeric` → `decimal`;
`float`/`double precision` → `real`; `binary`/`varbinary` →
`blob`. A length / precision argument (`varchar(255)`) is
accepted and ignored — the playground's types are
unparameterised. The engine's own type names are an internal
mapping and are neither accepted as input nor shown.
### 6. `SELECT` — the read-only query path
`SELECT` touches no metadata, no persistence, no types. It is
carried as `Command::Select` holding the validated SQL; the
worker (`Request::RunSelect`) prepares and runs it, producing
the existing `DataResult`, which renders through the existing
data-table renderer (the one `show data` uses, ADR-0016).
Columns that carry no playground type — computed expressions,
joined columns — render with neutral alignment; the result is
capped like `show data`, with `LIMIT` suggested for large
outputs. A reference to an internal `__rdbms_*` table is
rejected by the grammar (those tables are not in scope).
### 7. Engine neutrality
- **No engine type names** in or out (§5).
- **No `STRICT`**, no storage options. `STRICT` is applied
internally by `do_create_table`; the user neither writes nor
sees it. It is simply not part of the authored grammar, so
typing it is an ordinary parse error — not a SQLite feature
surfaced to the learner.
- **Engine-neutral errors.** SQL parse errors, out-of-subset
refusals, and execution failures all route through the
friendly-error layer (ADR-0019); the engine's raw message
and product name never appear.
- **Honest limitation.** The grammar enforces the *structural*
subset exactly. *Expression-level* neutrality is best-effort:
an exotic engine-specific function the grammar admits and the
engine then rejects surfaces an engine-neutral error rather
than being caught up front. A function allowlist is a
possible future hardening (§13).
### 8. Ambient assistance comes for free
Because SQL is grammar in the unified tree (§1), the walker
gives SQL — with no SQL-specific assistance code — the same as
the DSL:
- **Syntax highlighting** of SQL keywords, identifiers,
literals.
- **Tab completion** of SQL keywords, and of schema names
(tables, columns) drawn from the same `SchemaCache` the DSL
completion already uses.
- **Hint-panel prose** at each grammar slot.
- The **`[ERR]`/`[WRN]` validity indicator** (ADR-0027).
- **Per-command parse-error usage** (ADR-0021).
This is the reason for §1: assistance and a batch parser are
incompatible; assistance and the unified grammar tree are the
same thing.
### 9. Parse errors and the unsupported surface (`Q2`)
A construct not in the authored grammar is an ordinary walker
parse error; the ADR-0021 per-command usage machinery and the
ADR-0027 indicator apply, with engine-neutral wording. There
is no separate "valid SQL but unsupported" classifier — that
would require the batch parser §1 dropped; the walker's
expected-set drives the message instead.
### 10. The DSL → SQL teaching bridge
When a **DSL** command runs **in advanced mode**, its output
includes the equivalent SQL — so a learner who knows the
simple-mode form reads off how to express it in SQL.
- It is a `Command` → SQL renderer: the inverse of §4's DDL
translator.
- It fires only for commands entered via the DSL form, and
only in advanced mode (a command the user already typed as
SQL is not echoed back; simple mode is left uncluttered).
- It renders as a distinct, de-emphasised output line beneath
the `[ok]` summary, using the `OutputLine` styled-runs
mechanism (ADR-0028).
- App-level commands have no SQL form and are not echoed.
### 11. Persistence, metadata, history, replay
- **DDL** → `Command` → the typed executor, so `project.yaml`,
the metadata tables, and `history.log` stay correct with no
new code (§4).
- **DML** → the worker re-persists the affected table's CSV
after running the statement.
- **`history.log`** records the **literal submitted line** — a
statement typed as SQL is logged as that SQL. The replay
format is therefore app-enterable syntax, no divergence.
- **Replay** re-runs each log line through the one walker with
the advanced view active, so a project whose history mixes
DSL and SQL replays faithfully.
- **`project.yaml`** stays a structured schema snapshot; its
embedded expressions (a column `CHECK`) are stored as SQL
the user could re-enter in advanced mode — one syntax, not a
third.
### 12. Safety in advanced mode
Advanced mode carries **fewer rails** by design. The DSL's
`WHERE`-or-`--all-rows` guard on `update`/`delete` (ADR-0014)
is a simple-mode teaching aid; a SQL `DELETE FROM t` with no
`WHERE` executes as written. The safety net is the
auto-snapshot before destructive operations (ADR-0006), which
fires regardless of which surface produced the statement; the
mode's visual distinction (ADR-0003) is the user's signal
until then.
### 13. Out of scope
- **OOS-1.** `CREATE VIEW` / `TRIGGER`. Views are anticipated
by the items panel's design (`S2`) but need their own model.
- **OOS-2.** `EXPLAIN` of advanced-mode SQL queries. The DSL
`explain` (ADR-0028) still works for what it already wraps.
**(Superseded by ADR-0039, 2026-05-27 — this was a *deferred*
scope exclusion, not a principled rejection; EXPLAIN over advanced
SQL is now in scope, as a deferred follow-up.)**
- **OOS-3.** A function/expression allowlist for full
expression-level engine neutrality (§7) — best-effort now.
- **OOS-4.** Multi-statement batches and transaction control.
- **OOS-5.** A SQL → DSL echo (the reverse of §10).
## Consequences
- The unified grammar tree gains a large body of SQL grammar.
The `Node` taxonomy and the walker may need extension to
carry it (e.g. deeper recursion for subqueries / CTEs) — a
known risk, addressed per phase.
- `sqlparser-rs` is **not** adopted as the parser; ADR-0001's
reservation is superseded. `Q1`'s wording ("SQL parsed via
`sqlparser-rs`") is superseded — SQL is parsed by the
unified walker.
- `Command` gains a `Select` variant; every exhaustive `match
Command` gains an arm (the recurring ADR-0028/0029 gotcha).
- The `Database` worker gains a `RunSelect` request and a
"run validated DML, re-persist the table" request; DDL
reuses the existing typed requests unchanged.
- Mode-gating is added to the grammar / walker.
- The metadata, persistence, and type machinery is reused
unchanged for DDL — the payoff of routing DDL through
`Command`.
- This is the project's largest single feature so far. The
phased plan keeps each step independently shippable;
scope-cutting, if a slice proves disproportionate, is an
explicit escalation, never a silent trim.
- `Q4` is satisfied by this ADR; `Q1` / `Q2` are unblocked and
reframed around the unified walker; `M1` gains its
"recognised as SQL" hint.
## Implementation notes
Phased; each phase independently shippable and test-guarded.
The two large grammar slices each warrant their own focused
ADR when taken up (ADR-0026-style).
1. **Foundations + first `SELECT`.** Mode-gate the grammar
(advanced unlocks the SQL nodes). Author the core SQL
**expression grammar** — the ADR-0026 superset — as its own
ADR. A single-table `SELECT` (projection, `WHERE`, `ORDER
BY`, `LIMIT`) as a SQL `CommandNode` → `Command::Select` →
worker `RunSelect` → the existing renderer. Replace the
placeholder echo; add the simple-mode "this is SQL" hint.
This proves the path end-to-end *with full walker
assistance*.
2. **`SELECT` — full.** `JOIN`s, `GROUP BY`/`HAVING`,
aggregates, subqueries, `UNION`, CTEs. The big grammar
phase — its own ADR.
3. **DML.** `INSERT` / `UPDATE` / `DELETE` grammar; the
execute-as-validated-SQL path; the worker re-persist step;
settle multi-row `INSERT` and `shortid` auto-fill on a SQL
`INSERT`.
4. **DDL.** `CREATE` / `DROP` / `ALTER TABLE`, `CREATE` /
`DROP INDEX` grammar → `Command`; the §5 type-name map; FK
clauses → `AddRelationship`; may land table-rename (`C1`).
5. **The DSL → SQL teaching echo** (§10).
6. **Polish.** `help sql`; an engine-neutral error sweep;
typing-surface / matrix coverage; the `DOC1` SQL-surface
reference page.
## See also
- ADR-0001 — reserved `sqlparser-rs`; that reservation is
superseded here (§1).
- ADR-0002 — the engine is an implementation detail; "no
engine name in user-facing strings" — §7 extends it.
- ADR-0003 — the simple / advanced mode model this builds on.
- ADR-0005 — the ten-type vocabulary advanced DDL uses (§5).
- ADR-0009 — the DSL conventions; the DSL stays usable in
advanced mode.
- ADR-0012 / ADR-0013 — the metadata tables the `Command` core
keeps in sync, inherited for free (§4, §11).
- ADR-0014 — the data-operation model and the `--all-rows`
guard advanced mode deliberately relaxes (§12).
- ADR-0015 — persistence write-through and replay, reused and
made surface-agnostic (§11).
- ADR-0016 — the data-table renderer `SELECT` results reuse
(§6).
- ADR-0019 — the friendly-error layer all SQL errors route
through (§7, §9).
- ADR-0021 — per-command parse-error usage, free for SQL (§9).
- ADR-0022 — ambient typing assistance; §8 is its extension to
SQL.
- ADR-0023 / ADR-0024 — the unified grammar tree SQL becomes
part of (§1, §2).
- ADR-0026 — the `WHERE` expression grammar the SQL expression
grammar is the superset of (§3).
- ADR-0027 — the validity indicator, free for SQL (§8).
- ADR-0028 — the `OutputLine` styled-runs the teaching echo
uses (§10).