Files
rdbms-playground/docs/adr/0030-advanced-mode-sql-surface.md
claude@clouddev1 9f15f386d5 docs(adr): design the DSL→SQL teaching echo (ADR-0038) + dependencies
Realises ADR-0030 §10 (the DSL→SQL teaching bridge) as a /runda'd design
set, before implementation:

- ADR-0037 (new): execution-time mode side-channel — SubmissionMode
  {Simple, Advanced, AdvancedOneShot} threaded Action→worker, output-only;
  redeems ADR-0033 Amendment 3's deferred follow-up. Replay stays silent.
- ADR-0038 (new): the teaching echo + full catalogue (Buckets A/B/C),
  the copy-paste round-trip contract, the three-category framework, and
  the Value→SQL-literal renderer. DDL + show-data centric (overlapping
  DML is SQL-first, so already SQL). Build-order deps recorded.
- ADR-0035 Amendment 2: standard-first dialect stance + ALTER COLUMN
  SET/DROP NOT NULL, SET/DROP DEFAULT, ISO SET DATA TYPE gap-fill.
- ADR-0033 Amendment 4: reclassifies the `update … --all-rows`
  non-fall-back as a bug; it now falls back to the DSL Update and echoes
  (keyed on adjacent `--`; spaced arithmetic preserved).
- ADR-0039 (new): EXPLAIN over advanced SQL — decision recorded, build
  deferred; supersedes ADR-0030 §13 OOS-2.
- ADR-0000: out-of-scope discipline (deferred vs rejected). README index
  updated for all of the above.

Reconcile CLAUDE.md: simple-mode column ops are implemented, not pending
(requirements.md C2/B2 already [x]).
2026-05-27 20:44:38 +00:00

18 KiB

ADR-0030: Advanced mode — the standard-SQL surface

Status

Accepted

Context

ADR-0003 split the input field into two modes. Simple mode (the default) takes the teaching DSL; advanced mode was specified to take "raw SQL, including DDL and queries". The DSL half is fully built (ADR-0009, ADR-0023/0024, and everything since); advanced mode is still a placeholder — a submitted line is echoed back unexecuted.

Requirement Q1 commits to a defined SQL subset, Q2 to rejecting out-of-subset syntax clearly, Q4 is the subset specification — this ADR. Two constraints shape every decision below; both come from how this project already works.

  1. The engine is an implementation detail. ADR-0002 established that the database product is never named in user-facing strings. Advanced mode must extend that posture: it is a way to work with standard SQL, as independent of the storage engine as we can make it — not a console onto the engine. The engine's type names, its STRICT keyword, its dialect quirks, and its raw error text must not surface. And handing typed text straight to the engine would bypass the typed executor that keeps the internal metadata tables (ADR-0012/0013) in sync, writes project.yaml + CSV (ADR-0015), and preserves the playground's rich type vocabulary (ADR-0005).

  2. Assistance comes from one place. Completion, syntax highlighting, hint-panel prose, the [ERR]/[WRN] indicator, and per-command parse-error usage all derive from a single unified grammar tree walked incrementally (ADR-0022/0023/0024 — explicitly "the single source of truth"). A batch SQL parser — the kind sqlparser-rs (reserved in ADR-0001) is — produces an AST and nothing else: it cannot say what is valid at the cursor, cannot drive completion, highlighting, or hints. Parsing SQL with such a library would leave advanced mode either without the ambient assistance the DSL has, or dependent on a second, parallel assistance system — both contrary to ADR-0023/0024.

The decision: SQL is not parsed by a separate library. SQL becomes additional grammar within the unified tree, walked by the same walker as the DSL. Advanced mode is not a different parser — it is the same parser with more grammar unlocked.

Decision

1. SQL lives in the unified grammar tree

SQL statements are authored as CommandNode / Node grammar in the ADR-0024 tree and parsed by the existing walker. The consequence is the whole point: completion, highlighting, hint prose, the validity indicator, and parse-error usage work for SQL exactly as for the DSL, for free, because they are all walker outputs (§8).

sqlparser-rs is therefore not used as the parser; ADR-0001's reservation of it is superseded. (An implementer may retain it narrowly as a test oracle — parse the same SQL, compare — but it is not on the execution path.)

The honest cost: the supported SQL is exactly what we author into the tree — we are, in effect, writing a SQL grammar. This is the project's largest single feature to date. The target is the full teaching-relevant standard-SQL surface (§3); scope is cut only on demonstrated difficulty, as a deliberate escalation to the user, never silently.

2. Mode gates the grammar

There is one grammar tree. Simple mode exposes the DSL subset of it; advanced mode additionally exposes the SQL forms.

  • Shared entry words — create, drop, insert, update, delete — carry both a DSL form and a SQL form as Choice branches under one CommandNode (mechanically how add already holds four sub-commands today). select is a new, SQL-only entry word.
  • SQL branches are mode-tagged; the walker presents the DSL-only view in simple mode and the full view in advanced.
  • The : one-shot escape and mode advanced unlock the SQL view for a line / persistently — unchanged from ADR-0003.
  • Because the grammar knows a node is SQL (it is tagged, merely gated), a simple-mode line that matches a gated SQL form yields a precise hint — "this is SQL; switch with mode advanced, or prefix the line with :" — rather than a generic parse error. This satisfies M1's "recognised as SQL" promise.

The DSL stays usable in advanced mode (the superset rule): nothing a learner already knows stops working.

3. The supported SQL surface (Q4)

The target is the teaching-relevant standard-SQL surface, authored into the tree with no pre-emptive cuts:

  • SELECT — the full query surface: projection, WHERE, inner/outer JOINs, GROUP BY / HAVING, aggregate functions, ORDER BY, LIMIT / OFFSET, scalar and correlated subqueries, UNION / INTERSECT / EXCEPT, and common table expressions (WITH).
  • INSERT (single- and multi-row), UPDATE, DELETE.
  • CREATE / DROP / ALTER TABLE, CREATE / DROP INDEX.
  • A SQL expression grammar — arithmetic, function calls, CASE, the comparison / LIKE / IN / BETWEEN / IS NULL predicate set, subquery expressions — the superset of ADR-0026's WHERE grammar, shared by WHERE, HAVING, CHECK, SELECT projections, and DEFAULT.

Out of the surface: views, triggers, transaction control (BEGIN/COMMIT/…), PRAGMA, ATTACH/DETACH, VACUUM, virtual tables, multi-statement batches. One statement per submission; a trailing ; is tolerated.

The SQL expression grammar and the full SELECT grammar are each large enough to warrant their own focused ADR when implemented — the precedent is ADR-0026 for the WHERE grammar. ADR-0030 fixes the architecture; those ADRs fix the detailed grammar.

4. Execution — DDL through Command, DML and SELECT as validated SQL

The walker parsing a SQL statement yields a matched parse. From it:

  • DDL → a Command (CreateTable, DropTable, AddColumn, AddConstraint, AddIndex, …). DDL must run the typed executor, because that is what keeps the metadata tables, the playground type vocabulary, and STRICT intact. The CommandNode's ast_builder is the SQL → Command translator.
  • DML and SELECT → executed as the validated SQL itself (re-rendered canonically from the matched parse, or the validated original text). They change no schema, so modelling them as a typed Command buys nothing. For DML the worker — knowing the statement kind and target table from the parse — runs the statement and re-persists that table's CSV; SELECT is read-only, run and rendered (§6).

This split is also what makes advanced mode genuinely full. Because DML / SELECT / CHECK expressions are not lowered into the DSL's deliberately-limited Expr (ADR-0026), advanced mode delivers the full SQL expression surface — arithmetic, functions, subqueries, nested boolean operands — that docs/simple-mode-limitations.md records as the inverse of the simple-mode subset. The DSL Expr is the DSL's representation; the SQL surface does not round-trip through it.

Forward note (2026-05-26, ADR-0036 Accepted). ADR-0036 augments the "DML → validated SQL text" half of this section — it does not change the execution model. Advanced-mode DML still executes verbatim; what's added is that "validated" now also means value-validated: the literal data values in INSERT/UPDATE are checked against the playground type system (and retained for error reporting) before the verbatim statement runs, sharing the DSL's per-type validators. No binding, no reconstruction, no command-identity change (ADR-0033 Am3 stands). The SELECT half and the full-expression-surface rationale above stand: expressions and queries remain verbatim text (ADR-0026's limited Expr is not imposed on SQL). Dividing line: a static literal value (validate it) vs an engine-evaluated expression-or-query.

5. Type vocabulary — the playground's, not the engine's

Advanced-mode DDL uses the playground's own ten-type vocabulary (ADR-0005). There is no fallback to engine storage types: a column created in advanced mode is a first-class serial / decimal / date / … exactly as a DSL-created one, with the same metadata row.

The type-name slot accepts the playground keywords directly (text, int, real, decimal, bool, date, datetime, blob, serial, shortid) and standard-SQL aliases that map onto them — integer/smallint/bigintint; varchar/chartext; booleanbool; timestampdatetime; numericdecimal; float/double precisionreal; binary/varbinaryblob. A length / precision argument (varchar(255)) is accepted and ignored — the playground's types are unparameterised. The engine's own type names are an internal mapping and are neither accepted as input nor shown.

6. SELECT — the read-only query path

SELECT touches no metadata, no persistence, no types. It is carried as Command::Select holding the validated SQL; the worker (Request::RunSelect) prepares and runs it, producing the existing DataResult, which renders through the existing data-table renderer (the one show data uses, ADR-0016). Columns that carry no playground type — computed expressions, joined columns — render with neutral alignment; the result is capped like show data, with LIMIT suggested for large outputs. A reference to an internal __rdbms_* table is rejected by the grammar (those tables are not in scope).

7. Engine neutrality

  • No engine type names in or out (§5).
  • No STRICT, no storage options. STRICT is applied internally by do_create_table; the user neither writes nor sees it. It is simply not part of the authored grammar, so typing it is an ordinary parse error — not a SQLite feature surfaced to the learner.
  • Engine-neutral errors. SQL parse errors, out-of-subset refusals, and execution failures all route through the friendly-error layer (ADR-0019); the engine's raw message and product name never appear.
  • Honest limitation. The grammar enforces the structural subset exactly. Expression-level neutrality is best-effort: an exotic engine-specific function the grammar admits and the engine then rejects surfaces an engine-neutral error rather than being caught up front. A function allowlist is a possible future hardening (§13).

8. Ambient assistance comes for free

Because SQL is grammar in the unified tree (§1), the walker gives SQL — with no SQL-specific assistance code — the same as the DSL:

  • Syntax highlighting of SQL keywords, identifiers, literals.
  • Tab completion of SQL keywords, and of schema names (tables, columns) drawn from the same SchemaCache the DSL completion already uses.
  • Hint-panel prose at each grammar slot.
  • The [ERR]/[WRN] validity indicator (ADR-0027).
  • Per-command parse-error usage (ADR-0021).

This is the reason for §1: assistance and a batch parser are incompatible; assistance and the unified grammar tree are the same thing.

9. Parse errors and the unsupported surface (Q2)

A construct not in the authored grammar is an ordinary walker parse error; the ADR-0021 per-command usage machinery and the ADR-0027 indicator apply, with engine-neutral wording. There is no separate "valid SQL but unsupported" classifier — that would require the batch parser §1 dropped; the walker's expected-set drives the message instead.

10. The DSL → SQL teaching bridge

When a DSL command runs in advanced mode, its output includes the equivalent SQL — so a learner who knows the simple-mode form reads off how to express it in SQL.

  • It is a Command → SQL renderer: the inverse of §4's DDL translator.
  • It fires only for commands entered via the DSL form, and only in advanced mode (a command the user already typed as SQL is not echoed back; simple mode is left uncluttered).
  • It renders as a distinct, de-emphasised output line beneath the [ok] summary, using the OutputLine styled-runs mechanism (ADR-0028).
  • App-level commands have no SQL form and are not echoed.

11. Persistence, metadata, history, replay

  • DDLCommand → the typed executor, so project.yaml, the metadata tables, and history.log stay correct with no new code (§4).
  • DML → the worker re-persists the affected table's CSV after running the statement.
  • history.log records the literal submitted line — a statement typed as SQL is logged as that SQL. The replay format is therefore app-enterable syntax, no divergence.
  • Replay re-runs each log line through the one walker with the advanced view active, so a project whose history mixes DSL and SQL replays faithfully.
  • project.yaml stays a structured schema snapshot; its embedded expressions (a column CHECK) are stored as SQL the user could re-enter in advanced mode — one syntax, not a third.

12. Safety in advanced mode

Advanced mode carries fewer rails by design. The DSL's WHERE-or---all-rows guard on update/delete (ADR-0014) is a simple-mode teaching aid; a SQL DELETE FROM t with no WHERE executes as written. The safety net is the auto-snapshot before destructive operations (ADR-0006), which fires regardless of which surface produced the statement; the mode's visual distinction (ADR-0003) is the user's signal until then.

13. Out of scope

  • OOS-1. CREATE VIEW / TRIGGER. Views are anticipated by the items panel's design (S2) but need their own model.
  • OOS-2. EXPLAIN of advanced-mode SQL queries. The DSL explain (ADR-0028) still works for what it already wraps. (Superseded by ADR-0039, 2026-05-27 — this was a deferred scope exclusion, not a principled rejection; EXPLAIN over advanced SQL is now in scope, as a deferred follow-up.)
  • OOS-3. A function/expression allowlist for full expression-level engine neutrality (§7) — best-effort now.
  • OOS-4. Multi-statement batches and transaction control.
  • OOS-5. A SQL → DSL echo (the reverse of §10).

Consequences

  • The unified grammar tree gains a large body of SQL grammar. The Node taxonomy and the walker may need extension to carry it (e.g. deeper recursion for subqueries / CTEs) — a known risk, addressed per phase.
  • sqlparser-rs is not adopted as the parser; ADR-0001's reservation is superseded. Q1's wording ("SQL parsed via sqlparser-rs") is superseded — SQL is parsed by the unified walker.
  • Command gains a Select variant; every exhaustive match Command gains an arm (the recurring ADR-0028/0029 gotcha).
  • The Database worker gains a RunSelect request and a "run validated DML, re-persist the table" request; DDL reuses the existing typed requests unchanged.
  • Mode-gating is added to the grammar / walker.
  • The metadata, persistence, and type machinery is reused unchanged for DDL — the payoff of routing DDL through Command.
  • This is the project's largest single feature so far. The phased plan keeps each step independently shippable; scope-cutting, if a slice proves disproportionate, is an explicit escalation, never a silent trim.
  • Q4 is satisfied by this ADR; Q1 / Q2 are unblocked and reframed around the unified walker; M1 gains its "recognised as SQL" hint.

Implementation notes

Phased; each phase independently shippable and test-guarded. The two large grammar slices each warrant their own focused ADR when taken up (ADR-0026-style).

  1. Foundations + first SELECT. Mode-gate the grammar (advanced unlocks the SQL nodes). Author the core SQL expression grammar — the ADR-0026 superset — as its own ADR. A single-table SELECT (projection, WHERE, ORDER BY, LIMIT) as a SQL CommandNodeCommand::Select → worker RunSelect → the existing renderer. Replace the placeholder echo; add the simple-mode "this is SQL" hint. This proves the path end-to-end with full walker assistance.
  2. SELECT — full. JOINs, GROUP BY/HAVING, aggregates, subqueries, UNION, CTEs. The big grammar phase — its own ADR.
  3. DML. INSERT / UPDATE / DELETE grammar; the execute-as-validated-SQL path; the worker re-persist step; settle multi-row INSERT and shortid auto-fill on a SQL INSERT.
  4. DDL. CREATE / DROP / ALTER TABLE, CREATE / DROP INDEX grammar → Command; the §5 type-name map; FK clauses → AddRelationship; may land table-rename (C1).
  5. The DSL → SQL teaching echo (§10).
  6. Polish. help sql; an engine-neutral error sweep; typing-surface / matrix coverage; the DOC1 SQL-surface reference page.

See also

  • ADR-0001 — reserved sqlparser-rs; that reservation is superseded here (§1).
  • ADR-0002 — the engine is an implementation detail; "no engine name in user-facing strings" — §7 extends it.
  • ADR-0003 — the simple / advanced mode model this builds on.
  • ADR-0005 — the ten-type vocabulary advanced DDL uses (§5).
  • ADR-0009 — the DSL conventions; the DSL stays usable in advanced mode.
  • ADR-0012 / ADR-0013 — the metadata tables the Command core keeps in sync, inherited for free (§4, §11).
  • ADR-0014 — the data-operation model and the --all-rows guard advanced mode deliberately relaxes (§12).
  • ADR-0015 — persistence write-through and replay, reused and made surface-agnostic (§11).
  • ADR-0016 — the data-table renderer SELECT results reuse (§6).
  • ADR-0019 — the friendly-error layer all SQL errors route through (§7, §9).
  • ADR-0021 — per-command parse-error usage, free for SQL (§9).
  • ADR-0022 — ambient typing assistance; §8 is its extension to SQL.
  • ADR-0023 / ADR-0024 — the unified grammar tree SQL becomes part of (§1, §2).
  • ADR-0026 — the WHERE expression grammar the SQL expression grammar is the superset of (§3).
  • ADR-0027 — the validity indicator, free for SQL (§8).
  • ADR-0028 — the OutputLine styled-runs the teaching echo uses (§10).