Files
rdbms-playground/docs/adr/0035-advanced-mode-sql-ddl.md
T
claude@clouddev1 631074ff9c feat: ADR-0035 4a — SQL CREATE TABLE command, worker, and exit gate
Command + builder + worker for advanced-mode SQL CREATE TABLE
(sub-phase 4a), executed structurally through do_create_table:

- Command::SqlCreateTable + build_sql_create_table (ddl.rs): aliases via
  from_sql_name (incl. double precision), column- and table-level
  PRIMARY KEY, redundant-flag de-dup off a sole PK, IF NOT EXISTS.
  Advanced REGISTRY entry on the shared `create` word (SQL-first, DSL
  fallback); no-PK tables allowed (user-confirmed).
- Worker (db.rs): Request::SqlCreateTable + CreateOutcome + snapshot_then
  (one undo step); IF NOT EXISTS no-op (no snapshot, but journalled, like
  read-only commands). do_create_table inline-PK rule aligned with the
  rebuild generator schema_to_ddl — no round-trip DDL drift; serial
  autoincrement is independent of inline-PK (verified by round-trip
  tests).
- Runtime/App: dispatch + CommandOutcome::SchemaSkipped +
  AppEvent::DslCreateSkipped (structure + "already exists — skipped"
  note). Friendly catalog keys added (engine-neutral).

DEFAULT/CHECK/table-level UNIQUE are absent from the 4a grammar (parse
error with usage skeleton; friendly message + support land in the 4a.2
constraint slice) — user-confirmed.

Tests: type resolver, grammar shape, builder (incl. the PK
detection bug they caught), and tests/sql_create_table.rs (worker
round-trip, serial autoincrement first/non-first across rebuild, IF NOT
EXISTS no-op + journalling, no-PK table, one undo step) + a replay-as-
write test. 1739 pass / 0 fail / 1 ignored; clippy clean.

Exit gate: ADR-0035 Proposed -> Accepted (validated end-to-end by 4a);
README + requirements.md Q1 updated.
2026-05-25 10:04:28 +00:00

385 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-0035: Advanced-mode SQL DDL
## Status
Accepted. Design agreed with the user (2026-05-24); the approach is
**validated end-to-end by sub-phase 4a** (`CREATE TABLE`, implemented
2026-05-25 — plan `docs/plans/20260524-adr-0035-sql-ddl-4a.md`), so the
decision is accepted while the remaining sub-phases (**4a.2, 4b4i**,
§13) continue. This is **Phase 4** of the ADR-0030 roadmap (the
advanced-mode SQL surface), the peer of ADR-0031 (expression grammar),
ADR-0032 (`SELECT`), and ADR-0033 (DML). It **clarifies ADR-0030 §4**
on how DDL is represented and executed.
**Refinements (2026-05-24, pre-implementation `/runda` round,
user-confirmed).** Two open micro-calls were settled before 4a:
(1) `IF [NOT] EXISTS` is **admitted** as a no-op-that-succeeds-with-a-note
rather than refused — it is a near-universal cross-vendor idiom
(PostgreSQL, MySQL/MariaDB, SQLite, Oracle 23ai), not an
engine-specific spelling, so it belongs in the standard surface
(§3/§4/§12/§13); (2) `INTEGER PRIMARY KEY` maps to a **plain `int`**
primary key, *not* auto-increment — `serial` remains the sole
auto-increment type (§3).
## Context
ADR-0030 fixed the *architecture* of advanced mode — SQL authored as
grammar in the unified tree (not a separate batch parser), with the
playground's own type vocabulary and metadata model — and noted that
each large grammar piece gets its own focused ADR. Phases 13 shipped:
the SQL expression grammar (ADR-0031), full `SELECT` (ADR-0032), and
DML — `INSERT`/`UPDATE`/`DELETE` (ADR-0033). Phase 4 is **DDL**:
`CREATE` / `DROP` / `ALTER TABLE` and `CREATE` / `DROP INDEX`.
Two things from the earlier phases shape this one:
1. **The advanced surface gets its *own* commands.** ADR-0033
established that a SQL statement produces a distinct command
(`SqlInsert` / `SqlUpdate` / `SqlDelete`), separate from the
simple-mode typed command for the same verb. Those DML commands
execute as **validated SQL run verbatim** — possible only because
DML changes no schema and touches no metadata.
2. **DDL cannot run verbatim.** If `CREATE TABLE Orders (id INTEGER)`
executed as-is, the engine would make the table, but the
playground would lose what the user meant: that `id` is `serial`,
that a `REFERENCES` clause is a *named relationship*, that `STRICT`
applies, that the ten-type vocabulary governs. Recovering that
needs the parsed statement either way.
ADR-0030 §4 said "DDL → a `Command` … run the typed executor." That
remains right in spirit — DDL is *structurally* executed, not raw —
but it predates the DML build and read as "reuse the simple-mode
`CreateTable` variant." This ADR clarifies it: **DDL gets its own
advanced commands too**, executed structurally (not verbatim). The
"verbatim" execution of the DML commands is an implementation
convenience available only because nothing about DML required
otherwise — not an architectural rule.
Requirements touched: realizes `Q4` for DDL; closes the advanced-mode
side of table/column/index/constraint/relationship operations; lands
the table-rename half of `C1` (advanced mode only).
## Decision
### 1. Own per-statement SQL DDL commands (clarifies ADR-0030 §4)
New `Command` variants, one per statement kind — granularity mirrors
the DML phase:
- `SqlCreateTable`
- `SqlAlterTable`
- `SqlDropTable`
- `SqlCreateIndex`
- `SqlDropIndex`
They are produced by the unified grammar's `ast_builder`s in advanced
mode. Unlike the DML `Sql*` commands they **execute structurally**:
the handler reads the parsed structure and performs the schema change
through the playground's metadata-maintaining machinery — writing
`__rdbms_playground_columns` / `__rdbms_playground_relationships`,
applying `STRICT`, using the ten-type vocabulary — so an
advanced-mode-created object is a first-class playground object,
identical to a simple-mode-created one (ADR-0030 §5).
**Simple mode is untouched.** The existing typed commands
(`CreateTable`, `AddColumn`, `AddRelationship`, …) and their grammar
are unchanged; advanced SQL DDL is purely additive.
**Execution sharing (per the user's steer).** The SQL DDL handlers
**reuse the low-level schema/metadata helpers** — the table builder,
the metadata writers, the rebuild-table primitive (ADR-0013) — where
the underlying operation is genuinely the same, so the two surfaces
cannot drift. Where the SQL path is genuinely different (e.g. a
`CREATE TABLE` that declares several inline foreign keys, which has no
simple-mode shape), it is implemented directly **for clarity rather
than bending the simple-mode command shapes to absorb it**. Shared
where it works; separate where it doesn't.
### 2. Dispatch — shared entry words, advanced-only `alter`
`create` and `drop` are already simple-mode entry words. They reuse
the **category-grouped, mode-aware dispatch** from ADR-0033
Amendment 1: each appears in both the `Simple` and `Advanced` groups
of the `REGISTRY`; in advanced mode the SQL node is tried first and
falls back to the simple node when the SQL shape doesn't match. So in
advanced mode `CREATE TABLE T (id serial)` parses as SQL while
`create table T with pk id(serial)` still parses as the simple form —
exactly as `insert` behaves today. `alter` is a **new advanced-only
entry word** (`CommandCategory::Advanced`); simple mode keeps its
`add column` / `drop column` / `rename column` / `change column`
verbs and gains no `alter`.
### 3. Type vocabulary (restates ADR-0030 §5)
The type-name slot accepts the playground keywords directly (`text`,
`int`, `real`, `decimal`, `bool`, `date`, `datetime`, `blob`,
`serial`, `shortid`) **and** standard-SQL aliases mapped onto them:
`integer`/`smallint`/`bigint``int`; `varchar`/`char``text`;
`boolean``bool`; `timestamp``datetime`; `numeric``decimal`;
`float`/`double precision``real`; `binary`/`varbinary``blob`. A
length/precision argument (`varchar(255)`, `numeric(10,2)`) is
**accepted and ignored** — the playground's types are
unparameterised. Engine storage-type names are neither accepted as
input nor shown (§9).
The map is purely **lexical**: `INTEGER PRIMARY KEY` becomes a plain
`int` primary key — it is **not** treated as auto-increment, unlike
the engine's rowid-alias idiom. Auto-increment is reached only through
the explicit `serial` type (`id serial primary key`). This keeps the
engine's storage behaviour from leaking into the standard surface and
matches ADR-0005's single-auto-increment-type model.
### 4. The DDL surface (full; `Q4`, no pre-emptive cuts)
**`CREATE TABLE <name> ( <element>, … )`**
- **Column elements**: `<name> <type> [constraints…]`, where the
column constraints are the ADR-0029 set spelled in SQL: `NOT NULL`,
`UNIQUE`, `PRIMARY KEY`, `DEFAULT <expr>`, `CHECK (<expr>)`, and an
inline `REFERENCES <T>(<col>) [ON DELETE …] [ON UPDATE …]` (§5).
- **Table elements**: `PRIMARY KEY (<col>, …)` (single **and
compound**), `UNIQUE (<col>, …)`, `CHECK (<expr>)`,
`[CONSTRAINT <name>] FOREIGN KEY (<col>) REFERENCES <T>(<col>)
[ON DELETE …] [ON UPDATE …]` (§5).
- `CHECK` and `DEFAULT` expressions reuse the ADR-0031 `sql_expr`
grammar (the same fragment `WHERE`/`HAVING`/projections use).
- `CREATE TABLE IF NOT EXISTS <name> …` is admitted: when the table
already exists the statement is a **no-op that succeeds with a note**
("table already exists — skipped") instead of the plain-form
"table already exists" error. `IF NOT EXISTS` is a near-universal
cross-vendor idiom, not an engine-specific spelling, so it is part of
the standard surface (refines §12).
**`DROP TABLE [IF EXISTS] <name>`** → `SqlDropTable`. Cascade of inbound
relationships follows the existing `drop table` semantics. `IF EXISTS`
is admitted (universal across the major engines): dropping an absent
table is then a **no-op that succeeds with a note** instead of the
plain-form "no such table" error.
**`ALTER TABLE <name> <action>`** → `SqlAlterTable`, where `<action>`
covers, mapping to the existing low-level operations:
| SQL action | Underlying operation |
|---|---|
| `ADD COLUMN <name> <type> [constraints]` | add-column (ADR-0013 rebuild where needed) |
| `DROP COLUMN <name>` | drop-column |
| `RENAME COLUMN <old> TO <new>` | rename-column |
| `ALTER COLUMN <name> TYPE <type>` | change-column-type (§5 conversion) |
| `ADD [CONSTRAINT <name>] <table-constraint>` | add-constraint / add-relationship (FK) |
| `DROP CONSTRAINT <name>` | drop-constraint |
| `RENAME TO <new>` | **table rename (§6, new low-level op)** |
**`CREATE [UNIQUE] INDEX [<name>] ON <table> (<col>, …)`** →
`SqlCreateIndex`, mapped to the ADR-0025 index machinery; `UNIQUE`
sets the index's uniqueness (a small extension to ADR-0025's index
model if it does not already carry the flag, called out in §13).
**`DROP INDEX <name>`** → `SqlDropIndex`.
### 5. Foreign keys → named relationships
A `REFERENCES` / `FOREIGN KEY` clause is the SQL spelling of an
ADR-0013 relationship. Because `SqlCreateTable` is its own command
carrying the whole parsed structure, a `CREATE TABLE` that declares
FK columns **creates the table and its relationship metadata
together** — one statement, one command, one transaction, **one undo
step** (§10). No decomposition into separate commands is needed.
- `ON DELETE` / `ON UPDATE` → the ADR-0013 referential actions.
- A `CONSTRAINT <name> FOREIGN KEY …` names the relationship; an
unnamed FK is auto-named by the existing ADR-0013 convention.
- `ALTER TABLE child ADD [CONSTRAINT <name>] FOREIGN KEY (<col>)
REFERENCES <P>(<col>) …` adds a relationship to an existing table
(the clean 1:1 with add-relationship).
- FK column type compatibility follows `Type::fk_target_type`
(ADR-0011) unchanged.
### 6. Table rename — advanced mode only (`C1`)
`ALTER TABLE <old> RENAME TO <new>` is **advanced-mode only**; there
is no simple-mode rename-table verb. It needs a genuinely new
low-level operation (none exists today): within one transaction,
rename the table in the database, rename its `data/<table>.csv` file,
and update every metadata row that names it — the column-metadata
rows, and **both ends of any relationship** in
`__rdbms_playground_relationships` that references the old name. Name
validation and `__rdbms_*` rejection apply to the target. This closes
the rename half of `C1` for the advanced surface.
### 7. Column type conversion — one engine, mode-appropriate policy
The per-cell classification of ADR-0017 (clean / lossy / incompatible,
plus static refusals for playground-type-specific targets such as
`→ serial` and `↔ blob`) is a property of the **type set**, shared by
both modes. The policy on the *lossy* tier differs by mode:
| Tier | Simple mode | Advanced mode (`ALTER COLUMN … TYPE`) |
|---|---|---|
| **clean** | auto-convert | auto-convert |
| **incompatible** | refuse (friendly) | refuse (friendly) — real SQL errors too |
| **static-refused** (`→serial`, `↔blob`, …) | refuse | refuse — our own types have no SQL meaning to mirror |
| **lossy** (`3.14`→`3`) | **refuse by default**; `--force-conversion` opts in | **perform it** (what SQL does), with a post-op "N values converted with loss" note; **no force flag** |
Rationale: **simple mode protects up front; advanced mode trusts the
user like SQL does and lets `undo` catch regrets.** A lossy advanced
conversion is snapshotted (§10), so it is one `undo` away — there is
no silent *irreversible* loss, and no need to drop to simple mode to
"force". Conversions that exist only in the playground's vocabulary
stay protected in both modes. The simple-mode `--force-conversion` /
`--dont-convert` flags are unchanged and have **no SQL spelling**
(advanced mode always performs the conversion); the Postgres `USING
<expr>` clause is **not** adopted (§12).
### 8. Constraints
Column- and table-level constraints map to the ADR-0029 model:
`NOT NULL`, `UNIQUE`, `PRIMARY KEY` (incl. compound, table-level),
`DEFAULT <expr>`, `CHECK (<expr>)`. A populated-column constraint
addition reuses ADR-0029's pre-flight dry-run guard. `CHECK` /
`DEFAULT` expressions are stored as the SQL the user could re-enter in
advanced mode (ADR-0030 §11) — one syntax, not a third.
### 9. Engine neutrality (ADR-0030 §7)
No engine type names in or out (§3). `STRICT` is applied internally by
the create path; it is not in the authored grammar, so typing it is an
ordinary parse error, not a surfaced engine feature. Parse errors,
out-of-subset refusals, and execution failures route through the
friendly-error layer (ADR-0019) with engine-neutral wording.
### 10. Persistence, metadata, history, replay, undo
- Structural execution keeps `project.yaml`, the metadata tables, and
the CSV layer correct with the same guarantees as the simple-mode
path (ADR-0015 §6 ordering preserved).
- `history.log` records the **literal submitted SQL line**; replay
re-runs it through the one walker with the advanced view active.
`create` / `drop` / `alter` are **schema-write entry words, not in
ADR-0034 Amendment 1's app-lifecycle skip set**, so SQL DDL
**replays as a write** (re-applied) with **no replay-filter change**
— unlike `undo` / `redo`, which had to be added to that skip set.
- **Undo (ADR-0006):** each SQL DDL statement is a user mutation
carrying a `source`, so it is snapshotted by the worker hook and is
**one undo step** — including a `CREATE TABLE` with foreign keys,
precisely because it is a single command (§5) rather than a
decomposed sequence.
### 11. Ambient assistance comes for free (ADR-0030 §8)
Because the DDL is grammar in the unified tree, the walker
**mechanisms** apply with no DDL-specific assistance code: syntax
highlighting, the `[ERR]`/`[WRN]` validity indicator (ADR-0027), the
per-command parse-error usage skeleton (ADR-0021), and the completion
engine.
What each grammar node still **authors** (this is writing the grammar,
not bolting assistance on afterwards): the correct `IdentSource` on
every schema-name slot — so `ALTER TABLE`/`DROP TABLE`/`DROP INDEX`
and `REFERENCES T(col)` / `CREATE INDEX ON T (cols)` complete from the
`SchemaCache`; the per-node hint + usage catalog keys (as the
app-command nodes carry `help_id` / `usage_ids`); and the
DDL-specific walker diagnostics with their catalog keys — the DDL
peers of the DML diagnostics ADR-0033 added (e.g. unknown type,
column-already-exists, FK column-type mismatch, the §7 lossy-conversion
note). The integration is structural, not free of authoring.
### 12. Out of scope
- Per ADR-0030 §3: views, triggers, transaction control, `PRAGMA`,
`ATTACH`/`DETACH`, `VACUUM`, virtual tables, multi-statement
batches. One statement per submission; a trailing `;` is tolerated.
- The Postgres `USING <expr>` conversion clause (§7) — heavy
(per-row expression evaluation), dialect-specific, and unable to
express playground-type targets.
- The simple-mode `--dont-convert` semantics have no SQL form
(advanced `ALTER COLUMN TYPE` always converts).
- The **DSL → SQL teaching echo** (ADR-0030 §10) is Phase 5, a
separate ADR — not this one.
- Engine-specific DDL spellings (`AUTOINCREMENT`, `WITHOUT ROWID`,
collations) — the grammar admits the standard surface; extras are
ordinary parse errors. (`IF [NOT] EXISTS` was **reclassified into
scope** — see §4 — as a near-universal cross-vendor idiom rather
than an engine-specific spelling.)
### 13. Phased implementation plan
Sub-phases, each opening with the smallest end-to-end slice and each
with an explicit exit gate + a written Devil's-Advocate gate, mirroring
ADR-0033's structure:
- **4a — Dispatch + `CREATE TABLE` core.** Advanced `create`
dispatch; `SqlCreateTable` for columns + types (the §3 map, incl. the
two-word `double precision` and discarded length args) + the
**clean-reuse column constraints only** — `NOT NULL` / `UNIQUE` /
column-level `PRIMARY KEY` — + single/compound table-level
`PRIMARY KEY`, plus `IF NOT EXISTS` (no-op-with-note, §4). Reuses
`do_create_table`, whose inline-PK rule is aligned with the rebuild
generator `schema_to_ddl` (inline only a first-column single PK) so a
created table and its rebuilt form have identical DDL; `serial`
autoincrement is independent of inline-vs-table-level PK (the insert
path computes the next value), verified by round-trip tests. **No
FK** (4b); **no `DEFAULT`/`CHECK`/table-level `UNIQUE`** (4a.2).
- **4a.2 — The constraint slice.** Split out (2026-05-24,
user-confirmed) for the constraints that are *not* a clean reuse:
(1) **`CHECK`/`DEFAULT`** via the full `sql_expr` surface stored as
**raw SQL text** — needed because `sql_expr` is validate-only and
yields no `Expr` AST for `compile_check_sql`/`ColumnSpec`, so it is a
separate execution path; (2) **composite `UNIQUE(a,b)` and
multi-column table `CHECK`** — the first structures `TableSchema`
cannot already represent, needing a model + YAML round-trip +
`read_schema` detection + `do_create_table` emission extension, with
save/load/rebuild tests. Until then 4a rejects all of these
"not yet supported". (The general rule: a DDL feature needs new
model/execution work only when it introduces a structure simple mode
could never produce, or an expression the structural helper cannot
consume — cf. the `UNIQUE`-index flag in 4d and the rename op in 4h.)
- **4b — Foreign keys in `CREATE TABLE`.** Inline `REFERENCES` +
table-level `FOREIGN KEY` → relationship metadata, one undo step.
- **4c — `DROP TABLE [IF EXISTS]`** → `SqlDropTable` (cascade parity;
`IF EXISTS` no-op-with-note, §4).
- **4d — `CREATE [UNIQUE] INDEX` / `DROP INDEX`** → `SqlCreateIndex`
/ `SqlDropIndex` (ADR-0025; the `UNIQUE` flag extension if needed).
- **4e — `ALTER TABLE` add/drop/rename column.**
- **4f — `ALTER TABLE … ALTER COLUMN TYPE`** (the §7 conversion
model + the lossy-with-note path).
- **4g — `ALTER TABLE` add/drop constraint, add foreign key.**
- **4h — `ALTER TABLE … RENAME TO`** (the §6 new low-level op).
- **4i — Verification sweep.** Typing-surface + matrix coverage,
engine-neutral error pass, undo-parity check (one step per
statement), `help`/usage for the new forms.
## Consequences
- Advanced mode reaches DDL parity with simple mode and adds
table-rename, so a learner can build and evolve a whole schema in
standard SQL with the playground's types, metadata, and safety
intact.
- The command set grows by five `Sql*` DDL variants; the worker gains
their handlers, which lean on shared low-level helpers where the
operation matches the simple-mode path and stand alone where the
SQL surface is genuinely richer (multi-FK `CREATE TABLE`).
- One genuinely new capability — table rename — adds a low-level op
that the simple mode does not have; it must keep the CSV file name
and the relationship metadata in step with the table name.
- ADR-0030 §4 is clarified (own `Sql*` DDL commands, structurally
executed); no behaviour of the shipped DML/`SELECT` phases changes.
- The conversion model unifies simple and advanced without a force
flag in SQL, relying on `undo` (ADR-0006) as the advanced-mode
safety net — a concrete payoff of having shipped undo first.
## See also
- **ADR-0030** — the advanced-mode architecture; this is its Phase 4
and clarifies §4 (DDL representation) and restates §5 (types) / §7
(neutrality) / §8 (assistance) / §11 (persistence).
- **ADR-0033** — the DML phase; source of the category-grouped
mode-aware dispatch (Amendment 1) reused for shared entry words.
- **ADR-0031** — `sql_expr`, reused for `CHECK` / `DEFAULT`.
- **ADR-0013** — relationships + the rebuild-table primitive that the
`ALTER`/FK handlers build on.
- **ADR-0017** — the column type-change classification §7 shares.
- **ADR-0029** — column constraints; **ADR-0025** — indexes;
**ADR-0011** — FK column-type compatibility; **ADR-0005** — the
ten-type vocabulary.
- **ADR-0006** — undo; each DDL statement is one undo step (§10).