# ADR-0024: Unified grammar tree — execution plan ## Status **Accepted.** 2026-05-14. Concrete specification for the direction proposed in ADR-0023. Where ADR-0023 captured the critique of the current parser shape and the high-level vision, this ADR specifies the data model, walker semantics, migration sequence, and cleanup steps in enough detail that implementation can proceed without further design decisions. Supersedes ADR-0023's "Proposed" status. ADR-0023 stays in the directory as institutional memory of why this change is happening; ADR-0024 is what gets built. ## Context The design pass landed in the round-6 session (2026-05-14) worked through ADR-0023's open questions and a number of implicit decisions that hadn't been written down. Four rounds of questions, each followed by user confirmation: 1. **Round 1 — foundational.** Registry shape, node taxonomy, AST output model, failure / "expected" semantics, walker API and its mapping to parse / complete / highlight / hint concerns. 2. **Round 2 — concrete representation.** Multi-keyword sequences, sub-grammar reusability (static and dynamic), path-bearing commands, bare-or-with-suffix commands. 3. **Round 3 — organisation and migration.** Module layout, per-command migration strategy, test discipline during migration. 4. **Round 4 — smaller details.** Aliases on keyword nodes, `IdentSlot` fate, highlight palette, external-tooling exposure. Two larger decisions emerged from the rounds and shifted the shape from ADR-0023's sketch: - **The lexer dissolves.** The walker operates directly on source bytes ("scannerless"). The current `dsl/lexer.rs` module's responsibilities (whitespace skipping, token shape recognition, byte-span tracking) migrate into terminal-node consume functions and the walker driver. The `define_keywords!` macro is no longer needed in its current form; keyword literals live on `Word` nodes in the grammar. - **Schema-aware parse from day one.** ADR-0023 had been cautious about coupling parse to schema state. The round-1 / round-2 discussion concluded that this caution comes from general-purpose parser tooling and doesn't apply to an interactive DSL editor where the schema *is* the context. Typed value slots consult the schema during parse; bind-time type checks remain but become belt-and-braces rather than the primary defense. A separate critique surfaced in the design pass: my (Claude's) default pull toward "what's the safe incremental version of what general-purpose parser tooling does" repeatedly fought against the project owner's cleaner direct design. The pull is now explicitly resisted — this ADR ships the direct design, not a phased compromise. ## Decision summary A single trie data structure declared in Rust serves as the authority for parsing, completion, syntax highlighting, parse- error usage rendering, hint-panel content, and (eventually) external-tooling exposure. The walker that consumes this trie operates directly on source bytes — no separate lexer pass. Schema-aware narrowing flows naturally from the trie's structure: typed value slots and dynamic sub-grammars consult a per-walk context that carries the current table, the resolved column types, and a reference to the schema cache. Migration is per-command across six phases. The legacy chumsky parser and the new walker run side-by-side during the transition; existing behavioural tests guard regressions. Phase F removes chumsky, the lexer module, the separate `UsageEntry` registry, and the expected-set introspection in `completion.rs`. Estimated total cost: ~4 sessions — one to land the framework and migrate Phase A, two for Phases B-D, one for Phases E + F. ## Architecture ### Walker as single source of truth ```rust pub fn walk( source: &str, bound: WalkBound, ctx: &mut WalkContext, ) -> WalkResult<'_>; pub enum WalkBound { EndOfInput, // parse: walk all input Position(usize), // complete / hint: walk up to cursor byte } pub struct WalkResult<'a> { pub outcome: WalkOutcome, pub matched_path: MatchedPath, pub per_byte_class: Vec<(ByteRange, HighlightClass)>, } pub enum WalkOutcome { Match { command_idx: usize }, Incomplete { position: usize, expected: Vec<&'static Node> }, Mismatch { position: usize, expected: Vec<&'static Node>, found_byte: u8 }, ValidationFailed { position: usize, message_key: &'static str, args: Vec<(&'static str, String)> }, } ``` Consumers: - **Parse for dispatch.** `walk(source, EndOfInput, ctx)`. On `Match`, invoke `commands[command_idx].ast_builder(matched_path)` and dispatch the returned `Command`. - **Highlighting.** `walk(source, EndOfInput, ctx).per_byte_class`. Each terminal records `(byte_range, node.highlight_class())` as it matches. Unmatched ranges (past a failure) get the `tok_error` overlay. - **Completion at cursor.** `walk(source, Position(cursor), ctx)`, inspect `outcome.expected`. Each expected `Node` contributes candidates: `Word` → its primary literal, `Ident { source }` → schema-cache lookup, `Flag` → `--name`, value-literal slot → type-appropriate hint per `HintMode`, etc. - **Hint panel ambient.** Same walk as completion. The hint resolver consults `WalkOutcome` variants plus the expected nodes' `HintMode` to choose between candidates rendering, prose, suppression, etc. ### Scannerless: no lexer module Terminal nodes consume bytes directly. No pre-pass produces a `Vec`. The walker's driver handles whitespace skipping between siblings of a `Seq` and dispatches to each terminal's `consume(source, position)` function. Character-level helpers (identifier shape, digit-sequence shape, quoted-string escape handling) live in `src/dsl/walker/lex_helpers.rs` — a small shared module used by the various terminal consume functions. This is internally similar to the current lexer's logic, but it's invoked per-position by the walker rather than as a pre-pass. `src/dsl/lexer.rs` and `src/dsl/keyword.rs` are deleted in Phase F. The keyword vocabulary is no longer a Rust enum; each keyword exists as a `Word` node in the grammar declarations. ### Node taxonomy Thirteen node kinds. Three categories: **Terminals** (consume bytes): ```rust pub enum Node { Word { primary: &'static str, aliases: &'static [&'static str], // Default tok_keyword unless overridden. highlight_override: Option, }, Punct(char), Ident { source: IdentSource, role: &'static str, highlight_override: Option, }, NumberLit, StringLit, BlobLit, Flag(&'static str), BarePath, // Combinators ↓ } ``` **Combinators** (compose other nodes): ```rust Choice(&'static [Node]), Seq(&'static [Node]), Optional(&'static Node), Repeated { inner: &'static Node, separator: Option<&'static Node>, min: usize, }, ``` **Dynamic** (resolves at walk time using `WalkContext`): ```rust DynamicSubgrammar(fn(&WalkContext) -> Node), } ``` `CommandNode` is the top-level entry record: ```rust pub struct CommandNode { pub entry: Word, pub shape: Node, // usually a Seq pub ast_builder: fn(&MatchedPath) -> Command, pub dispatch: fn(&mut App, Command) -> Vec, pub help_id: Option<&'static str>, pub usage_id: Option<&'static str>, // Hint mode override at command level; nodes can carry their own too. pub hint_mode: Option, } pub const REGISTRY: &[CommandNode] = &[ /* ... */ ]; ``` ### Typed value slots Value-literal positions use typed slots built from terminals plus content validators. One slot factory per data type: ```rust fn int_slot() -> Node { Choice(&[NumberLit_with(integer_only_validator), null_word()]) } fn real_slot() -> Node { Choice(&[NumberLit, null_word()]) } fn decimal_slot() -> Node { Choice(&[NumberLit_with(decimal_validator), null_word()]) } fn bool_slot() -> Node { Choice(&[Word("true", &[]), Word("false", &[]), null_word()]) } fn text_slot() -> Node { Choice(&[StringLit, null_word()]) } fn date_slot() -> Node { Choice(&[StringLit_with(date_format_validator), null_word()]) } fn datetime_slot() -> Node { Choice(&[StringLit_with(datetime_format_validator), null_word()]) } fn blob_slot() -> Node { Choice(&[BlobLit, null_word()]) } ``` `StringLit_with(validator)` is a `StringLit` terminal carrying a content validator that runs after a successful match. Same for `NumberLit_with`. A failed validator surfaces as `WalkOutcome::ValidationFailed` with the validator's catalog key. `slot_for_type(ty: Type) -> Node` is the dispatcher: given a column type, returns the appropriate slot. Used by dynamic sub-grammars (see below). ### `WalkContext` ```rust pub struct WalkContext<'a> { pub schema: &'a SchemaCache, // Current table inferred from the partial parse — e.g., // `insert into Customers ...` sets `current_table = "Customers"`. pub current_table: Option, // The columns of `current_table`, in declaration order, with types. // Populated by Ident { source: Tables } when it matches a // known table. pub current_table_columns: Option>, // For comma-separated value lists, which position we're at. pub value_position: usize, // For `set` clauses and `where` clauses, the column whose value // we're about to consume. pub current_column: Option, } ``` Nodes can write to `WalkContext`: - `Ident { source: Tables, role: "table", writes_table: true }` on match sets `ctx.current_table` to the matched identifier and resolves `ctx.current_table_columns` from the schema. - `Ident { source: Columns, role: "column", writes_current_column: true }` on match sets `ctx.current_column` from the resolved column list. Nodes can read from `WalkContext`: - `DynamicSubgrammar(column_value_list)` reads `ctx.current_table_columns` and unfolds to a `Seq` of comma-separated typed slots — one per column. - The value slot after `set col=` reads `ctx.current_column.user_type` to pick the right typed slot. ### `WalkOutcome` and "expected" The walker keeps track of the longest prefix that matched and the position at which it failed (or completed). At a failure or incomplete position, `expected` is the set of nodes that could legally continue the walk — derived structurally from the trie, not from a separate "expected" table. For a `Seq` mid-walk, `expected` is the next child node. For a `Choice` that hasn't committed to a branch, `expected` is all children. For an `Optional` at a position where its inner could start, `expected` includes the inner plus the next sibling. This is the same information chumsky's `ParseError::Invalid::expected` carries today, sourced from the trie directly instead of via combinator introspection. ### `HintMode` per node Each node may carry a `HintMode`: ```rust pub enum HintMode { /// Candidates if any surface; else prose fallback. Default, /// Force the prose at this catalog key regardless of candidates. /// Used by NewName slots ("Type a name, then `(`"). ForceProse(&'static str), /// Show only the prose; suppress Tab candidates. /// Used by typed value slots at empty prefix. ProseOnly(&'static str), /// Suppress prose; only candidates. SuppressProse, } ``` The walker propagates each expected node's `HintMode` to the hint resolver, which dispatches accordingly. The current ad-hoc cases in `input_render.rs::ambient_hint` (value-literal slot suppression, NewName slot typing-name prose, invalid-ident overlay) migrate to node-attached `HintMode` annotations during Phase D. ### Ranker layer A ranker function runs between the walker's raw candidate output and the hint-panel renderer: ```rust pub type Ranker = fn(&WalkContext, Vec) -> Vec; pub fn identity_ranker(_: &WalkContext, c: Vec) -> Vec { c } ``` Default is `identity_ranker` — declaration order from the trie is preserved. The signature allows future enhancements (frequency-based ranking, content-aware priors for type suggestions per column name) to plug in without changing grammar declarations. The ranker lives outside the trie. Grammar declarations are about *what's valid*; ranking is about *what's likely useful first*. ### Sub-grammars Two flavours, no global registry: **Static** — pure composition, function returning a const node: ```rust const fn qualified_column(role_table: &'static str, role_col: &'static str) -> Node { Seq(&[ Ident { source: Tables, role: role_table, /* ... */ }, Punct('.'), Ident { source: Columns, role: role_col, /* ... */ }, ]) } const fn where_clause() -> Node { Seq(&[ Word { primary: "where", /* ... */ }, Ident { source: Columns, role: "filter_column", /* ... */ }, Punct('='), AnyValueSlot, ]) } ``` **Dynamic** — context-aware, expands at walk time: ```rust fn column_value_list(ctx: &WalkContext) -> Node { let cols = ctx.current_table_columns.as_ref().unwrap_or(&Vec::new()); let mut children: Vec = Vec::new(); for (i, col) in cols.iter().enumerate() { if i > 0 { children.push(Punct(',')); } children.push(slot_for_type(col.user_type)); } Seq(Box::leak(children.into_boxed_slice())) } ``` Dynamic sub-grammars return owned `Node` values that the walker treats as inline expansions. The leak above is one implementation tactic — alternatively, the walker stores the expanded node in a small per-walk arena. Both work; pick at implementation time. ### Aliases A `Word` node carries `primary` and an `aliases` slice. The walker matches input against either; completion surfaces only the primary; help text mentions aliases prose-style if appropriate. Highlight class is the same for both. Round 5's `q` removal is *not* reverted by this design. `q` stays gone — adding it back would now be the single line `aliases: &["q"]` on the `quit` `Word` node, and would not surface as a separate candidate in completion (matching the round-5 user request). ### `IdentSource` Replaces the current `dsl::ident_slot::IdentSlot`: ```rust pub enum IdentSource { NewName, // user invents; no schema lookup; ProseOnly hint Tables, // existing table names Columns, // existing column names (filtered by current table) Relationships, // existing relationship names Types, // closed set from Type::all() } ``` `Types` is new — it replaces the magic-string `TYPE_SLOT_LABEL` used today. `src/dsl/ident_slot.rs` dissolves into `src/dsl/grammar/mod.rs`. ### Highlight class assignment Per-byte highlight class is computed as a side effect of the walk. Each terminal records `(byte_range, class)` in `WalkResult::per_byte_class` as it matches. Unmatched ranges (past a definite failure) get the `tok_error` overlay, identical to today's behaviour. Default classes per terminal kind: | Terminal | Default class | |---|---| | `Word` | `tok_keyword` | | `Punct` | `tok_punct` | | `Ident` | `tok_identifier` | | `NumberLit` | `tok_number` | | `StringLit` | `tok_string` | | `BlobLit` | `tok_string` | | `Flag` | `tok_flag` | | `BarePath` | `tok_string` | The `highlight_override: Option` field on `Word` and `Ident` is reserved for future per-slot variants (e.g., a Tables slot in a distinct shade vs a NewName slot muted) — left `None` everywhere in round 1. No new palette colours for the initial migration. ## Migration plan ### Code organisation ``` src/dsl/ grammar/ mod.rs — Node enum, IdentSource, HintMode, HighlightClass, MatchedPath, CommandNode, REGISTRY top-level data.rs — insert, update, delete, show ddl.rs — create, drop, add, rename, change app.rs — quit, help, save/save-as, new, load, rebuild, export, import, mode, messages shared.rs — typed value slots (int_slot, date_slot, …), qualified_column, where_clause, action_keyword, column_value_list (dynamic) validators.rs — content validators (integer_only_validator, date_format_validator, datetime_format_validator, type_name_validator, …) walker/ mod.rs — public walk() entry; orchestration driver.rs — the per-node-kind dispatch context.rs — WalkContext outcome.rs — WalkOutcome, MatchedPath, WalkResult lex_helpers.rs — identifier-shape, digit-shape, string-escape helpers; shared across terminal consume fns parser.rs — Phase A: becomes a router. Phase F: deleted. lexer.rs — Phase F: deleted. keyword.rs — Phase F: deleted. ident_slot.rs — Phase F: dissolved into grammar/mod.rs. usage.rs — Phase F: REGISTRY deleted; the file may go. ``` ### Six-phase migration **Phase A — Walker skeleton + app-lifecycle commands.** - Build the walker driver, `WalkContext`, `WalkOutcome`, `MatchedPath`, the terminal consume functions. - Migrate the app-lifecycle commands (no schema dependency, no value literals): quit, help, rebuild, save, save as, new, load, export, import, mode, messages. - Router in `parse_command` consults the walker for migrated commands; falls back to chumsky for the rest. - Differential test scaffolding: a test helper that, for every input in the existing test corpus, runs both parsers and asserts identical `Command` output where the input falls under a migrated command. Exit criteria: walker handles the app-lifecycle commands end-to-end; existing tests for those commands pass via the walker path; tests for other commands still pass via chumsky. **Phase B — DDL commands without value literals.** - drop table, drop column, drop relationship. - rename column. - add column (without the value-literal aspect — type slot uses `Ident { source: Types }` with a content validator). - add 1:n relationship (referential clauses as a static sub-grammar). - change column (type slot + flags). These exercise schema lookups via `Ident { source: Tables }` and `Ident { source: Columns }`, and the `Types` source. No typed value slots yet, no `DynamicSubgrammar`. Exit criteria: all DDL commands except `create table` pass via the walker; the rest still pass via chumsky. **Phase C — `create table` with column-list value literals.** - The `with pk` clause uses `Repeated` for the column-spec list, each spec being a `Seq(Ident{NewName}, Punct(':'), Ident{Types}-with-validator)`. - First test of `Repeated` with separator. Exit criteria: create table works end-to-end via the walker. **Phase D — data commands with full schema awareness.** - show data, show table, replay. - insert: uses `DynamicSubgrammar(column_value_list)` for the comma-separated typed value list. Exercises full `WalkContext` propagation: `Ident { source: Tables, role: "table", writes_table: true }` resolves the column list; the dynamic sub-grammar unfolds typed slots per column. - update: `set` clauses use `DynamicSubgrammar` to resolve the value slot's type from the column. `where` clause uses the shared sub-grammar with `AnyValueSlot` (or, optionally, also column-typed if the column resolves cleanly). - delete: same `where` clause; otherwise simple. This is the phase that proves the design's central claim: typed slots, dynamic sub-grammars, and schema-aware narrowing all collaborate to produce a single coherent grammar declaration per command. Exit criteria: all data commands pass via the walker; the round-5 limitations close automatically (save Tab can offer `as`, value slots narrow by column type). **Phase E — replay end-to-end.** - replay uses `BarePath` + `StringLit` (quoted form). - Internally replays each line through the same dispatch pipeline. Exit criteria: replay works end-to-end via the walker; nested replay rejection still fires from the runtime, unchanged. **Phase F — cleanup.** - Delete `dsl/parser.rs`. - Delete `dsl/lexer.rs`. - Delete `dsl/keyword.rs`. - Delete `dsl/ident_slot.rs` (already merged into `grammar/mod.rs` in Phase A). - Delete `dsl/usage.rs::REGISTRY`. - Delete `chumsky` dependency from `Cargo.toml`. - Delete `parse.token.keyword.*` entries from the catalog and `keys.rs` that the walker doesn't need (the keyword vocabulary is implicit in the grammar nodes). - Remove the differential test scaffolding from Phase A. Exit criteria: working tree clean of legacy parser code; test suite still all-green; `cargo clippy --all-targets -- -D warnings` passes; `cargo build --release` binary not noticeably larger. **Implementation note (2026-05-15) — "Phase F minimal".** Phase F shipped as planned with one deliberate deviation: `dsl/parser.rs` was *retained*, not deleted. The chumsky + lexer pipeline is gone (chumsky dependency removed; `lexer.rs`, `keyword.rs`, `ident_slot.rs`, `usage.rs` all deleted; the `parse.token.*` catalog entries collapsed), but `parser.rs` remains as the thin router: it owns the public `parse_command` / `parse_command_with_schema` entry points and the `ParseError` type, whose `{message, position, at_eof, expected}` shape completion, hint rendering, and the input-renderer overlay all depend on. Deleting the file would only scatter that surface across `walker` / `dsl/mod.rs` for no functional gain. The differential scaffolding was never built as a live harness — it materialised as hand-curated expectation tests. `parser.rs` documents this in its own module doc comment. ### Test discipline Three guarantees throughout migration: 1. **Full test suite green at every commit.** Migration is per-command; tests are per-behaviour. They don't care which parser produces a `Command` — they assert input → expected output. If a test fails mid-migration, the walker hasn't reproduced behaviour; fix the walker before continuing. 2. **Walker-specific tests for trie-only features.** Schema- aware narrowing, `WalkContext` propagation, dynamic sub- grammar expansion, `HintMode` per-node behaviour, the round-5 "save Tab offers as" gap-closing — each gets new tests as the feature lands. 3. **Differential check during the migration window.** A test helper iterates the existing input corpus, runs both parsers on inputs that fall under a migrated command, and asserts identical `Command` output. Cheap insurance against subtle divergence. Removed at Phase F cleanup. ### Cleanup pass at Phase F Beyond deleting the legacy modules, Phase F includes catalog cleanup. The `parse.token.keyword.*` entries (40+ of them) are near-mechanical wrappers (`create: "\`create\`"`); with no external code looking up these keys (the walker renders keyword names from `Word` node literals directly), the entries can collapse. A small `format_keyword_for_error(literal) -> String` helper replaces them. The `keys.rs` declarations go with them. Help text in `help.cli_banner` and `help.in_app_body` stays as hand-written prose — the alternative (auto-generating from the grammar) was deferred during the round-6 discussion as a separate concern; the grammar tree carries enough metadata (per-command `help_id`) for future automation but the prose documentation is still hand-curated for round 1. ## Consequences ### What we gain - **One declaration per command.** Entry keyword, shape, AST builder, dispatch handler, usage reference, help reference all colocated. Adding a command is one block in one file. - **No cross-file scatter.** The round-5 "10 places to remove `q`" critique is structurally addressed: there's nowhere else for keyword/usage/registry info to live but the grammar tree. - **Schema-aware narrowing from day one.** Typed value slots reject mis-shaped input at parse time with localised error wording; completion narrows per column type; the round-5 value-literal slot hint becomes type-specific ("Type a date as 'YYYY-MM-DD'") not generic. - **Aliases as a single annotation.** `q` could come back as one line on the `quit` `Word` node, no scatter. - **Tests focus on behaviour, not enumeration.** Tests that hardcoded keyword lists during round 5 (we noted these in `usage.rs` and `completion.rs`) can iterate the trie registry instead, becoming structural rather than literal. - **Drift is structurally impossible.** Completion, highlight, parse, usage, and help all derive from the same trie. No separate sources to keep in sync. ### What we accept - **Parse depends on schema state.** A DSL command that references a non-existent table fails at parse time, not at execute time as today. This matches the user mental model when typing (the schema cache is current per ADR-0022) and yields better completion / hint experience. It does mean tests that exercised parser behaviour in isolation may now need to set up a schema cache. - **chumsky's general-purpose features go unused.** Recovery on ambiguous input, multi-error reporting in a single pass, ambiguous-grammar handling — features chumsky offers but our DSL doesn't use. The trade is fine because our grammar is deterministic. - **Some implementation complexity moves into the walker.** Whitespace skipping between siblings, terminal consume functions, character-level shape recognition — the lexer did some of this implicitly; the walker does it explicitly. Net code is comparable or smaller because the scatter cost goes away. ### What's out of scope for this ADR - **External tooling integration (LSP, editor extensions).** The registry is `pub` and accessible via accessor functions, so future tooling work doesn't fight this design. No tooling is built in round 1. - **Help text auto-generation.** Grammar tree carries `help_id` per node, but the help catalog body stays hand-curated. - **Performance optimisation.** Walker re-runs per keystroke for completion + highlighting. Naïve implementation is acceptable; if hot-path concerns emerge later, caching / incremental walks become a separate ADR. - **Ranker implementations.** The ranker hook exists; default is identity. Frequency-based ranking, content-aware priors for type completion ("Email → text first, Score → real"), recency — all future work that plugs into the ranker signature without touching grammar declarations. - **Per-slot highlight overrides.** The `highlight_override` field exists but stays `None` in round 1. Differentiating table-ident from new-name-ident visually is a future enhancement. ## References - ADR-0023 — Unified declarative grammar tree (Proposed direction). Superseded by this ADR for execution detail. - ADR-0001 — Language and TUI framework (chumsky choice). Phase F removes the chumsky dependency. - ADR-0019 — Friendly error layer and i18n catalog. Catalog conventions stay; `parse.token.keyword.*` entries collapse in Phase F. - ADR-0020 — Tokenization layer for the DSL parser. Superseded by the scannerless walker. - ADR-0021 — Parser-as-source-of-truth for H1a. Usage info migrates from a separate registry to grammar nodes. - ADR-0022 — Ambient typing assistance. The walker subsumes the expected-set introspection that powered completion in that ADR. - Round-6 session transcript — design pass that produced this spec.