Files
rdbms-playground/docs/adr/0024-unified-grammar-tree-execution-plan.md
T
claude@clouddev1 41043d686b docs: record ADR-0024 completion, reconcile requirements.md + handoff-14
ADR-0024 audited as fully implemented. Amend the ADR with a "Phase F
minimal" implementation note (parser.rs retained as the router +
ParseError home) and update the README index line to match.

Reconcile docs/requirements.md against handoffs 10-14: refresh the
test baseline (449 -> 1006), mark U4 (replay) satisfied, correct the
A1 / H1a / H3 progress notes.

Amend handoff-14: §3 flagged items both resolved (ranker kept,
CommandNode.hint_mode removed); §4 rewritten as a concrete next-work
pointer at the reconciled requirements.md.
2026-05-15 23:03:18 +00:00

27 KiB

ADR-0024: Unified grammar tree — execution plan

Status

Accepted. 2026-05-14.

Concrete specification for the direction proposed in ADR-0023. Where ADR-0023 captured the critique of the current parser shape and the high-level vision, this ADR specifies the data model, walker semantics, migration sequence, and cleanup steps in enough detail that implementation can proceed without further design decisions.

Supersedes ADR-0023's "Proposed" status. ADR-0023 stays in the directory as institutional memory of why this change is happening; ADR-0024 is what gets built.

Context

The design pass landed in the round-6 session (2026-05-14) worked through ADR-0023's open questions and a number of implicit decisions that hadn't been written down. Four rounds of questions, each followed by user confirmation:

  1. Round 1 — foundational. Registry shape, node taxonomy, AST output model, failure / "expected" semantics, walker API and its mapping to parse / complete / highlight / hint concerns.
  2. Round 2 — concrete representation. Multi-keyword sequences, sub-grammar reusability (static and dynamic), path-bearing commands, bare-or-with-suffix commands.
  3. Round 3 — organisation and migration. Module layout, per-command migration strategy, test discipline during migration.
  4. Round 4 — smaller details. Aliases on keyword nodes, IdentSlot fate, highlight palette, external-tooling exposure.

Two larger decisions emerged from the rounds and shifted the shape from ADR-0023's sketch:

  • The lexer dissolves. The walker operates directly on source bytes ("scannerless"). The current dsl/lexer.rs module's responsibilities (whitespace skipping, token shape recognition, byte-span tracking) migrate into terminal-node consume functions and the walker driver. The define_keywords! macro is no longer needed in its current form; keyword literals live on Word nodes in the grammar.
  • Schema-aware parse from day one. ADR-0023 had been cautious about coupling parse to schema state. The round-1 / round-2 discussion concluded that this caution comes from general-purpose parser tooling and doesn't apply to an interactive DSL editor where the schema is the context. Typed value slots consult the schema during parse; bind-time type checks remain but become belt-and-braces rather than the primary defense.

A separate critique surfaced in the design pass: my (Claude's) default pull toward "what's the safe incremental version of what general-purpose parser tooling does" repeatedly fought against the project owner's cleaner direct design. The pull is now explicitly resisted — this ADR ships the direct design, not a phased compromise.

Decision summary

A single trie data structure declared in Rust serves as the authority for parsing, completion, syntax highlighting, parse- error usage rendering, hint-panel content, and (eventually) external-tooling exposure. The walker that consumes this trie operates directly on source bytes — no separate lexer pass. Schema-aware narrowing flows naturally from the trie's structure: typed value slots and dynamic sub-grammars consult a per-walk context that carries the current table, the resolved column types, and a reference to the schema cache.

Migration is per-command across six phases. The legacy chumsky parser and the new walker run side-by-side during the transition; existing behavioural tests guard regressions. Phase F removes chumsky, the lexer module, the separate UsageEntry registry, and the expected-set introspection in completion.rs.

Estimated total cost: ~4 sessions — one to land the framework and migrate Phase A, two for Phases B-D, one for Phases E + F.

Architecture

Walker as single source of truth

pub fn walk(
    source: &str,
    bound: WalkBound,
    ctx: &mut WalkContext,
) -> WalkResult<'_>;

pub enum WalkBound {
    EndOfInput,           // parse: walk all input
    Position(usize),      // complete / hint: walk up to cursor byte
}

pub struct WalkResult<'a> {
    pub outcome: WalkOutcome,
    pub matched_path: MatchedPath,
    pub per_byte_class: Vec<(ByteRange, HighlightClass)>,
}

pub enum WalkOutcome {
    Match { command_idx: usize },
    Incomplete { position: usize, expected: Vec<&'static Node> },
    Mismatch { position: usize, expected: Vec<&'static Node>, found_byte: u8 },
    ValidationFailed { position: usize, message_key: &'static str, args: Vec<(&'static str, String)> },
}

Consumers:

  • Parse for dispatch. walk(source, EndOfInput, ctx). On Match, invoke commands[command_idx].ast_builder(matched_path) and dispatch the returned Command.
  • Highlighting. walk(source, EndOfInput, ctx).per_byte_class. Each terminal records (byte_range, node.highlight_class()) as it matches. Unmatched ranges (past a failure) get the tok_error overlay.
  • Completion at cursor. walk(source, Position(cursor), ctx), inspect outcome.expected. Each expected Node contributes candidates: Word → its primary literal, Ident { source } → schema-cache lookup, Flag--name, value-literal slot → type-appropriate hint per HintMode, etc.
  • Hint panel ambient. Same walk as completion. The hint resolver consults WalkOutcome variants plus the expected nodes' HintMode to choose between candidates rendering, prose, suppression, etc.

Scannerless: no lexer module

Terminal nodes consume bytes directly. No pre-pass produces a Vec<Token>. The walker's driver handles whitespace skipping between siblings of a Seq and dispatches to each terminal's consume(source, position) function.

Character-level helpers (identifier shape, digit-sequence shape, quoted-string escape handling) live in src/dsl/walker/lex_helpers.rs — a small shared module used by the various terminal consume functions. This is internally similar to the current lexer's logic, but it's invoked per-position by the walker rather than as a pre-pass.

src/dsl/lexer.rs and src/dsl/keyword.rs are deleted in Phase F. The keyword vocabulary is no longer a Rust enum; each keyword exists as a Word node in the grammar declarations.

Node taxonomy

Thirteen node kinds. Three categories:

Terminals (consume bytes):

pub enum Node {
    Word {
        primary: &'static str,
        aliases: &'static [&'static str],
        // Default tok_keyword unless overridden.
        highlight_override: Option<HighlightClass>,
    },
    Punct(char),
    Ident {
        source: IdentSource,
        role: &'static str,
        highlight_override: Option<HighlightClass>,
    },
    NumberLit,
    StringLit,
    BlobLit,
    Flag(&'static str),
    BarePath,
    // Combinators ↓
}

Combinators (compose other nodes):

    Choice(&'static [Node]),
    Seq(&'static [Node]),
    Optional(&'static Node),
    Repeated {
        inner: &'static Node,
        separator: Option<&'static Node>,
        min: usize,
    },

Dynamic (resolves at walk time using WalkContext):

    DynamicSubgrammar(fn(&WalkContext) -> Node),
}

CommandNode is the top-level entry record:

pub struct CommandNode {
    pub entry: Word,
    pub shape: Node,                                  // usually a Seq
    pub ast_builder: fn(&MatchedPath) -> Command,
    pub dispatch: fn(&mut App, Command) -> Vec<Action>,
    pub help_id: Option<&'static str>,
    pub usage_id: Option<&'static str>,
    // Hint mode override at command level; nodes can carry their own too.
    pub hint_mode: Option<HintMode>,
}

pub const REGISTRY: &[CommandNode] = &[ /* ... */ ];

Typed value slots

Value-literal positions use typed slots built from terminals plus content validators. One slot factory per data type:

fn int_slot()      -> Node { Choice(&[NumberLit_with(integer_only_validator), null_word()]) }
fn real_slot()     -> Node { Choice(&[NumberLit, null_word()]) }
fn decimal_slot()  -> Node { Choice(&[NumberLit_with(decimal_validator), null_word()]) }
fn bool_slot()     -> Node { Choice(&[Word("true", &[]), Word("false", &[]), null_word()]) }
fn text_slot()     -> Node { Choice(&[StringLit, null_word()]) }
fn date_slot()     -> Node { Choice(&[StringLit_with(date_format_validator), null_word()]) }
fn datetime_slot() -> Node { Choice(&[StringLit_with(datetime_format_validator), null_word()]) }
fn blob_slot()     -> Node { Choice(&[BlobLit, null_word()]) }

StringLit_with(validator) is a StringLit terminal carrying a content validator that runs after a successful match. Same for NumberLit_with. A failed validator surfaces as WalkOutcome::ValidationFailed with the validator's catalog key.

slot_for_type(ty: Type) -> Node is the dispatcher: given a column type, returns the appropriate slot. Used by dynamic sub-grammars (see below).

WalkContext

pub struct WalkContext<'a> {
    pub schema: &'a SchemaCache,
    // Current table inferred from the partial parse — e.g.,
    // `insert into Customers ...` sets `current_table = "Customers"`.
    pub current_table: Option<String>,
    // The columns of `current_table`, in declaration order, with types.
    // Populated by Ident { source: Tables } when it matches a
    // known table.
    pub current_table_columns: Option<Vec<ColumnInfo>>,
    // For comma-separated value lists, which position we're at.
    pub value_position: usize,
    // For `set` clauses and `where` clauses, the column whose value
    // we're about to consume.
    pub current_column: Option<ColumnInfo>,
}

Nodes can write to WalkContext:

  • Ident { source: Tables, role: "table", writes_table: true } on match sets ctx.current_table to the matched identifier and resolves ctx.current_table_columns from the schema.
  • Ident { source: Columns, role: "column", writes_current_column: true } on match sets ctx.current_column from the resolved column list.

Nodes can read from WalkContext:

  • DynamicSubgrammar(column_value_list) reads ctx.current_table_columns and unfolds to a Seq of comma-separated typed slots — one per column.
  • The value slot after set col= reads ctx.current_column.user_type to pick the right typed slot.

WalkOutcome and "expected"

The walker keeps track of the longest prefix that matched and the position at which it failed (or completed). At a failure or incomplete position, expected is the set of nodes that could legally continue the walk — derived structurally from the trie, not from a separate "expected" table.

For a Seq mid-walk, expected is the next child node. For a Choice that hasn't committed to a branch, expected is all children. For an Optional at a position where its inner could start, expected includes the inner plus the next sibling.

This is the same information chumsky's ParseError::Invalid::expected carries today, sourced from the trie directly instead of via combinator introspection.

HintMode per node

Each node may carry a HintMode:

pub enum HintMode {
    /// Candidates if any surface; else prose fallback.
    Default,
    /// Force the prose at this catalog key regardless of candidates.
    /// Used by NewName slots ("Type a name, then `(`").
    ForceProse(&'static str),
    /// Show only the prose; suppress Tab candidates.
    /// Used by typed value slots at empty prefix.
    ProseOnly(&'static str),
    /// Suppress prose; only candidates.
    SuppressProse,
}

The walker propagates each expected node's HintMode to the hint resolver, which dispatches accordingly.

The current ad-hoc cases in input_render.rs::ambient_hint (value-literal slot suppression, NewName slot typing-name prose, invalid-ident overlay) migrate to node-attached HintMode annotations during Phase D.

Ranker layer

A ranker function runs between the walker's raw candidate output and the hint-panel renderer:

pub type Ranker = fn(&WalkContext, Vec<Candidate>) -> Vec<Candidate>;

pub fn identity_ranker(_: &WalkContext, c: Vec<Candidate>) -> Vec<Candidate> { c }

Default is identity_ranker — declaration order from the trie is preserved. The signature allows future enhancements (frequency-based ranking, content-aware priors for type suggestions per column name) to plug in without changing grammar declarations.

The ranker lives outside the trie. Grammar declarations are about what's valid; ranking is about what's likely useful first.

Sub-grammars

Two flavours, no global registry:

Static — pure composition, function returning a const node:

const fn qualified_column(role_table: &'static str, role_col: &'static str) -> Node {
    Seq(&[
        Ident { source: Tables, role: role_table, /* ... */ },
        Punct('.'),
        Ident { source: Columns, role: role_col, /* ... */ },
    ])
}

const fn where_clause() -> Node {
    Seq(&[
        Word { primary: "where", /* ... */ },
        Ident { source: Columns, role: "filter_column", /* ... */ },
        Punct('='),
        AnyValueSlot,
    ])
}

Dynamic — context-aware, expands at walk time:

fn column_value_list(ctx: &WalkContext) -> Node {
    let cols = ctx.current_table_columns.as_ref().unwrap_or(&Vec::new());
    let mut children: Vec<Node> = Vec::new();
    for (i, col) in cols.iter().enumerate() {
        if i > 0 { children.push(Punct(',')); }
        children.push(slot_for_type(col.user_type));
    }
    Seq(Box::leak(children.into_boxed_slice()))
}

Dynamic sub-grammars return owned Node values that the walker treats as inline expansions. The leak above is one implementation tactic — alternatively, the walker stores the expanded node in a small per-walk arena. Both work; pick at implementation time.

Aliases

A Word node carries primary and an aliases slice. The walker matches input against either; completion surfaces only the primary; help text mentions aliases prose-style if appropriate. Highlight class is the same for both.

Round 5's q removal is not reverted by this design. q stays gone — adding it back would now be the single line aliases: &["q"] on the quit Word node, and would not surface as a separate candidate in completion (matching the round-5 user request).

IdentSource

Replaces the current dsl::ident_slot::IdentSlot:

pub enum IdentSource {
    NewName,         // user invents; no schema lookup; ProseOnly hint
    Tables,          // existing table names
    Columns,         // existing column names (filtered by current table)
    Relationships,   // existing relationship names
    Types,           // closed set from Type::all()
}

Types is new — it replaces the magic-string TYPE_SLOT_LABEL used today. src/dsl/ident_slot.rs dissolves into src/dsl/grammar/mod.rs.

Highlight class assignment

Per-byte highlight class is computed as a side effect of the walk. Each terminal records (byte_range, class) in WalkResult::per_byte_class as it matches. Unmatched ranges (past a definite failure) get the tok_error overlay, identical to today's behaviour.

Default classes per terminal kind:

Terminal Default class
Word tok_keyword
Punct tok_punct
Ident tok_identifier
NumberLit tok_number
StringLit tok_string
BlobLit tok_string
Flag tok_flag
BarePath tok_string

The highlight_override: Option<HighlightClass> field on Word and Ident is reserved for future per-slot variants (e.g., a Tables slot in a distinct shade vs a NewName slot muted) — left None everywhere in round 1.

No new palette colours for the initial migration.

Migration plan

Code organisation

src/dsl/
  grammar/
    mod.rs           — Node enum, IdentSource, HintMode, HighlightClass,
                       MatchedPath, CommandNode, REGISTRY top-level
    data.rs          — insert, update, delete, show
    ddl.rs           — create, drop, add, rename, change
    app.rs           — quit, help, save/save-as, new, load, rebuild,
                       export, import, mode, messages
    shared.rs        — typed value slots (int_slot, date_slot, …),
                       qualified_column, where_clause, action_keyword,
                       column_value_list (dynamic)
    validators.rs    — content validators (integer_only_validator,
                       date_format_validator, datetime_format_validator,
                       type_name_validator, …)
  walker/
    mod.rs           — public walk() entry; orchestration
    driver.rs        — the per-node-kind dispatch
    context.rs       — WalkContext
    outcome.rs       — WalkOutcome, MatchedPath, WalkResult
    lex_helpers.rs   — identifier-shape, digit-shape, string-escape
                       helpers; shared across terminal consume fns
  parser.rs          — Phase A: becomes a router. Phase F: deleted.
  lexer.rs           — Phase F: deleted.
  keyword.rs         — Phase F: deleted.
  ident_slot.rs      — Phase F: dissolved into grammar/mod.rs.
  usage.rs           — Phase F: REGISTRY deleted; the file may go.

Six-phase migration

Phase A — Walker skeleton + app-lifecycle commands.

  • Build the walker driver, WalkContext, WalkOutcome, MatchedPath, the terminal consume functions.
  • Migrate the app-lifecycle commands (no schema dependency, no value literals): quit, help, rebuild, save, save as, new, load, export, import, mode, messages.
  • Router in parse_command consults the walker for migrated commands; falls back to chumsky for the rest.
  • Differential test scaffolding: a test helper that, for every input in the existing test corpus, runs both parsers and asserts identical Command output where the input falls under a migrated command.

Exit criteria: walker handles the app-lifecycle commands end-to-end; existing tests for those commands pass via the walker path; tests for other commands still pass via chumsky.

Phase B — DDL commands without value literals.

  • drop table, drop column, drop relationship.
  • rename column.
  • add column (without the value-literal aspect — type slot uses Ident { source: Types } with a content validator).
  • add 1:n relationship (referential clauses as a static sub-grammar).
  • change column (type slot + flags).

These exercise schema lookups via Ident { source: Tables } and Ident { source: Columns }, and the Types source. No typed value slots yet, no DynamicSubgrammar.

Exit criteria: all DDL commands except create table pass via the walker; the rest still pass via chumsky.

Phase C — create table with column-list value literals.

  • The with pk clause uses Repeated for the column-spec list, each spec being a Seq(Ident{NewName}, Punct(':'), Ident{Types}-with-validator).
  • First test of Repeated with separator.

Exit criteria: create table works end-to-end via the walker.

Phase D — data commands with full schema awareness.

  • show data, show table, replay.
  • insert: uses DynamicSubgrammar(column_value_list) for the comma-separated typed value list. Exercises full WalkContext propagation: Ident { source: Tables, role: "table", writes_table: true } resolves the column list; the dynamic sub-grammar unfolds typed slots per column.
  • update: set clauses use DynamicSubgrammar to resolve the value slot's type from the column. where clause uses the shared sub-grammar with AnyValueSlot (or, optionally, also column-typed if the column resolves cleanly).
  • delete: same where clause; otherwise simple.

This is the phase that proves the design's central claim: typed slots, dynamic sub-grammars, and schema-aware narrowing all collaborate to produce a single coherent grammar declaration per command.

Exit criteria: all data commands pass via the walker; the round-5 limitations close automatically (save Tab can offer as, value slots narrow by column type).

Phase E — replay end-to-end.

  • replay uses BarePath + StringLit (quoted form).
  • Internally replays each line through the same dispatch pipeline.

Exit criteria: replay works end-to-end via the walker; nested replay rejection still fires from the runtime, unchanged.

Phase F — cleanup.

  • Delete dsl/parser.rs.
  • Delete dsl/lexer.rs.
  • Delete dsl/keyword.rs.
  • Delete dsl/ident_slot.rs (already merged into grammar/mod.rs in Phase A).
  • Delete dsl/usage.rs::REGISTRY.
  • Delete chumsky dependency from Cargo.toml.
  • Delete parse.token.keyword.* entries from the catalog and keys.rs that the walker doesn't need (the keyword vocabulary is implicit in the grammar nodes).
  • Remove the differential test scaffolding from Phase A.

Exit criteria: working tree clean of legacy parser code; test suite still all-green; cargo clippy --all-targets -- -D warnings passes; cargo build --release binary not noticeably larger.

Implementation note (2026-05-15) — "Phase F minimal". Phase F shipped as planned with one deliberate deviation: dsl/parser.rs was retained, not deleted. The chumsky + lexer pipeline is gone (chumsky dependency removed; lexer.rs, keyword.rs, ident_slot.rs, usage.rs all deleted; the parse.token.* catalog entries collapsed), but parser.rs remains as the thin router: it owns the public parse_command / parse_command_with_schema entry points and the ParseError type, whose {message, position, at_eof, expected} shape completion, hint rendering, and the input-renderer overlay all depend on. Deleting the file would only scatter that surface across walker / dsl/mod.rs for no functional gain. The differential scaffolding was never built as a live harness — it materialised as hand-curated expectation tests. parser.rs documents this in its own module doc comment.

Test discipline

Three guarantees throughout migration:

  1. Full test suite green at every commit. Migration is per-command; tests are per-behaviour. They don't care which parser produces a Command — they assert input → expected output. If a test fails mid-migration, the walker hasn't reproduced behaviour; fix the walker before continuing.
  2. Walker-specific tests for trie-only features. Schema- aware narrowing, WalkContext propagation, dynamic sub- grammar expansion, HintMode per-node behaviour, the round-5 "save Tab offers as" gap-closing — each gets new tests as the feature lands.
  3. Differential check during the migration window. A test helper iterates the existing input corpus, runs both parsers on inputs that fall under a migrated command, and asserts identical Command output. Cheap insurance against subtle divergence. Removed at Phase F cleanup.

Cleanup pass at Phase F

Beyond deleting the legacy modules, Phase F includes catalog cleanup. The parse.token.keyword.* entries (40+ of them) are near-mechanical wrappers (create: "\create`"); with no external code looking up these keys (the walker renders keyword names from Wordnode literals directly), the entries can collapse. A smallformat_keyword_for_error(literal) -> Stringhelper replaces them. Thekeys.rs` declarations go with them.

Help text in help.cli_banner and help.in_app_body stays as hand-written prose — the alternative (auto-generating from the grammar) was deferred during the round-6 discussion as a separate concern; the grammar tree carries enough metadata (per-command help_id) for future automation but the prose documentation is still hand-curated for round 1.

Consequences

What we gain

  • One declaration per command. Entry keyword, shape, AST builder, dispatch handler, usage reference, help reference all colocated. Adding a command is one block in one file.
  • No cross-file scatter. The round-5 "10 places to remove q" critique is structurally addressed: there's nowhere else for keyword/usage/registry info to live but the grammar tree.
  • Schema-aware narrowing from day one. Typed value slots reject mis-shaped input at parse time with localised error wording; completion narrows per column type; the round-5 value-literal slot hint becomes type-specific ("Type a date as 'YYYY-MM-DD'") not generic.
  • Aliases as a single annotation. q could come back as one line on the quit Word node, no scatter.
  • Tests focus on behaviour, not enumeration. Tests that hardcoded keyword lists during round 5 (we noted these in usage.rs and completion.rs) can iterate the trie registry instead, becoming structural rather than literal.
  • Drift is structurally impossible. Completion, highlight, parse, usage, and help all derive from the same trie. No separate sources to keep in sync.

What we accept

  • Parse depends on schema state. A DSL command that references a non-existent table fails at parse time, not at execute time as today. This matches the user mental model when typing (the schema cache is current per ADR-0022) and yields better completion / hint experience. It does mean tests that exercised parser behaviour in isolation may now need to set up a schema cache.
  • chumsky's general-purpose features go unused. Recovery on ambiguous input, multi-error reporting in a single pass, ambiguous-grammar handling — features chumsky offers but our DSL doesn't use. The trade is fine because our grammar is deterministic.
  • Some implementation complexity moves into the walker. Whitespace skipping between siblings, terminal consume functions, character-level shape recognition — the lexer did some of this implicitly; the walker does it explicitly. Net code is comparable or smaller because the scatter cost goes away.

What's out of scope for this ADR

  • External tooling integration (LSP, editor extensions). The registry is pub and accessible via accessor functions, so future tooling work doesn't fight this design. No tooling is built in round 1.
  • Help text auto-generation. Grammar tree carries help_id per node, but the help catalog body stays hand-curated.
  • Performance optimisation. Walker re-runs per keystroke for completion + highlighting. Naïve implementation is acceptable; if hot-path concerns emerge later, caching / incremental walks become a separate ADR.
  • Ranker implementations. The ranker hook exists; default is identity. Frequency-based ranking, content-aware priors for type completion ("Email → text first, Score → real"), recency — all future work that plugs into the ranker signature without touching grammar declarations.
  • Per-slot highlight overrides. The highlight_override field exists but stays None in round 1. Differentiating table-ident from new-name-ident visually is a future enhancement.

References

  • ADR-0023 — Unified declarative grammar tree (Proposed direction). Superseded by this ADR for execution detail.
  • ADR-0001 — Language and TUI framework (chumsky choice). Phase F removes the chumsky dependency.
  • ADR-0019 — Friendly error layer and i18n catalog. Catalog conventions stay; parse.token.keyword.* entries collapse in Phase F.
  • ADR-0020 — Tokenization layer for the DSL parser. Superseded by the scannerless walker.
  • ADR-0021 — Parser-as-source-of-truth for H1a. Usage info migrates from a separate registry to grammar nodes.
  • ADR-0022 — Ambient typing assistance. The walker subsumes the expected-set introspection that powered completion in that ADR.
  • Round-6 session transcript — design pass that produced this spec.