Files

T

claude@clouddev1 41043d686b docs: record ADR-0024 completion, reconcile requirements.md + handoff-14

ADR-0024 audited as fully implemented. Amend the ADR with a "Phase F
minimal" implementation note (parser.rs retained as the router +
ParseError home) and update the README index line to match.

Reconcile docs/requirements.md against handoffs 10-14: refresh the
test baseline (449 -> 1006), mark U4 (replay) satisfied, correct the
A1 / H1a / H3 progress notes.

Amend handoff-14: §3 flagged items both resolved (ranker kept,
CommandNode.hint_mode removed); §4 rewritten as a concrete next-work
pointer at the reconciled requirements.md.

2026-05-15 23:03:18 +00:00

27 KiB

Raw Blame History

ADR-0024: Unified grammar tree — execution plan

Status

Accepted. 2026-05-14.

Concrete specification for the direction proposed in ADR-0023. Where ADR-0023 captured the critique of the current parser shape and the high-level vision, this ADR specifies the data model, walker semantics, migration sequence, and cleanup steps in enough detail that implementation can proceed without further design decisions.

Supersedes ADR-0023's "Proposed" status. ADR-0023 stays in the directory as institutional memory of why this change is happening; ADR-0024 is what gets built.

Context

The design pass landed in the round-6 session (2026-05-14) worked through ADR-0023's open questions and a number of implicit decisions that hadn't been written down. Four rounds of questions, each followed by user confirmation:

Round 1 — foundational. Registry shape, node taxonomy, AST output model, failure / "expected" semantics, walker API and its mapping to parse / complete / highlight / hint concerns.
Round 2 — concrete representation. Multi-keyword sequences, sub-grammar reusability (static and dynamic), path-bearing commands, bare-or-with-suffix commands.
Round 3 — organisation and migration. Module layout, per-command migration strategy, test discipline during migration.
Round 4 — smaller details. Aliases on keyword nodes, IdentSlot fate, highlight palette, external-tooling exposure.

Two larger decisions emerged from the rounds and shifted the shape from ADR-0023's sketch:

The lexer dissolves. The walker operates directly on source bytes ("scannerless"). The current dsl/lexer.rs module's responsibilities (whitespace skipping, token shape recognition, byte-span tracking) migrate into terminal-node consume functions and the walker driver. The define_keywords! macro is no longer needed in its current form; keyword literals live on Word nodes in the grammar.
Schema-aware parse from day one. ADR-0023 had been cautious about coupling parse to schema state. The round-1 / round-2 discussion concluded that this caution comes from general-purpose parser tooling and doesn't apply to an interactive DSL editor where the schema is the context. Typed value slots consult the schema during parse; bind-time type checks remain but become belt-and-braces rather than the primary defense.

A separate critique surfaced in the design pass: my (Claude's) default pull toward "what's the safe incremental version of what general-purpose parser tooling does" repeatedly fought against the project owner's cleaner direct design. The pull is now explicitly resisted — this ADR ships the direct design, not a phased compromise.

Decision summary

A single trie data structure declared in Rust serves as the authority for parsing, completion, syntax highlighting, parse- error usage rendering, hint-panel content, and (eventually) external-tooling exposure. The walker that consumes this trie operates directly on source bytes — no separate lexer pass. Schema-aware narrowing flows naturally from the trie's structure: typed value slots and dynamic sub-grammars consult a per-walk context that carries the current table, the resolved column types, and a reference to the schema cache.

Migration is per-command across six phases. The legacy chumsky parser and the new walker run side-by-side during the transition; existing behavioural tests guard regressions. Phase F removes chumsky, the lexer module, the separate UsageEntry registry, and the expected-set introspection in completion.rs.

Estimated total cost: ~4 sessions — one to land the framework and migrate Phase A, two for Phases B-D, one for Phases E + F.

Architecture

Walker as single source of truth

pub fn walk(
    source: &str,
    bound: WalkBound,
    ctx: &mut WalkContext,
) -> WalkResult<'_>;

pub enum WalkBound {
    EndOfInput,           // parse: walk all input
    Position(usize),      // complete / hint: walk up to cursor byte
}

pub struct WalkResult<'a> {
    pub outcome: WalkOutcome,
    pub matched_path: MatchedPath,
    pub per_byte_class: Vec<(ByteRange, HighlightClass)>,
}

pub enum WalkOutcome {
    Match { command_idx: usize },
    Incomplete { position: usize, expected: Vec<&'static Node> },
    Mismatch { position: usize, expected: Vec<&'static Node>, found_byte: u8 },
    ValidationFailed { position: usize, message_key: &'static str, args: Vec<(&'static str, String)> },
}

Consumers:

Parse for dispatch. walk(source, EndOfInput, ctx). On Match, invoke commands[command_idx].ast_builder(matched_path) and dispatch the returned Command.
Highlighting. walk(source, EndOfInput, ctx).per_byte_class. Each terminal records (byte_range, node.highlight_class()) as it matches. Unmatched ranges (past a failure) get the tok_error overlay.
Completion at cursor. walk(source, Position(cursor), ctx), inspect outcome.expected. Each expected Node contributes candidates: Word → its primary literal, Ident { source } → schema-cache lookup, Flag → --name, value-literal slot → type-appropriate hint per HintMode, etc.
Hint panel ambient. Same walk as completion. The hint resolver consults WalkOutcome variants plus the expected nodes' HintMode to choose between candidates rendering, prose, suppression, etc.

Scannerless: no lexer module

Terminal nodes consume bytes directly. No pre-pass produces a Vec<Token>. The walker's driver handles whitespace skipping between siblings of a Seq and dispatches to each terminal's consume(source, position) function.

Character-level helpers (identifier shape, digit-sequence shape, quoted-string escape handling) live in src/dsl/walker/lex_helpers.rs — a small shared module used by the various terminal consume functions. This is internally similar to the current lexer's logic, but it's invoked per-position by the walker rather than as a pre-pass.

src/dsl/lexer.rs and src/dsl/keyword.rs are deleted in Phase F. The keyword vocabulary is no longer a Rust enum; each keyword exists as a Word node in the grammar declarations.

Node taxonomy

Thirteen node kinds. Three categories:

Terminals (consume bytes):

pub enum Node {
    Word {
        primary: &'static str,
        aliases: &'static [&'static str],
        // Default tok_keyword unless overridden.
        highlight_override: Option<HighlightClass>,
    },
    Punct(char),
    Ident {
        source: IdentSource,
        role: &'static str,
        highlight_override: Option<HighlightClass>,
    },
    NumberLit,
    StringLit,
    BlobLit,
    Flag(&'static str),
    BarePath,
    // Combinators ↓
}

Combinators (compose other nodes):

    Choice(&'static [Node]),
    Seq(&'static [Node]),
    Optional(&'static Node),
    Repeated {
        inner: &'static Node,
        separator: Option<&'static Node>,
        min: usize,
    },

Dynamic (resolves at walk time using WalkContext):

    DynamicSubgrammar(fn(&WalkContext) -> Node),
}

CommandNode is the top-level entry record:

pub struct CommandNode {
    pub entry: Word,
    pub shape: Node,                                  // usually a Seq
    pub ast_builder: fn(&MatchedPath) -> Command,
    pub dispatch: fn(&mut App, Command) -> Vec<Action>,
    pub help_id: Option<&'static str>,
    pub usage_id: Option<&'static str>,
    // Hint mode override at command level; nodes can carry their own too.
    pub hint_mode: Option<HintMode>,
}

pub const REGISTRY: &[CommandNode] = &[ /* ... */ ];

Typed value slots

Value-literal positions use typed slots built from terminals plus content validators. One slot factory per data type:

fn int_slot()      -> Node { Choice(&[NumberLit_with(integer_only_validator), null_word()]) }
fn real_slot()     -> Node { Choice(&[NumberLit, null_word()]) }
fn decimal_slot()  -> Node { Choice(&[NumberLit_with(decimal_validator), null_word()]) }
fn bool_slot()     -> Node { Choice(&[Word("true", &[]), Word("false", &[]), null_word()]) }
fn text_slot()     -> Node { Choice(&[StringLit, null_word()]) }
fn date_slot()     -> Node { Choice(&[StringLit_with(date_format_validator), null_word()]) }
fn datetime_slot() -> Node { Choice(&[StringLit_with(datetime_format_validator), null_word()]) }
fn blob_slot()     -> Node { Choice(&[BlobLit, null_word()]) }

StringLit_with(validator) is a StringLit terminal carrying a content validator that runs after a successful match. Same for NumberLit_with. A failed validator surfaces as WalkOutcome::ValidationFailed with the validator's catalog key.

slot_for_type(ty: Type) -> Node is the dispatcher: given a column type, returns the appropriate slot. Used by dynamic sub-grammars (see below).

`WalkContext`

pub struct WalkContext<'a> {
    pub schema: &'a SchemaCache,
    // Current table inferred from the partial parse — e.g.,
    // `insert into Customers ...` sets `current_table = "Customers"`.
    pub current_table: Option<String>,
    // The columns of `current_table`, in declaration order, with types.
    // Populated by Ident { source: Tables } when it matches a
    // known table.
    pub current_table_columns: Option<Vec<ColumnInfo>>,
    // For comma-separated value lists, which position we're at.
    pub value_position: usize,
    // For `set` clauses and `where` clauses, the column whose value
    // we're about to consume.
    pub current_column: Option<ColumnInfo>,
}

Nodes can write to WalkContext:

Ident { source: Tables, role: "table", writes_table: true } on match sets ctx.current_table to the matched identifier and resolves ctx.current_table_columns from the schema.
Ident { source: Columns, role: "column", writes_current_column: true } on match sets ctx.current_column from the resolved column list.

Nodes can read from WalkContext:

DynamicSubgrammar(column_value_list) reads ctx.current_table_columns and unfolds to a Seq of comma-separated typed slots — one per column.
The value slot after set col= reads ctx.current_column.user_type to pick the right typed slot.

`WalkOutcome` and "expected"

The walker keeps track of the longest prefix that matched and the position at which it failed (or completed). At a failure or incomplete position, expected is the set of nodes that could legally continue the walk — derived structurally from the trie, not from a separate "expected" table.

For a Seq mid-walk, expected is the next child node. For a Choice that hasn't committed to a branch, expected is all children. For an Optional at a position where its inner could start, expected includes the inner plus the next sibling.

This is the same information chumsky's ParseError::Invalid::expected carries today, sourced from the trie directly instead of via combinator introspection.

`HintMode` per node

Each node may carry a HintMode:

pub enum HintMode {
    /// Candidates if any surface; else prose fallback.
    Default,
    /// Force the prose at this catalog key regardless of candidates.
    /// Used by NewName slots ("Type a name, then `(`").
    ForceProse(&'static str),
    /// Show only the prose; suppress Tab candidates.
    /// Used by typed value slots at empty prefix.
    ProseOnly(&'static str),
    /// Suppress prose; only candidates.
    SuppressProse,
}

The walker propagates each expected node's HintMode to the hint resolver, which dispatches accordingly.

The current ad-hoc cases in input_render.rs::ambient_hint (value-literal slot suppression, NewName slot typing-name prose, invalid-ident overlay) migrate to node-attached HintMode annotations during Phase D.

Ranker layer

A ranker function runs between the walker's raw candidate output and the hint-panel renderer:

pub type Ranker = fn(&WalkContext, Vec<Candidate>) -> Vec<Candidate>;

pub fn identity_ranker(_: &WalkContext, c: Vec<Candidate>) -> Vec<Candidate> { c }

Default is identity_ranker — declaration order from the trie is preserved. The signature allows future enhancements (frequency-based ranking, content-aware priors for type suggestions per column name) to plug in without changing grammar declarations.

The ranker lives outside the trie. Grammar declarations are about what's valid; ranking is about what's likely useful first.

Sub-grammars

Two flavours, no global registry:

Static — pure composition, function returning a const node:

const fn qualified_column(role_table: &'static str, role_col: &'static str) -> Node {
    Seq(&[
        Ident { source: Tables, role: role_table, /* ... */ },
        Punct('.'),
        Ident { source: Columns, role: role_col, /* ... */ },
    ])
}

const fn where_clause() -> Node {
    Seq(&[
        Word { primary: "where", /* ... */ },
        Ident { source: Columns, role: "filter_column", /* ... */ },
        Punct('='),
        AnyValueSlot,
    ])
}

Dynamic — context-aware, expands at walk time:

fn column_value_list(ctx: &WalkContext) -> Node {
    let cols = ctx.current_table_columns.as_ref().unwrap_or(&Vec::new());
    let mut children: Vec<Node> = Vec::new();
    for (i, col) in cols.iter().enumerate() {
        if i > 0 { children.push(Punct(',')); }
        children.push(slot_for_type(col.user_type));
    }
    Seq(Box::leak(children.into_boxed_slice()))
}

Dynamic sub-grammars return owned Node values that the walker treats as inline expansions. The leak above is one implementation tactic — alternatively, the walker stores the expanded node in a small per-walk arena. Both work; pick at implementation time.

Aliases

A Word node carries primary and an aliases slice. The walker matches input against either; completion surfaces only the primary; help text mentions aliases prose-style if appropriate. Highlight class is the same for both.

Round 5's q removal is not reverted by this design. q stays gone — adding it back would now be the single line aliases: &["q"] on the quit Word node, and would not surface as a separate candidate in completion (matching the round-5 user request).

`IdentSource`

Replaces the current dsl::ident_slot::IdentSlot:

pub enum IdentSource {
    NewName,         // user invents; no schema lookup; ProseOnly hint
    Tables,          // existing table names
    Columns,         // existing column names (filtered by current table)
    Relationships,   // existing relationship names
    Types,           // closed set from Type::all()
}

Types is new — it replaces the magic-string TYPE_SLOT_LABEL used today. src/dsl/ident_slot.rs dissolves into src/dsl/grammar/mod.rs.

Highlight class assignment

Per-byte highlight class is computed as a side effect of the walk. Each terminal records (byte_range, class) in WalkResult::per_byte_class as it matches. Unmatched ranges (past a definite failure) get the tok_error overlay, identical to today's behaviour.

Default classes per terminal kind:

Terminal	Default class
`Word`	`tok_keyword`
`Punct`	`tok_punct`
`Ident`	`tok_identifier`
`NumberLit`	`tok_number`
`StringLit`	`tok_string`
`BlobLit`	`tok_string`
`Flag`	`tok_flag`
`BarePath`	`tok_string`

The highlight_override: Option<HighlightClass> field on Word and Ident is reserved for future per-slot variants (e.g., a Tables slot in a distinct shade vs a NewName slot muted) — left None everywhere in round 1.

No new palette colours for the initial migration.

Migration plan

Code organisation

src/dsl/
  grammar/
    mod.rs           — Node enum, IdentSource, HintMode, HighlightClass,
                       MatchedPath, CommandNode, REGISTRY top-level
    data.rs          — insert, update, delete, show
    ddl.rs           — create, drop, add, rename, change
    app.rs           — quit, help, save/save-as, new, load, rebuild,
                       export, import, mode, messages
    shared.rs        — typed value slots (int_slot, date_slot, …),
                       qualified_column, where_clause, action_keyword,
                       column_value_list (dynamic)
    validators.rs    — content validators (integer_only_validator,
                       date_format_validator, datetime_format_validator,
                       type_name_validator, …)
  walker/
    mod.rs           — public walk() entry; orchestration
    driver.rs        — the per-node-kind dispatch
    context.rs       — WalkContext
    outcome.rs       — WalkOutcome, MatchedPath, WalkResult
    lex_helpers.rs   — identifier-shape, digit-shape, string-escape
                       helpers; shared across terminal consume fns
  parser.rs          — Phase A: becomes a router. Phase F: deleted.
  lexer.rs           — Phase F: deleted.
  keyword.rs         — Phase F: deleted.
  ident_slot.rs      — Phase F: dissolved into grammar/mod.rs.
  usage.rs           — Phase F: REGISTRY deleted; the file may go.

Six-phase migration

Phase A — Walker skeleton + app-lifecycle commands.

Build the walker driver, WalkContext, WalkOutcome, MatchedPath, the terminal consume functions.
Migrate the app-lifecycle commands (no schema dependency, no value literals): quit, help, rebuild, save, save as, new, load, export, import, mode, messages.
Router in parse_command consults the walker for migrated commands; falls back to chumsky for the rest.
Differential test scaffolding: a test helper that, for every input in the existing test corpus, runs both parsers and asserts identical Command output where the input falls under a migrated command.

Exit criteria: walker handles the app-lifecycle commands end-to-end; existing tests for those commands pass via the walker path; tests for other commands still pass via chumsky.

Phase B — DDL commands without value literals.

drop table, drop column, drop relationship.
rename column.
add column (without the value-literal aspect — type slot uses Ident { source: Types } with a content validator).
add 1:n relationship (referential clauses as a static sub-grammar).
change column (type slot + flags).

These exercise schema lookups via Ident { source: Tables } and Ident { source: Columns }, and the Types source. No typed value slots yet, no DynamicSubgrammar.

Exit criteria: all DDL commands except create table pass via the walker; the rest still pass via chumsky.

Phase C — create table with column-list value literals.

The with pk clause uses Repeated for the column-spec list, each spec being a Seq(Ident{NewName}, Punct(':'), Ident{Types}-with-validator).
First test of Repeated with separator.

Exit criteria: create table works end-to-end via the walker.

Phase D — data commands with full schema awareness.

show data, show table, replay.
insert: uses DynamicSubgrammar(column_value_list) for the comma-separated typed value list. Exercises full WalkContext propagation: Ident { source: Tables, role: "table", writes_table: true } resolves the column list; the dynamic sub-grammar unfolds typed slots per column.
update: set clauses use DynamicSubgrammar to resolve the value slot's type from the column. where clause uses the shared sub-grammar with AnyValueSlot (or, optionally, also column-typed if the column resolves cleanly).
delete: same where clause; otherwise simple.

This is the phase that proves the design's central claim: typed slots, dynamic sub-grammars, and schema-aware narrowing all collaborate to produce a single coherent grammar declaration per command.

Exit criteria: all data commands pass via the walker; the round-5 limitations close automatically (save Tab can offer as, value slots narrow by column type).

Phase E — replay end-to-end.

replay uses BarePath + StringLit (quoted form).
Internally replays each line through the same dispatch pipeline.

Exit criteria: replay works end-to-end via the walker; nested replay rejection still fires from the runtime, unchanged.

Phase F — cleanup.

Delete dsl/parser.rs.
Delete dsl/lexer.rs.
Delete dsl/keyword.rs.
Delete dsl/ident_slot.rs (already merged into grammar/mod.rs in Phase A).
Delete dsl/usage.rs::REGISTRY.
Delete chumsky dependency from Cargo.toml.
Delete parse.token.keyword.* entries from the catalog and keys.rs that the walker doesn't need (the keyword vocabulary is implicit in the grammar nodes).
Remove the differential test scaffolding from Phase A.

Exit criteria: working tree clean of legacy parser code; test suite still all-green; cargo clippy --all-targets -- -D warnings passes; cargo build --release binary not noticeably larger.

Implementation note (2026-05-15) — "Phase F minimal". Phase F shipped as planned with one deliberate deviation: dsl/parser.rs was retained, not deleted. The chumsky + lexer pipeline is gone (chumsky dependency removed; lexer.rs, keyword.rs, ident_slot.rs, usage.rs all deleted; the parse.token.* catalog entries collapsed), but parser.rs remains as the thin router: it owns the public parse_command / parse_command_with_schema entry points and the ParseError type, whose {message, position, at_eof, expected} shape completion, hint rendering, and the input-renderer overlay all depend on. Deleting the file would only scatter that surface across walker / dsl/mod.rs for no functional gain. The differential scaffolding was never built as a live harness — it materialised as hand-curated expectation tests. parser.rs documents this in its own module doc comment.

Test discipline

Three guarantees throughout migration:

Full test suite green at every commit. Migration is per-command; tests are per-behaviour. They don't care which parser produces a Command — they assert input → expected output. If a test fails mid-migration, the walker hasn't reproduced behaviour; fix the walker before continuing.
Walker-specific tests for trie-only features. Schema- aware narrowing, WalkContext propagation, dynamic sub- grammar expansion, HintMode per-node behaviour, the round-5 "save Tab offers as" gap-closing — each gets new tests as the feature lands.
Differential check during the migration window. A test helper iterates the existing input corpus, runs both parsers on inputs that fall under a migrated command, and asserts identical Command output. Cheap insurance against subtle divergence. Removed at Phase F cleanup.

Cleanup pass at Phase F

Beyond deleting the legacy modules, Phase F includes catalog cleanup. The parse.token.keyword.* entries (40+ of them) are near-mechanical wrappers (create: "\create`"); with no external code looking up these keys (the walker renders keyword names from Wordnode literals directly), the entries can collapse. A smallformat_keyword_for_error(literal) -> Stringhelper replaces them. Thekeys.rs` declarations go with them.

Help text in help.cli_banner and help.in_app_body stays as hand-written prose — the alternative (auto-generating from the grammar) was deferred during the round-6 discussion as a separate concern; the grammar tree carries enough metadata (per-command help_id) for future automation but the prose documentation is still hand-curated for round 1.

Consequences

What we gain

One declaration per command. Entry keyword, shape, AST builder, dispatch handler, usage reference, help reference all colocated. Adding a command is one block in one file.
No cross-file scatter. The round-5 "10 places to remove q" critique is structurally addressed: there's nowhere else for keyword/usage/registry info to live but the grammar tree.
Schema-aware narrowing from day one. Typed value slots reject mis-shaped input at parse time with localised error wording; completion narrows per column type; the round-5 value-literal slot hint becomes type-specific ("Type a date as 'YYYY-MM-DD'") not generic.
Aliases as a single annotation. q could come back as one line on the quit Word node, no scatter.
Tests focus on behaviour, not enumeration. Tests that hardcoded keyword lists during round 5 (we noted these in usage.rs and completion.rs) can iterate the trie registry instead, becoming structural rather than literal.
Drift is structurally impossible. Completion, highlight, parse, usage, and help all derive from the same trie. No separate sources to keep in sync.

What we accept

Parse depends on schema state. A DSL command that references a non-existent table fails at parse time, not at execute time as today. This matches the user mental model when typing (the schema cache is current per ADR-0022) and yields better completion / hint experience. It does mean tests that exercised parser behaviour in isolation may now need to set up a schema cache.
chumsky's general-purpose features go unused. Recovery on ambiguous input, multi-error reporting in a single pass, ambiguous-grammar handling — features chumsky offers but our DSL doesn't use. The trade is fine because our grammar is deterministic.
Some implementation complexity moves into the walker. Whitespace skipping between siblings, terminal consume functions, character-level shape recognition — the lexer did some of this implicitly; the walker does it explicitly. Net code is comparable or smaller because the scatter cost goes away.

What's out of scope for this ADR

External tooling integration (LSP, editor extensions). The registry is pub and accessible via accessor functions, so future tooling work doesn't fight this design. No tooling is built in round 1.
Help text auto-generation. Grammar tree carries help_id per node, but the help catalog body stays hand-curated.
Performance optimisation. Walker re-runs per keystroke for completion + highlighting. Naïve implementation is acceptable; if hot-path concerns emerge later, caching / incremental walks become a separate ADR.
Ranker implementations. The ranker hook exists; default is identity. Frequency-based ranking, content-aware priors for type completion ("Email → text first, Score → real"), recency — all future work that plugs into the ranker signature without touching grammar declarations.
Per-slot highlight overrides. The highlight_override field exists but stays None in round 1. Differentiating table-ident from new-name-ident visually is a future enhancement.

References

ADR-0023 — Unified declarative grammar tree (Proposed direction). Superseded by this ADR for execution detail.
ADR-0001 — Language and TUI framework (chumsky choice). Phase F removes the chumsky dependency.
ADR-0019 — Friendly error layer and i18n catalog. Catalog conventions stay; parse.token.keyword.* entries collapse in Phase F.
ADR-0020 — Tokenization layer for the DSL parser. Superseded by the scannerless walker.
ADR-0021 — Parser-as-source-of-truth for H1a. Usage info migrates from a separate registry to grammar nodes.
ADR-0022 — Ambient typing assistance. The walker subsumes the expected-set introspection that powered completion in that ADR.
Round-6 session transcript — design pass that produced this spec.

27 KiB Raw Blame History