ADR-0024 audited as fully implemented. Amend the ADR with a "Phase F minimal" implementation note (parser.rs retained as the router + ParseError home) and update the README index line to match. Reconcile docs/requirements.md against handoffs 10-14: refresh the test baseline (449 -> 1006), mark U4 (replay) satisfied, correct the A1 / H1a / H3 progress notes. Amend handoff-14: §3 flagged items both resolved (ranker kept, CommandNode.hint_mode removed); §4 rewritten as a concrete next-work pointer at the reconciled requirements.md.
27 KiB
ADR-0024: Unified grammar tree — execution plan
Status
Accepted. 2026-05-14.
Concrete specification for the direction proposed in ADR-0023. Where ADR-0023 captured the critique of the current parser shape and the high-level vision, this ADR specifies the data model, walker semantics, migration sequence, and cleanup steps in enough detail that implementation can proceed without further design decisions.
Supersedes ADR-0023's "Proposed" status. ADR-0023 stays in the directory as institutional memory of why this change is happening; ADR-0024 is what gets built.
Context
The design pass landed in the round-6 session (2026-05-14) worked through ADR-0023's open questions and a number of implicit decisions that hadn't been written down. Four rounds of questions, each followed by user confirmation:
- Round 1 — foundational. Registry shape, node taxonomy, AST output model, failure / "expected" semantics, walker API and its mapping to parse / complete / highlight / hint concerns.
- Round 2 — concrete representation. Multi-keyword sequences, sub-grammar reusability (static and dynamic), path-bearing commands, bare-or-with-suffix commands.
- Round 3 — organisation and migration. Module layout, per-command migration strategy, test discipline during migration.
- Round 4 — smaller details. Aliases on keyword nodes,
IdentSlotfate, highlight palette, external-tooling exposure.
Two larger decisions emerged from the rounds and shifted the shape from ADR-0023's sketch:
- The lexer dissolves. The walker operates directly on
source bytes ("scannerless"). The current
dsl/lexer.rsmodule's responsibilities (whitespace skipping, token shape recognition, byte-span tracking) migrate into terminal-node consume functions and the walker driver. Thedefine_keywords!macro is no longer needed in its current form; keyword literals live onWordnodes in the grammar. - Schema-aware parse from day one. ADR-0023 had been cautious about coupling parse to schema state. The round-1 / round-2 discussion concluded that this caution comes from general-purpose parser tooling and doesn't apply to an interactive DSL editor where the schema is the context. Typed value slots consult the schema during parse; bind-time type checks remain but become belt-and-braces rather than the primary defense.
A separate critique surfaced in the design pass: my (Claude's) default pull toward "what's the safe incremental version of what general-purpose parser tooling does" repeatedly fought against the project owner's cleaner direct design. The pull is now explicitly resisted — this ADR ships the direct design, not a phased compromise.
Decision summary
A single trie data structure declared in Rust serves as the authority for parsing, completion, syntax highlighting, parse- error usage rendering, hint-panel content, and (eventually) external-tooling exposure. The walker that consumes this trie operates directly on source bytes — no separate lexer pass. Schema-aware narrowing flows naturally from the trie's structure: typed value slots and dynamic sub-grammars consult a per-walk context that carries the current table, the resolved column types, and a reference to the schema cache.
Migration is per-command across six phases. The legacy
chumsky parser and the new walker run side-by-side during the
transition; existing behavioural tests guard regressions.
Phase F removes chumsky, the lexer module, the separate
UsageEntry registry, and the expected-set introspection
in completion.rs.
Estimated total cost: ~4 sessions — one to land the framework and migrate Phase A, two for Phases B-D, one for Phases E + F.
Architecture
Walker as single source of truth
pub fn walk(
source: &str,
bound: WalkBound,
ctx: &mut WalkContext,
) -> WalkResult<'_>;
pub enum WalkBound {
EndOfInput, // parse: walk all input
Position(usize), // complete / hint: walk up to cursor byte
}
pub struct WalkResult<'a> {
pub outcome: WalkOutcome,
pub matched_path: MatchedPath,
pub per_byte_class: Vec<(ByteRange, HighlightClass)>,
}
pub enum WalkOutcome {
Match { command_idx: usize },
Incomplete { position: usize, expected: Vec<&'static Node> },
Mismatch { position: usize, expected: Vec<&'static Node>, found_byte: u8 },
ValidationFailed { position: usize, message_key: &'static str, args: Vec<(&'static str, String)> },
}
Consumers:
- Parse for dispatch.
walk(source, EndOfInput, ctx). OnMatch, invokecommands[command_idx].ast_builder(matched_path)and dispatch the returnedCommand. - Highlighting.
walk(source, EndOfInput, ctx).per_byte_class. Each terminal records(byte_range, node.highlight_class())as it matches. Unmatched ranges (past a failure) get thetok_erroroverlay. - Completion at cursor.
walk(source, Position(cursor), ctx), inspectoutcome.expected. Each expectedNodecontributes candidates:Word→ its primary literal,Ident { source }→ schema-cache lookup,Flag→--name, value-literal slot → type-appropriate hint perHintMode, etc. - Hint panel ambient. Same walk as completion. The hint
resolver consults
WalkOutcomevariants plus the expected nodes'HintModeto choose between candidates rendering, prose, suppression, etc.
Scannerless: no lexer module
Terminal nodes consume bytes directly. No pre-pass produces a
Vec<Token>. The walker's driver handles whitespace skipping
between siblings of a Seq and dispatches to each terminal's
consume(source, position) function.
Character-level helpers (identifier shape, digit-sequence shape,
quoted-string escape handling) live in
src/dsl/walker/lex_helpers.rs — a small shared module used
by the various terminal consume functions. This is internally
similar to the current lexer's logic, but it's invoked per-position
by the walker rather than as a pre-pass.
src/dsl/lexer.rs and src/dsl/keyword.rs are deleted in
Phase F. The keyword vocabulary is no longer a Rust enum; each
keyword exists as a Word node in the grammar declarations.
Node taxonomy
Thirteen node kinds. Three categories:
Terminals (consume bytes):
pub enum Node {
Word {
primary: &'static str,
aliases: &'static [&'static str],
// Default tok_keyword unless overridden.
highlight_override: Option<HighlightClass>,
},
Punct(char),
Ident {
source: IdentSource,
role: &'static str,
highlight_override: Option<HighlightClass>,
},
NumberLit,
StringLit,
BlobLit,
Flag(&'static str),
BarePath,
// Combinators ↓
}
Combinators (compose other nodes):
Choice(&'static [Node]),
Seq(&'static [Node]),
Optional(&'static Node),
Repeated {
inner: &'static Node,
separator: Option<&'static Node>,
min: usize,
},
Dynamic (resolves at walk time using WalkContext):
DynamicSubgrammar(fn(&WalkContext) -> Node),
}
CommandNode is the top-level entry record:
pub struct CommandNode {
pub entry: Word,
pub shape: Node, // usually a Seq
pub ast_builder: fn(&MatchedPath) -> Command,
pub dispatch: fn(&mut App, Command) -> Vec<Action>,
pub help_id: Option<&'static str>,
pub usage_id: Option<&'static str>,
// Hint mode override at command level; nodes can carry their own too.
pub hint_mode: Option<HintMode>,
}
pub const REGISTRY: &[CommandNode] = &[ /* ... */ ];
Typed value slots
Value-literal positions use typed slots built from terminals plus content validators. One slot factory per data type:
fn int_slot() -> Node { Choice(&[NumberLit_with(integer_only_validator), null_word()]) }
fn real_slot() -> Node { Choice(&[NumberLit, null_word()]) }
fn decimal_slot() -> Node { Choice(&[NumberLit_with(decimal_validator), null_word()]) }
fn bool_slot() -> Node { Choice(&[Word("true", &[]), Word("false", &[]), null_word()]) }
fn text_slot() -> Node { Choice(&[StringLit, null_word()]) }
fn date_slot() -> Node { Choice(&[StringLit_with(date_format_validator), null_word()]) }
fn datetime_slot() -> Node { Choice(&[StringLit_with(datetime_format_validator), null_word()]) }
fn blob_slot() -> Node { Choice(&[BlobLit, null_word()]) }
StringLit_with(validator) is a StringLit terminal carrying
a content validator that runs after a successful match. Same
for NumberLit_with. A failed validator surfaces as
WalkOutcome::ValidationFailed with the validator's catalog
key.
slot_for_type(ty: Type) -> Node is the dispatcher: given a
column type, returns the appropriate slot. Used by dynamic
sub-grammars (see below).
WalkContext
pub struct WalkContext<'a> {
pub schema: &'a SchemaCache,
// Current table inferred from the partial parse — e.g.,
// `insert into Customers ...` sets `current_table = "Customers"`.
pub current_table: Option<String>,
// The columns of `current_table`, in declaration order, with types.
// Populated by Ident { source: Tables } when it matches a
// known table.
pub current_table_columns: Option<Vec<ColumnInfo>>,
// For comma-separated value lists, which position we're at.
pub value_position: usize,
// For `set` clauses and `where` clauses, the column whose value
// we're about to consume.
pub current_column: Option<ColumnInfo>,
}
Nodes can write to WalkContext:
Ident { source: Tables, role: "table", writes_table: true }on match setsctx.current_tableto the matched identifier and resolvesctx.current_table_columnsfrom the schema.Ident { source: Columns, role: "column", writes_current_column: true }on match setsctx.current_columnfrom the resolved column list.
Nodes can read from WalkContext:
DynamicSubgrammar(column_value_list)readsctx.current_table_columnsand unfolds to aSeqof comma-separated typed slots — one per column.- The value slot after
set col=readsctx.current_column.user_typeto pick the right typed slot.
WalkOutcome and "expected"
The walker keeps track of the longest prefix that matched and
the position at which it failed (or completed). At a failure
or incomplete position, expected is the set of nodes that
could legally continue the walk — derived structurally from
the trie, not from a separate "expected" table.
For a Seq mid-walk, expected is the next child node.
For a Choice that hasn't committed to a branch, expected
is all children. For an Optional at a position where its
inner could start, expected includes the inner plus the
next sibling.
This is the same information chumsky's
ParseError::Invalid::expected carries today, sourced from
the trie directly instead of via combinator introspection.
HintMode per node
Each node may carry a HintMode:
pub enum HintMode {
/// Candidates if any surface; else prose fallback.
Default,
/// Force the prose at this catalog key regardless of candidates.
/// Used by NewName slots ("Type a name, then `(`").
ForceProse(&'static str),
/// Show only the prose; suppress Tab candidates.
/// Used by typed value slots at empty prefix.
ProseOnly(&'static str),
/// Suppress prose; only candidates.
SuppressProse,
}
The walker propagates each expected node's HintMode to the
hint resolver, which dispatches accordingly.
The current ad-hoc cases in input_render.rs::ambient_hint
(value-literal slot suppression, NewName slot typing-name
prose, invalid-ident overlay) migrate to node-attached
HintMode annotations during Phase D.
Ranker layer
A ranker function runs between the walker's raw candidate output and the hint-panel renderer:
pub type Ranker = fn(&WalkContext, Vec<Candidate>) -> Vec<Candidate>;
pub fn identity_ranker(_: &WalkContext, c: Vec<Candidate>) -> Vec<Candidate> { c }
Default is identity_ranker — declaration order from the
trie is preserved. The signature allows future enhancements
(frequency-based ranking, content-aware priors for type
suggestions per column name) to plug in without changing
grammar declarations.
The ranker lives outside the trie. Grammar declarations are about what's valid; ranking is about what's likely useful first.
Sub-grammars
Two flavours, no global registry:
Static — pure composition, function returning a const node:
const fn qualified_column(role_table: &'static str, role_col: &'static str) -> Node {
Seq(&[
Ident { source: Tables, role: role_table, /* ... */ },
Punct('.'),
Ident { source: Columns, role: role_col, /* ... */ },
])
}
const fn where_clause() -> Node {
Seq(&[
Word { primary: "where", /* ... */ },
Ident { source: Columns, role: "filter_column", /* ... */ },
Punct('='),
AnyValueSlot,
])
}
Dynamic — context-aware, expands at walk time:
fn column_value_list(ctx: &WalkContext) -> Node {
let cols = ctx.current_table_columns.as_ref().unwrap_or(&Vec::new());
let mut children: Vec<Node> = Vec::new();
for (i, col) in cols.iter().enumerate() {
if i > 0 { children.push(Punct(',')); }
children.push(slot_for_type(col.user_type));
}
Seq(Box::leak(children.into_boxed_slice()))
}
Dynamic sub-grammars return owned Node values that the
walker treats as inline expansions. The leak above is one
implementation tactic — alternatively, the walker stores the
expanded node in a small per-walk arena. Both work; pick at
implementation time.
Aliases
A Word node carries primary and an aliases slice. The
walker matches input against either; completion surfaces only
the primary; help text mentions aliases prose-style if
appropriate. Highlight class is the same for both.
Round 5's q removal is not reverted by this design. q
stays gone — adding it back would now be the single line
aliases: &["q"] on the quit Word node, and would not
surface as a separate candidate in completion (matching the
round-5 user request).
IdentSource
Replaces the current dsl::ident_slot::IdentSlot:
pub enum IdentSource {
NewName, // user invents; no schema lookup; ProseOnly hint
Tables, // existing table names
Columns, // existing column names (filtered by current table)
Relationships, // existing relationship names
Types, // closed set from Type::all()
}
Types is new — it replaces the magic-string TYPE_SLOT_LABEL
used today. src/dsl/ident_slot.rs dissolves into
src/dsl/grammar/mod.rs.
Highlight class assignment
Per-byte highlight class is computed as a side effect of the
walk. Each terminal records (byte_range, class) in
WalkResult::per_byte_class as it matches. Unmatched ranges
(past a definite failure) get the tok_error overlay,
identical to today's behaviour.
Default classes per terminal kind:
| Terminal | Default class |
|---|---|
Word |
tok_keyword |
Punct |
tok_punct |
Ident |
tok_identifier |
NumberLit |
tok_number |
StringLit |
tok_string |
BlobLit |
tok_string |
Flag |
tok_flag |
BarePath |
tok_string |
The highlight_override: Option<HighlightClass> field on
Word and Ident is reserved for future per-slot variants
(e.g., a Tables slot in a distinct shade vs a NewName slot
muted) — left None everywhere in round 1.
No new palette colours for the initial migration.
Migration plan
Code organisation
src/dsl/
grammar/
mod.rs — Node enum, IdentSource, HintMode, HighlightClass,
MatchedPath, CommandNode, REGISTRY top-level
data.rs — insert, update, delete, show
ddl.rs — create, drop, add, rename, change
app.rs — quit, help, save/save-as, new, load, rebuild,
export, import, mode, messages
shared.rs — typed value slots (int_slot, date_slot, …),
qualified_column, where_clause, action_keyword,
column_value_list (dynamic)
validators.rs — content validators (integer_only_validator,
date_format_validator, datetime_format_validator,
type_name_validator, …)
walker/
mod.rs — public walk() entry; orchestration
driver.rs — the per-node-kind dispatch
context.rs — WalkContext
outcome.rs — WalkOutcome, MatchedPath, WalkResult
lex_helpers.rs — identifier-shape, digit-shape, string-escape
helpers; shared across terminal consume fns
parser.rs — Phase A: becomes a router. Phase F: deleted.
lexer.rs — Phase F: deleted.
keyword.rs — Phase F: deleted.
ident_slot.rs — Phase F: dissolved into grammar/mod.rs.
usage.rs — Phase F: REGISTRY deleted; the file may go.
Six-phase migration
Phase A — Walker skeleton + app-lifecycle commands.
- Build the walker driver,
WalkContext,WalkOutcome,MatchedPath, the terminal consume functions. - Migrate the app-lifecycle commands (no schema dependency, no value literals): quit, help, rebuild, save, save as, new, load, export, import, mode, messages.
- Router in
parse_commandconsults the walker for migrated commands; falls back to chumsky for the rest. - Differential test scaffolding: a test helper that, for every
input in the existing test corpus, runs both parsers and
asserts identical
Commandoutput where the input falls under a migrated command.
Exit criteria: walker handles the app-lifecycle commands end-to-end; existing tests for those commands pass via the walker path; tests for other commands still pass via chumsky.
Phase B — DDL commands without value literals.
- drop table, drop column, drop relationship.
- rename column.
- add column (without the value-literal aspect — type slot
uses
Ident { source: Types }with a content validator). - add 1:n relationship (referential clauses as a static sub-grammar).
- change column (type slot + flags).
These exercise schema lookups via Ident { source: Tables }
and Ident { source: Columns }, and the Types source. No
typed value slots yet, no DynamicSubgrammar.
Exit criteria: all DDL commands except create table pass
via the walker; the rest still pass via chumsky.
Phase C — create table with column-list value literals.
- The
with pkclause usesRepeatedfor the column-spec list, each spec being aSeq(Ident{NewName}, Punct(':'), Ident{Types}-with-validator). - First test of
Repeatedwith separator.
Exit criteria: create table works end-to-end via the walker.
Phase D — data commands with full schema awareness.
- show data, show table, replay.
- insert: uses
DynamicSubgrammar(column_value_list)for the comma-separated typed value list. Exercises fullWalkContextpropagation:Ident { source: Tables, role: "table", writes_table: true }resolves the column list; the dynamic sub-grammar unfolds typed slots per column. - update:
setclauses useDynamicSubgrammarto resolve the value slot's type from the column.whereclause uses the shared sub-grammar withAnyValueSlot(or, optionally, also column-typed if the column resolves cleanly). - delete: same
whereclause; otherwise simple.
This is the phase that proves the design's central claim: typed slots, dynamic sub-grammars, and schema-aware narrowing all collaborate to produce a single coherent grammar declaration per command.
Exit criteria: all data commands pass via the walker; the
round-5 limitations close automatically (save Tab can offer
as, value slots narrow by column type).
Phase E — replay end-to-end.
- replay uses
BarePath+StringLit(quoted form). - Internally replays each line through the same dispatch pipeline.
Exit criteria: replay works end-to-end via the walker; nested replay rejection still fires from the runtime, unchanged.
Phase F — cleanup.
- Delete
dsl/parser.rs. - Delete
dsl/lexer.rs. - Delete
dsl/keyword.rs. - Delete
dsl/ident_slot.rs(already merged intogrammar/mod.rsin Phase A). - Delete
dsl/usage.rs::REGISTRY. - Delete
chumskydependency fromCargo.toml. - Delete
parse.token.keyword.*entries from the catalog andkeys.rsthat the walker doesn't need (the keyword vocabulary is implicit in the grammar nodes). - Remove the differential test scaffolding from Phase A.
Exit criteria: working tree clean of legacy parser code;
test suite still all-green; cargo clippy --all-targets -- -D warnings passes; cargo build --release binary not
noticeably larger.
Implementation note (2026-05-15) — "Phase F minimal".
Phase F shipped as planned with one deliberate deviation:
dsl/parser.rs was retained, not deleted. The chumsky +
lexer pipeline is gone (chumsky dependency removed; lexer.rs,
keyword.rs, ident_slot.rs, usage.rs all deleted; the
parse.token.* catalog entries collapsed), but parser.rs
remains as the thin router: it owns the public parse_command
/ parse_command_with_schema entry points and the ParseError
type, whose {message, position, at_eof, expected} shape
completion, hint rendering, and the input-renderer overlay all
depend on. Deleting the file would only scatter that surface
across walker / dsl/mod.rs for no functional gain. The
differential scaffolding was never built as a live harness —
it materialised as hand-curated expectation tests. parser.rs
documents this in its own module doc comment.
Test discipline
Three guarantees throughout migration:
- Full test suite green at every commit. Migration is
per-command; tests are per-behaviour. They don't care
which parser produces a
Command— they assert input → expected output. If a test fails mid-migration, the walker hasn't reproduced behaviour; fix the walker before continuing. - Walker-specific tests for trie-only features. Schema-
aware narrowing,
WalkContextpropagation, dynamic sub- grammar expansion,HintModeper-node behaviour, the round-5 "save Tab offers as" gap-closing — each gets new tests as the feature lands. - Differential check during the migration window. A
test helper iterates the existing input corpus, runs both
parsers on inputs that fall under a migrated command, and
asserts identical
Commandoutput. Cheap insurance against subtle divergence. Removed at Phase F cleanup.
Cleanup pass at Phase F
Beyond deleting the legacy modules, Phase F includes catalog
cleanup. The parse.token.keyword.* entries (40+ of them) are
near-mechanical wrappers (create: "\create`"); with no external code looking up these keys (the walker renders keyword names from Wordnode literals directly), the entries can collapse. A smallformat_keyword_for_error(literal)
-> Stringhelper replaces them. Thekeys.rs` declarations
go with them.
Help text in help.cli_banner and help.in_app_body stays
as hand-written prose — the alternative (auto-generating from
the grammar) was deferred during the round-6 discussion as a
separate concern; the grammar tree carries enough metadata
(per-command help_id) for future automation but the prose
documentation is still hand-curated for round 1.
Consequences
What we gain
- One declaration per command. Entry keyword, shape, AST builder, dispatch handler, usage reference, help reference all colocated. Adding a command is one block in one file.
- No cross-file scatter. The round-5 "10 places to remove
q" critique is structurally addressed: there's nowhere else for keyword/usage/registry info to live but the grammar tree. - Schema-aware narrowing from day one. Typed value slots reject mis-shaped input at parse time with localised error wording; completion narrows per column type; the round-5 value-literal slot hint becomes type-specific ("Type a date as 'YYYY-MM-DD'") not generic.
- Aliases as a single annotation.
qcould come back as one line on thequitWordnode, no scatter. - Tests focus on behaviour, not enumeration. Tests that
hardcoded keyword lists during round 5 (we noted these in
usage.rsandcompletion.rs) can iterate the trie registry instead, becoming structural rather than literal. - Drift is structurally impossible. Completion, highlight, parse, usage, and help all derive from the same trie. No separate sources to keep in sync.
What we accept
- Parse depends on schema state. A DSL command that references a non-existent table fails at parse time, not at execute time as today. This matches the user mental model when typing (the schema cache is current per ADR-0022) and yields better completion / hint experience. It does mean tests that exercised parser behaviour in isolation may now need to set up a schema cache.
- chumsky's general-purpose features go unused. Recovery on ambiguous input, multi-error reporting in a single pass, ambiguous-grammar handling — features chumsky offers but our DSL doesn't use. The trade is fine because our grammar is deterministic.
- Some implementation complexity moves into the walker. Whitespace skipping between siblings, terminal consume functions, character-level shape recognition — the lexer did some of this implicitly; the walker does it explicitly. Net code is comparable or smaller because the scatter cost goes away.
What's out of scope for this ADR
- External tooling integration (LSP, editor extensions).
The registry is
puband accessible via accessor functions, so future tooling work doesn't fight this design. No tooling is built in round 1. - Help text auto-generation. Grammar tree carries
help_idper node, but the help catalog body stays hand-curated. - Performance optimisation. Walker re-runs per keystroke for completion + highlighting. Naïve implementation is acceptable; if hot-path concerns emerge later, caching / incremental walks become a separate ADR.
- Ranker implementations. The ranker hook exists; default is identity. Frequency-based ranking, content-aware priors for type completion ("Email → text first, Score → real"), recency — all future work that plugs into the ranker signature without touching grammar declarations.
- Per-slot highlight overrides. The
highlight_overridefield exists but staysNonein round 1. Differentiating table-ident from new-name-ident visually is a future enhancement.
References
- ADR-0023 — Unified declarative grammar tree (Proposed direction). Superseded by this ADR for execution detail.
- ADR-0001 — Language and TUI framework (chumsky choice). Phase F removes the chumsky dependency.
- ADR-0019 — Friendly error layer and i18n catalog. Catalog conventions stay;
parse.token.keyword.*entries collapse in Phase F. - ADR-0020 — Tokenization layer for the DSL parser. Superseded by the scannerless walker.
- ADR-0021 — Parser-as-source-of-truth for H1a. Usage info migrates from a separate registry to grammar nodes.
- ADR-0022 — Ambient typing assistance. The walker subsumes the expected-set introspection that powered completion in that ADR.
- Round-6 session transcript — design pass that produced this spec.