74c3ec1edf
Concrete specification for the direction in ADR-0023, landed during the round-6 design pass. Resolves all four rounds of open design questions: walker as single source of truth, scannerless terminal vocabulary (~8 building blocks), typed value slots with content validators, WalkContext for schema- aware narrowing from day one, WalkOutcome multi-purpose return, HintMode per-node, ranker as separate layer, static + dynamic sub-grammars, aliases as Word annotations, IdentSource taxonomy, six-phase per-command migration with chumsky and walker side-by-side during the transition. Key shifts from ADR-0023's sketch: - Lexer dissolves entirely. Walker operates on bytes directly. dsl/lexer.rs, dsl/keyword.rs go away in Phase F. - Schema-aware parse from day one (not phased). Typed value slots reject mis-shaped input at parse time with localised wording. Completion narrows per column type. - Sub-grammars: static (fn() -> Node) for composition; dynamic (fn(&WalkContext) -> Node) for schema-dependent expansion. No global named registry. - Path-bearing commands: BarePath becomes a routine non-whitespace terminal. Paths with spaces require quoting via StringLit (UX simplification, aligns with standard CLI convention). - 13-node taxonomy: Word, Punct, Ident, NumberLit, StringLit, BlobLit, Flag, BarePath, Choice, Seq, Optional, Repeated, DynamicSubgrammar. Migration plan: Phase A (walker scaffolding + app-lifecycle commands), Phase B (DDL without value literals), Phase C (create table), Phase D (data commands with full schema awareness -- the design's central claim landing), Phase E (replay), Phase F (delete chumsky + lexer + legacy parser modules, simplify catalog). Estimated ~4 sessions total. Also: rename ADR-0023 from 0023-proposed-unified-grammar-tree.md to 0023-unified-grammar-tree.md (git mv preserves history) and update its status to reflect the direction-accepted-but- superseded-for-execution-detail relationship with ADR-0024. Index updated.
702 lines
26 KiB
Markdown
702 lines
26 KiB
Markdown
# ADR-0024: Unified grammar tree — execution plan
|
|
|
|
## Status
|
|
|
|
**Accepted.** 2026-05-14.
|
|
|
|
Concrete specification for the direction proposed in ADR-0023.
|
|
Where ADR-0023 captured the critique of the current parser
|
|
shape and the high-level vision, this ADR specifies the data
|
|
model, walker semantics, migration sequence, and cleanup steps
|
|
in enough detail that implementation can proceed without
|
|
further design decisions.
|
|
|
|
Supersedes ADR-0023's "Proposed" status. ADR-0023 stays in
|
|
the directory as institutional memory of why this change is
|
|
happening; ADR-0024 is what gets built.
|
|
|
|
## Context
|
|
|
|
The design pass landed in the round-6 session (2026-05-14)
|
|
worked through ADR-0023's open questions and a number of
|
|
implicit decisions that hadn't been written down. Four rounds
|
|
of questions, each followed by user confirmation:
|
|
|
|
1. **Round 1 — foundational.** Registry shape, node taxonomy,
|
|
AST output model, failure / "expected" semantics, walker
|
|
API and its mapping to parse / complete / highlight / hint
|
|
concerns.
|
|
2. **Round 2 — concrete representation.** Multi-keyword
|
|
sequences, sub-grammar reusability (static and dynamic),
|
|
path-bearing commands, bare-or-with-suffix commands.
|
|
3. **Round 3 — organisation and migration.** Module layout,
|
|
per-command migration strategy, test discipline during
|
|
migration.
|
|
4. **Round 4 — smaller details.** Aliases on keyword nodes,
|
|
`IdentSlot` fate, highlight palette, external-tooling
|
|
exposure.
|
|
|
|
Two larger decisions emerged from the rounds and shifted the
|
|
shape from ADR-0023's sketch:
|
|
|
|
- **The lexer dissolves.** The walker operates directly on
|
|
source bytes ("scannerless"). The current `dsl/lexer.rs`
|
|
module's responsibilities (whitespace skipping, token shape
|
|
recognition, byte-span tracking) migrate into terminal-node
|
|
consume functions and the walker driver. The `define_keywords!`
|
|
macro is no longer needed in its current form; keyword
|
|
literals live on `Word` nodes in the grammar.
|
|
- **Schema-aware parse from day one.** ADR-0023 had been
|
|
cautious about coupling parse to schema state. The round-1
|
|
/ round-2 discussion concluded that this caution comes from
|
|
general-purpose parser tooling and doesn't apply to an
|
|
interactive DSL editor where the schema *is* the context.
|
|
Typed value slots consult the schema during parse; bind-time
|
|
type checks remain but become belt-and-braces rather than
|
|
the primary defense.
|
|
|
|
A separate critique surfaced in the design pass: my (Claude's)
|
|
default pull toward "what's the safe incremental version of
|
|
what general-purpose parser tooling does" repeatedly fought
|
|
against the project owner's cleaner direct design. The pull
|
|
is now explicitly resisted — this ADR ships the direct design,
|
|
not a phased compromise.
|
|
|
|
## Decision summary
|
|
|
|
A single trie data structure declared in Rust serves as the
|
|
authority for parsing, completion, syntax highlighting, parse-
|
|
error usage rendering, hint-panel content, and (eventually)
|
|
external-tooling exposure. The walker that consumes this trie
|
|
operates directly on source bytes — no separate lexer pass.
|
|
Schema-aware narrowing flows naturally from the trie's
|
|
structure: typed value slots and dynamic sub-grammars consult
|
|
a per-walk context that carries the current table, the
|
|
resolved column types, and a reference to the schema cache.
|
|
|
|
Migration is per-command across six phases. The legacy
|
|
chumsky parser and the new walker run side-by-side during the
|
|
transition; existing behavioural tests guard regressions.
|
|
Phase F removes chumsky, the lexer module, the separate
|
|
`UsageEntry` registry, and the expected-set introspection
|
|
in `completion.rs`.
|
|
|
|
Estimated total cost: ~4 sessions — one to land the framework
|
|
and migrate Phase A, two for Phases B-D, one for Phases E + F.
|
|
|
|
## Architecture
|
|
|
|
### Walker as single source of truth
|
|
|
|
```rust
|
|
pub fn walk(
|
|
source: &str,
|
|
bound: WalkBound,
|
|
ctx: &mut WalkContext,
|
|
) -> WalkResult<'_>;
|
|
|
|
pub enum WalkBound {
|
|
EndOfInput, // parse: walk all input
|
|
Position(usize), // complete / hint: walk up to cursor byte
|
|
}
|
|
|
|
pub struct WalkResult<'a> {
|
|
pub outcome: WalkOutcome,
|
|
pub matched_path: MatchedPath,
|
|
pub per_byte_class: Vec<(ByteRange, HighlightClass)>,
|
|
}
|
|
|
|
pub enum WalkOutcome {
|
|
Match { command_idx: usize },
|
|
Incomplete { position: usize, expected: Vec<&'static Node> },
|
|
Mismatch { position: usize, expected: Vec<&'static Node>, found_byte: u8 },
|
|
ValidationFailed { position: usize, message_key: &'static str, args: Vec<(&'static str, String)> },
|
|
}
|
|
```
|
|
|
|
Consumers:
|
|
|
|
- **Parse for dispatch.** `walk(source, EndOfInput, ctx)`. On
|
|
`Match`, invoke `commands[command_idx].ast_builder(matched_path)`
|
|
and dispatch the returned `Command`.
|
|
- **Highlighting.** `walk(source, EndOfInput, ctx).per_byte_class`.
|
|
Each terminal records `(byte_range, node.highlight_class())`
|
|
as it matches. Unmatched ranges (past a failure) get the
|
|
`tok_error` overlay.
|
|
- **Completion at cursor.** `walk(source, Position(cursor), ctx)`,
|
|
inspect `outcome.expected`. Each expected `Node` contributes
|
|
candidates: `Word` → its primary literal, `Ident { source }`
|
|
→ schema-cache lookup, `Flag` → `--name`, value-literal slot
|
|
→ type-appropriate hint per `HintMode`, etc.
|
|
- **Hint panel ambient.** Same walk as completion. The hint
|
|
resolver consults `WalkOutcome` variants plus the expected
|
|
nodes' `HintMode` to choose between candidates rendering,
|
|
prose, suppression, etc.
|
|
|
|
### Scannerless: no lexer module
|
|
|
|
Terminal nodes consume bytes directly. No pre-pass produces a
|
|
`Vec<Token>`. The walker's driver handles whitespace skipping
|
|
between siblings of a `Seq` and dispatches to each terminal's
|
|
`consume(source, position)` function.
|
|
|
|
Character-level helpers (identifier shape, digit-sequence shape,
|
|
quoted-string escape handling) live in
|
|
`src/dsl/walker/lex_helpers.rs` — a small shared module used
|
|
by the various terminal consume functions. This is internally
|
|
similar to the current lexer's logic, but it's invoked per-position
|
|
by the walker rather than as a pre-pass.
|
|
|
|
`src/dsl/lexer.rs` and `src/dsl/keyword.rs` are deleted in
|
|
Phase F. The keyword vocabulary is no longer a Rust enum; each
|
|
keyword exists as a `Word` node in the grammar declarations.
|
|
|
|
### Node taxonomy
|
|
|
|
Thirteen node kinds. Three categories:
|
|
|
|
**Terminals** (consume bytes):
|
|
|
|
```rust
|
|
pub enum Node {
|
|
Word {
|
|
primary: &'static str,
|
|
aliases: &'static [&'static str],
|
|
// Default tok_keyword unless overridden.
|
|
highlight_override: Option<HighlightClass>,
|
|
},
|
|
Punct(char),
|
|
Ident {
|
|
source: IdentSource,
|
|
role: &'static str,
|
|
highlight_override: Option<HighlightClass>,
|
|
},
|
|
NumberLit,
|
|
StringLit,
|
|
BlobLit,
|
|
Flag(&'static str),
|
|
BarePath,
|
|
// Combinators ↓
|
|
}
|
|
```
|
|
|
|
**Combinators** (compose other nodes):
|
|
|
|
```rust
|
|
Choice(&'static [Node]),
|
|
Seq(&'static [Node]),
|
|
Optional(&'static Node),
|
|
Repeated {
|
|
inner: &'static Node,
|
|
separator: Option<&'static Node>,
|
|
min: usize,
|
|
},
|
|
```
|
|
|
|
**Dynamic** (resolves at walk time using `WalkContext`):
|
|
|
|
```rust
|
|
DynamicSubgrammar(fn(&WalkContext) -> Node),
|
|
}
|
|
```
|
|
|
|
`CommandNode` is the top-level entry record:
|
|
|
|
```rust
|
|
pub struct CommandNode {
|
|
pub entry: Word,
|
|
pub shape: Node, // usually a Seq
|
|
pub ast_builder: fn(&MatchedPath) -> Command,
|
|
pub dispatch: fn(&mut App, Command) -> Vec<Action>,
|
|
pub help_id: Option<&'static str>,
|
|
pub usage_id: Option<&'static str>,
|
|
// Hint mode override at command level; nodes can carry their own too.
|
|
pub hint_mode: Option<HintMode>,
|
|
}
|
|
|
|
pub const REGISTRY: &[CommandNode] = &[ /* ... */ ];
|
|
```
|
|
|
|
### Typed value slots
|
|
|
|
Value-literal positions use typed slots built from terminals
|
|
plus content validators. One slot factory per data type:
|
|
|
|
```rust
|
|
fn int_slot() -> Node { Choice(&[NumberLit_with(integer_only_validator), null_word()]) }
|
|
fn real_slot() -> Node { Choice(&[NumberLit, null_word()]) }
|
|
fn decimal_slot() -> Node { Choice(&[NumberLit_with(decimal_validator), null_word()]) }
|
|
fn bool_slot() -> Node { Choice(&[Word("true", &[]), Word("false", &[]), null_word()]) }
|
|
fn text_slot() -> Node { Choice(&[StringLit, null_word()]) }
|
|
fn date_slot() -> Node { Choice(&[StringLit_with(date_format_validator), null_word()]) }
|
|
fn datetime_slot() -> Node { Choice(&[StringLit_with(datetime_format_validator), null_word()]) }
|
|
fn blob_slot() -> Node { Choice(&[BlobLit, null_word()]) }
|
|
```
|
|
|
|
`StringLit_with(validator)` is a `StringLit` terminal carrying
|
|
a content validator that runs after a successful match. Same
|
|
for `NumberLit_with`. A failed validator surfaces as
|
|
`WalkOutcome::ValidationFailed` with the validator's catalog
|
|
key.
|
|
|
|
`slot_for_type(ty: Type) -> Node` is the dispatcher: given a
|
|
column type, returns the appropriate slot. Used by dynamic
|
|
sub-grammars (see below).
|
|
|
|
### `WalkContext`
|
|
|
|
```rust
|
|
pub struct WalkContext<'a> {
|
|
pub schema: &'a SchemaCache,
|
|
// Current table inferred from the partial parse — e.g.,
|
|
// `insert into Customers ...` sets `current_table = "Customers"`.
|
|
pub current_table: Option<String>,
|
|
// The columns of `current_table`, in declaration order, with types.
|
|
// Populated by Ident { source: Tables } when it matches a
|
|
// known table.
|
|
pub current_table_columns: Option<Vec<ColumnInfo>>,
|
|
// For comma-separated value lists, which position we're at.
|
|
pub value_position: usize,
|
|
// For `set` clauses and `where` clauses, the column whose value
|
|
// we're about to consume.
|
|
pub current_column: Option<ColumnInfo>,
|
|
}
|
|
```
|
|
|
|
Nodes can write to `WalkContext`:
|
|
|
|
- `Ident { source: Tables, role: "table", writes_table: true }`
|
|
on match sets `ctx.current_table` to the matched identifier
|
|
and resolves `ctx.current_table_columns` from the schema.
|
|
- `Ident { source: Columns, role: "column", writes_current_column: true }`
|
|
on match sets `ctx.current_column` from the resolved column list.
|
|
|
|
Nodes can read from `WalkContext`:
|
|
|
|
- `DynamicSubgrammar(column_value_list)` reads
|
|
`ctx.current_table_columns` and unfolds to a `Seq` of
|
|
comma-separated typed slots — one per column.
|
|
- The value slot after `set col=` reads `ctx.current_column.user_type`
|
|
to pick the right typed slot.
|
|
|
|
### `WalkOutcome` and "expected"
|
|
|
|
The walker keeps track of the longest prefix that matched and
|
|
the position at which it failed (or completed). At a failure
|
|
or incomplete position, `expected` is the set of nodes that
|
|
could legally continue the walk — derived structurally from
|
|
the trie, not from a separate "expected" table.
|
|
|
|
For a `Seq` mid-walk, `expected` is the next child node.
|
|
For a `Choice` that hasn't committed to a branch, `expected`
|
|
is all children. For an `Optional` at a position where its
|
|
inner could start, `expected` includes the inner plus the
|
|
next sibling.
|
|
|
|
This is the same information chumsky's
|
|
`ParseError::Invalid::expected` carries today, sourced from
|
|
the trie directly instead of via combinator introspection.
|
|
|
|
### `HintMode` per node
|
|
|
|
Each node may carry a `HintMode`:
|
|
|
|
```rust
|
|
pub enum HintMode {
|
|
/// Candidates if any surface; else prose fallback.
|
|
Default,
|
|
/// Force the prose at this catalog key regardless of candidates.
|
|
/// Used by NewName slots ("Type a name, then `(`").
|
|
ForceProse(&'static str),
|
|
/// Show only the prose; suppress Tab candidates.
|
|
/// Used by typed value slots at empty prefix.
|
|
ProseOnly(&'static str),
|
|
/// Suppress prose; only candidates.
|
|
SuppressProse,
|
|
}
|
|
```
|
|
|
|
The walker propagates each expected node's `HintMode` to the
|
|
hint resolver, which dispatches accordingly.
|
|
|
|
The current ad-hoc cases in `input_render.rs::ambient_hint`
|
|
(value-literal slot suppression, NewName slot typing-name
|
|
prose, invalid-ident overlay) migrate to node-attached
|
|
`HintMode` annotations during Phase D.
|
|
|
|
### Ranker layer
|
|
|
|
A ranker function runs between the walker's raw candidate
|
|
output and the hint-panel renderer:
|
|
|
|
```rust
|
|
pub type Ranker = fn(&WalkContext, Vec<Candidate>) -> Vec<Candidate>;
|
|
|
|
pub fn identity_ranker(_: &WalkContext, c: Vec<Candidate>) -> Vec<Candidate> { c }
|
|
```
|
|
|
|
Default is `identity_ranker` — declaration order from the
|
|
trie is preserved. The signature allows future enhancements
|
|
(frequency-based ranking, content-aware priors for type
|
|
suggestions per column name) to plug in without changing
|
|
grammar declarations.
|
|
|
|
The ranker lives outside the trie. Grammar declarations are
|
|
about *what's valid*; ranking is about *what's likely useful
|
|
first*.
|
|
|
|
### Sub-grammars
|
|
|
|
Two flavours, no global registry:
|
|
|
|
**Static** — pure composition, function returning a const node:
|
|
|
|
```rust
|
|
const fn qualified_column(role_table: &'static str, role_col: &'static str) -> Node {
|
|
Seq(&[
|
|
Ident { source: Tables, role: role_table, /* ... */ },
|
|
Punct('.'),
|
|
Ident { source: Columns, role: role_col, /* ... */ },
|
|
])
|
|
}
|
|
|
|
const fn where_clause() -> Node {
|
|
Seq(&[
|
|
Word { primary: "where", /* ... */ },
|
|
Ident { source: Columns, role: "filter_column", /* ... */ },
|
|
Punct('='),
|
|
AnyValueSlot,
|
|
])
|
|
}
|
|
```
|
|
|
|
**Dynamic** — context-aware, expands at walk time:
|
|
|
|
```rust
|
|
fn column_value_list(ctx: &WalkContext) -> Node {
|
|
let cols = ctx.current_table_columns.as_ref().unwrap_or(&Vec::new());
|
|
let mut children: Vec<Node> = Vec::new();
|
|
for (i, col) in cols.iter().enumerate() {
|
|
if i > 0 { children.push(Punct(',')); }
|
|
children.push(slot_for_type(col.user_type));
|
|
}
|
|
Seq(Box::leak(children.into_boxed_slice()))
|
|
}
|
|
```
|
|
|
|
Dynamic sub-grammars return owned `Node` values that the
|
|
walker treats as inline expansions. The leak above is one
|
|
implementation tactic — alternatively, the walker stores the
|
|
expanded node in a small per-walk arena. Both work; pick at
|
|
implementation time.
|
|
|
|
### Aliases
|
|
|
|
A `Word` node carries `primary` and an `aliases` slice. The
|
|
walker matches input against either; completion surfaces only
|
|
the primary; help text mentions aliases prose-style if
|
|
appropriate. Highlight class is the same for both.
|
|
|
|
Round 5's `q` removal is *not* reverted by this design. `q`
|
|
stays gone — adding it back would now be the single line
|
|
`aliases: &["q"]` on the `quit` `Word` node, and would not
|
|
surface as a separate candidate in completion (matching the
|
|
round-5 user request).
|
|
|
|
### `IdentSource`
|
|
|
|
Replaces the current `dsl::ident_slot::IdentSlot`:
|
|
|
|
```rust
|
|
pub enum IdentSource {
|
|
NewName, // user invents; no schema lookup; ProseOnly hint
|
|
Tables, // existing table names
|
|
Columns, // existing column names (filtered by current table)
|
|
Relationships, // existing relationship names
|
|
Types, // closed set from Type::all()
|
|
}
|
|
```
|
|
|
|
`Types` is new — it replaces the magic-string `TYPE_SLOT_LABEL`
|
|
used today. `src/dsl/ident_slot.rs` dissolves into
|
|
`src/dsl/grammar/mod.rs`.
|
|
|
|
### Highlight class assignment
|
|
|
|
Per-byte highlight class is computed as a side effect of the
|
|
walk. Each terminal records `(byte_range, class)` in
|
|
`WalkResult::per_byte_class` as it matches. Unmatched ranges
|
|
(past a definite failure) get the `tok_error` overlay,
|
|
identical to today's behaviour.
|
|
|
|
Default classes per terminal kind:
|
|
|
|
| Terminal | Default class |
|
|
|---|---|
|
|
| `Word` | `tok_keyword` |
|
|
| `Punct` | `tok_punct` |
|
|
| `Ident` | `tok_identifier` |
|
|
| `NumberLit` | `tok_number` |
|
|
| `StringLit` | `tok_string` |
|
|
| `BlobLit` | `tok_string` |
|
|
| `Flag` | `tok_flag` |
|
|
| `BarePath` | `tok_string` |
|
|
|
|
The `highlight_override: Option<HighlightClass>` field on
|
|
`Word` and `Ident` is reserved for future per-slot variants
|
|
(e.g., a Tables slot in a distinct shade vs a NewName slot
|
|
muted) — left `None` everywhere in round 1.
|
|
|
|
No new palette colours for the initial migration.
|
|
|
|
## Migration plan
|
|
|
|
### Code organisation
|
|
|
|
```
|
|
src/dsl/
|
|
grammar/
|
|
mod.rs — Node enum, IdentSource, HintMode, HighlightClass,
|
|
MatchedPath, CommandNode, REGISTRY top-level
|
|
data.rs — insert, update, delete, show
|
|
ddl.rs — create, drop, add, rename, change
|
|
app.rs — quit, help, save/save-as, new, load, rebuild,
|
|
export, import, mode, messages
|
|
shared.rs — typed value slots (int_slot, date_slot, …),
|
|
qualified_column, where_clause, action_keyword,
|
|
column_value_list (dynamic)
|
|
validators.rs — content validators (integer_only_validator,
|
|
date_format_validator, datetime_format_validator,
|
|
type_name_validator, …)
|
|
walker/
|
|
mod.rs — public walk() entry; orchestration
|
|
driver.rs — the per-node-kind dispatch
|
|
context.rs — WalkContext
|
|
outcome.rs — WalkOutcome, MatchedPath, WalkResult
|
|
lex_helpers.rs — identifier-shape, digit-shape, string-escape
|
|
helpers; shared across terminal consume fns
|
|
parser.rs — Phase A: becomes a router. Phase F: deleted.
|
|
lexer.rs — Phase F: deleted.
|
|
keyword.rs — Phase F: deleted.
|
|
ident_slot.rs — Phase F: dissolved into grammar/mod.rs.
|
|
usage.rs — Phase F: REGISTRY deleted; the file may go.
|
|
```
|
|
|
|
### Six-phase migration
|
|
|
|
**Phase A — Walker skeleton + app-lifecycle commands.**
|
|
|
|
- Build the walker driver, `WalkContext`, `WalkOutcome`,
|
|
`MatchedPath`, the terminal consume functions.
|
|
- Migrate the app-lifecycle commands (no schema dependency,
|
|
no value literals): quit, help, rebuild, save, save as, new,
|
|
load, export, import, mode, messages.
|
|
- Router in `parse_command` consults the walker for migrated
|
|
commands; falls back to chumsky for the rest.
|
|
- Differential test scaffolding: a test helper that, for every
|
|
input in the existing test corpus, runs both parsers and
|
|
asserts identical `Command` output where the input falls
|
|
under a migrated command.
|
|
|
|
Exit criteria: walker handles the app-lifecycle commands
|
|
end-to-end; existing tests for those commands pass via the
|
|
walker path; tests for other commands still pass via chumsky.
|
|
|
|
**Phase B — DDL commands without value literals.**
|
|
|
|
- drop table, drop column, drop relationship.
|
|
- rename column.
|
|
- add column (without the value-literal aspect — type slot
|
|
uses `Ident { source: Types }` with a content validator).
|
|
- add 1:n relationship (referential clauses as a static
|
|
sub-grammar).
|
|
- change column (type slot + flags).
|
|
|
|
These exercise schema lookups via `Ident { source: Tables }`
|
|
and `Ident { source: Columns }`, and the `Types` source. No
|
|
typed value slots yet, no `DynamicSubgrammar`.
|
|
|
|
Exit criteria: all DDL commands except `create table` pass
|
|
via the walker; the rest still pass via chumsky.
|
|
|
|
**Phase C — `create table` with column-list value literals.**
|
|
|
|
- The `with pk` clause uses `Repeated` for the column-spec
|
|
list, each spec being a `Seq(Ident{NewName}, Punct(':'),
|
|
Ident{Types}-with-validator)`.
|
|
- First test of `Repeated` with separator.
|
|
|
|
Exit criteria: create table works end-to-end via the walker.
|
|
|
|
**Phase D — data commands with full schema awareness.**
|
|
|
|
- show data, show table, replay.
|
|
- insert: uses `DynamicSubgrammar(column_value_list)` for the
|
|
comma-separated typed value list. Exercises full
|
|
`WalkContext` propagation: `Ident { source: Tables, role:
|
|
"table", writes_table: true }` resolves the column list;
|
|
the dynamic sub-grammar unfolds typed slots per column.
|
|
- update: `set` clauses use `DynamicSubgrammar` to resolve the
|
|
value slot's type from the column. `where` clause uses the
|
|
shared sub-grammar with `AnyValueSlot` (or, optionally, also
|
|
column-typed if the column resolves cleanly).
|
|
- delete: same `where` clause; otherwise simple.
|
|
|
|
This is the phase that proves the design's central claim:
|
|
typed slots, dynamic sub-grammars, and schema-aware narrowing
|
|
all collaborate to produce a single coherent grammar
|
|
declaration per command.
|
|
|
|
Exit criteria: all data commands pass via the walker; the
|
|
round-5 limitations close automatically (save Tab can offer
|
|
`as`, value slots narrow by column type).
|
|
|
|
**Phase E — replay end-to-end.**
|
|
|
|
- replay uses `BarePath` + `StringLit` (quoted form).
|
|
- Internally replays each line through the same dispatch
|
|
pipeline.
|
|
|
|
Exit criteria: replay works end-to-end via the walker; nested
|
|
replay rejection still fires from the runtime, unchanged.
|
|
|
|
**Phase F — cleanup.**
|
|
|
|
- Delete `dsl/parser.rs`.
|
|
- Delete `dsl/lexer.rs`.
|
|
- Delete `dsl/keyword.rs`.
|
|
- Delete `dsl/ident_slot.rs` (already merged into
|
|
`grammar/mod.rs` in Phase A).
|
|
- Delete `dsl/usage.rs::REGISTRY`.
|
|
- Delete `chumsky` dependency from `Cargo.toml`.
|
|
- Delete `parse.token.keyword.*` entries from the catalog and
|
|
`keys.rs` that the walker doesn't need (the keyword
|
|
vocabulary is implicit in the grammar nodes).
|
|
- Remove the differential test scaffolding from Phase A.
|
|
|
|
Exit criteria: working tree clean of legacy parser code;
|
|
test suite still all-green; `cargo clippy --all-targets --
|
|
-D warnings` passes; `cargo build --release` binary not
|
|
noticeably larger.
|
|
|
|
### Test discipline
|
|
|
|
Three guarantees throughout migration:
|
|
|
|
1. **Full test suite green at every commit.** Migration is
|
|
per-command; tests are per-behaviour. They don't care
|
|
which parser produces a `Command` — they assert input →
|
|
expected output. If a test fails mid-migration, the
|
|
walker hasn't reproduced behaviour; fix the walker
|
|
before continuing.
|
|
2. **Walker-specific tests for trie-only features.** Schema-
|
|
aware narrowing, `WalkContext` propagation, dynamic sub-
|
|
grammar expansion, `HintMode` per-node behaviour, the
|
|
round-5 "save Tab offers as" gap-closing — each gets new
|
|
tests as the feature lands.
|
|
3. **Differential check during the migration window.** A
|
|
test helper iterates the existing input corpus, runs both
|
|
parsers on inputs that fall under a migrated command, and
|
|
asserts identical `Command` output. Cheap insurance
|
|
against subtle divergence. Removed at Phase F cleanup.
|
|
|
|
### Cleanup pass at Phase F
|
|
|
|
Beyond deleting the legacy modules, Phase F includes catalog
|
|
cleanup. The `parse.token.keyword.*` entries (40+ of them) are
|
|
near-mechanical wrappers (`create: "\`create\`"`); with no
|
|
external code looking up these keys (the walker renders
|
|
keyword names from `Word` node literals directly), the
|
|
entries can collapse. A small `format_keyword_for_error(literal)
|
|
-> String` helper replaces them. The `keys.rs` declarations
|
|
go with them.
|
|
|
|
Help text in `help.cli_banner` and `help.in_app_body` stays
|
|
as hand-written prose — the alternative (auto-generating from
|
|
the grammar) was deferred during the round-6 discussion as a
|
|
separate concern; the grammar tree carries enough metadata
|
|
(per-command `help_id`) for future automation but the prose
|
|
documentation is still hand-curated for round 1.
|
|
|
|
## Consequences
|
|
|
|
### What we gain
|
|
|
|
- **One declaration per command.** Entry keyword, shape, AST
|
|
builder, dispatch handler, usage reference, help reference
|
|
all colocated. Adding a command is one block in one file.
|
|
- **No cross-file scatter.** The round-5 "10 places to remove
|
|
`q`" critique is structurally addressed: there's nowhere
|
|
else for keyword/usage/registry info to live but the
|
|
grammar tree.
|
|
- **Schema-aware narrowing from day one.** Typed value slots
|
|
reject mis-shaped input at parse time with localised error
|
|
wording; completion narrows per column type; the round-5
|
|
value-literal slot hint becomes type-specific
|
|
("Type a date as 'YYYY-MM-DD'") not generic.
|
|
- **Aliases as a single annotation.** `q` could come back as
|
|
one line on the `quit` `Word` node, no scatter.
|
|
- **Tests focus on behaviour, not enumeration.** Tests that
|
|
hardcoded keyword lists during round 5 (we noted these in
|
|
`usage.rs` and `completion.rs`) can iterate the trie
|
|
registry instead, becoming structural rather than
|
|
literal.
|
|
- **Drift is structurally impossible.** Completion, highlight,
|
|
parse, usage, and help all derive from the same trie. No
|
|
separate sources to keep in sync.
|
|
|
|
### What we accept
|
|
|
|
- **Parse depends on schema state.** A DSL command that
|
|
references a non-existent table fails at parse time, not
|
|
at execute time as today. This matches the user mental
|
|
model when typing (the schema cache is current per
|
|
ADR-0022) and yields better completion / hint
|
|
experience. It does mean tests that exercised parser
|
|
behaviour in isolation may now need to set up a schema
|
|
cache.
|
|
- **chumsky's general-purpose features go unused.** Recovery
|
|
on ambiguous input, multi-error reporting in a single
|
|
pass, ambiguous-grammar handling — features chumsky offers
|
|
but our DSL doesn't use. The trade is fine because our
|
|
grammar is deterministic.
|
|
- **Some implementation complexity moves into the walker.**
|
|
Whitespace skipping between siblings, terminal consume
|
|
functions, character-level shape recognition — the lexer
|
|
did some of this implicitly; the walker does it
|
|
explicitly. Net code is comparable or smaller because the
|
|
scatter cost goes away.
|
|
|
|
### What's out of scope for this ADR
|
|
|
|
- **External tooling integration (LSP, editor extensions).**
|
|
The registry is `pub` and accessible via accessor
|
|
functions, so future tooling work doesn't fight this design.
|
|
No tooling is built in round 1.
|
|
- **Help text auto-generation.** Grammar tree carries
|
|
`help_id` per node, but the help catalog body stays
|
|
hand-curated.
|
|
- **Performance optimisation.** Walker re-runs per keystroke
|
|
for completion + highlighting. Naïve implementation is
|
|
acceptable; if hot-path concerns emerge later, caching /
|
|
incremental walks become a separate ADR.
|
|
- **Ranker implementations.** The ranker hook exists; default
|
|
is identity. Frequency-based ranking, content-aware priors
|
|
for type completion ("Email → text first, Score → real"),
|
|
recency — all future work that plugs into the ranker
|
|
signature without touching grammar declarations.
|
|
- **Per-slot highlight overrides.** The `highlight_override`
|
|
field exists but stays `None` in round 1. Differentiating
|
|
table-ident from new-name-ident visually is a future
|
|
enhancement.
|
|
|
|
## References
|
|
|
|
- ADR-0023 — Unified declarative grammar tree (Proposed direction). Superseded by this ADR for execution detail.
|
|
- ADR-0001 — Language and TUI framework (chumsky choice). Phase F removes the chumsky dependency.
|
|
- ADR-0019 — Friendly error layer and i18n catalog. Catalog conventions stay; `parse.token.keyword.*` entries collapse in Phase F.
|
|
- ADR-0020 — Tokenization layer for the DSL parser. Superseded by the scannerless walker.
|
|
- ADR-0021 — Parser-as-source-of-truth for H1a. Usage info migrates from a separate registry to grammar nodes.
|
|
- ADR-0022 — Ambient typing assistance. The walker subsumes the expected-set introspection that powered completion in that ADR.
|
|
- Round-6 session transcript — design pass that produced this spec.
|