add ADR-0024: unified grammar tree execution plan (accepted)
Concrete specification for the direction in ADR-0023, landed during the round-6 design pass. Resolves all four rounds of open design questions: walker as single source of truth, scannerless terminal vocabulary (~8 building blocks), typed value slots with content validators, WalkContext for schema- aware narrowing from day one, WalkOutcome multi-purpose return, HintMode per-node, ranker as separate layer, static + dynamic sub-grammars, aliases as Word annotations, IdentSource taxonomy, six-phase per-command migration with chumsky and walker side-by-side during the transition. Key shifts from ADR-0023's sketch: - Lexer dissolves entirely. Walker operates on bytes directly. dsl/lexer.rs, dsl/keyword.rs go away in Phase F. - Schema-aware parse from day one (not phased). Typed value slots reject mis-shaped input at parse time with localised wording. Completion narrows per column type. - Sub-grammars: static (fn() -> Node) for composition; dynamic (fn(&WalkContext) -> Node) for schema-dependent expansion. No global named registry. - Path-bearing commands: BarePath becomes a routine non-whitespace terminal. Paths with spaces require quoting via StringLit (UX simplification, aligns with standard CLI convention). - 13-node taxonomy: Word, Punct, Ident, NumberLit, StringLit, BlobLit, Flag, BarePath, Choice, Seq, Optional, Repeated, DynamicSubgrammar. Migration plan: Phase A (walker scaffolding + app-lifecycle commands), Phase B (DDL without value literals), Phase C (create table), Phase D (data commands with full schema awareness -- the design's central claim landing), Phase E (replay), Phase F (delete chumsky + lexer + legacy parser modules, simplify catalog). Estimated ~4 sessions total. Also: rename ADR-0023 from 0023-proposed-unified-grammar-tree.md to 0023-unified-grammar-tree.md (git mv preserves history) and update its status to reflect the direction-accepted-but- superseded-for-execution-detail relationship with ADR-0024. Index updated.
This commit is contained in:
+16
-13
@@ -1,21 +1,24 @@
|
||||
# ADR-0023: Unified declarative grammar tree (proposed direction)
|
||||
# ADR-0023: Unified declarative grammar tree (direction)
|
||||
|
||||
## Status
|
||||
|
||||
**Proposed.**
|
||||
**Accepted in direction, superseded for execution detail by
|
||||
ADR-0024.** 2026-05-14.
|
||||
|
||||
Not yet accepted. Captures a researched direction for a future
|
||||
refactor that supersedes the parts of ADR-0001 (chumsky as the
|
||||
DSL parser), ADR-0019 (separated catalog declaration), ADR-0020
|
||||
(lexer + keyword macro), ADR-0021 (per-command usage registry),
|
||||
and ADR-0022 (completion via expected-set introspection,
|
||||
highlighting via lexer) that this ADR identifies as accreted
|
||||
rather than designed.
|
||||
This ADR captures the architectural critique (the "10-place
|
||||
edit" scatter problem with the current parser shape) and the
|
||||
direction (a unified declarative grammar tree). The round-6
|
||||
design pass turned that direction into a concrete specification,
|
||||
which ships as ADR-0024. ADR-0024 makes some refinements
|
||||
beyond what's sketched here — notably the decision to drop the
|
||||
lexer module entirely (scannerless walker) and to put schema-
|
||||
aware narrowing into round 1 rather than phasing it. Read
|
||||
ADR-0024 for the executable plan; this ADR remains for the
|
||||
institutional memory of why the change is happening.
|
||||
|
||||
Filename carries the `-proposed-` segment so the status is
|
||||
visible at directory listing time; on acceptance, rename to
|
||||
`0023-unified-grammar-tree.md` via `git mv` (history
|
||||
preserved).
|
||||
The filename was renamed from `0023-proposed-unified-grammar-tree.md`
|
||||
to `0023-unified-grammar-tree.md` when the direction was
|
||||
accepted. History is preserved through the `git mv`.
|
||||
|
||||
## Context
|
||||
|
||||
@@ -0,0 +1,701 @@
|
||||
# ADR-0024: Unified grammar tree — execution plan
|
||||
|
||||
## Status
|
||||
|
||||
**Accepted.** 2026-05-14.
|
||||
|
||||
Concrete specification for the direction proposed in ADR-0023.
|
||||
Where ADR-0023 captured the critique of the current parser
|
||||
shape and the high-level vision, this ADR specifies the data
|
||||
model, walker semantics, migration sequence, and cleanup steps
|
||||
in enough detail that implementation can proceed without
|
||||
further design decisions.
|
||||
|
||||
Supersedes ADR-0023's "Proposed" status. ADR-0023 stays in
|
||||
the directory as institutional memory of why this change is
|
||||
happening; ADR-0024 is what gets built.
|
||||
|
||||
## Context
|
||||
|
||||
The design pass landed in the round-6 session (2026-05-14)
|
||||
worked through ADR-0023's open questions and a number of
|
||||
implicit decisions that hadn't been written down. Four rounds
|
||||
of questions, each followed by user confirmation:
|
||||
|
||||
1. **Round 1 — foundational.** Registry shape, node taxonomy,
|
||||
AST output model, failure / "expected" semantics, walker
|
||||
API and its mapping to parse / complete / highlight / hint
|
||||
concerns.
|
||||
2. **Round 2 — concrete representation.** Multi-keyword
|
||||
sequences, sub-grammar reusability (static and dynamic),
|
||||
path-bearing commands, bare-or-with-suffix commands.
|
||||
3. **Round 3 — organisation and migration.** Module layout,
|
||||
per-command migration strategy, test discipline during
|
||||
migration.
|
||||
4. **Round 4 — smaller details.** Aliases on keyword nodes,
|
||||
`IdentSlot` fate, highlight palette, external-tooling
|
||||
exposure.
|
||||
|
||||
Two larger decisions emerged from the rounds and shifted the
|
||||
shape from ADR-0023's sketch:
|
||||
|
||||
- **The lexer dissolves.** The walker operates directly on
|
||||
source bytes ("scannerless"). The current `dsl/lexer.rs`
|
||||
module's responsibilities (whitespace skipping, token shape
|
||||
recognition, byte-span tracking) migrate into terminal-node
|
||||
consume functions and the walker driver. The `define_keywords!`
|
||||
macro is no longer needed in its current form; keyword
|
||||
literals live on `Word` nodes in the grammar.
|
||||
- **Schema-aware parse from day one.** ADR-0023 had been
|
||||
cautious about coupling parse to schema state. The round-1
|
||||
/ round-2 discussion concluded that this caution comes from
|
||||
general-purpose parser tooling and doesn't apply to an
|
||||
interactive DSL editor where the schema *is* the context.
|
||||
Typed value slots consult the schema during parse; bind-time
|
||||
type checks remain but become belt-and-braces rather than
|
||||
the primary defense.
|
||||
|
||||
A separate critique surfaced in the design pass: my (Claude's)
|
||||
default pull toward "what's the safe incremental version of
|
||||
what general-purpose parser tooling does" repeatedly fought
|
||||
against the project owner's cleaner direct design. The pull
|
||||
is now explicitly resisted — this ADR ships the direct design,
|
||||
not a phased compromise.
|
||||
|
||||
## Decision summary
|
||||
|
||||
A single trie data structure declared in Rust serves as the
|
||||
authority for parsing, completion, syntax highlighting, parse-
|
||||
error usage rendering, hint-panel content, and (eventually)
|
||||
external-tooling exposure. The walker that consumes this trie
|
||||
operates directly on source bytes — no separate lexer pass.
|
||||
Schema-aware narrowing flows naturally from the trie's
|
||||
structure: typed value slots and dynamic sub-grammars consult
|
||||
a per-walk context that carries the current table, the
|
||||
resolved column types, and a reference to the schema cache.
|
||||
|
||||
Migration is per-command across six phases. The legacy
|
||||
chumsky parser and the new walker run side-by-side during the
|
||||
transition; existing behavioural tests guard regressions.
|
||||
Phase F removes chumsky, the lexer module, the separate
|
||||
`UsageEntry` registry, and the expected-set introspection
|
||||
in `completion.rs`.
|
||||
|
||||
Estimated total cost: ~4 sessions — one to land the framework
|
||||
and migrate Phase A, two for Phases B-D, one for Phases E + F.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Walker as single source of truth
|
||||
|
||||
```rust
|
||||
pub fn walk(
|
||||
source: &str,
|
||||
bound: WalkBound,
|
||||
ctx: &mut WalkContext,
|
||||
) -> WalkResult<'_>;
|
||||
|
||||
pub enum WalkBound {
|
||||
EndOfInput, // parse: walk all input
|
||||
Position(usize), // complete / hint: walk up to cursor byte
|
||||
}
|
||||
|
||||
pub struct WalkResult<'a> {
|
||||
pub outcome: WalkOutcome,
|
||||
pub matched_path: MatchedPath,
|
||||
pub per_byte_class: Vec<(ByteRange, HighlightClass)>,
|
||||
}
|
||||
|
||||
pub enum WalkOutcome {
|
||||
Match { command_idx: usize },
|
||||
Incomplete { position: usize, expected: Vec<&'static Node> },
|
||||
Mismatch { position: usize, expected: Vec<&'static Node>, found_byte: u8 },
|
||||
ValidationFailed { position: usize, message_key: &'static str, args: Vec<(&'static str, String)> },
|
||||
}
|
||||
```
|
||||
|
||||
Consumers:
|
||||
|
||||
- **Parse for dispatch.** `walk(source, EndOfInput, ctx)`. On
|
||||
`Match`, invoke `commands[command_idx].ast_builder(matched_path)`
|
||||
and dispatch the returned `Command`.
|
||||
- **Highlighting.** `walk(source, EndOfInput, ctx).per_byte_class`.
|
||||
Each terminal records `(byte_range, node.highlight_class())`
|
||||
as it matches. Unmatched ranges (past a failure) get the
|
||||
`tok_error` overlay.
|
||||
- **Completion at cursor.** `walk(source, Position(cursor), ctx)`,
|
||||
inspect `outcome.expected`. Each expected `Node` contributes
|
||||
candidates: `Word` → its primary literal, `Ident { source }`
|
||||
→ schema-cache lookup, `Flag` → `--name`, value-literal slot
|
||||
→ type-appropriate hint per `HintMode`, etc.
|
||||
- **Hint panel ambient.** Same walk as completion. The hint
|
||||
resolver consults `WalkOutcome` variants plus the expected
|
||||
nodes' `HintMode` to choose between candidates rendering,
|
||||
prose, suppression, etc.
|
||||
|
||||
### Scannerless: no lexer module
|
||||
|
||||
Terminal nodes consume bytes directly. No pre-pass produces a
|
||||
`Vec<Token>`. The walker's driver handles whitespace skipping
|
||||
between siblings of a `Seq` and dispatches to each terminal's
|
||||
`consume(source, position)` function.
|
||||
|
||||
Character-level helpers (identifier shape, digit-sequence shape,
|
||||
quoted-string escape handling) live in
|
||||
`src/dsl/walker/lex_helpers.rs` — a small shared module used
|
||||
by the various terminal consume functions. This is internally
|
||||
similar to the current lexer's logic, but it's invoked per-position
|
||||
by the walker rather than as a pre-pass.
|
||||
|
||||
`src/dsl/lexer.rs` and `src/dsl/keyword.rs` are deleted in
|
||||
Phase F. The keyword vocabulary is no longer a Rust enum; each
|
||||
keyword exists as a `Word` node in the grammar declarations.
|
||||
|
||||
### Node taxonomy
|
||||
|
||||
Thirteen node kinds. Three categories:
|
||||
|
||||
**Terminals** (consume bytes):
|
||||
|
||||
```rust
|
||||
pub enum Node {
|
||||
Word {
|
||||
primary: &'static str,
|
||||
aliases: &'static [&'static str],
|
||||
// Default tok_keyword unless overridden.
|
||||
highlight_override: Option<HighlightClass>,
|
||||
},
|
||||
Punct(char),
|
||||
Ident {
|
||||
source: IdentSource,
|
||||
role: &'static str,
|
||||
highlight_override: Option<HighlightClass>,
|
||||
},
|
||||
NumberLit,
|
||||
StringLit,
|
||||
BlobLit,
|
||||
Flag(&'static str),
|
||||
BarePath,
|
||||
// Combinators ↓
|
||||
}
|
||||
```
|
||||
|
||||
**Combinators** (compose other nodes):
|
||||
|
||||
```rust
|
||||
Choice(&'static [Node]),
|
||||
Seq(&'static [Node]),
|
||||
Optional(&'static Node),
|
||||
Repeated {
|
||||
inner: &'static Node,
|
||||
separator: Option<&'static Node>,
|
||||
min: usize,
|
||||
},
|
||||
```
|
||||
|
||||
**Dynamic** (resolves at walk time using `WalkContext`):
|
||||
|
||||
```rust
|
||||
DynamicSubgrammar(fn(&WalkContext) -> Node),
|
||||
}
|
||||
```
|
||||
|
||||
`CommandNode` is the top-level entry record:
|
||||
|
||||
```rust
|
||||
pub struct CommandNode {
|
||||
pub entry: Word,
|
||||
pub shape: Node, // usually a Seq
|
||||
pub ast_builder: fn(&MatchedPath) -> Command,
|
||||
pub dispatch: fn(&mut App, Command) -> Vec<Action>,
|
||||
pub help_id: Option<&'static str>,
|
||||
pub usage_id: Option<&'static str>,
|
||||
// Hint mode override at command level; nodes can carry their own too.
|
||||
pub hint_mode: Option<HintMode>,
|
||||
}
|
||||
|
||||
pub const REGISTRY: &[CommandNode] = &[ /* ... */ ];
|
||||
```
|
||||
|
||||
### Typed value slots
|
||||
|
||||
Value-literal positions use typed slots built from terminals
|
||||
plus content validators. One slot factory per data type:
|
||||
|
||||
```rust
|
||||
fn int_slot() -> Node { Choice(&[NumberLit_with(integer_only_validator), null_word()]) }
|
||||
fn real_slot() -> Node { Choice(&[NumberLit, null_word()]) }
|
||||
fn decimal_slot() -> Node { Choice(&[NumberLit_with(decimal_validator), null_word()]) }
|
||||
fn bool_slot() -> Node { Choice(&[Word("true", &[]), Word("false", &[]), null_word()]) }
|
||||
fn text_slot() -> Node { Choice(&[StringLit, null_word()]) }
|
||||
fn date_slot() -> Node { Choice(&[StringLit_with(date_format_validator), null_word()]) }
|
||||
fn datetime_slot() -> Node { Choice(&[StringLit_with(datetime_format_validator), null_word()]) }
|
||||
fn blob_slot() -> Node { Choice(&[BlobLit, null_word()]) }
|
||||
```
|
||||
|
||||
`StringLit_with(validator)` is a `StringLit` terminal carrying
|
||||
a content validator that runs after a successful match. Same
|
||||
for `NumberLit_with`. A failed validator surfaces as
|
||||
`WalkOutcome::ValidationFailed` with the validator's catalog
|
||||
key.
|
||||
|
||||
`slot_for_type(ty: Type) -> Node` is the dispatcher: given a
|
||||
column type, returns the appropriate slot. Used by dynamic
|
||||
sub-grammars (see below).
|
||||
|
||||
### `WalkContext`
|
||||
|
||||
```rust
|
||||
pub struct WalkContext<'a> {
|
||||
pub schema: &'a SchemaCache,
|
||||
// Current table inferred from the partial parse — e.g.,
|
||||
// `insert into Customers ...` sets `current_table = "Customers"`.
|
||||
pub current_table: Option<String>,
|
||||
// The columns of `current_table`, in declaration order, with types.
|
||||
// Populated by Ident { source: Tables } when it matches a
|
||||
// known table.
|
||||
pub current_table_columns: Option<Vec<ColumnInfo>>,
|
||||
// For comma-separated value lists, which position we're at.
|
||||
pub value_position: usize,
|
||||
// For `set` clauses and `where` clauses, the column whose value
|
||||
// we're about to consume.
|
||||
pub current_column: Option<ColumnInfo>,
|
||||
}
|
||||
```
|
||||
|
||||
Nodes can write to `WalkContext`:
|
||||
|
||||
- `Ident { source: Tables, role: "table", writes_table: true }`
|
||||
on match sets `ctx.current_table` to the matched identifier
|
||||
and resolves `ctx.current_table_columns` from the schema.
|
||||
- `Ident { source: Columns, role: "column", writes_current_column: true }`
|
||||
on match sets `ctx.current_column` from the resolved column list.
|
||||
|
||||
Nodes can read from `WalkContext`:
|
||||
|
||||
- `DynamicSubgrammar(column_value_list)` reads
|
||||
`ctx.current_table_columns` and unfolds to a `Seq` of
|
||||
comma-separated typed slots — one per column.
|
||||
- The value slot after `set col=` reads `ctx.current_column.user_type`
|
||||
to pick the right typed slot.
|
||||
|
||||
### `WalkOutcome` and "expected"
|
||||
|
||||
The walker keeps track of the longest prefix that matched and
|
||||
the position at which it failed (or completed). At a failure
|
||||
or incomplete position, `expected` is the set of nodes that
|
||||
could legally continue the walk — derived structurally from
|
||||
the trie, not from a separate "expected" table.
|
||||
|
||||
For a `Seq` mid-walk, `expected` is the next child node.
|
||||
For a `Choice` that hasn't committed to a branch, `expected`
|
||||
is all children. For an `Optional` at a position where its
|
||||
inner could start, `expected` includes the inner plus the
|
||||
next sibling.
|
||||
|
||||
This is the same information chumsky's
|
||||
`ParseError::Invalid::expected` carries today, sourced from
|
||||
the trie directly instead of via combinator introspection.
|
||||
|
||||
### `HintMode` per node
|
||||
|
||||
Each node may carry a `HintMode`:
|
||||
|
||||
```rust
|
||||
pub enum HintMode {
|
||||
/// Candidates if any surface; else prose fallback.
|
||||
Default,
|
||||
/// Force the prose at this catalog key regardless of candidates.
|
||||
/// Used by NewName slots ("Type a name, then `(`").
|
||||
ForceProse(&'static str),
|
||||
/// Show only the prose; suppress Tab candidates.
|
||||
/// Used by typed value slots at empty prefix.
|
||||
ProseOnly(&'static str),
|
||||
/// Suppress prose; only candidates.
|
||||
SuppressProse,
|
||||
}
|
||||
```
|
||||
|
||||
The walker propagates each expected node's `HintMode` to the
|
||||
hint resolver, which dispatches accordingly.
|
||||
|
||||
The current ad-hoc cases in `input_render.rs::ambient_hint`
|
||||
(value-literal slot suppression, NewName slot typing-name
|
||||
prose, invalid-ident overlay) migrate to node-attached
|
||||
`HintMode` annotations during Phase D.
|
||||
|
||||
### Ranker layer
|
||||
|
||||
A ranker function runs between the walker's raw candidate
|
||||
output and the hint-panel renderer:
|
||||
|
||||
```rust
|
||||
pub type Ranker = fn(&WalkContext, Vec<Candidate>) -> Vec<Candidate>;
|
||||
|
||||
pub fn identity_ranker(_: &WalkContext, c: Vec<Candidate>) -> Vec<Candidate> { c }
|
||||
```
|
||||
|
||||
Default is `identity_ranker` — declaration order from the
|
||||
trie is preserved. The signature allows future enhancements
|
||||
(frequency-based ranking, content-aware priors for type
|
||||
suggestions per column name) to plug in without changing
|
||||
grammar declarations.
|
||||
|
||||
The ranker lives outside the trie. Grammar declarations are
|
||||
about *what's valid*; ranking is about *what's likely useful
|
||||
first*.
|
||||
|
||||
### Sub-grammars
|
||||
|
||||
Two flavours, no global registry:
|
||||
|
||||
**Static** — pure composition, function returning a const node:
|
||||
|
||||
```rust
|
||||
const fn qualified_column(role_table: &'static str, role_col: &'static str) -> Node {
|
||||
Seq(&[
|
||||
Ident { source: Tables, role: role_table, /* ... */ },
|
||||
Punct('.'),
|
||||
Ident { source: Columns, role: role_col, /* ... */ },
|
||||
])
|
||||
}
|
||||
|
||||
const fn where_clause() -> Node {
|
||||
Seq(&[
|
||||
Word { primary: "where", /* ... */ },
|
||||
Ident { source: Columns, role: "filter_column", /* ... */ },
|
||||
Punct('='),
|
||||
AnyValueSlot,
|
||||
])
|
||||
}
|
||||
```
|
||||
|
||||
**Dynamic** — context-aware, expands at walk time:
|
||||
|
||||
```rust
|
||||
fn column_value_list(ctx: &WalkContext) -> Node {
|
||||
let cols = ctx.current_table_columns.as_ref().unwrap_or(&Vec::new());
|
||||
let mut children: Vec<Node> = Vec::new();
|
||||
for (i, col) in cols.iter().enumerate() {
|
||||
if i > 0 { children.push(Punct(',')); }
|
||||
children.push(slot_for_type(col.user_type));
|
||||
}
|
||||
Seq(Box::leak(children.into_boxed_slice()))
|
||||
}
|
||||
```
|
||||
|
||||
Dynamic sub-grammars return owned `Node` values that the
|
||||
walker treats as inline expansions. The leak above is one
|
||||
implementation tactic — alternatively, the walker stores the
|
||||
expanded node in a small per-walk arena. Both work; pick at
|
||||
implementation time.
|
||||
|
||||
### Aliases
|
||||
|
||||
A `Word` node carries `primary` and an `aliases` slice. The
|
||||
walker matches input against either; completion surfaces only
|
||||
the primary; help text mentions aliases prose-style if
|
||||
appropriate. Highlight class is the same for both.
|
||||
|
||||
Round 5's `q` removal is *not* reverted by this design. `q`
|
||||
stays gone — adding it back would now be the single line
|
||||
`aliases: &["q"]` on the `quit` `Word` node, and would not
|
||||
surface as a separate candidate in completion (matching the
|
||||
round-5 user request).
|
||||
|
||||
### `IdentSource`
|
||||
|
||||
Replaces the current `dsl::ident_slot::IdentSlot`:
|
||||
|
||||
```rust
|
||||
pub enum IdentSource {
|
||||
NewName, // user invents; no schema lookup; ProseOnly hint
|
||||
Tables, // existing table names
|
||||
Columns, // existing column names (filtered by current table)
|
||||
Relationships, // existing relationship names
|
||||
Types, // closed set from Type::all()
|
||||
}
|
||||
```
|
||||
|
||||
`Types` is new — it replaces the magic-string `TYPE_SLOT_LABEL`
|
||||
used today. `src/dsl/ident_slot.rs` dissolves into
|
||||
`src/dsl/grammar/mod.rs`.
|
||||
|
||||
### Highlight class assignment
|
||||
|
||||
Per-byte highlight class is computed as a side effect of the
|
||||
walk. Each terminal records `(byte_range, class)` in
|
||||
`WalkResult::per_byte_class` as it matches. Unmatched ranges
|
||||
(past a definite failure) get the `tok_error` overlay,
|
||||
identical to today's behaviour.
|
||||
|
||||
Default classes per terminal kind:
|
||||
|
||||
| Terminal | Default class |
|
||||
|---|---|
|
||||
| `Word` | `tok_keyword` |
|
||||
| `Punct` | `tok_punct` |
|
||||
| `Ident` | `tok_identifier` |
|
||||
| `NumberLit` | `tok_number` |
|
||||
| `StringLit` | `tok_string` |
|
||||
| `BlobLit` | `tok_string` |
|
||||
| `Flag` | `tok_flag` |
|
||||
| `BarePath` | `tok_string` |
|
||||
|
||||
The `highlight_override: Option<HighlightClass>` field on
|
||||
`Word` and `Ident` is reserved for future per-slot variants
|
||||
(e.g., a Tables slot in a distinct shade vs a NewName slot
|
||||
muted) — left `None` everywhere in round 1.
|
||||
|
||||
No new palette colours for the initial migration.
|
||||
|
||||
## Migration plan
|
||||
|
||||
### Code organisation
|
||||
|
||||
```
|
||||
src/dsl/
|
||||
grammar/
|
||||
mod.rs — Node enum, IdentSource, HintMode, HighlightClass,
|
||||
MatchedPath, CommandNode, REGISTRY top-level
|
||||
data.rs — insert, update, delete, show
|
||||
ddl.rs — create, drop, add, rename, change
|
||||
app.rs — quit, help, save/save-as, new, load, rebuild,
|
||||
export, import, mode, messages
|
||||
shared.rs — typed value slots (int_slot, date_slot, …),
|
||||
qualified_column, where_clause, action_keyword,
|
||||
column_value_list (dynamic)
|
||||
validators.rs — content validators (integer_only_validator,
|
||||
date_format_validator, datetime_format_validator,
|
||||
type_name_validator, …)
|
||||
walker/
|
||||
mod.rs — public walk() entry; orchestration
|
||||
driver.rs — the per-node-kind dispatch
|
||||
context.rs — WalkContext
|
||||
outcome.rs — WalkOutcome, MatchedPath, WalkResult
|
||||
lex_helpers.rs — identifier-shape, digit-shape, string-escape
|
||||
helpers; shared across terminal consume fns
|
||||
parser.rs — Phase A: becomes a router. Phase F: deleted.
|
||||
lexer.rs — Phase F: deleted.
|
||||
keyword.rs — Phase F: deleted.
|
||||
ident_slot.rs — Phase F: dissolved into grammar/mod.rs.
|
||||
usage.rs — Phase F: REGISTRY deleted; the file may go.
|
||||
```
|
||||
|
||||
### Six-phase migration
|
||||
|
||||
**Phase A — Walker skeleton + app-lifecycle commands.**
|
||||
|
||||
- Build the walker driver, `WalkContext`, `WalkOutcome`,
|
||||
`MatchedPath`, the terminal consume functions.
|
||||
- Migrate the app-lifecycle commands (no schema dependency,
|
||||
no value literals): quit, help, rebuild, save, save as, new,
|
||||
load, export, import, mode, messages.
|
||||
- Router in `parse_command` consults the walker for migrated
|
||||
commands; falls back to chumsky for the rest.
|
||||
- Differential test scaffolding: a test helper that, for every
|
||||
input in the existing test corpus, runs both parsers and
|
||||
asserts identical `Command` output where the input falls
|
||||
under a migrated command.
|
||||
|
||||
Exit criteria: walker handles the app-lifecycle commands
|
||||
end-to-end; existing tests for those commands pass via the
|
||||
walker path; tests for other commands still pass via chumsky.
|
||||
|
||||
**Phase B — DDL commands without value literals.**
|
||||
|
||||
- drop table, drop column, drop relationship.
|
||||
- rename column.
|
||||
- add column (without the value-literal aspect — type slot
|
||||
uses `Ident { source: Types }` with a content validator).
|
||||
- add 1:n relationship (referential clauses as a static
|
||||
sub-grammar).
|
||||
- change column (type slot + flags).
|
||||
|
||||
These exercise schema lookups via `Ident { source: Tables }`
|
||||
and `Ident { source: Columns }`, and the `Types` source. No
|
||||
typed value slots yet, no `DynamicSubgrammar`.
|
||||
|
||||
Exit criteria: all DDL commands except `create table` pass
|
||||
via the walker; the rest still pass via chumsky.
|
||||
|
||||
**Phase C — `create table` with column-list value literals.**
|
||||
|
||||
- The `with pk` clause uses `Repeated` for the column-spec
|
||||
list, each spec being a `Seq(Ident{NewName}, Punct(':'),
|
||||
Ident{Types}-with-validator)`.
|
||||
- First test of `Repeated` with separator.
|
||||
|
||||
Exit criteria: create table works end-to-end via the walker.
|
||||
|
||||
**Phase D — data commands with full schema awareness.**
|
||||
|
||||
- show data, show table, replay.
|
||||
- insert: uses `DynamicSubgrammar(column_value_list)` for the
|
||||
comma-separated typed value list. Exercises full
|
||||
`WalkContext` propagation: `Ident { source: Tables, role:
|
||||
"table", writes_table: true }` resolves the column list;
|
||||
the dynamic sub-grammar unfolds typed slots per column.
|
||||
- update: `set` clauses use `DynamicSubgrammar` to resolve the
|
||||
value slot's type from the column. `where` clause uses the
|
||||
shared sub-grammar with `AnyValueSlot` (or, optionally, also
|
||||
column-typed if the column resolves cleanly).
|
||||
- delete: same `where` clause; otherwise simple.
|
||||
|
||||
This is the phase that proves the design's central claim:
|
||||
typed slots, dynamic sub-grammars, and schema-aware narrowing
|
||||
all collaborate to produce a single coherent grammar
|
||||
declaration per command.
|
||||
|
||||
Exit criteria: all data commands pass via the walker; the
|
||||
round-5 limitations close automatically (save Tab can offer
|
||||
`as`, value slots narrow by column type).
|
||||
|
||||
**Phase E — replay end-to-end.**
|
||||
|
||||
- replay uses `BarePath` + `StringLit` (quoted form).
|
||||
- Internally replays each line through the same dispatch
|
||||
pipeline.
|
||||
|
||||
Exit criteria: replay works end-to-end via the walker; nested
|
||||
replay rejection still fires from the runtime, unchanged.
|
||||
|
||||
**Phase F — cleanup.**
|
||||
|
||||
- Delete `dsl/parser.rs`.
|
||||
- Delete `dsl/lexer.rs`.
|
||||
- Delete `dsl/keyword.rs`.
|
||||
- Delete `dsl/ident_slot.rs` (already merged into
|
||||
`grammar/mod.rs` in Phase A).
|
||||
- Delete `dsl/usage.rs::REGISTRY`.
|
||||
- Delete `chumsky` dependency from `Cargo.toml`.
|
||||
- Delete `parse.token.keyword.*` entries from the catalog and
|
||||
`keys.rs` that the walker doesn't need (the keyword
|
||||
vocabulary is implicit in the grammar nodes).
|
||||
- Remove the differential test scaffolding from Phase A.
|
||||
|
||||
Exit criteria: working tree clean of legacy parser code;
|
||||
test suite still all-green; `cargo clippy --all-targets --
|
||||
-D warnings` passes; `cargo build --release` binary not
|
||||
noticeably larger.
|
||||
|
||||
### Test discipline
|
||||
|
||||
Three guarantees throughout migration:
|
||||
|
||||
1. **Full test suite green at every commit.** Migration is
|
||||
per-command; tests are per-behaviour. They don't care
|
||||
which parser produces a `Command` — they assert input →
|
||||
expected output. If a test fails mid-migration, the
|
||||
walker hasn't reproduced behaviour; fix the walker
|
||||
before continuing.
|
||||
2. **Walker-specific tests for trie-only features.** Schema-
|
||||
aware narrowing, `WalkContext` propagation, dynamic sub-
|
||||
grammar expansion, `HintMode` per-node behaviour, the
|
||||
round-5 "save Tab offers as" gap-closing — each gets new
|
||||
tests as the feature lands.
|
||||
3. **Differential check during the migration window.** A
|
||||
test helper iterates the existing input corpus, runs both
|
||||
parsers on inputs that fall under a migrated command, and
|
||||
asserts identical `Command` output. Cheap insurance
|
||||
against subtle divergence. Removed at Phase F cleanup.
|
||||
|
||||
### Cleanup pass at Phase F
|
||||
|
||||
Beyond deleting the legacy modules, Phase F includes catalog
|
||||
cleanup. The `parse.token.keyword.*` entries (40+ of them) are
|
||||
near-mechanical wrappers (`create: "\`create\`"`); with no
|
||||
external code looking up these keys (the walker renders
|
||||
keyword names from `Word` node literals directly), the
|
||||
entries can collapse. A small `format_keyword_for_error(literal)
|
||||
-> String` helper replaces them. The `keys.rs` declarations
|
||||
go with them.
|
||||
|
||||
Help text in `help.cli_banner` and `help.in_app_body` stays
|
||||
as hand-written prose — the alternative (auto-generating from
|
||||
the grammar) was deferred during the round-6 discussion as a
|
||||
separate concern; the grammar tree carries enough metadata
|
||||
(per-command `help_id`) for future automation but the prose
|
||||
documentation is still hand-curated for round 1.
|
||||
|
||||
## Consequences
|
||||
|
||||
### What we gain
|
||||
|
||||
- **One declaration per command.** Entry keyword, shape, AST
|
||||
builder, dispatch handler, usage reference, help reference
|
||||
all colocated. Adding a command is one block in one file.
|
||||
- **No cross-file scatter.** The round-5 "10 places to remove
|
||||
`q`" critique is structurally addressed: there's nowhere
|
||||
else for keyword/usage/registry info to live but the
|
||||
grammar tree.
|
||||
- **Schema-aware narrowing from day one.** Typed value slots
|
||||
reject mis-shaped input at parse time with localised error
|
||||
wording; completion narrows per column type; the round-5
|
||||
value-literal slot hint becomes type-specific
|
||||
("Type a date as 'YYYY-MM-DD'") not generic.
|
||||
- **Aliases as a single annotation.** `q` could come back as
|
||||
one line on the `quit` `Word` node, no scatter.
|
||||
- **Tests focus on behaviour, not enumeration.** Tests that
|
||||
hardcoded keyword lists during round 5 (we noted these in
|
||||
`usage.rs` and `completion.rs`) can iterate the trie
|
||||
registry instead, becoming structural rather than
|
||||
literal.
|
||||
- **Drift is structurally impossible.** Completion, highlight,
|
||||
parse, usage, and help all derive from the same trie. No
|
||||
separate sources to keep in sync.
|
||||
|
||||
### What we accept
|
||||
|
||||
- **Parse depends on schema state.** A DSL command that
|
||||
references a non-existent table fails at parse time, not
|
||||
at execute time as today. This matches the user mental
|
||||
model when typing (the schema cache is current per
|
||||
ADR-0022) and yields better completion / hint
|
||||
experience. It does mean tests that exercised parser
|
||||
behaviour in isolation may now need to set up a schema
|
||||
cache.
|
||||
- **chumsky's general-purpose features go unused.** Recovery
|
||||
on ambiguous input, multi-error reporting in a single
|
||||
pass, ambiguous-grammar handling — features chumsky offers
|
||||
but our DSL doesn't use. The trade is fine because our
|
||||
grammar is deterministic.
|
||||
- **Some implementation complexity moves into the walker.**
|
||||
Whitespace skipping between siblings, terminal consume
|
||||
functions, character-level shape recognition — the lexer
|
||||
did some of this implicitly; the walker does it
|
||||
explicitly. Net code is comparable or smaller because the
|
||||
scatter cost goes away.
|
||||
|
||||
### What's out of scope for this ADR
|
||||
|
||||
- **External tooling integration (LSP, editor extensions).**
|
||||
The registry is `pub` and accessible via accessor
|
||||
functions, so future tooling work doesn't fight this design.
|
||||
No tooling is built in round 1.
|
||||
- **Help text auto-generation.** Grammar tree carries
|
||||
`help_id` per node, but the help catalog body stays
|
||||
hand-curated.
|
||||
- **Performance optimisation.** Walker re-runs per keystroke
|
||||
for completion + highlighting. Naïve implementation is
|
||||
acceptable; if hot-path concerns emerge later, caching /
|
||||
incremental walks become a separate ADR.
|
||||
- **Ranker implementations.** The ranker hook exists; default
|
||||
is identity. Frequency-based ranking, content-aware priors
|
||||
for type completion ("Email → text first, Score → real"),
|
||||
recency — all future work that plugs into the ranker
|
||||
signature without touching grammar declarations.
|
||||
- **Per-slot highlight overrides.** The `highlight_override`
|
||||
field exists but stays `None` in round 1. Differentiating
|
||||
table-ident from new-name-ident visually is a future
|
||||
enhancement.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-0023 — Unified declarative grammar tree (Proposed direction). Superseded by this ADR for execution detail.
|
||||
- ADR-0001 — Language and TUI framework (chumsky choice). Phase F removes the chumsky dependency.
|
||||
- ADR-0019 — Friendly error layer and i18n catalog. Catalog conventions stay; `parse.token.keyword.*` entries collapse in Phase F.
|
||||
- ADR-0020 — Tokenization layer for the DSL parser. Superseded by the scannerless walker.
|
||||
- ADR-0021 — Parser-as-source-of-truth for H1a. Usage info migrates from a separate registry to grammar nodes.
|
||||
- ADR-0022 — Ambient typing assistance. The walker subsumes the expected-set introspection that powered completion in that ADR.
|
||||
- Round-6 session transcript — design pass that produced this spec.
|
||||
+2
-1
@@ -28,4 +28,5 @@ This directory contains the project's ADRs, recorded per
|
||||
- [ADR-0020 — Tokenization layer for the DSL parser](0020-tokenization-layer-for-the-dsl-parser.md)
|
||||
- [ADR-0021 — Parser-as-source-of-truth for H1a (per-command usage in parse errors)](0021-parser-as-source-of-truth-for-h1a.md)
|
||||
- [ADR-0022 — Ambient typing assistance: colour, hint panel, completion (I3 + I4)](0022-ambient-typing-assistance.md)
|
||||
- [ADR-0023 — Unified declarative grammar tree](0023-proposed-unified-grammar-tree.md) — **Proposed** (researched direction, not yet accepted)
|
||||
- [ADR-0023 — Unified declarative grammar tree](0023-unified-grammar-tree.md) — direction (superseded for execution detail by ADR-0024)
|
||||
- [ADR-0024 — Unified grammar tree: execution plan](0024-unified-grammar-tree-execution-plan.md) — **Accepted**, the executable spec
|
||||
|
||||
Reference in New Issue
Block a user