add ADR-0024: unified grammar tree execution plan (accepted)
Concrete specification for the direction in ADR-0023, landed during the round-6 design pass. Resolves all four rounds of open design questions: walker as single source of truth, scannerless terminal vocabulary (~8 building blocks), typed value slots with content validators, WalkContext for schema- aware narrowing from day one, WalkOutcome multi-purpose return, HintMode per-node, ranker as separate layer, static + dynamic sub-grammars, aliases as Word annotations, IdentSource taxonomy, six-phase per-command migration with chumsky and walker side-by-side during the transition. Key shifts from ADR-0023's sketch: - Lexer dissolves entirely. Walker operates on bytes directly. dsl/lexer.rs, dsl/keyword.rs go away in Phase F. - Schema-aware parse from day one (not phased). Typed value slots reject mis-shaped input at parse time with localised wording. Completion narrows per column type. - Sub-grammars: static (fn() -> Node) for composition; dynamic (fn(&WalkContext) -> Node) for schema-dependent expansion. No global named registry. - Path-bearing commands: BarePath becomes a routine non-whitespace terminal. Paths with spaces require quoting via StringLit (UX simplification, aligns with standard CLI convention). - 13-node taxonomy: Word, Punct, Ident, NumberLit, StringLit, BlobLit, Flag, BarePath, Choice, Seq, Optional, Repeated, DynamicSubgrammar. Migration plan: Phase A (walker scaffolding + app-lifecycle commands), Phase B (DDL without value literals), Phase C (create table), Phase D (data commands with full schema awareness -- the design's central claim landing), Phase E (replay), Phase F (delete chumsky + lexer + legacy parser modules, simplify catalog). Estimated ~4 sessions total. Also: rename ADR-0023 from 0023-proposed-unified-grammar-tree.md to 0023-unified-grammar-tree.md (git mv preserves history) and update its status to reflect the direction-accepted-but- superseded-for-execution-detail relationship with ADR-0024. Index updated.
This commit is contained in:
@@ -0,0 +1,430 @@
|
||||
# ADR-0023: Unified declarative grammar tree (direction)
|
||||
|
||||
## Status
|
||||
|
||||
**Accepted in direction, superseded for execution detail by
|
||||
ADR-0024.** 2026-05-14.
|
||||
|
||||
This ADR captures the architectural critique (the "10-place
|
||||
edit" scatter problem with the current parser shape) and the
|
||||
direction (a unified declarative grammar tree). The round-6
|
||||
design pass turned that direction into a concrete specification,
|
||||
which ships as ADR-0024. ADR-0024 makes some refinements
|
||||
beyond what's sketched here — notably the decision to drop the
|
||||
lexer module entirely (scannerless walker) and to put schema-
|
||||
aware narrowing into round 1 rather than phasing it. Read
|
||||
ADR-0024 for the executable plan; this ADR remains for the
|
||||
institutional memory of why the change is happening.
|
||||
|
||||
The filename was renamed from `0023-proposed-unified-grammar-tree.md`
|
||||
to `0023-unified-grammar-tree.md` when the direction was
|
||||
accepted. History is preserved through the `git mv`.
|
||||
|
||||
## Context
|
||||
|
||||
### What hurt
|
||||
|
||||
The round-5 session removed the (small, accidental) `q` alias
|
||||
for the `quit` command. Removing one keyword required edits in
|
||||
ten places:
|
||||
|
||||
1. `define_keywords!` entry in `src/dsl/keyword.rs`
|
||||
2. Parser combinator branch in `src/dsl/parser.rs`
|
||||
3. `UsageEntry` row in `src/dsl/usage.rs::REGISTRY`
|
||||
4. Hardcoded keyword array in `usage.rs::every_command_has_a_registry_entry` test
|
||||
5. Hardcoded keyword array in `usage.rs::entry_keywords_alphabetised_returns_unique_sorted_commands` test
|
||||
6. `KEYS_AND_PLACEHOLDERS` declaration in `src/friendly/keys.rs`
|
||||
7. `parse.token.keyword.q` entry in `src/friendly/strings/en-US.yaml`
|
||||
8. `help.cli_banner` prose in `en-US.yaml`
|
||||
9. `help.in_app_body` prose in `en-US.yaml`
|
||||
10. Hardcoded keyword array in `completion.rs::empty_input_offers_app_command_entry_keywords` test
|
||||
|
||||
Adding a brand-new app command in the same session required a
|
||||
similar number of touches across the same files plus
|
||||
`Command` enum extension, dispatch handler wiring, and a
|
||||
per-command usage template entry. The pattern is: every new
|
||||
command or keyword incurs a cross-file scatter of typically 7
|
||||
files for a normal addition, 10+ when tests and help text
|
||||
catch the change.
|
||||
|
||||
### Why this happened
|
||||
|
||||
The current architecture is the accretion of features added
|
||||
across separate ADRs, each locally sensible:
|
||||
|
||||
- **ADR-0001** chose chumsky for parsing — a general-purpose
|
||||
parser-combinator library oriented at programming-language
|
||||
grammars with expression precedence, error recovery, and
|
||||
ambiguous-grammar handling.
|
||||
- **ADR-0019** introduced the i18n catalog as a flat YAML +
|
||||
Rust-side validator (`keys.rs`) for two-sided typo
|
||||
protection.
|
||||
- **ADR-0020** added the lexer + `define_keywords!` macro
|
||||
when the unified-token requirement bit. The macro
|
||||
consolidated *keyword* definitions but didn't tackle the
|
||||
broader command surface.
|
||||
- **ADR-0021** added a per-command `UsageEntry` registry so
|
||||
parse errors could surface usage templates.
|
||||
- **ADR-0022** added completion by *introspecting* chumsky's
|
||||
expected-token set at parse failure points, rather than by
|
||||
consulting the grammar declaration directly.
|
||||
|
||||
Each step solved its presenting problem. None of them
|
||||
restructured the grammar declaration to be the single source
|
||||
of truth for completion + highlighting + parse + help + i18n.
|
||||
|
||||
Requirements for completion, highlighting, help, and i18n
|
||||
were all known from project start. A design pass at the
|
||||
beginning asking "what unified data structure carries all of
|
||||
this?" would not have landed at the current scattered shape.
|
||||
The trajectory is a process critique, not an inevitability.
|
||||
|
||||
### What chumsky earned
|
||||
|
||||
For the DSL we actually built — deterministic prefix-keyword
|
||||
commands with a small set of clauses — chumsky's general-purpose
|
||||
machinery is mostly unused:
|
||||
|
||||
- We have no arbitrary expression grammar (no arithmetic
|
||||
precedence, no function calls, no recursion).
|
||||
- We have no multi-error recovery requirement; we fail on the
|
||||
first error and ask the user to fix it.
|
||||
- We have no ambiguous-grammar handling needs.
|
||||
|
||||
Chumsky's `try_map` custom-error machinery is exercised (for
|
||||
"unknown type", "tables need at least one column", flag
|
||||
mutual-exclusion). These are pre-shape and post-shape
|
||||
validators that fit naturally into chumsky's combinator
|
||||
model. They could be expressed equally cleanly as per-node
|
||||
validator functions in a trie-based design — the placement
|
||||
matters less than the shape.
|
||||
|
||||
### What a unified grammar would look like
|
||||
|
||||
The proposed structure (sketched by the project owner during
|
||||
the round-5 design discussion):
|
||||
|
||||
```js
|
||||
commandGrammar = {
|
||||
add: {
|
||||
helpId: "add-command",
|
||||
shortHelp: "Add structure to a table or relationship",
|
||||
highlightType: "top-level-command",
|
||||
cont: {
|
||||
"1:n": {
|
||||
highlightType: "sub-command",
|
||||
cont: {
|
||||
relationship: {
|
||||
cont: {
|
||||
from: {
|
||||
cont: {
|
||||
// qualified column slot: <Table>.<Column>
|
||||
// bound to parent_table + parent_column
|
||||
// through the command's extractor.
|
||||
...
|
||||
completionSource: "table-names",
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
import: {
|
||||
helpId: "import-command",
|
||||
cont: {
|
||||
completionSource: "current-folder-zip-file-names",
|
||||
// …
|
||||
}
|
||||
},
|
||||
// …
|
||||
}
|
||||
```
|
||||
|
||||
This declaration carries everything the current system spreads
|
||||
across many files: grammar shape (`cont`), completion sources
|
||||
(`completionSource`), highlight classes (`highlightType`),
|
||||
help references (`helpId`). The same declaration drives
|
||||
completion (walk to current node, list its children), syntax
|
||||
highlighting (each node's class), parse-error usage rendering
|
||||
(walk to failure point, list valid continuations), and AST
|
||||
construction (per-command extractor walks the matched path).
|
||||
|
||||
The shape generalises to most SQL surface a teaching tool
|
||||
would expose. Where-clauses and similar reusable chunks can
|
||||
be named and registered separately, then referenced by ID
|
||||
from anywhere they're needed. True expression grammars
|
||||
(arithmetic, function calls, precedence climbing) — if
|
||||
they're ever needed — fit as opaque leaf nodes whose
|
||||
*structure* the trie validates, with the actual interpretation
|
||||
delegated to a downstream module or simply passed through to
|
||||
the SQL engine.
|
||||
|
||||
## Proposed direction
|
||||
|
||||
### Data model
|
||||
|
||||
A single grammar registry, structurally similar to the
|
||||
sketch above, declared once in Rust. Per node:
|
||||
|
||||
- `entry: &'static str` — the literal that selects this
|
||||
branch (for keyword nodes) OR a typed slot descriptor (for
|
||||
literal / identifier / completion-source nodes).
|
||||
- `cont: &'static [Node]` — child nodes representing valid
|
||||
continuations from this point. Empty for terminal nodes.
|
||||
- `highlight: HighlightClass` — colour role for the input
|
||||
pane and echo line. Inherits from parent if not specified.
|
||||
- `completion_source: Option<CompletionSource>` — for
|
||||
identifier slots, the schema-cache key or static list
|
||||
that drives Tab candidates and known-set validity checks.
|
||||
- `help_id: Option<&'static str>` — reference into the help
|
||||
catalog (decoupled from grammar so wording changes don't
|
||||
touch grammar).
|
||||
- `validator: Option<NodeValidator>` — per-position
|
||||
validator function (e.g., "this identifier must be a
|
||||
valid new name", or "this slot can occur at most once").
|
||||
- `extractor_role: Option<&'static str>` — names the role
|
||||
this slot plays in the command's typed output (e.g.,
|
||||
`"parent_table"`). Read by the command's extractor at
|
||||
AST-construction time. Optional because the *positional*
|
||||
shape of a command's tree is usually enough — the
|
||||
extractor knows the command's structure and reads the
|
||||
walked path in order.
|
||||
|
||||
Per command (top-level node):
|
||||
|
||||
- `ast_builder: fn(WalkedPath) -> Command` — walks the
|
||||
matched path and produces the typed AST variant. Replaces
|
||||
the per-command chumsky combinator's `.map(...)` closure.
|
||||
- `dispatch: fn(&mut App, Command) -> Vec<Action>` — the
|
||||
dispatch handler. Replaces the per-command `match` arm in
|
||||
`dispatch_input` / `dispatch_app_command`.
|
||||
- `help_id` — root help reference for the command family.
|
||||
|
||||
### Named sub-grammars
|
||||
|
||||
For composable chunks (where-clauses, projection lists,
|
||||
qualified column references, value literals), the registry
|
||||
supports named sub-grammars:
|
||||
|
||||
```rust
|
||||
register_subgrammar("where_clause", &[
|
||||
// structure declaration …
|
||||
]);
|
||||
|
||||
// Referenced from any command:
|
||||
SubgrammarRef("where_clause"),
|
||||
```
|
||||
|
||||
The walker treats a `SubgrammarRef` as a transparent
|
||||
expansion at parse / completion / highlight time. The
|
||||
extractor reads the sub-grammar's matched path and applies
|
||||
the sub-grammar's own AST-fragment builder.
|
||||
|
||||
### Walker functions
|
||||
|
||||
A single walker module exposes:
|
||||
|
||||
- `complete(input, cursor) -> Vec<Candidate>` — walk the
|
||||
trie alongside the typed prefix; at the cursor's
|
||||
position, return the union of (a) literal children of the
|
||||
current node, (b) candidates from the active node's
|
||||
`completion_source`. Replaces `completion.rs`.
|
||||
- `highlight(input) -> Vec<StyledRun>` — walk producing the
|
||||
highlight class per token range. Replaces the
|
||||
ad-hoc lookups in `input_render.rs`.
|
||||
- `parse(input) -> Result<Command, ParseError>` — walk
|
||||
consuming tokens, running per-node `validator`s, applying
|
||||
the command's `ast_builder` at completion. Replaces
|
||||
`dsl::parser::parse_command`.
|
||||
- `usage_at(input, position) -> UsageBlock` — walk to the
|
||||
failure point, render the valid continuations as a
|
||||
usage template. Replaces `usage::matched_entry`.
|
||||
|
||||
All four operations read the same registry.
|
||||
|
||||
### Value-leaf parsers
|
||||
|
||||
Literal types (number, string, date, bool) and identifier
|
||||
shape validators (new-name checks) remain as small standalone
|
||||
functions referenced by leaf nodes. They don't pretend to be
|
||||
parser combinators — they're predicate-plus-builder pairs.
|
||||
The chumsky machinery they currently use is shed.
|
||||
|
||||
### i18n integration
|
||||
|
||||
The catalog (en-US.yaml) stays. The `keys.rs`
|
||||
`KEYS_AND_PLACEHOLDERS` validator stays. The
|
||||
`parse.token.keyword.*` entries can collapse to a default
|
||||
formatter (every keyword renders as `` `{literal}` `` unless
|
||||
the catalog explicitly overrides for a specific keyword).
|
||||
Adding a normal keyword no longer requires a catalog entry
|
||||
unless its wording deviates from the default.
|
||||
|
||||
The grammar tree references `help_id` strings, not catalog
|
||||
keys directly, so help wording lives in its own catalog
|
||||
section without touching grammar declarations.
|
||||
|
||||
## Trade-offs
|
||||
|
||||
### What we give up
|
||||
|
||||
- **chumsky as the DSL parser.** Library dependency stays
|
||||
if it's still used elsewhere (it isn't, currently). The
|
||||
recovery and ambiguous-grammar features go unused, so
|
||||
losing access to them costs nothing concrete.
|
||||
- **Single-file grammar entry per command.** The current
|
||||
per-command combinator in `parser.rs` was always a
|
||||
separate function; in the new model the command's
|
||||
`ast_builder` is colocated with its grammar declaration.
|
||||
This is a gain, not a loss, but it means every command's
|
||||
current parser function gets rewritten.
|
||||
- **No automatic backtracking on alternative branches.**
|
||||
The trie design is greedy (the first child node that
|
||||
matches wins). For our deterministic grammar this is fine
|
||||
— `drop column`, `drop relationship`, `drop table` are
|
||||
disambiguated by their second keyword, so the walker
|
||||
picks the right branch on the first token after `drop`.
|
||||
Pathological grammars that require backtracking are out
|
||||
of scope.
|
||||
|
||||
### What we gain
|
||||
|
||||
- **One block per command.** Adding a new command =
|
||||
declare a top-level node with its `cont`, `ast_builder`,
|
||||
`dispatch`, and `help_id`. No edits to a separate
|
||||
registry, no edits to a separate catalog list, no edits
|
||||
to a separate dispatch match, no edits to tests (which
|
||||
iterate the registry).
|
||||
- **Adding a keyword = one node literal.** No
|
||||
`define_keywords!` macro entry, no `parse.token.keyword.*`
|
||||
catalog entry (default formatter handles it), no
|
||||
`keys.rs` declaration (the same default handles it).
|
||||
- **Completion + highlight + parse + usage rendering all
|
||||
come from one source.** Drift is structurally impossible
|
||||
because they all walk the same tree.
|
||||
- **Aliases as a single annotation.** A keyword node
|
||||
declares `aliases: &["q"]` and the walker accepts any
|
||||
of them; no new variant, no new dispatch wiring.
|
||||
- **Tests focus on behaviour, not enumeration.** Tests
|
||||
that previously asserted on hardcoded keyword lists
|
||||
iterate the registry. Adding/removing a command leaves
|
||||
test code untouched.
|
||||
- **Documentation discoverability.** The grammar
|
||||
registry IS the spec. Reading `commandGrammar` tells you
|
||||
every command, every option, every continuation.
|
||||
|
||||
### Migration cost
|
||||
|
||||
Estimated at ~4 sessions:
|
||||
|
||||
- Session 1: design the walker + registry data model in
|
||||
detail; build a stub with one command migrated end-to-end
|
||||
alongside the existing chumsky path.
|
||||
- Session 2: migrate the data-command family (create,
|
||||
drop, add, rename, change, show, insert, update,
|
||||
delete, replay). Tests at each step verify the
|
||||
walker-driven parse produces the same `Command` as the
|
||||
current chumsky parse.
|
||||
- Session 3: migrate the app-command family (quit, help,
|
||||
rebuild, save / save as, new, load, export, import,
|
||||
mode, messages). Drop the parallel chumsky path.
|
||||
- Session 4: clean up — remove dead modules
|
||||
(`usage.rs::REGISTRY`, expected-set introspection in
|
||||
`completion.rs`, the ad-hoc lookups in `input_render.rs`),
|
||||
remove `keys.rs` entries that the default formatter now
|
||||
covers, simplify the catalog.
|
||||
|
||||
Steady-state cost after migration: a new command is one
|
||||
block. A new keyword is one node literal. A typo in either
|
||||
fails the test suite because tests iterate the registry.
|
||||
|
||||
## Why not now
|
||||
|
||||
The project has a non-trivial feature backlog (Query DSL,
|
||||
constraint management, V-series UX projects, A1 CI workflow,
|
||||
multi-line input, readline shortcuts, undo/snapshot, …).
|
||||
Doing this refactor now would freeze feature work for ~4
|
||||
sessions and would interact disruptively with any in-flight
|
||||
ADRs that touch grammar surfaces.
|
||||
|
||||
The "scatter cost" is bearable for the near-term command
|
||||
count. Most commands are already in place; we're not
|
||||
adding ten new ones in the next few sessions. Each new
|
||||
command incurs ~7-file scatter; that's a modest
|
||||
recurring tax, not a crisis.
|
||||
|
||||
The right moment to execute is when:
|
||||
|
||||
- Feature backlog quiets down, OR
|
||||
- Cumulative scatter cost from new commands becomes
|
||||
visibly painful, OR
|
||||
- The grammar needs to extend in ways the current shape
|
||||
fights against (e.g., a real SQL parser landing in
|
||||
advanced mode would benefit from this restructuring
|
||||
more than from another bolt-on).
|
||||
|
||||
Until then: leave it.
|
||||
|
||||
## Migration plan (when executed)
|
||||
|
||||
Per-command migration, not big-bang. The new walker runs
|
||||
alongside the chumsky path during the transition. Each
|
||||
command is migrated in sequence:
|
||||
|
||||
1. Declare the command's grammar node in the new registry.
|
||||
2. Write its `ast_builder`. Verify it produces the same
|
||||
`Command` variant as the chumsky parser for every
|
||||
existing test input.
|
||||
3. Route the command's entry keyword to the walker. The
|
||||
chumsky parser's branch is gated off for that keyword.
|
||||
4. Run the full test suite. If green, commit.
|
||||
5. Move to the next command.
|
||||
|
||||
When all commands are migrated:
|
||||
|
||||
1. Delete the chumsky parser combinator module.
|
||||
2. Delete the expected-set introspection completion path.
|
||||
3. Delete the `UsageEntry` registry.
|
||||
4. Simplify `keys.rs` and the catalog per the default-formatter
|
||||
rules.
|
||||
5. Delete the chumsky dependency.
|
||||
|
||||
Tests cover behaviour throughout — every command has
|
||||
existing tests asserting both successful parses and error
|
||||
messages. The migration is safe because the test suite
|
||||
guards regressions at each step.
|
||||
|
||||
## Open questions (resolve at design-pass time)
|
||||
|
||||
- Should the registry be a const declaration or built at
|
||||
runtime (e.g., from a static slice)? Const composes with
|
||||
testing; runtime allows reuse across crates. The
|
||||
registry as it stands works as `const`.
|
||||
- Should `extractor_role` annotations be mandatory or
|
||||
positional-fallback? Mandatory is explicit; positional
|
||||
is terser. Recommend positional with `extractor_role`
|
||||
as escape hatch.
|
||||
- How do we represent multi-keyword sequences (`save as`,
|
||||
`on delete`)? As a nested `cont` chain, or as a
|
||||
composite keyword? Recommend nested `cont` — it falls
|
||||
out of the same data model and surfaces correctly in
|
||||
completion (after `save`, candidate `as`; after `on`,
|
||||
candidate `delete`).
|
||||
- How do we expose grammar to external tooling (LSP,
|
||||
syntax highlighter for editor integration)? The
|
||||
registry as a single `pub static` is trivially
|
||||
introspectable; serialising it to JSON for external
|
||||
consumption is mechanical.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-0001 — Language and TUI framework (chumsky choice)
|
||||
- ADR-0019 — Friendly error layer and i18n catalog
|
||||
- ADR-0020 — Tokenization layer for the DSL parser
|
||||
- ADR-0021 — Parser-as-source-of-truth for H1a
|
||||
- ADR-0022 — Ambient typing assistance (I3 + I4 unified)
|
||||
- Round-5 session transcript — design discussion that
|
||||
produced the trie sketch and the critique of the
|
||||
current shape.
|
||||
Reference in New Issue
Block a user