d6e138169f
Captures the architectural critique surfaced during round-5 manual testing — that adding a keyword or command currently requires edits in 7-10 files across parser, completion, usage registry, catalog, and tests — and the proposed direction: a single declarative trie registry that drives parse, completion, highlight, and usage rendering from one source. Status: Proposed. Not yet accepted. Filename carries the `-proposed-` segment so status is visible at directory-listing time; rename to `0023-unified-grammar-tree.md` on acceptance. Estimated cost: ~4 sessions, per-command migration. Why not now: feature backlog and bearable scatter cost. Right moment to execute when backlog quiets or scatter cost becomes visibly painful.
428 lines
16 KiB
Markdown
428 lines
16 KiB
Markdown
# ADR-0023: Unified declarative grammar tree (proposed direction)
|
|
|
|
## Status
|
|
|
|
**Proposed.**
|
|
|
|
Not yet accepted. Captures a researched direction for a future
|
|
refactor that supersedes the parts of ADR-0001 (chumsky as the
|
|
DSL parser), ADR-0019 (separated catalog declaration), ADR-0020
|
|
(lexer + keyword macro), ADR-0021 (per-command usage registry),
|
|
and ADR-0022 (completion via expected-set introspection,
|
|
highlighting via lexer) that this ADR identifies as accreted
|
|
rather than designed.
|
|
|
|
Filename carries the `-proposed-` segment so the status is
|
|
visible at directory listing time; on acceptance, rename to
|
|
`0023-unified-grammar-tree.md` via `git mv` (history
|
|
preserved).
|
|
|
|
## Context
|
|
|
|
### What hurt
|
|
|
|
The round-5 session removed the (small, accidental) `q` alias
|
|
for the `quit` command. Removing one keyword required edits in
|
|
ten places:
|
|
|
|
1. `define_keywords!` entry in `src/dsl/keyword.rs`
|
|
2. Parser combinator branch in `src/dsl/parser.rs`
|
|
3. `UsageEntry` row in `src/dsl/usage.rs::REGISTRY`
|
|
4. Hardcoded keyword array in `usage.rs::every_command_has_a_registry_entry` test
|
|
5. Hardcoded keyword array in `usage.rs::entry_keywords_alphabetised_returns_unique_sorted_commands` test
|
|
6. `KEYS_AND_PLACEHOLDERS` declaration in `src/friendly/keys.rs`
|
|
7. `parse.token.keyword.q` entry in `src/friendly/strings/en-US.yaml`
|
|
8. `help.cli_banner` prose in `en-US.yaml`
|
|
9. `help.in_app_body` prose in `en-US.yaml`
|
|
10. Hardcoded keyword array in `completion.rs::empty_input_offers_app_command_entry_keywords` test
|
|
|
|
Adding a brand-new app command in the same session required a
|
|
similar number of touches across the same files plus
|
|
`Command` enum extension, dispatch handler wiring, and a
|
|
per-command usage template entry. The pattern is: every new
|
|
command or keyword incurs a cross-file scatter of typically 7
|
|
files for a normal addition, 10+ when tests and help text
|
|
catch the change.
|
|
|
|
### Why this happened
|
|
|
|
The current architecture is the accretion of features added
|
|
across separate ADRs, each locally sensible:
|
|
|
|
- **ADR-0001** chose chumsky for parsing — a general-purpose
|
|
parser-combinator library oriented at programming-language
|
|
grammars with expression precedence, error recovery, and
|
|
ambiguous-grammar handling.
|
|
- **ADR-0019** introduced the i18n catalog as a flat YAML +
|
|
Rust-side validator (`keys.rs`) for two-sided typo
|
|
protection.
|
|
- **ADR-0020** added the lexer + `define_keywords!` macro
|
|
when the unified-token requirement bit. The macro
|
|
consolidated *keyword* definitions but didn't tackle the
|
|
broader command surface.
|
|
- **ADR-0021** added a per-command `UsageEntry` registry so
|
|
parse errors could surface usage templates.
|
|
- **ADR-0022** added completion by *introspecting* chumsky's
|
|
expected-token set at parse failure points, rather than by
|
|
consulting the grammar declaration directly.
|
|
|
|
Each step solved its presenting problem. None of them
|
|
restructured the grammar declaration to be the single source
|
|
of truth for completion + highlighting + parse + help + i18n.
|
|
|
|
Requirements for completion, highlighting, help, and i18n
|
|
were all known from project start. A design pass at the
|
|
beginning asking "what unified data structure carries all of
|
|
this?" would not have landed at the current scattered shape.
|
|
The trajectory is a process critique, not an inevitability.
|
|
|
|
### What chumsky earned
|
|
|
|
For the DSL we actually built — deterministic prefix-keyword
|
|
commands with a small set of clauses — chumsky's general-purpose
|
|
machinery is mostly unused:
|
|
|
|
- We have no arbitrary expression grammar (no arithmetic
|
|
precedence, no function calls, no recursion).
|
|
- We have no multi-error recovery requirement; we fail on the
|
|
first error and ask the user to fix it.
|
|
- We have no ambiguous-grammar handling needs.
|
|
|
|
Chumsky's `try_map` custom-error machinery is exercised (for
|
|
"unknown type", "tables need at least one column", flag
|
|
mutual-exclusion). These are pre-shape and post-shape
|
|
validators that fit naturally into chumsky's combinator
|
|
model. They could be expressed equally cleanly as per-node
|
|
validator functions in a trie-based design — the placement
|
|
matters less than the shape.
|
|
|
|
### What a unified grammar would look like
|
|
|
|
The proposed structure (sketched by the project owner during
|
|
the round-5 design discussion):
|
|
|
|
```js
|
|
commandGrammar = {
|
|
add: {
|
|
helpId: "add-command",
|
|
shortHelp: "Add structure to a table or relationship",
|
|
highlightType: "top-level-command",
|
|
cont: {
|
|
"1:n": {
|
|
highlightType: "sub-command",
|
|
cont: {
|
|
relationship: {
|
|
cont: {
|
|
from: {
|
|
cont: {
|
|
// qualified column slot: <Table>.<Column>
|
|
// bound to parent_table + parent_column
|
|
// through the command's extractor.
|
|
...
|
|
completionSource: "table-names",
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
},
|
|
import: {
|
|
helpId: "import-command",
|
|
cont: {
|
|
completionSource: "current-folder-zip-file-names",
|
|
// …
|
|
}
|
|
},
|
|
// …
|
|
}
|
|
```
|
|
|
|
This declaration carries everything the current system spreads
|
|
across many files: grammar shape (`cont`), completion sources
|
|
(`completionSource`), highlight classes (`highlightType`),
|
|
help references (`helpId`). The same declaration drives
|
|
completion (walk to current node, list its children), syntax
|
|
highlighting (each node's class), parse-error usage rendering
|
|
(walk to failure point, list valid continuations), and AST
|
|
construction (per-command extractor walks the matched path).
|
|
|
|
The shape generalises to most SQL surface a teaching tool
|
|
would expose. Where-clauses and similar reusable chunks can
|
|
be named and registered separately, then referenced by ID
|
|
from anywhere they're needed. True expression grammars
|
|
(arithmetic, function calls, precedence climbing) — if
|
|
they're ever needed — fit as opaque leaf nodes whose
|
|
*structure* the trie validates, with the actual interpretation
|
|
delegated to a downstream module or simply passed through to
|
|
the SQL engine.
|
|
|
|
## Proposed direction
|
|
|
|
### Data model
|
|
|
|
A single grammar registry, structurally similar to the
|
|
sketch above, declared once in Rust. Per node:
|
|
|
|
- `entry: &'static str` — the literal that selects this
|
|
branch (for keyword nodes) OR a typed slot descriptor (for
|
|
literal / identifier / completion-source nodes).
|
|
- `cont: &'static [Node]` — child nodes representing valid
|
|
continuations from this point. Empty for terminal nodes.
|
|
- `highlight: HighlightClass` — colour role for the input
|
|
pane and echo line. Inherits from parent if not specified.
|
|
- `completion_source: Option<CompletionSource>` — for
|
|
identifier slots, the schema-cache key or static list
|
|
that drives Tab candidates and known-set validity checks.
|
|
- `help_id: Option<&'static str>` — reference into the help
|
|
catalog (decoupled from grammar so wording changes don't
|
|
touch grammar).
|
|
- `validator: Option<NodeValidator>` — per-position
|
|
validator function (e.g., "this identifier must be a
|
|
valid new name", or "this slot can occur at most once").
|
|
- `extractor_role: Option<&'static str>` — names the role
|
|
this slot plays in the command's typed output (e.g.,
|
|
`"parent_table"`). Read by the command's extractor at
|
|
AST-construction time. Optional because the *positional*
|
|
shape of a command's tree is usually enough — the
|
|
extractor knows the command's structure and reads the
|
|
walked path in order.
|
|
|
|
Per command (top-level node):
|
|
|
|
- `ast_builder: fn(WalkedPath) -> Command` — walks the
|
|
matched path and produces the typed AST variant. Replaces
|
|
the per-command chumsky combinator's `.map(...)` closure.
|
|
- `dispatch: fn(&mut App, Command) -> Vec<Action>` — the
|
|
dispatch handler. Replaces the per-command `match` arm in
|
|
`dispatch_input` / `dispatch_app_command`.
|
|
- `help_id` — root help reference for the command family.
|
|
|
|
### Named sub-grammars
|
|
|
|
For composable chunks (where-clauses, projection lists,
|
|
qualified column references, value literals), the registry
|
|
supports named sub-grammars:
|
|
|
|
```rust
|
|
register_subgrammar("where_clause", &[
|
|
// structure declaration …
|
|
]);
|
|
|
|
// Referenced from any command:
|
|
SubgrammarRef("where_clause"),
|
|
```
|
|
|
|
The walker treats a `SubgrammarRef` as a transparent
|
|
expansion at parse / completion / highlight time. The
|
|
extractor reads the sub-grammar's matched path and applies
|
|
the sub-grammar's own AST-fragment builder.
|
|
|
|
### Walker functions
|
|
|
|
A single walker module exposes:
|
|
|
|
- `complete(input, cursor) -> Vec<Candidate>` — walk the
|
|
trie alongside the typed prefix; at the cursor's
|
|
position, return the union of (a) literal children of the
|
|
current node, (b) candidates from the active node's
|
|
`completion_source`. Replaces `completion.rs`.
|
|
- `highlight(input) -> Vec<StyledRun>` — walk producing the
|
|
highlight class per token range. Replaces the
|
|
ad-hoc lookups in `input_render.rs`.
|
|
- `parse(input) -> Result<Command, ParseError>` — walk
|
|
consuming tokens, running per-node `validator`s, applying
|
|
the command's `ast_builder` at completion. Replaces
|
|
`dsl::parser::parse_command`.
|
|
- `usage_at(input, position) -> UsageBlock` — walk to the
|
|
failure point, render the valid continuations as a
|
|
usage template. Replaces `usage::matched_entry`.
|
|
|
|
All four operations read the same registry.
|
|
|
|
### Value-leaf parsers
|
|
|
|
Literal types (number, string, date, bool) and identifier
|
|
shape validators (new-name checks) remain as small standalone
|
|
functions referenced by leaf nodes. They don't pretend to be
|
|
parser combinators — they're predicate-plus-builder pairs.
|
|
The chumsky machinery they currently use is shed.
|
|
|
|
### i18n integration
|
|
|
|
The catalog (en-US.yaml) stays. The `keys.rs`
|
|
`KEYS_AND_PLACEHOLDERS` validator stays. The
|
|
`parse.token.keyword.*` entries can collapse to a default
|
|
formatter (every keyword renders as `` `{literal}` `` unless
|
|
the catalog explicitly overrides for a specific keyword).
|
|
Adding a normal keyword no longer requires a catalog entry
|
|
unless its wording deviates from the default.
|
|
|
|
The grammar tree references `help_id` strings, not catalog
|
|
keys directly, so help wording lives in its own catalog
|
|
section without touching grammar declarations.
|
|
|
|
## Trade-offs
|
|
|
|
### What we give up
|
|
|
|
- **chumsky as the DSL parser.** Library dependency stays
|
|
if it's still used elsewhere (it isn't, currently). The
|
|
recovery and ambiguous-grammar features go unused, so
|
|
losing access to them costs nothing concrete.
|
|
- **Single-file grammar entry per command.** The current
|
|
per-command combinator in `parser.rs` was always a
|
|
separate function; in the new model the command's
|
|
`ast_builder` is colocated with its grammar declaration.
|
|
This is a gain, not a loss, but it means every command's
|
|
current parser function gets rewritten.
|
|
- **No automatic backtracking on alternative branches.**
|
|
The trie design is greedy (the first child node that
|
|
matches wins). For our deterministic grammar this is fine
|
|
— `drop column`, `drop relationship`, `drop table` are
|
|
disambiguated by their second keyword, so the walker
|
|
picks the right branch on the first token after `drop`.
|
|
Pathological grammars that require backtracking are out
|
|
of scope.
|
|
|
|
### What we gain
|
|
|
|
- **One block per command.** Adding a new command =
|
|
declare a top-level node with its `cont`, `ast_builder`,
|
|
`dispatch`, and `help_id`. No edits to a separate
|
|
registry, no edits to a separate catalog list, no edits
|
|
to a separate dispatch match, no edits to tests (which
|
|
iterate the registry).
|
|
- **Adding a keyword = one node literal.** No
|
|
`define_keywords!` macro entry, no `parse.token.keyword.*`
|
|
catalog entry (default formatter handles it), no
|
|
`keys.rs` declaration (the same default handles it).
|
|
- **Completion + highlight + parse + usage rendering all
|
|
come from one source.** Drift is structurally impossible
|
|
because they all walk the same tree.
|
|
- **Aliases as a single annotation.** A keyword node
|
|
declares `aliases: &["q"]` and the walker accepts any
|
|
of them; no new variant, no new dispatch wiring.
|
|
- **Tests focus on behaviour, not enumeration.** Tests
|
|
that previously asserted on hardcoded keyword lists
|
|
iterate the registry. Adding/removing a command leaves
|
|
test code untouched.
|
|
- **Documentation discoverability.** The grammar
|
|
registry IS the spec. Reading `commandGrammar` tells you
|
|
every command, every option, every continuation.
|
|
|
|
### Migration cost
|
|
|
|
Estimated at ~4 sessions:
|
|
|
|
- Session 1: design the walker + registry data model in
|
|
detail; build a stub with one command migrated end-to-end
|
|
alongside the existing chumsky path.
|
|
- Session 2: migrate the data-command family (create,
|
|
drop, add, rename, change, show, insert, update,
|
|
delete, replay). Tests at each step verify the
|
|
walker-driven parse produces the same `Command` as the
|
|
current chumsky parse.
|
|
- Session 3: migrate the app-command family (quit, help,
|
|
rebuild, save / save as, new, load, export, import,
|
|
mode, messages). Drop the parallel chumsky path.
|
|
- Session 4: clean up — remove dead modules
|
|
(`usage.rs::REGISTRY`, expected-set introspection in
|
|
`completion.rs`, the ad-hoc lookups in `input_render.rs`),
|
|
remove `keys.rs` entries that the default formatter now
|
|
covers, simplify the catalog.
|
|
|
|
Steady-state cost after migration: a new command is one
|
|
block. A new keyword is one node literal. A typo in either
|
|
fails the test suite because tests iterate the registry.
|
|
|
|
## Why not now
|
|
|
|
The project has a non-trivial feature backlog (Query DSL,
|
|
constraint management, V-series UX projects, A1 CI workflow,
|
|
multi-line input, readline shortcuts, undo/snapshot, …).
|
|
Doing this refactor now would freeze feature work for ~4
|
|
sessions and would interact disruptively with any in-flight
|
|
ADRs that touch grammar surfaces.
|
|
|
|
The "scatter cost" is bearable for the near-term command
|
|
count. Most commands are already in place; we're not
|
|
adding ten new ones in the next few sessions. Each new
|
|
command incurs ~7-file scatter; that's a modest
|
|
recurring tax, not a crisis.
|
|
|
|
The right moment to execute is when:
|
|
|
|
- Feature backlog quiets down, OR
|
|
- Cumulative scatter cost from new commands becomes
|
|
visibly painful, OR
|
|
- The grammar needs to extend in ways the current shape
|
|
fights against (e.g., a real SQL parser landing in
|
|
advanced mode would benefit from this restructuring
|
|
more than from another bolt-on).
|
|
|
|
Until then: leave it.
|
|
|
|
## Migration plan (when executed)
|
|
|
|
Per-command migration, not big-bang. The new walker runs
|
|
alongside the chumsky path during the transition. Each
|
|
command is migrated in sequence:
|
|
|
|
1. Declare the command's grammar node in the new registry.
|
|
2. Write its `ast_builder`. Verify it produces the same
|
|
`Command` variant as the chumsky parser for every
|
|
existing test input.
|
|
3. Route the command's entry keyword to the walker. The
|
|
chumsky parser's branch is gated off for that keyword.
|
|
4. Run the full test suite. If green, commit.
|
|
5. Move to the next command.
|
|
|
|
When all commands are migrated:
|
|
|
|
1. Delete the chumsky parser combinator module.
|
|
2. Delete the expected-set introspection completion path.
|
|
3. Delete the `UsageEntry` registry.
|
|
4. Simplify `keys.rs` and the catalog per the default-formatter
|
|
rules.
|
|
5. Delete the chumsky dependency.
|
|
|
|
Tests cover behaviour throughout — every command has
|
|
existing tests asserting both successful parses and error
|
|
messages. The migration is safe because the test suite
|
|
guards regressions at each step.
|
|
|
|
## Open questions (resolve at design-pass time)
|
|
|
|
- Should the registry be a const declaration or built at
|
|
runtime (e.g., from a static slice)? Const composes with
|
|
testing; runtime allows reuse across crates. The
|
|
registry as it stands works as `const`.
|
|
- Should `extractor_role` annotations be mandatory or
|
|
positional-fallback? Mandatory is explicit; positional
|
|
is terser. Recommend positional with `extractor_role`
|
|
as escape hatch.
|
|
- How do we represent multi-keyword sequences (`save as`,
|
|
`on delete`)? As a nested `cont` chain, or as a
|
|
composite keyword? Recommend nested `cont` — it falls
|
|
out of the same data model and surfaces correctly in
|
|
completion (after `save`, candidate `as`; after `on`,
|
|
candidate `delete`).
|
|
- How do we expose grammar to external tooling (LSP,
|
|
syntax highlighter for editor integration)? The
|
|
registry as a single `pub static` is trivially
|
|
introspectable; serialising it to JSON for external
|
|
consumption is mechanical.
|
|
|
|
## References
|
|
|
|
- ADR-0001 — Language and TUI framework (chumsky choice)
|
|
- ADR-0019 — Friendly error layer and i18n catalog
|
|
- ADR-0020 — Tokenization layer for the DSL parser
|
|
- ADR-0021 — Parser-as-source-of-truth for H1a
|
|
- ADR-0022 — Ambient typing assistance (I3 + I4 unified)
|
|
- Round-5 session transcript — design discussion that
|
|
produced the trie sketch and the critique of the
|
|
current shape.
|