# ADR-0021: Parser-as-source-of-truth for H1a (per-command usage in parse errors) ## Status **Mechanism superseded by ADR-0024; H1a scope continued in ADR-0042.** Accepted then superseded. > **Superseding note (2026-06-03).** The *intent* of this ADR — surface > the grammar of the command at the point of error, not just the next > token — survived and is largely delivered. The *mechanism* did not. > This ADR specifies a `chumsky`-based design: a separate `UsageEntry` > registry in `src/dsl/usage.rs`, `parse.token.*` catalog keys driven > by chumsky's `RichPattern` expected sets, and a renderer over > chumsky output. ADR-0024 (unified grammar tree) replaced chumsky with > a scannerless walker and **folded usage info onto the grammar nodes > themselves**: `usage_ids` live on each `CommandNode`, the per-command > `parse.usage.*` templates and the `parse.available_commands` fallback > ship as designed here, and the expected-set vocabulary > (`format_expectation` in `parser.rs`) renders directly from walker > `Expectation` variants — no `UsageEntry` registry, no `parse.token.*` > keys, no `src/dsl/usage.rs`. > > So: the §1 usage registry, §3 "deepest consumed keyword" mechanism, > §4 `parse.token.*` catalog, and §7 validator details below describe > code that does not exist. What shipped equivalently: §1's per-command > templates (as `usage_ids` + `parse.usage.*`), §2's three-block render > (echo+caret / structural error / usage), and §5's available-commands > fallback. **ADR-0042 picks up H1a from here** — it records what is > actually shipped and defines the remaining systematic-pass scope > against the grammar-tree architecture. Read ADR-0042 for the live > plan; this ADR remains as the design rationale for the pedagogy goal. --- *Original status (historical):* Accepted. Builds on ADR-0020 (tokenization layer). Addresses H1a from `requirements.md` — the parse-error pedagogy gap that ADR-0019's friendly-error layer left untouched. Cross-references ADR-0019 (i18n catalog conventions; H1a's output goes through the same catalog) and ADR-0009 (DSL syntax conventions; usage templates render in the project's documented surface form). ## Context ADR-0019 dramatically improved engine-error wording. Parse-error wording is now the visibly-weakest user surface — the user-reported gap was concrete: typing `create` produces ``` parse error: after `create`, expected `table` ``` The error is *structurally* correct (chumsky has consumed `create` and is now looking for the next required token) but *pedagogically* silent. A learner who got this far typed `create` because they'd been told that's how new tables are made; what they need next is the shape of the command, not a single missing-token pointer. Comparable observations apply across the whole DSL surface: - `add` → expected `column` or `1` (uninformative; user needs the shape of `add column …` AND `add 1:n relationship …`). - `update Customers` → expected `set` (true; but `update`'s full grammar with `set …`, `where …`, `--all-rows` is what the user wants illustrated). - `frobulate Customers` → expected one of `create`, `drop`, `add`, `rename`, `change`, `show`, `insert`, `update`, `delete`, `replay` (true after ADR-0020; the available-commands list is now informative, but the no-prefix case wants its own framing — "available commands" rather than "expected"). H1a's job is to close that gap by surfacing the **grammar** of the command at the point of error, not just the next token. ### What ADR-0020 supplies ADR-0020 lands the lexer + parser-over-tokens architecture. What that buys H1a: - **Aggregated `expected` sets at the failure point** (top-level `choice` failures now list every command-starting keyword, not just one). The user-visible "available commands" list becomes correct without any work in this ADR. - **Token-kind error patterns** (`RichPattern` instead of `RichPattern`). Each pattern renders via a stable catalog key — no per-character humanising. - **A canonical entry-token for each command** (the first `Keyword(_)` consumed). H1a keys per-command usage templates off this token. ### What this ADR adds on top - A registry of per-command **usage templates** (one declaration per command). - A renderer that composes the parse error with: caret + structural error wording + matching usage template(s). - New catalog keys under `parse.usage.*` (templates) and `parse.token.*` (single-token rendering for expected-set joins). - A "no commands consumed" fallback that renders an available-commands list under a different prefix ("available commands:" rather than "expected:") for the zero-prefix case. ## Decision ### 1. Per-command usage template registry Each command parser is paired with a `UsageEntry`: ```rust pub struct UsageEntry { /// First keyword that distinguishes this command. Used /// as the registry key. pub entry: Keyword, /// Catalog key for the grammar template body (under /// `parse.usage.*`). One key per command. pub catalog_key: &'static str, } ``` The registry is a `&'static [UsageEntry]` declared in one place (`src/dsl/usage.rs`). Lookup: given a consumed entry keyword, return all entries whose `entry == keyword`. For `Keyword::Add` the registry returns the `add column` and `add 1:n relationship` entries; for `Keyword::Drop` it returns `drop table`, `drop column`, `drop relationship`; for unique-entry keywords (e.g. `Keyword::Create` today) it returns one. The catalog key is what gets translated. Template bodies live in `src/friendly/strings/en-US.yaml` under `parse.usage.*`: ```yaml parse: usage: create_table: "create table with pk [:[, ...]]" drop_table: "drop table " add_column: "add column [to] [table] : ()" add_relationship: | add 1:n relationship [as ] from .to .[on delete ] [on update ] [--create-fk] rename_column: "rename column [in] [table]
: to " change_column: | change column [in] [table]
: () [--force-conversion | --dont-convert] show_data: "show data
" show_table: "show table
" insert: "insert into
[([, ...])] [values] ([, ...])" update: "update
set =[, ...] (where = | --all-rows)" delete: "delete from
(where = | --all-rows)" drop_column: "drop column [from] [table]
: " drop_relationship: | drop relationship drop relationship from .to .replay: "replay | replay ''" ``` (Wording is illustrative; exact phrasing settled at implementation time. The bracket convention `[...]` for optional parts and angle-bracket `<...>` for placeholders matches ADR-0009's documentation surface.) ### 2. The renderer composes three blocks A parse error renders as: ``` running: ^ ← caret (existing, unchanged) parse error: usage: ← when multiple entries share the entry keyword ``` Block 1 (the echo + caret) is unchanged from today. Block 2 is the structural or content error. ADR-0020 guarantees the structural error is now properly aggregated ("expected `data` or `table`" not "expected `table`"). The content errors (unknown type, mutually-exclusive flags) are unchanged in voice. Block 3 (usage:) is new. It is rendered if and only if **at least one keyword token was consumed** before the parser failed AND that keyword is a registered entry. If no keyword was consumed (e.g., `frobulate Customers`, where `frobulate` is an `Identifier`, not a `Keyword`), Block 3 is replaced with the no-prefix fallback (§5). If multiple entries match (e.g., the `add` family), all are listed under a single `usage:` prefix, one per line. ### 3. Identifying the consumed entry keyword The parser surfaces, alongside the `ParseError`, the **deepest successfully-consumed keyword token**. Mechanism: - `parse_tokens` returns `(Result, ParseDiagnostics)` where `ParseDiagnostics` carries the furthest position chumsky reached AND a snapshot of the consumed prefix. - The renderer walks the consumed prefix backward to find the first `Keyword(_)` token. (Almost always the first token, but a future grammar where a command starts with a literal — none today — would still resolve correctly.) This logic lives in `src/dsl/usage.rs::matched_entry()` so the registry and the lookup sit together. ### 4. `parse.token.*` — single-token catalog vocabulary Chumsky's expected-set rendering needs a name for each token kind. Today `humanise()` hand-codes these (`describe_pattern` returns "`create`", "identifier", etc.). ADR-0021 moves the vocabulary into the catalog: ```yaml parse: token: # Keywords — one entry per Keyword enum variant. keyword.create: "`create`" keyword.table: "`table`" keyword.with: "`with`" # ... one per Keyword variant ... # Punctuation. punct.colon: "`:`" punct.open_paren: "`(`" punct.close_paren: "`)`" punct.comma: "`,`" punct.equals: "`=`" punct.dot: "`.`" # Token-class labels. identifier: "identifier" number: "number" string_literal: "string literal" flag: "flag (--name)" end_of_input: "end of input" # Lexer-error tokens. error.unterminated_string: "unterminated string starting at column {column}" error.unknown_char: "unrecognised character {found}" ``` Joining ("`a`, `b`, or `c`") stays in code (`oxford_or` from the current humanise machinery, lifted intact). Wording of each token is in the catalog. `parse.error` (existing wrapper key) stays. Its `{detail}` placeholder is filled by: ``` {consumed_prefix} expected {oxford_or(expected)}, found {found_token} ``` — each piece sourced from the catalog, joined in code. `parse.caret` (existing) and `parse.empty` (existing) unchanged. ### 5. No-prefix fallback: "available commands" When the parser fails with **no keyword consumed**, the "expected" set lists every top-level command-starting keyword. That's correct but the framing should be "available commands" rather than "expected". Renderer detects this case (consumed-keyword count == 0) and substitutes Block 3 with: ``` available commands: create, drop, add, rename, change, show, insert, update, delete, replay ``` via a new catalog key: ```yaml parse: available_commands: "available commands: {commands}" ``` The list is the alphabetised set of `entry` keywords from the usage registry, each rendered via its `parse.token.keyword.*` catalog entry (so the strings are catalog-sourced, not hard-coded). This case only fires when the user typed something the parser couldn't classify as any known command keyword — the "frobulate Customers" case. It's both rarer and more useful than the with-prefix case: a user this lost benefits more from the full menu than from a missing-token pointer. ### 6. Anchor-phrase compliance (ADR-0019 §10) ADR-0019's anchor-phrase list contains nine substrings the catalog commits to keeping stable. None are parse-error-specific, so this ADR doesn't add to the list. The existing parser test that asserts on "unknown type" and "expected one of" substrings stays — those come from `Type::from_str`'s custom error message which ADR-0020 §4 keeps unchanged. The current structural-error tests assert on substrings like "after `show data`", "expected identifier", "found end of input", "after `change column Rich`", "expected `:`". The new render shape preserves all of these — the rendering template is `{prefix} expected {set}, found {token}` and the prefix / set / token come from the catalog with the same wording. Tests should port unchanged or with at most minor adjustments. ### 7. Catalog validator covers the new keys ADR-0019 §8.6's `KEYS_AND_PLACEHOLDERS` validator extends to cover: - Every `parse.usage.` key referenced from the registry exists. - Every `parse.token.keyword.` key for every `Keyword` enum variant exists. - Every `parse.token.punct.` key for every `Punct` variant exists. - The `parse.token.{identifier, number, string_literal, flag, end_of_input}` keys exist. - The `parse.token.error.*` keys exist for every `LexErrorKind` variant. - The `parse.available_commands` key exists. - No format specifiers (already enforced). - No engine vocabulary (already enforced). ### 8. The `usage:` block respects the verbosity setting? No. The `messages (short|verbose)` setting (ADR-0019) governs *engine-error* verbosity (whether to render the hint block of a `FriendlyError`). Parse errors don't go through `FriendlyError`; they have their own render path, and the usage block is always shown. Rationale: a learner toggling to `messages short` is signalling they recognise the engine-error patterns and want less explanation around those — they're not signalling that they want less parse-help. Parse errors mean the user couldn't even formulate a runnable command; that's exactly the moment to maximise pedagogical surface, regardless of the engine-error verbosity preference. If experience shows this is wrong, a future amendment can gate the usage block on a separate setting. Doesn't need to be designed now. ## Out of scope 1. **Tab completion (I3) and syntax highlighting (I4)** themselves. ADR-0020 §9-10 commits to the parser contract; ADR-0021 doesn't extend it. 2. **Schema-aware suggestions** ("did you mean `Customers`?" when the user typed `Customrs`). Useful but a separate feature; would land in I3 territory (completion + spell check share a candidate list). 3. **Suggested fixes** ("change `crete` to `create`"). Same bucket as schema-aware suggestions. 4. **Multi-error reporting.** Today and after this ADR, the parser reports the first error and stops. Recovery-based multi-error parsing is out of scope and re-opens with I3's ADR (ADR-0020 §11). 5. **Persisting the verbosity setting** (which doesn't affect parse errors anyway, per §8). ADR-0019 deferred it to a future settings ADR. ## Consequences ### Positive - **Per-command usage at point of failure.** A learner who types `create` sees the full `create table` grammar instead of "expected `table`". The user-reported gap closes. - **Aggregated `available commands` for cold starts.** `frobulate Customers` now lists the ten command-starting keywords under a sensible framing. - **Vocabulary lives in the catalog, not in code.** Renaming a keyword's user-facing wording is one YAML edit. Adding a new keyword adds two lines (registry + token-name key); the validator catches both if forgotten. - **The render path simplifies.** `humanise()` shrinks to a small composer over catalog lookups — no per-character description, no `RichPattern` walking, no prefer-custom-over-structural switching (the latter becomes "render the structural error and append the usage template"). - **Composes with ADR-0019's `FriendlyError`.** Engine errors and parse errors are rendered through different paths but both go through the catalog, so vocabulary drift between them is impossible. ### Costs - **A second registry to keep in sync** with the parser. The validator (§7) catches missing usage entries / missing token keys at test time, but adding a new command means three steps (parser combinator, usage-registry entry, catalog YAML edit). Mitigation: a unit test asserts every command in the parser has a registry entry (catches forgotten entries; matches the friendly-module pattern). - **Catalog grows by ~30-40 entries** (one usage template per command, one keyword name per `Keyword` variant, a handful of token-class names, a handful of error names). Each entry is one line of YAML; total catalog grows from ~170 entries to ~210. Within budget. - **Wording iteration** on the usage templates will probably happen post-merge. This is normal for pedagogical text and the catalog makes it cheap. ### Neutral - **Public parser API is unchanged.** `parse_command(&str)` signature stable. The new `lex` and `parse_tokens` functions exposed by ADR-0020 are the I3/I4 hook; ADR-0021 doesn't add to that surface. - **`AppEvent` shape unchanged.** Parse errors continue to flow through `dispatch_dsl`'s existing path (push echo, push caret, push error). This ADR's render changes are internal to that function plus the `t!()` calls inside it. ## Implementation notes ### Order of operations (within the joint ADR-0020 + ADR-0021 implementation session) 1. Land ADR-0020 (lexer + parser refactor + minimal humaniser). 2. Add `src/dsl/usage.rs` with the registry struct, the static table, and `matched_entry()`. 3. Populate `parse.usage.*` and `parse.token.*` catalog sections. 4. Extend `friendly::keys::KEYS_AND_PLACEHOLDERS` with the new keys. 5. Rewrite `dispatch_dsl`'s error-render arm in `app.rs` to compose the three blocks per §2 (or §5 fallback). 6. Add tests: - Unit: every registered usage entry resolves through the catalog. Every `Keyword` variant has a `parse.token.keyword.*` entry. - Integration (`tests/parse_error_pedagogy.rs`, new): `create`, `add`, `update Customers`, `frobulate Customers`, `create table` (no PK clause), `insert into T` (no values), each producing the expected three-block output. 7. Update or port the two existing structural-error tests in `parser.rs::tests` to the new render shape. ### Things that interact subtly - **The "deepest consumed keyword" mechanism** (§3) walks the prefix once per parse failure. Cheap; no perf concern. But it must not pick up keywords from inside content that is itself part of a partial AST (e.g. an identifier the user is typing that happens to be the first letters of a keyword); since the lexer commits to identifier-vs-keyword classification before the parser sees tokens, this isn't a real risk. Documented inline. - **Multiple usage entries per `add` / `drop`** are rendered under one `usage:` prefix per §2. This is one of the pedagogically-best parts of the change: the user gets the full family rather than guessing which sibling they wanted. - **`replay`'s special-case parsing** (ADR-0020 §6) is invisible to the usage layer. The user typing `replay` with no path gets the `parse.usage.replay` template. - **`messages` is an app-level command, not a DSL command**, so it is not in the parser registry and doesn't appear in `available commands:`. Same posture as `mode`, `help`, `quit`. Documented in the registry's prelude.