ADR-0020/0021 specified a chumsky-based H1a; ADR-0024 replaced chumsky with the scannerless walker, leaving both obsolete. Mark them superseded (kept as institutional memory) and add ADR-0042, which restates H1a against the architecture as built. ADR-0042 records that H1a is substantially shipped already — per-command usage block, available-commands fallback, source-derived ident slot labels, curated parse.custom.* near-miss messages, and schema-aware [ERR] diagnostics — and defines the remaining work: a verified per-command near-miss matrix (the definition of done), friendlier literal expectation labels that add role context while keeping the exact literal visible, and advanced-mode SQL parse parity (RETURNING scope, CROSS JOIN ON, INSERT…SELECT count), kept distinct from ADR-0019 §OOS-2 engine-error sanitisation. - docs/adr/0020,0021: superseded notes + README entries - docs/adr/0042: new ADR - docs/adr/README.md: index upkeep (ADR-0000 rule)
19 KiB
ADR-0021: Parser-as-source-of-truth for H1a (per-command usage in parse errors)
Status
Mechanism superseded by ADR-0024; H1a scope continued in ADR-0042. Accepted then superseded.
Superseding note (2026-06-03). The intent of this ADR — surface the grammar of the command at the point of error, not just the next token — survived and is largely delivered. The mechanism did not. This ADR specifies a
chumsky-based design: a separateUsageEntryregistry insrc/dsl/usage.rs,parse.token.*catalog keys driven by chumsky'sRichPattern<Token>expected sets, and a renderer over chumsky output. ADR-0024 (unified grammar tree) replaced chumsky with a scannerless walker and folded usage info onto the grammar nodes themselves:usage_idslive on eachCommandNode, the per-commandparse.usage.*templates and theparse.available_commandsfallback ship as designed here, and the expected-set vocabulary (format_expectationinparser.rs) renders directly from walkerExpectationvariants — noUsageEntryregistry, noparse.token.*keys, nosrc/dsl/usage.rs.So: the §1 usage registry, §3 "deepest consumed keyword" mechanism, §4
parse.token.*catalog, and §7 validator details below describe code that does not exist. What shipped equivalently: §1's per-command templates (asusage_ids+parse.usage.*), §2's three-block render (echo+caret / structural error / usage), and §5's available-commands fallback. ADR-0042 picks up H1a from here — it records what is actually shipped and defines the remaining systematic-pass scope against the grammar-tree architecture. Read ADR-0042 for the live plan; this ADR remains as the design rationale for the pedagogy goal.
Original status (historical): Accepted.
Builds on ADR-0020 (tokenization layer). Addresses H1a from
requirements.md — the parse-error pedagogy gap that
ADR-0019's friendly-error layer left untouched.
Cross-references ADR-0019 (i18n catalog conventions; H1a's output goes through the same catalog) and ADR-0009 (DSL syntax conventions; usage templates render in the project's documented surface form).
Context
ADR-0019 dramatically improved engine-error wording.
Parse-error wording is now the visibly-weakest user surface —
the user-reported gap was concrete: typing create produces
parse error: after `create`, expected `table`
The error is structurally correct (chumsky has consumed
create and is now looking for the next required token) but
pedagogically silent. A learner who got this far typed
create because they'd been told that's how new tables are
made; what they need next is the shape of the command, not
a single missing-token pointer.
Comparable observations apply across the whole DSL surface:
add→ expectedcolumnor1(uninformative; user needs the shape ofadd column …ANDadd 1:n relationship …).update Customers→ expectedset(true; butupdate's full grammar withset …,where …,--all-rowsis what the user wants illustrated).frobulate Customers→ expected one ofcreate,drop,add,rename,change,show,insert,update,delete,replay(true after ADR-0020; the available-commands list is now informative, but the no-prefix case wants its own framing — "available commands" rather than "expected").
H1a's job is to close that gap by surfacing the grammar of the command at the point of error, not just the next token.
What ADR-0020 supplies
ADR-0020 lands the lexer + parser-over-tokens architecture. What that buys H1a:
- Aggregated
expectedsets at the failure point (top-levelchoicefailures now list every command-starting keyword, not just one). The user-visible "available commands" list becomes correct without any work in this ADR. - Token-kind error patterns (
RichPattern<Token>instead ofRichPattern<char>). Each pattern renders via a stable catalog key — no per-character humanising. - A canonical entry-token for each command (the first
Keyword(_)consumed). H1a keys per-command usage templates off this token.
What this ADR adds on top
- A registry of per-command usage templates (one declaration per command).
- A renderer that composes the parse error with: caret + structural error wording + matching usage template(s).
- New catalog keys under
parse.usage.*(templates) andparse.token.*(single-token rendering for expected-set joins). - A "no commands consumed" fallback that renders an available-commands list under a different prefix ("available commands:" rather than "expected:") for the zero-prefix case.
Decision
1. Per-command usage template registry
Each command parser is paired with a UsageEntry:
pub struct UsageEntry {
/// First keyword that distinguishes this command. Used
/// as the registry key.
pub entry: Keyword,
/// Catalog key for the grammar template body (under
/// `parse.usage.*`). One key per command.
pub catalog_key: &'static str,
}
The registry is a &'static [UsageEntry] declared in one
place (src/dsl/usage.rs). Lookup: given a consumed entry
keyword, return all entries whose entry == keyword. For
Keyword::Add the registry returns the add column and
add 1:n relationship entries; for Keyword::Drop it
returns drop table, drop column, drop relationship;
for unique-entry keywords (e.g. Keyword::Create today) it
returns one.
The catalog key is what gets translated. Template bodies
live in src/friendly/strings/en-US.yaml under
parse.usage.*:
parse:
usage:
create_table: "create table <Name> with pk [<col>:<type>[, ...]]"
drop_table: "drop table <Name>"
add_column: "add column [to] [table] <Table>: <Name> (<Type>)"
add_relationship: |
add 1:n relationship [as <Name>]
from <Parent>.<col> to <Child>.<col>
[on delete <action>] [on update <action>]
[--create-fk]
rename_column: "rename column [in] [table] <Table>: <Old> to <New>"
change_column: |
change column [in] [table] <Table>: <Name> (<Type>)
[--force-conversion | --dont-convert]
show_data: "show data <Table>"
show_table: "show table <Table>"
insert: "insert into <Table> [(<col>[, ...])] [values] (<value>[, ...])"
update: "update <Table> set <col>=<value>[, ...] (where <col>=<value> | --all-rows)"
delete: "delete from <Table> (where <col>=<value> | --all-rows)"
drop_column: "drop column [from] [table] <Table>: <Name>"
drop_relationship: |
drop relationship <Name>
drop relationship from <Parent>.<col> to <Child>.<col>
replay: "replay <path> | replay '<path with spaces>'"
(Wording is illustrative; exact phrasing settled at
implementation time. The bracket convention [...] for
optional parts and angle-bracket <...> for placeholders
matches ADR-0009's documentation surface.)
2. The renderer composes three blocks
A parse error renders as:
running: <user input>
^ ← caret (existing, unchanged)
parse error: <structural-or-content message>
usage: <template1>
<template2> ← when multiple entries share the entry keyword
Block 1 (the echo + caret) is unchanged from today.
Block 2 is the structural or content error. ADR-0020
guarantees the structural error is now properly aggregated
("expected data or table" not "expected table"). The
content errors (unknown type, mutually-exclusive flags) are
unchanged in voice.
Block 3 (usage:) is new. It is rendered if and only if at
least one keyword token was consumed before the parser
failed AND that keyword is a registered entry. If no keyword
was consumed (e.g., frobulate Customers, where frobulate
is an Identifier, not a Keyword), Block 3 is replaced
with the no-prefix fallback (§5).
If multiple entries match (e.g., the add family), all are
listed under a single usage: prefix, one per line.
3. Identifying the consumed entry keyword
The parser surfaces, alongside the ParseError, the
deepest successfully-consumed keyword token. Mechanism:
parse_tokensreturns(Result<Command, ParseError>, ParseDiagnostics)whereParseDiagnosticscarries the furthest position chumsky reached AND a snapshot of the consumed prefix.- The renderer walks the consumed prefix backward to find the
first
Keyword(_)token. (Almost always the first token, but a future grammar where a command starts with a literal — none today — would still resolve correctly.)
This logic lives in src/dsl/usage.rs::matched_entry() so
the registry and the lookup sit together.
4. parse.token.* — single-token catalog vocabulary
Chumsky's expected-set rendering needs a name for each token
kind. Today humanise() hand-codes these
(describe_pattern returns "create", "identifier", etc.).
ADR-0021 moves the vocabulary into the catalog:
parse:
token:
# Keywords — one entry per Keyword enum variant.
keyword.create: "`create`"
keyword.table: "`table`"
keyword.with: "`with`"
# ... one per Keyword variant ...
# Punctuation.
punct.colon: "`:`"
punct.open_paren: "`(`"
punct.close_paren: "`)`"
punct.comma: "`,`"
punct.equals: "`=`"
punct.dot: "`.`"
# Token-class labels.
identifier: "identifier"
number: "number"
string_literal: "string literal"
flag: "flag (--name)"
end_of_input: "end of input"
# Lexer-error tokens.
error.unterminated_string: "unterminated string starting at column {column}"
error.unknown_char: "unrecognised character {found}"
Joining ("a, b, or c") stays in code (oxford_or from
the current humanise machinery, lifted intact). Wording of
each token is in the catalog.
parse.error (existing wrapper key) stays. Its {detail}
placeholder is filled by:
{consumed_prefix} expected {oxford_or(expected)}, found {found_token}
— each piece sourced from the catalog, joined in code.
parse.caret (existing) and parse.empty (existing)
unchanged.
5. No-prefix fallback: "available commands"
When the parser fails with no keyword consumed, the "expected" set lists every top-level command-starting keyword. That's correct but the framing should be "available commands" rather than "expected".
Renderer detects this case (consumed-keyword count == 0) and substitutes Block 3 with:
available commands: create, drop, add, rename, change,
show, insert, update, delete, replay
via a new catalog key:
parse:
available_commands: "available commands: {commands}"
The list is the alphabetised set of entry keywords from
the usage registry, each rendered via its parse.token.keyword.*
catalog entry (so the strings are catalog-sourced, not
hard-coded).
This case only fires when the user typed something the parser couldn't classify as any known command keyword — the "frobulate Customers" case. It's both rarer and more useful than the with-prefix case: a user this lost benefits more from the full menu than from a missing-token pointer.
6. Anchor-phrase compliance (ADR-0019 §10)
ADR-0019's anchor-phrase list contains nine substrings the
catalog commits to keeping stable. None are parse-error-specific,
so this ADR doesn't add to the list. The existing parser
test that asserts on "unknown type" and "expected one of"
substrings stays — those come from Type::from_str's custom
error message which ADR-0020 §4 keeps unchanged.
The current structural-error tests assert on substrings like
"after show data", "expected identifier", "found end of
input", "after change column Rich", "expected :". The
new render shape preserves all of these — the rendering
template is {prefix} expected {set}, found {token} and
the prefix / set / token come from the catalog with the same
wording. Tests should port unchanged or with at most minor
adjustments.
7. Catalog validator covers the new keys
ADR-0019 §8.6's KEYS_AND_PLACEHOLDERS validator extends
to cover:
- Every
parse.usage.<command>key referenced from the registry exists. - Every
parse.token.keyword.<variant>key for everyKeywordenum variant exists. - Every
parse.token.punct.<variant>key for everyPunctvariant exists. - The
parse.token.{identifier, number, string_literal, flag, end_of_input}keys exist. - The
parse.token.error.*keys exist for everyLexErrorKindvariant. - The
parse.available_commandskey exists. - No format specifiers (already enforced).
- No engine vocabulary (already enforced).
8. The usage: block respects the verbosity setting?
No. The messages (short|verbose) setting (ADR-0019)
governs engine-error verbosity (whether to render the
hint block of a FriendlyError). Parse errors don't go
through FriendlyError; they have their own render path,
and the usage block is always shown. Rationale: a learner
toggling to messages short is signalling they recognise
the engine-error patterns and want less explanation around
those — they're not signalling that they want less
parse-help. Parse errors mean the user couldn't even
formulate a runnable command; that's exactly the moment to
maximise pedagogical surface, regardless of the
engine-error verbosity preference.
If experience shows this is wrong, a future amendment can gate the usage block on a separate setting. Doesn't need to be designed now.
Out of scope
- Tab completion (I3) and syntax highlighting (I4) themselves. ADR-0020 §9-10 commits to the parser contract; ADR-0021 doesn't extend it.
- Schema-aware suggestions ("did you mean
Customers?" when the user typedCustomrs). Useful but a separate feature; would land in I3 territory (completion + spell check share a candidate list). - Suggested fixes ("change
cretetocreate"). Same bucket as schema-aware suggestions. - Multi-error reporting. Today and after this ADR, the parser reports the first error and stops. Recovery-based multi-error parsing is out of scope and re-opens with I3's ADR (ADR-0020 §11).
- Persisting the verbosity setting (which doesn't affect parse errors anyway, per §8). ADR-0019 deferred it to a future settings ADR.
Consequences
Positive
- Per-command usage at point of failure. A learner who
types
createsees the fullcreate tablegrammar instead of "expectedtable". The user-reported gap closes. - Aggregated
available commandsfor cold starts.frobulate Customersnow lists the ten command-starting keywords under a sensible framing. - Vocabulary lives in the catalog, not in code. Renaming a keyword's user-facing wording is one YAML edit. Adding a new keyword adds two lines (registry + token-name key); the validator catches both if forgotten.
- The render path simplifies.
humanise()shrinks to a small composer over catalog lookups — no per-character description, noRichPatternwalking, no prefer-custom-over-structural switching (the latter becomes "render the structural error and append the usage template"). - Composes with ADR-0019's
FriendlyError. Engine errors and parse errors are rendered through different paths but both go through the catalog, so vocabulary drift between them is impossible.
Costs
- A second registry to keep in sync with the parser. The validator (§7) catches missing usage entries / missing token keys at test time, but adding a new command means three steps (parser combinator, usage-registry entry, catalog YAML edit). Mitigation: a unit test asserts every command in the parser has a registry entry (catches forgotten entries; matches the friendly-module pattern).
- Catalog grows by ~30-40 entries (one usage template
per command, one keyword name per
Keywordvariant, a handful of token-class names, a handful of error names). Each entry is one line of YAML; total catalog grows from ~170 entries to ~210. Within budget. - Wording iteration on the usage templates will probably happen post-merge. This is normal for pedagogical text and the catalog makes it cheap.
Neutral
- Public parser API is unchanged.
parse_command(&str)signature stable. The newlexandparse_tokensfunctions exposed by ADR-0020 are the I3/I4 hook; ADR-0021 doesn't add to that surface. AppEventshape unchanged. Parse errors continue to flow throughdispatch_dsl's existing path (push echo, push caret, push error). This ADR's render changes are internal to that function plus thet!()calls inside it.
Implementation notes
Order of operations (within the joint ADR-0020 + ADR-0021 implementation session)
- Land ADR-0020 (lexer + parser refactor + minimal humaniser).
- Add
src/dsl/usage.rswith the registry struct, the static table, andmatched_entry(). - Populate
parse.usage.*andparse.token.*catalog sections. - Extend
friendly::keys::KEYS_AND_PLACEHOLDERSwith the new keys. - Rewrite
dispatch_dsl's error-render arm inapp.rsto compose the three blocks per §2 (or §5 fallback). - Add tests:
- Unit: every registered usage entry resolves through the
catalog. Every
Keywordvariant has aparse.token.keyword.*entry. - Integration (
tests/parse_error_pedagogy.rs, new):create,add,update Customers,frobulate Customers,create table(no PK clause),insert into T(no values), each producing the expected three-block output.
- Unit: every registered usage entry resolves through the
catalog. Every
- Update or port the two existing structural-error tests
in
parser.rs::teststo the new render shape.
Things that interact subtly
- The "deepest consumed keyword" mechanism (§3) walks the prefix once per parse failure. Cheap; no perf concern. But it must not pick up keywords from inside content that is itself part of a partial AST (e.g. an identifier the user is typing that happens to be the first letters of a keyword); since the lexer commits to identifier-vs-keyword classification before the parser sees tokens, this isn't a real risk. Documented inline.
- Multiple usage entries per
add/dropare rendered under oneusage:prefix per §2. This is one of the pedagogically-best parts of the change: the user gets the full family rather than guessing which sibling they wanted. replay's special-case parsing (ADR-0020 §6) is invisible to the usage layer. The user typingreplaywith no path gets theparse.usage.replaytemplate.messagesis an app-level command, not a DSL command, so it is not in the parser registry and doesn't appear inavailable commands:. Same posture asmode,help,quit. Documented in the registry's prelude.