0e6f767848
ADR-0020/0021 specified a chumsky-based H1a; ADR-0024 replaced chumsky with the scannerless walker, leaving both obsolete. Mark them superseded (kept as institutional memory) and add ADR-0042, which restates H1a against the architecture as built. ADR-0042 records that H1a is substantially shipped already — per-command usage block, available-commands fallback, source-derived ident slot labels, curated parse.custom.* near-miss messages, and schema-aware [ERR] diagnostics — and defines the remaining work: a verified per-command near-miss matrix (the definition of done), friendlier literal expectation labels that add role context while keeping the exact literal visible, and advanced-mode SQL parse parity (RETURNING scope, CROSS JOIN ON, INSERT…SELECT count), kept distinct from ADR-0019 §OOS-2 engine-error sanitisation. - docs/adr/0020,0021: superseded notes + README entries - docs/adr/0042: new ADR - docs/adr/README.md: index upkeep (ADR-0000 rule)
483 lines
19 KiB
Markdown
483 lines
19 KiB
Markdown
# ADR-0021: Parser-as-source-of-truth for H1a (per-command usage in parse errors)
|
|
|
|
## Status
|
|
|
|
**Mechanism superseded by ADR-0024; H1a scope continued in ADR-0042.**
|
|
Accepted then superseded.
|
|
|
|
> **Superseding note (2026-06-03).** The *intent* of this ADR — surface
|
|
> the grammar of the command at the point of error, not just the next
|
|
> token — survived and is largely delivered. The *mechanism* did not.
|
|
> This ADR specifies a `chumsky`-based design: a separate `UsageEntry`
|
|
> registry in `src/dsl/usage.rs`, `parse.token.*` catalog keys driven
|
|
> by chumsky's `RichPattern<Token>` expected sets, and a renderer over
|
|
> chumsky output. ADR-0024 (unified grammar tree) replaced chumsky with
|
|
> a scannerless walker and **folded usage info onto the grammar nodes
|
|
> themselves**: `usage_ids` live on each `CommandNode`, the per-command
|
|
> `parse.usage.*` templates and the `parse.available_commands` fallback
|
|
> ship as designed here, and the expected-set vocabulary
|
|
> (`format_expectation` in `parser.rs`) renders directly from walker
|
|
> `Expectation` variants — no `UsageEntry` registry, no `parse.token.*`
|
|
> keys, no `src/dsl/usage.rs`.
|
|
>
|
|
> So: the §1 usage registry, §3 "deepest consumed keyword" mechanism,
|
|
> §4 `parse.token.*` catalog, and §7 validator details below describe
|
|
> code that does not exist. What shipped equivalently: §1's per-command
|
|
> templates (as `usage_ids` + `parse.usage.*`), §2's three-block render
|
|
> (echo+caret / structural error / usage), and §5's available-commands
|
|
> fallback. **ADR-0042 picks up H1a from here** — it records what is
|
|
> actually shipped and defines the remaining systematic-pass scope
|
|
> against the grammar-tree architecture. Read ADR-0042 for the live
|
|
> plan; this ADR remains as the design rationale for the pedagogy goal.
|
|
|
|
---
|
|
|
|
*Original status (historical):* Accepted.
|
|
|
|
Builds on ADR-0020 (tokenization layer). Addresses H1a from
|
|
`requirements.md` — the parse-error pedagogy gap that
|
|
ADR-0019's friendly-error layer left untouched.
|
|
|
|
Cross-references ADR-0019 (i18n catalog conventions; H1a's
|
|
output goes through the same catalog) and ADR-0009 (DSL
|
|
syntax conventions; usage templates render in the project's
|
|
documented surface form).
|
|
|
|
## Context
|
|
|
|
ADR-0019 dramatically improved engine-error wording.
|
|
Parse-error wording is now the visibly-weakest user surface —
|
|
the user-reported gap was concrete: typing `create` produces
|
|
|
|
```
|
|
parse error: after `create`, expected `table`
|
|
```
|
|
|
|
The error is *structurally* correct (chumsky has consumed
|
|
`create` and is now looking for the next required token) but
|
|
*pedagogically* silent. A learner who got this far typed
|
|
`create` because they'd been told that's how new tables are
|
|
made; what they need next is the shape of the command, not
|
|
a single missing-token pointer.
|
|
|
|
Comparable observations apply across the whole DSL surface:
|
|
|
|
- `add` → expected `column` or `1` (uninformative; user
|
|
needs the shape of `add column …` AND `add 1:n
|
|
relationship …`).
|
|
- `update Customers` → expected `set` (true; but `update`'s
|
|
full grammar with `set …`, `where …`, `--all-rows` is what
|
|
the user wants illustrated).
|
|
- `frobulate Customers` → expected one of `create`, `drop`,
|
|
`add`, `rename`, `change`, `show`, `insert`, `update`,
|
|
`delete`, `replay` (true after ADR-0020; the available-commands
|
|
list is now informative, but the no-prefix case wants its
|
|
own framing — "available commands" rather than "expected").
|
|
|
|
H1a's job is to close that gap by surfacing the **grammar**
|
|
of the command at the point of error, not just the next
|
|
token.
|
|
|
|
### What ADR-0020 supplies
|
|
|
|
ADR-0020 lands the lexer + parser-over-tokens architecture.
|
|
What that buys H1a:
|
|
|
|
- **Aggregated `expected` sets at the failure point** (top-level
|
|
`choice` failures now list every command-starting keyword,
|
|
not just one). The user-visible "available commands" list
|
|
becomes correct without any work in this ADR.
|
|
- **Token-kind error patterns** (`RichPattern<Token>` instead
|
|
of `RichPattern<char>`). Each pattern renders via a stable
|
|
catalog key — no per-character humanising.
|
|
- **A canonical entry-token for each command** (the first
|
|
`Keyword(_)` consumed). H1a keys per-command usage
|
|
templates off this token.
|
|
|
|
### What this ADR adds on top
|
|
|
|
- A registry of per-command **usage templates** (one
|
|
declaration per command).
|
|
- A renderer that composes the parse error with: caret +
|
|
structural error wording + matching usage template(s).
|
|
- New catalog keys under `parse.usage.*` (templates) and
|
|
`parse.token.*` (single-token rendering for expected-set
|
|
joins).
|
|
- A "no commands consumed" fallback that renders an
|
|
available-commands list under a different prefix
|
|
("available commands:" rather than "expected:") for the
|
|
zero-prefix case.
|
|
|
|
## Decision
|
|
|
|
### 1. Per-command usage template registry
|
|
|
|
Each command parser is paired with a `UsageEntry`:
|
|
|
|
```rust
|
|
pub struct UsageEntry {
|
|
/// First keyword that distinguishes this command. Used
|
|
/// as the registry key.
|
|
pub entry: Keyword,
|
|
/// Catalog key for the grammar template body (under
|
|
/// `parse.usage.*`). One key per command.
|
|
pub catalog_key: &'static str,
|
|
}
|
|
```
|
|
|
|
The registry is a `&'static [UsageEntry]` declared in one
|
|
place (`src/dsl/usage.rs`). Lookup: given a consumed entry
|
|
keyword, return all entries whose `entry == keyword`. For
|
|
`Keyword::Add` the registry returns the `add column` and
|
|
`add 1:n relationship` entries; for `Keyword::Drop` it
|
|
returns `drop table`, `drop column`, `drop relationship`;
|
|
for unique-entry keywords (e.g. `Keyword::Create` today) it
|
|
returns one.
|
|
|
|
The catalog key is what gets translated. Template bodies
|
|
live in `src/friendly/strings/en-US.yaml` under
|
|
`parse.usage.*`:
|
|
|
|
```yaml
|
|
parse:
|
|
usage:
|
|
create_table: "create table <Name> with pk [<col>:<type>[, ...]]"
|
|
drop_table: "drop table <Name>"
|
|
add_column: "add column [to] [table] <Table>: <Name> (<Type>)"
|
|
add_relationship: |
|
|
add 1:n relationship [as <Name>]
|
|
from <Parent>.<col> to <Child>.<col>
|
|
[on delete <action>] [on update <action>]
|
|
[--create-fk]
|
|
rename_column: "rename column [in] [table] <Table>: <Old> to <New>"
|
|
change_column: |
|
|
change column [in] [table] <Table>: <Name> (<Type>)
|
|
[--force-conversion | --dont-convert]
|
|
show_data: "show data <Table>"
|
|
show_table: "show table <Table>"
|
|
insert: "insert into <Table> [(<col>[, ...])] [values] (<value>[, ...])"
|
|
update: "update <Table> set <col>=<value>[, ...] (where <col>=<value> | --all-rows)"
|
|
delete: "delete from <Table> (where <col>=<value> | --all-rows)"
|
|
drop_column: "drop column [from] [table] <Table>: <Name>"
|
|
drop_relationship: |
|
|
drop relationship <Name>
|
|
drop relationship from <Parent>.<col> to <Child>.<col>
|
|
replay: "replay <path> | replay '<path with spaces>'"
|
|
```
|
|
|
|
(Wording is illustrative; exact phrasing settled at
|
|
implementation time. The bracket convention `[...]` for
|
|
optional parts and angle-bracket `<...>` for placeholders
|
|
matches ADR-0009's documentation surface.)
|
|
|
|
### 2. The renderer composes three blocks
|
|
|
|
A parse error renders as:
|
|
|
|
```
|
|
running: <user input>
|
|
^ ← caret (existing, unchanged)
|
|
parse error: <structural-or-content message>
|
|
usage: <template1>
|
|
<template2> ← when multiple entries share the entry keyword
|
|
```
|
|
|
|
Block 1 (the echo + caret) is unchanged from today.
|
|
|
|
Block 2 is the structural or content error. ADR-0020
|
|
guarantees the structural error is now properly aggregated
|
|
("expected `data` or `table`" not "expected `table`"). The
|
|
content errors (unknown type, mutually-exclusive flags) are
|
|
unchanged in voice.
|
|
|
|
Block 3 (usage:) is new. It is rendered if and only if **at
|
|
least one keyword token was consumed** before the parser
|
|
failed AND that keyword is a registered entry. If no keyword
|
|
was consumed (e.g., `frobulate Customers`, where `frobulate`
|
|
is an `Identifier`, not a `Keyword`), Block 3 is replaced
|
|
with the no-prefix fallback (§5).
|
|
|
|
If multiple entries match (e.g., the `add` family), all are
|
|
listed under a single `usage:` prefix, one per line.
|
|
|
|
### 3. Identifying the consumed entry keyword
|
|
|
|
The parser surfaces, alongside the `ParseError`, the
|
|
**deepest successfully-consumed keyword token**. Mechanism:
|
|
|
|
- `parse_tokens` returns `(Result<Command, ParseError>,
|
|
ParseDiagnostics)` where `ParseDiagnostics` carries the
|
|
furthest position chumsky reached AND a snapshot of the
|
|
consumed prefix.
|
|
- The renderer walks the consumed prefix backward to find the
|
|
first `Keyword(_)` token. (Almost always the first token,
|
|
but a future grammar where a command starts with a
|
|
literal — none today — would still resolve correctly.)
|
|
|
|
This logic lives in `src/dsl/usage.rs::matched_entry()` so
|
|
the registry and the lookup sit together.
|
|
|
|
### 4. `parse.token.*` — single-token catalog vocabulary
|
|
|
|
Chumsky's expected-set rendering needs a name for each token
|
|
kind. Today `humanise()` hand-codes these
|
|
(`describe_pattern` returns "`create`", "identifier", etc.).
|
|
ADR-0021 moves the vocabulary into the catalog:
|
|
|
|
```yaml
|
|
parse:
|
|
token:
|
|
# Keywords — one entry per Keyword enum variant.
|
|
keyword.create: "`create`"
|
|
keyword.table: "`table`"
|
|
keyword.with: "`with`"
|
|
# ... one per Keyword variant ...
|
|
|
|
# Punctuation.
|
|
punct.colon: "`:`"
|
|
punct.open_paren: "`(`"
|
|
punct.close_paren: "`)`"
|
|
punct.comma: "`,`"
|
|
punct.equals: "`=`"
|
|
punct.dot: "`.`"
|
|
|
|
# Token-class labels.
|
|
identifier: "identifier"
|
|
number: "number"
|
|
string_literal: "string literal"
|
|
flag: "flag (--name)"
|
|
end_of_input: "end of input"
|
|
|
|
# Lexer-error tokens.
|
|
error.unterminated_string: "unterminated string starting at column {column}"
|
|
error.unknown_char: "unrecognised character {found}"
|
|
```
|
|
|
|
Joining ("`a`, `b`, or `c`") stays in code (`oxford_or` from
|
|
the current humanise machinery, lifted intact). Wording of
|
|
each token is in the catalog.
|
|
|
|
`parse.error` (existing wrapper key) stays. Its `{detail}`
|
|
placeholder is filled by:
|
|
|
|
```
|
|
{consumed_prefix} expected {oxford_or(expected)}, found {found_token}
|
|
```
|
|
|
|
— each piece sourced from the catalog, joined in code.
|
|
|
|
`parse.caret` (existing) and `parse.empty` (existing)
|
|
unchanged.
|
|
|
|
### 5. No-prefix fallback: "available commands"
|
|
|
|
When the parser fails with **no keyword consumed**, the
|
|
"expected" set lists every top-level command-starting
|
|
keyword. That's correct but the framing should be
|
|
"available commands" rather than "expected".
|
|
|
|
Renderer detects this case (consumed-keyword count == 0) and
|
|
substitutes Block 3 with:
|
|
|
|
```
|
|
available commands: create, drop, add, rename, change,
|
|
show, insert, update, delete, replay
|
|
```
|
|
|
|
via a new catalog key:
|
|
|
|
```yaml
|
|
parse:
|
|
available_commands: "available commands: {commands}"
|
|
```
|
|
|
|
The list is the alphabetised set of `entry` keywords from
|
|
the usage registry, each rendered via its `parse.token.keyword.*`
|
|
catalog entry (so the strings are catalog-sourced, not
|
|
hard-coded).
|
|
|
|
This case only fires when the user typed something the
|
|
parser couldn't classify as any known command keyword — the
|
|
"frobulate Customers" case. It's both rarer and more useful
|
|
than the with-prefix case: a user this lost benefits more
|
|
from the full menu than from a missing-token pointer.
|
|
|
|
### 6. Anchor-phrase compliance (ADR-0019 §10)
|
|
|
|
ADR-0019's anchor-phrase list contains nine substrings the
|
|
catalog commits to keeping stable. None are parse-error-specific,
|
|
so this ADR doesn't add to the list. The existing parser
|
|
test that asserts on "unknown type" and "expected one of"
|
|
substrings stays — those come from `Type::from_str`'s custom
|
|
error message which ADR-0020 §4 keeps unchanged.
|
|
|
|
The current structural-error tests assert on substrings like
|
|
"after `show data`", "expected identifier", "found end of
|
|
input", "after `change column Rich`", "expected `:`". The
|
|
new render shape preserves all of these — the rendering
|
|
template is `{prefix} expected {set}, found {token}` and
|
|
the prefix / set / token come from the catalog with the same
|
|
wording. Tests should port unchanged or with at most minor
|
|
adjustments.
|
|
|
|
### 7. Catalog validator covers the new keys
|
|
|
|
ADR-0019 §8.6's `KEYS_AND_PLACEHOLDERS` validator extends
|
|
to cover:
|
|
|
|
- Every `parse.usage.<command>` key referenced from the
|
|
registry exists.
|
|
- Every `parse.token.keyword.<variant>` key for every
|
|
`Keyword` enum variant exists.
|
|
- Every `parse.token.punct.<variant>` key for every `Punct`
|
|
variant exists.
|
|
- The `parse.token.{identifier, number, string_literal,
|
|
flag, end_of_input}` keys exist.
|
|
- The `parse.token.error.*` keys exist for every
|
|
`LexErrorKind` variant.
|
|
- The `parse.available_commands` key exists.
|
|
- No format specifiers (already enforced).
|
|
- No engine vocabulary (already enforced).
|
|
|
|
### 8. The `usage:` block respects the verbosity setting?
|
|
|
|
No. The `messages (short|verbose)` setting (ADR-0019)
|
|
governs *engine-error* verbosity (whether to render the
|
|
hint block of a `FriendlyError`). Parse errors don't go
|
|
through `FriendlyError`; they have their own render path,
|
|
and the usage block is always shown. Rationale: a learner
|
|
toggling to `messages short` is signalling they recognise
|
|
the engine-error patterns and want less explanation around
|
|
those — they're not signalling that they want less
|
|
parse-help. Parse errors mean the user couldn't even
|
|
formulate a runnable command; that's exactly the moment to
|
|
maximise pedagogical surface, regardless of the
|
|
engine-error verbosity preference.
|
|
|
|
If experience shows this is wrong, a future amendment can
|
|
gate the usage block on a separate setting. Doesn't need
|
|
to be designed now.
|
|
|
|
## Out of scope
|
|
|
|
1. **Tab completion (I3) and syntax highlighting (I4)**
|
|
themselves. ADR-0020 §9-10 commits to the parser
|
|
contract; ADR-0021 doesn't extend it.
|
|
2. **Schema-aware suggestions** ("did you mean `Customers`?"
|
|
when the user typed `Customrs`). Useful but a separate
|
|
feature; would land in I3 territory (completion + spell
|
|
check share a candidate list).
|
|
3. **Suggested fixes** ("change `crete` to `create`"). Same
|
|
bucket as schema-aware suggestions.
|
|
4. **Multi-error reporting.** Today and after this ADR, the
|
|
parser reports the first error and stops. Recovery-based
|
|
multi-error parsing is out of scope and re-opens with
|
|
I3's ADR (ADR-0020 §11).
|
|
5. **Persisting the verbosity setting** (which doesn't
|
|
affect parse errors anyway, per §8). ADR-0019 deferred it
|
|
to a future settings ADR.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Per-command usage at point of failure.** A learner who
|
|
types `create` sees the full `create table` grammar
|
|
instead of "expected `table`". The user-reported gap
|
|
closes.
|
|
- **Aggregated `available commands` for cold starts.**
|
|
`frobulate Customers` now lists the ten command-starting
|
|
keywords under a sensible framing.
|
|
- **Vocabulary lives in the catalog, not in code.** Renaming
|
|
a keyword's user-facing wording is one YAML edit. Adding
|
|
a new keyword adds two lines (registry + token-name key);
|
|
the validator catches both if forgotten.
|
|
- **The render path simplifies.** `humanise()` shrinks to a
|
|
small composer over catalog lookups — no per-character
|
|
description, no `RichPattern` walking, no
|
|
prefer-custom-over-structural switching (the latter
|
|
becomes "render the structural error and append the usage
|
|
template").
|
|
- **Composes with ADR-0019's `FriendlyError`.** Engine
|
|
errors and parse errors are rendered through different
|
|
paths but both go through the catalog, so vocabulary
|
|
drift between them is impossible.
|
|
|
|
### Costs
|
|
|
|
- **A second registry to keep in sync** with the parser. The
|
|
validator (§7) catches missing usage entries / missing
|
|
token keys at test time, but adding a new command means
|
|
three steps (parser combinator, usage-registry entry,
|
|
catalog YAML edit). Mitigation: a unit test asserts every
|
|
command in the parser has a registry entry (catches
|
|
forgotten entries; matches the friendly-module pattern).
|
|
- **Catalog grows by ~30-40 entries** (one usage template
|
|
per command, one keyword name per `Keyword` variant, a
|
|
handful of token-class names, a handful of error names).
|
|
Each entry is one line of YAML; total catalog grows from
|
|
~170 entries to ~210. Within budget.
|
|
- **Wording iteration** on the usage templates will probably
|
|
happen post-merge. This is normal for pedagogical text
|
|
and the catalog makes it cheap.
|
|
|
|
### Neutral
|
|
|
|
- **Public parser API is unchanged.** `parse_command(&str)`
|
|
signature stable. The new `lex` and `parse_tokens`
|
|
functions exposed by ADR-0020 are the I3/I4 hook;
|
|
ADR-0021 doesn't add to that surface.
|
|
- **`AppEvent` shape unchanged.** Parse errors continue to
|
|
flow through `dispatch_dsl`'s existing path (push echo,
|
|
push caret, push error). This ADR's render changes are
|
|
internal to that function plus the `t!()` calls inside it.
|
|
|
|
## Implementation notes
|
|
|
|
### Order of operations (within the joint ADR-0020 + ADR-0021 implementation session)
|
|
|
|
1. Land ADR-0020 (lexer + parser refactor + minimal
|
|
humaniser).
|
|
2. Add `src/dsl/usage.rs` with the registry struct, the
|
|
static table, and `matched_entry()`.
|
|
3. Populate `parse.usage.*` and `parse.token.*` catalog
|
|
sections.
|
|
4. Extend `friendly::keys::KEYS_AND_PLACEHOLDERS` with the
|
|
new keys.
|
|
5. Rewrite `dispatch_dsl`'s error-render arm in `app.rs` to
|
|
compose the three blocks per §2 (or §5 fallback).
|
|
6. Add tests:
|
|
- Unit: every registered usage entry resolves through the
|
|
catalog. Every `Keyword` variant has a `parse.token.keyword.*`
|
|
entry.
|
|
- Integration (`tests/parse_error_pedagogy.rs`, new):
|
|
`create`, `add`, `update Customers`, `frobulate
|
|
Customers`, `create table` (no PK clause), `insert into
|
|
T` (no values), each producing the expected
|
|
three-block output.
|
|
7. Update or port the two existing structural-error tests
|
|
in `parser.rs::tests` to the new render shape.
|
|
|
|
### Things that interact subtly
|
|
|
|
- **The "deepest consumed keyword" mechanism** (§3) walks
|
|
the prefix once per parse failure. Cheap; no perf concern.
|
|
But it must not pick up keywords from inside content that
|
|
is itself part of a partial AST (e.g. an identifier the
|
|
user is typing that happens to be the first letters of a
|
|
keyword); since the lexer commits to identifier-vs-keyword
|
|
classification before the parser sees tokens, this isn't a
|
|
real risk. Documented inline.
|
|
- **Multiple usage entries per `add` / `drop`** are rendered
|
|
under one `usage:` prefix per §2. This is one of the
|
|
pedagogically-best parts of the change: the user gets the
|
|
full family rather than guessing which sibling they
|
|
wanted.
|
|
- **`replay`'s special-case parsing** (ADR-0020 §6) is
|
|
invisible to the usage layer. The user typing `replay`
|
|
with no path gets the `parse.usage.replay` template.
|
|
- **`messages` is an app-level command, not a DSL command**,
|
|
so it is not in the parser registry and doesn't appear in
|
|
`available commands:`. Same posture as `mode`, `help`,
|
|
`quit`. Documented in the registry's prelude.
|