docs: ADR-0042 — continue H1a parse-error pedagogy on the grammar tree
ADR-0020/0021 specified a chumsky-based H1a; ADR-0024 replaced chumsky with the scannerless walker, leaving both obsolete. Mark them superseded (kept as institutional memory) and add ADR-0042, which restates H1a against the architecture as built. ADR-0042 records that H1a is substantially shipped already — per-command usage block, available-commands fallback, source-derived ident slot labels, curated parse.custom.* near-miss messages, and schema-aware [ERR] diagnostics — and defines the remaining work: a verified per-command near-miss matrix (the definition of done), friendlier literal expectation labels that add role context while keeping the exact literal visible, and advanced-mode SQL parse parity (RETURNING scope, CROSS JOIN ON, INSERT…SELECT count), kept distinct from ADR-0019 §OOS-2 engine-error sanitisation. - docs/adr/0020,0021: superseded notes + README entries - docs/adr/0042: new ADR - docs/adr/README.md: index upkeep (ADR-0000 rule)
This commit is contained in:
@@ -0,0 +1,298 @@
|
||||
# ADR-0042: H1a parse-error pedagogy in the grammar-tree era
|
||||
|
||||
## Status
|
||||
|
||||
**Accepted** — 2026-06-03.
|
||||
|
||||
Continues H1a (`requirements.md`) from **ADR-0021**, whose
|
||||
chumsky-based mechanism was superseded by **ADR-0024** (unified
|
||||
grammar tree). ADR-0021's *intent* — surface the grammar of the
|
||||
command at the point of error, not just the next token — is
|
||||
re-stated here against the architecture as actually built, with
|
||||
an inventory of what already ships and a definition of done for
|
||||
the remaining work.
|
||||
|
||||
Cross-references ADR-0019 (friendly-error layer + i18n catalog
|
||||
conventions; H1a output shares the catalog), ADR-0022 (ambient
|
||||
typing assistance, which shares the walker's expected-set
|
||||
machinery), ADR-0024 (the grammar tree), and ADR-0009 (DSL
|
||||
surface conventions; usage templates render in the documented
|
||||
surface form).
|
||||
|
||||
## Context
|
||||
|
||||
### Why a new ADR rather than amending ADR-0021
|
||||
|
||||
ADR-0021 specifies a `UsageEntry` registry in `src/dsl/usage.rs`,
|
||||
`parse.token.*` catalog keys, and a renderer over chumsky's
|
||||
`RichPattern<Token>` expected sets. None of that exists. ADR-0024
|
||||
removed chumsky from the project, deleted `usage.rs`, and folded
|
||||
usage information onto the grammar nodes themselves. Amending
|
||||
ADR-0021 in place would force every reader to mentally translate a
|
||||
dead mechanism; a fresh ADR records the live state directly.
|
||||
ADR-0020 and ADR-0021 keep their superseding notes and remain as
|
||||
institutional memory.
|
||||
|
||||
### What H1a is
|
||||
|
||||
When a learner types something near-correct, the error should
|
||||
*name the missing keyword or clause* and *show the shape of the
|
||||
command*, rather than point a caret at the unexpected character.
|
||||
The user-reported gap: typing `create` once produced
|
||||
`parse error: after \`create\`, expected \`table\`` — structurally
|
||||
true, pedagogically silent.
|
||||
|
||||
### What already ships (the baseline — do not re-build)
|
||||
|
||||
Verified against code on 2026-06-03. The grammar-tree migration
|
||||
delivered most of ADR-0021's intent through different machinery:
|
||||
|
||||
1. **Per-command usage block.** Every `CommandNode` carries
|
||||
`usage_ids: &'static [&'static str]`
|
||||
(`src/dsl/grammar/mod.rs`). On any parse error the renderer
|
||||
emits a `usage:` block listing every form of the matched
|
||||
command family — 38 templates under `parse.usage.*`
|
||||
(`src/friendly/strings/en-US.yaml:499-571`), resolved by
|
||||
`grammar::usage_keys_for_input` and rendered by
|
||||
`render_usage_block` (`src/app.rs:2560`).
|
||||
|
||||
2. **Available-commands fallback.** When no command keyword was
|
||||
consumed, the block becomes
|
||||
`available commands: …` (`parse.available_commands`,
|
||||
`en-US.yaml:493`; `app.rs:2593`).
|
||||
|
||||
3. **Structural error names the consumed prefix and expected
|
||||
set.** `format_walker_error` (`src/dsl/parser.rs:289`) renders
|
||||
`after \`<consumed>\`, expected <set>, found <token|end of
|
||||
input>`, distinguishing incomplete-at-EOF (`at_eof = true`,
|
||||
more input would help) from a definite mid-input mismatch.
|
||||
|
||||
4. **Friendly slot labels for identifiers.** `format_expectation`
|
||||
(`src/dsl/parser.rs:262`) renders `Ident` slots by source —
|
||||
"table name", "column name", "relationship name", "index
|
||||
name", "type" — instead of a bare "identifier" (ADR-0022 stage
|
||||
8c).
|
||||
|
||||
5. **Curated custom messages** for high-value near-misses under
|
||||
`parse.custom.*` (`en-US.yaml:443-478`): `create_table_needs_pk`,
|
||||
`insert_form_a_missing_values` ("looks like Form A — add
|
||||
`values (...)`"), `change_column_flags_exclusive`,
|
||||
`bind_type_mismatch`, the redundant-constraint and
|
||||
alter-add-primary-key cases, etc.
|
||||
|
||||
6. **Schema-aware pre-flight diagnostics** that light the
|
||||
`[ERR]` validity indicator *at typing time* (ADR-0027 /
|
||||
ADR-0033 / ADR-0036): INSERT arity for Forms A/B/C, unknown
|
||||
table/column, type mismatch, `= NULL`, NOT-NULL-missing, and —
|
||||
on the advanced-SQL surface — `cte_arity_mismatch`,
|
||||
`compound_arity_mismatch`, and `projection_alias_misplaced`
|
||||
(`diagnostic.*`, `en-US.yaml:577-620`; walker logic in
|
||||
`src/dsl/walker/mod.rs`).
|
||||
|
||||
7. **Ambient "Next:" hints** and the **simple→advanced cross-mode
|
||||
pointer** (ADR-0022 / `advanced_alternative_note`,
|
||||
`src/input_render.rs`).
|
||||
|
||||
So H1a is *substantially* delivered at the intent level. The
|
||||
handoff's two canonical examples already behave: `insert into T
|
||||
('Oli')` → custom Form-A message; `update T set x=1` → structural
|
||||
"expected `where` or `--all-rows`" + usage block.
|
||||
|
||||
### What remains — the genuine gap
|
||||
|
||||
The remaining work is **systematic verification plus targeted
|
||||
polish**, not a missing feature:
|
||||
|
||||
- **No enumerated coverage guarantee.** Coverage is curated
|
||||
case-by-case; nothing asserts that *every required slot in every
|
||||
command* produces a pedagogically-sound near-miss message.
|
||||
- **Literal expectations render terse.** `Word`/`Literal`/`Punct`/
|
||||
`Flag` slots come out as backticked literals (`` `where` ``,
|
||||
`` `=` ``, `` `--all-rows` ``). Correct, but a learner is helped
|
||||
more by a short prose gloss in select high-value positions.
|
||||
- **Advanced-mode SQL parse pedagogy is thinner** than the DSL
|
||||
surface (RETURNING scope, CTE-arity diagnostic positioning,
|
||||
`CROSS JOIN … ON`, INSERT…SELECT column-count). No other ADR or
|
||||
open issue covers this (ADR-0019 §OOS-2 covers advanced-SQL
|
||||
*engine-error sanitisation* — a different layer).
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Definition of done — a verified near-miss matrix
|
||||
|
||||
H1a is "done" when there is a test matrix that, for **every
|
||||
command in the REGISTRY**, exercises its salient near-miss inputs
|
||||
and asserts the rendered output reads pedagogically. "Salient
|
||||
near-misses" per command means at minimum:
|
||||
|
||||
- the bare entry keyword alone (`create`, `add`, `update`);
|
||||
- each required clause omitted (e.g. `update T set x=1` with no
|
||||
filter rail; `insert into T (cols)` with no `values`);
|
||||
- a wrong token where a specific slot is expected (e.g. a number
|
||||
where a table name belongs);
|
||||
- the zero-prefix / unknown-command case (available-commands
|
||||
fallback).
|
||||
|
||||
The matrix lives in the existing surfaces — `tests/typing_surface/`
|
||||
(snapshot-based, the standalone `typing_surface_matrix` binary) for
|
||||
the typing-time hint/validity view, and
|
||||
`tests/it/parse_error_pedagogy.rs` (the consolidated `it` binary)
|
||||
for the submit-time rendered three-block output. New integration
|
||||
tests go in `tests/it/` per the handoff-57 §3 layout rule — **not**
|
||||
as new top-level `tests/*.rs`.
|
||||
|
||||
Work is **test-first**: add the matrix entry, observe the current
|
||||
rendering, and only then adjust wording/labels where it reads
|
||||
poorly. A near-miss whose current rendering is already good is
|
||||
locked by a snapshot, not rewritten.
|
||||
|
||||
### 2. Friendlier literal expectation labels
|
||||
|
||||
`format_expectation` gains, for high-value keyword/punct positions,
|
||||
an optional prose gloss while **always keeping the exact literal
|
||||
visible** — a learner must still see the precise token to type.
|
||||
The principle: a label may *add* role context, never *replace* the
|
||||
literal.
|
||||
|
||||
Illustrative target (final wording settled per-case against the
|
||||
matrix, as is normal for pedagogical text):
|
||||
|
||||
- `expected \`where\` or \`--all-rows\`` →
|
||||
`expected a filter clause: \`where …\` or \`--all-rows\``
|
||||
- `expected \`values\`` (after a Form-A column list) →
|
||||
already covered by `parse.custom.insert_form_a_missing_values`;
|
||||
the matrix confirms it fires.
|
||||
|
||||
Mechanism (illustrative, finalised at implementation time): a
|
||||
grammar `Word`/`Punct` node may carry an optional expectation-label
|
||||
key, mirroring how `Ident` slots derive a label from
|
||||
`IdentSource`. Absent an override, rendering is unchanged (the
|
||||
backticked literal). This keeps the change additive and per-slot —
|
||||
no blanket reword that would churn the anchor-phrase tests
|
||||
needlessly.
|
||||
|
||||
New glosses are catalog-sourced (`parse.expect.*` or reuse of
|
||||
`parse.usage.*` fragments — chosen at implementation time) so
|
||||
wording stays in `en-US.yaml`, not in code, consistent with
|
||||
ADR-0019.
|
||||
|
||||
### 3. Advanced-mode SQL parse pedagogy — in scope
|
||||
|
||||
The same matrix discipline (§1) extends to the advanced-mode SQL
|
||||
surface. Two of the relevant arity diagnostics **already exist** and
|
||||
must not be re-built — `cte_arity_mismatch` and
|
||||
`compound_arity_mismatch` (`en-US.yaml:590-591`); for these the work
|
||||
is matrix coverage and, for CTE, auditing whether the diagnostic is
|
||||
*positioned at the CTE name* (easiest to fix) rather than the body.
|
||||
The genuine **absences** the pass adds are: `RETURNING` column
|
||||
scope, `CROSS JOIN` rejecting an `ON` clause, and INSERT…SELECT
|
||||
projection/target column-count (verified absent from the catalog
|
||||
2026-06-03). Each gets a matrix entry; fixes land as walker
|
||||
diagnostics or `parse.custom.*` messages following the existing
|
||||
patterns. The `:` one-shot escape (a simple-mode line run once in
|
||||
advanced mode) is part of the advanced surface and gets at least one
|
||||
near-miss matrix entry.
|
||||
|
||||
This stays clear of ADR-0019 §OOS-2 (advanced-SQL *engine-error*
|
||||
sanitisation): §OOS-2 reworks errors raised by *executing* SQL;
|
||||
H1a here concerns errors raised while *parsing* it. If a near-miss
|
||||
turns out to be an engine error rather than a parse error, it is
|
||||
out of H1a scope and noted against §OOS-2 instead.
|
||||
|
||||
### 4. Catalog and anchor-phrase discipline
|
||||
|
||||
All new or reworded user-facing strings go through the i18n catalog
|
||||
(`en-US.yaml`) and the `KEYS_AND_PLACEHOLDERS` validator, per
|
||||
ADR-0019. No engine vocabulary in any string (CLAUDE.md).
|
||||
|
||||
Two anchor styles constrain §2's glosses and both are preserved by
|
||||
its "literal always visible" rule:
|
||||
|
||||
- The **substring assertions** in `src/dsl/parser.rs` tests
|
||||
("after `…`", "expected table name", "found end of input",
|
||||
"unknown type", "expected one of").
|
||||
- The **substring assertions** in `tests/it/parse_error_pedagogy.rs`,
|
||||
which check for backticked literals and usage fragments
|
||||
(e.g. `` `column` ``, `` `1` ``, "create table", "with pk"). This
|
||||
test is `.contains()`-based, not snapshot-based, so a §2 gloss
|
||||
that dropped the bare literal would fail it — which is precisely
|
||||
the regression §2's rule prevents.
|
||||
|
||||
The snapshot-based `tests/typing_surface/` matrix will re-baseline
|
||||
on any §2 wording change (expected; reviewed via `cargo insta`),
|
||||
but the two substring suites above must stay green without edits to
|
||||
their assertions.
|
||||
|
||||
## Out of scope
|
||||
|
||||
1. **Advanced-SQL engine-error sanitisation** — ADR-0019 §OOS-2.
|
||||
2. **Tab completion (I3) and syntax highlighting (I4)** as
|
||||
features — they share the walker but are separate ADRs.
|
||||
3. **Schema-aware "did you mean `Customers`?" spell-correction** —
|
||||
ADR-0021's out-of-scope §2; belongs with I3.
|
||||
4. **Multi-error reporting.** The walker reports the first error
|
||||
and stops; unchanged.
|
||||
5. **`messages`-style verbosity gating of the usage block.** Per
|
||||
ADR-0021 §8 the usage block is always shown; parse errors are
|
||||
exactly when pedagogical surface should be maximal. Unchanged.
|
||||
6. **Auto-generating usage/help text from the grammar.** ADR-0024
|
||||
left help prose hand-curated; templates stay hand-written.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- H1a gains an explicit, enumerated definition of done instead of
|
||||
an open-ended "systematic pass still pending".
|
||||
- The matrix becomes a regression lock: future grammar changes
|
||||
that degrade a near-miss message fail a snapshot.
|
||||
- Literal-label glosses close the last terse-wording gap without a
|
||||
blanket reword.
|
||||
- The advanced-SQL surface reaches parity with the DSL surface for
|
||||
the audience that has switched to raw SQL.
|
||||
|
||||
### Costs
|
||||
|
||||
- Wording iteration across many near-miss cases — but cheap,
|
||||
catalog-driven, and snapshot-guarded.
|
||||
- The §2 per-node label field is one more annotation a new command
|
||||
may set (optional; default unchanged).
|
||||
- Snapshot volume grows; acceptable given the existing ~160-entry
|
||||
typing-surface matrix.
|
||||
|
||||
### Neutral
|
||||
|
||||
- No public API change. `parse_command*` signatures, the
|
||||
`ParseError` shape, and the three-block render path are all
|
||||
unchanged; this ADR adds wording, labels, and tests within them.
|
||||
|
||||
## Implementation notes
|
||||
|
||||
Order of operations (test-first throughout):
|
||||
|
||||
1. Enumerate the per-command near-miss matrix (§1) as failing/asserting
|
||||
tests in `tests/typing_surface/` + `tests/it/parse_error_pedagogy.rs`.
|
||||
Capture current rendering as the starting baseline.
|
||||
2. Triage: which entries read poorly? Only those get wording work.
|
||||
3. Add the optional expectation-label mechanism (§2) and apply it
|
||||
to the high-value keyword/punct positions surfaced in triage.
|
||||
4. Advanced-SQL near-miss audit + fixes (§3), distinguishing parse
|
||||
from engine errors as they arise.
|
||||
5. Catalog validator + anchor-phrase checks stay green (§4).
|
||||
6. Update `requirements.md` H1a with the matrix as the done-marker;
|
||||
flip to `[x]` only when the matrix is complete and green.
|
||||
|
||||
## See also
|
||||
|
||||
- ADR-0021 — Parser-as-source-of-truth for H1a (mechanism
|
||||
superseded; intent continued here).
|
||||
- ADR-0020 — Tokenization layer (superseded by the scannerless
|
||||
walker).
|
||||
- ADR-0024 — Unified grammar tree (the architecture H1a is built
|
||||
on).
|
||||
- ADR-0022 — Ambient typing assistance (shares the expected-set
|
||||
machinery).
|
||||
- ADR-0019 — Friendly-error layer and i18n catalog (§OOS-2 is the
|
||||
adjacent engine-error scope).
|
||||
- ADR-0009 — DSL command-syntax conventions (usage surface form).
|
||||
- `requirements.md` — H1a tracking entry.
|
||||
Reference in New Issue
Block a user