# ADR-0042: H1a parse-error pedagogy in the grammar-tree era ## Status **Accepted** — 2026-06-03. Continues H1a (`requirements.md`) from **ADR-0021**, whose chumsky-based mechanism was superseded by **ADR-0024** (unified grammar tree). ADR-0021's *intent* — surface the grammar of the command at the point of error, not just the next token — is re-stated here against the architecture as actually built, with an inventory of what already ships and a definition of done for the remaining work. Cross-references ADR-0019 (friendly-error layer + i18n catalog conventions; H1a output shares the catalog), ADR-0022 (ambient typing assistance, which shares the walker's expected-set machinery), ADR-0024 (the grammar tree), and ADR-0009 (DSL surface conventions; usage templates render in the documented surface form). ## Context ### Why a new ADR rather than amending ADR-0021 ADR-0021 specifies a `UsageEntry` registry in `src/dsl/usage.rs`, `parse.token.*` catalog keys, and a renderer over chumsky's `RichPattern` expected sets. None of that exists. ADR-0024 removed chumsky from the project, deleted `usage.rs`, and folded usage information onto the grammar nodes themselves. Amending ADR-0021 in place would force every reader to mentally translate a dead mechanism; a fresh ADR records the live state directly. ADR-0020 and ADR-0021 keep their superseding notes and remain as institutional memory. ### What H1a is When a learner types something near-correct, the error should *name the missing keyword or clause* and *show the shape of the command*, rather than point a caret at the unexpected character. The user-reported gap: typing `create` once produced `parse error: after \`create\`, expected \`table\`` — structurally true, pedagogically silent. ### What already ships (the baseline — do not re-build) Verified against code on 2026-06-03. The grammar-tree migration delivered most of ADR-0021's intent through different machinery: 1. **Per-command usage block.** Every `CommandNode` carries `usage_ids: &'static [&'static str]` (`src/dsl/grammar/mod.rs`). On any parse error the renderer emits a `usage:` block listing every form of the matched command family — 38 templates under `parse.usage.*` (`src/friendly/strings/en-US.yaml:499-571`), resolved by `grammar::usage_keys_for_input` and rendered by `render_usage_block` (`src/app.rs:2560`). 2. **Available-commands fallback.** When no command keyword was consumed, the block becomes `available commands: …` (`parse.available_commands`, `en-US.yaml:493`; `app.rs:2593`). 3. **Structural error names the consumed prefix and expected set.** `format_walker_error` (`src/dsl/parser.rs:289`) renders `after \`\`, expected , found `, distinguishing incomplete-at-EOF (`at_eof = true`, more input would help) from a definite mid-input mismatch. 4. **Friendly slot labels for identifiers.** `format_expectation` (`src/dsl/parser.rs:262`) renders `Ident` slots by source — "table name", "column name", "relationship name", "index name", "type" — instead of a bare "identifier" (ADR-0022 stage 8c). 5. **Curated custom messages** for high-value near-misses under `parse.custom.*` (`en-US.yaml:443-478`): `create_table_needs_pk`, `insert_form_a_missing_values` ("looks like Form A — add `values (...)`"), `change_column_flags_exclusive`, `bind_type_mismatch`, the redundant-constraint and alter-add-primary-key cases, etc. 6. **Schema-aware pre-flight diagnostics** that light the `[ERR]` validity indicator *at typing time* (ADR-0027 / ADR-0033 / ADR-0036): INSERT arity for Forms A/B/C, unknown table/column, type mismatch, `= NULL`, NOT-NULL-missing, and — on the advanced-SQL surface — `cte_arity_mismatch`, `compound_arity_mismatch`, and `projection_alias_misplaced` (`diagnostic.*`, `en-US.yaml:577-620`; walker logic in `src/dsl/walker/mod.rs`). 7. **Ambient "Next:" hints** and the **simple→advanced cross-mode pointer** (ADR-0022 / `advanced_alternative_note`, `src/input_render.rs`). So H1a is *substantially* delivered at the intent level. The handoff's two canonical examples already behave: `insert into T ('Oli')` → custom Form-A message; `update T set x=1` → structural "expected `where` or `--all-rows`" + usage block. ### What remains — the genuine gap The remaining work is **systematic verification plus targeted polish**, not a missing feature: - **No enumerated coverage guarantee.** Coverage is curated case-by-case; nothing asserts that *every required slot in every command* produces a pedagogically-sound near-miss message. - **Literal expectations render terse.** `Word`/`Literal`/`Punct`/ `Flag` slots come out as backticked literals (`` `where` ``, `` `=` ``, `` `--all-rows` ``). Correct, but a learner is helped more by a short prose gloss in select high-value positions. - **Advanced-mode SQL parse pedagogy is thinner** than the DSL surface (RETURNING scope, CTE-arity diagnostic positioning, `CROSS JOIN … ON`, INSERT…SELECT column-count). No other ADR or open issue covers this (ADR-0019 §OOS-2 covers advanced-SQL *engine-error sanitisation* — a different layer). ## Decision ### 1. Definition of done — a verified near-miss matrix H1a is "done" when there is a test matrix that, for **every command in the REGISTRY**, exercises its salient near-miss inputs and asserts the rendered output reads pedagogically. "Salient near-misses" per command means at minimum: - the bare entry keyword alone (`create`, `add`, `update`); - each required clause omitted (e.g. `update T set x=1` with no filter rail; `insert into T (cols)` with no `values`); - a wrong token where a specific slot is expected (e.g. a number where a table name belongs); - the zero-prefix / unknown-command case (available-commands fallback). The matrix lives in the existing surfaces — `tests/typing_surface/` (snapshot-based, the standalone `typing_surface_matrix` binary) for the typing-time hint/validity view, and `tests/it/parse_error_pedagogy.rs` (the consolidated `it` binary) for the submit-time rendered three-block output. New integration tests go in `tests/it/` per the handoff-57 §3 layout rule — **not** as new top-level `tests/*.rs`. Work is **test-first**: add the matrix entry, observe the current rendering, and only then adjust wording/labels where it reads poorly. A near-miss whose current rendering is already good is locked by a snapshot, not rewritten. ### 2. Friendlier literal expectation labels `format_expectation` gains, for high-value keyword/punct positions, an optional prose gloss while **always keeping the exact literal visible** — a learner must still see the precise token to type. The principle: a label may *add* role context, never *replace* the literal. Illustrative target (final wording settled per-case against the matrix, as is normal for pedagogical text): - `expected \`where\` or \`--all-rows\`` → `expected a filter clause: \`where …\` or \`--all-rows\`` - `expected \`values\`` (after a Form-A column list) → already covered by `parse.custom.insert_form_a_missing_values`; the matrix confirms it fires. Mechanism (illustrative, finalised at implementation time): a grammar `Word`/`Punct` node may carry an optional expectation-label key, mirroring how `Ident` slots derive a label from `IdentSource`. Absent an override, rendering is unchanged (the backticked literal). This keeps the change additive and per-slot — no blanket reword that would churn the anchor-phrase tests needlessly. New glosses are catalog-sourced (`parse.expect.*` or reuse of `parse.usage.*` fragments — chosen at implementation time) so wording stays in `en-US.yaml`, not in code, consistent with ADR-0019. ### 3. Advanced-mode SQL parse pedagogy — in scope The same matrix discipline (§1) extends to the advanced-mode SQL surface. Two of the relevant arity diagnostics **already exist** and must not be re-built — `cte_arity_mismatch` and `compound_arity_mismatch` (`en-US.yaml:590-591`); for these the work is matrix coverage and, for CTE, auditing whether the diagnostic is *positioned at the CTE name* (easiest to fix) rather than the body. The genuine **absences** the pass adds are: `RETURNING` column scope, `CROSS JOIN` rejecting an `ON` clause, and INSERT…SELECT projection/target column-count (verified absent from the catalog 2026-06-03). Each gets a matrix entry; fixes land as walker diagnostics or `parse.custom.*` messages following the existing patterns. The `:` one-shot escape (a simple-mode line run once in advanced mode) is part of the advanced surface and gets at least one near-miss matrix entry. This stays clear of ADR-0019 §OOS-2 (advanced-SQL *engine-error* sanitisation): §OOS-2 reworks errors raised by *executing* SQL; H1a here concerns errors raised while *parsing* it. If a near-miss turns out to be an engine error rather than a parse error, it is out of H1a scope and noted against §OOS-2 instead. ### 4. Catalog and anchor-phrase discipline All new or reworded user-facing strings go through the i18n catalog (`en-US.yaml`) and the `KEYS_AND_PLACEHOLDERS` validator, per ADR-0019. No engine vocabulary in any string (CLAUDE.md). Two anchor styles constrain §2's glosses and both are preserved by its "literal always visible" rule: - The **substring assertions** in `src/dsl/parser.rs` tests ("after `…`", "expected table name", "found end of input", "unknown type", "expected one of"). - The **substring assertions** in `tests/it/parse_error_pedagogy.rs`, which check for backticked literals and usage fragments (e.g. `` `column` ``, `` `1` ``, "create table", "with pk"). This test is `.contains()`-based, not snapshot-based, so a §2 gloss that dropped the bare literal would fail it — which is precisely the regression §2's rule prevents. The snapshot-based `tests/typing_surface/` matrix will re-baseline on any §2 wording change (expected; reviewed via `cargo insta`), but the two substring suites above must stay green without edits to their assertions. ## Out of scope 1. **Advanced-SQL engine-error sanitisation** — ADR-0019 §OOS-2. 2. **Tab completion (I3) and syntax highlighting (I4)** as features — they share the walker but are separate ADRs. 3. **Schema-aware "did you mean `Customers`?" spell-correction** — ADR-0021's out-of-scope §2; belongs with I3. 4. **Multi-error reporting.** The walker reports the first error and stops; unchanged. 5. **`messages`-style verbosity gating of the usage block.** Per ADR-0021 §8 the usage block is always shown; parse errors are exactly when pedagogical surface should be maximal. Unchanged. 6. **Auto-generating usage/help text from the grammar.** ADR-0024 left help prose hand-curated; templates stay hand-written. ## Consequences ### Positive - H1a gains an explicit, enumerated definition of done instead of an open-ended "systematic pass still pending". - The matrix becomes a regression lock: future grammar changes that degrade a near-miss message fail a snapshot. - Literal-label glosses close the last terse-wording gap without a blanket reword. - The advanced-SQL surface reaches parity with the DSL surface for the audience that has switched to raw SQL. ### Costs - Wording iteration across many near-miss cases — but cheap, catalog-driven, and snapshot-guarded. - The §2 per-node label field is one more annotation a new command may set (optional; default unchanged). - Snapshot volume grows; acceptable given the existing ~160-entry typing-surface matrix. ### Neutral - No public API change. `parse_command*` signatures, the `ParseError` shape, and the three-block render path are all unchanged; this ADR adds wording, labels, and tests within them. ## Implementation notes Order of operations (test-first throughout): 1. Enumerate the per-command near-miss matrix (§1) as failing/asserting tests in `tests/typing_surface/` + `tests/it/parse_error_pedagogy.rs`. Capture current rendering as the starting baseline. 2. Triage: which entries read poorly? Only those get wording work. 3. Add the optional expectation-label mechanism (§2) and apply it to the high-value keyword/punct positions surfaced in triage. 4. Advanced-SQL near-miss audit + fixes (§3), distinguishing parse from engine errors as they arise. 5. Catalog validator + anchor-phrase checks stay green (§4). 6. Update `requirements.md` H1a with the matrix as the done-marker; flip to `[x]` only when the matrix is complete and green. ## See also - ADR-0021 — Parser-as-source-of-truth for H1a (mechanism superseded; intent continued here). - ADR-0020 — Tokenization layer (superseded by the scannerless walker). - ADR-0024 — Unified grammar tree (the architecture H1a is built on). - ADR-0022 — Ambient typing assistance (shares the expected-set machinery). - ADR-0019 — Friendly-error layer and i18n catalog (§OOS-2 is the adjacent engine-error scope). - ADR-0009 — DSL command-syntax conventions (usage surface form). - `requirements.md` — H1a tracking entry.