Empirically re-checking ADR §3's advanced-SQL "gaps" reversed two of three — the code survey that produced the list was wrong: - INSERT…SELECT column-count: already handled (verdict=Error, "the column list names N column(s) but M value(s) are given"; insert_select_arity_mismatch_fires). - RETURNING scope: already handled (completion offers the table's columns; `returning <unknown>` → unknown_column diagnostic). The one genuine residual is fixed: `select … cross join b on …` rejected the ON with a bare "expected end of input". Add parse.cross_join_no_on — "a CROSS JOIN has no ON clause — it pairs every row; for a join condition use `JOIN … ON`, or filter with `WHERE`" — rendered when the failing token is `on` and the most recent consumed join is a CROSS join (a precise signature: every other join requires `on`, so `on` is expected there, not a failure). Render-only in format_walker_error; two misfire guards locked (plain join still asks for ON; a stray `on` with no join does not fire). ADR-0042 §3 corrected + Implementation-outcome records the advanced-SQL re-check and the user-confirmed low-priority residual (submit-time expression first-set at non-projection positions, where typing-time completion already offers the right candidates). Full suite green (lib 1578 / it 388 / typing_surface_matrix 192); clippy clean.
17 KiB
ADR-0042: H1a parse-error pedagogy in the grammar-tree era
Status
Accepted — 2026-06-03.
Continues H1a (requirements.md) from ADR-0021, whose
chumsky-based mechanism was superseded by ADR-0024 (unified
grammar tree). ADR-0021's intent — surface the grammar of the
command at the point of error, not just the next token — is
re-stated here against the architecture as actually built, with
an inventory of what already ships and a definition of done for
the remaining work.
Cross-references ADR-0019 (friendly-error layer + i18n catalog conventions; H1a output shares the catalog), ADR-0022 (ambient typing assistance, which shares the walker's expected-set machinery), ADR-0024 (the grammar tree), and ADR-0009 (DSL surface conventions; usage templates render in the documented surface form).
Context
Why a new ADR rather than amending ADR-0021
ADR-0021 specifies a UsageEntry registry in src/dsl/usage.rs,
parse.token.* catalog keys, and a renderer over chumsky's
RichPattern<Token> expected sets. None of that exists. ADR-0024
removed chumsky from the project, deleted usage.rs, and folded
usage information onto the grammar nodes themselves. Amending
ADR-0021 in place would force every reader to mentally translate a
dead mechanism; a fresh ADR records the live state directly.
ADR-0020 and ADR-0021 keep their superseding notes and remain as
institutional memory.
What H1a is
When a learner types something near-correct, the error should
name the missing keyword or clause and show the shape of the
command, rather than point a caret at the unexpected character.
The user-reported gap: typing create once produced
parse error: after \create`, expected `table`` — structurally
true, pedagogically silent.
What already ships (the baseline — do not re-build)
Verified against code on 2026-06-03. The grammar-tree migration delivered most of ADR-0021's intent through different machinery:
-
Per-command usage block. Every
CommandNodecarriesusage_ids: &'static [&'static str](src/dsl/grammar/mod.rs). On any parse error the renderer emits ausage:block listing every form of the matched command family — 38 templates underparse.usage.*(src/friendly/strings/en-US.yaml:499-571), resolved bygrammar::usage_keys_for_inputand rendered byrender_usage_block(src/app.rs:2560). -
Available-commands fallback. When no command keyword was consumed, the block becomes
available commands: …(parse.available_commands,en-US.yaml:493;app.rs:2593). -
Structural error names the consumed prefix and expected set.
format_walker_error(src/dsl/parser.rs:289) rendersafter \`, expected , found <token|end of input>, distinguishing incomplete-at-EOF (at_eof = true`, more input would help) from a definite mid-input mismatch. -
Friendly slot labels for identifiers.
format_expectation(src/dsl/parser.rs:262) rendersIdentslots by source — "table name", "column name", "relationship name", "index name", "type" — instead of a bare "identifier" (ADR-0022 stage 8c). -
Curated custom messages for high-value near-misses under
parse.custom.*(en-US.yaml:443-478):create_table_needs_pk,insert_form_a_missing_values("looks like Form A — addvalues (...)"),change_column_flags_exclusive,bind_type_mismatch, the redundant-constraint and alter-add-primary-key cases, etc. -
Schema-aware pre-flight diagnostics that light the
[ERR]validity indicator at typing time (ADR-0027 / ADR-0033 / ADR-0036): INSERT arity for Forms A/B/C, unknown table/column, type mismatch,= NULL, NOT-NULL-missing, and — on the advanced-SQL surface —cte_arity_mismatch,compound_arity_mismatch, andprojection_alias_misplaced(diagnostic.*,en-US.yaml:577-620; walker logic insrc/dsl/walker/mod.rs). -
Ambient "Next:" hints and the simple→advanced cross-mode pointer (ADR-0022 /
advanced_alternative_note,src/input_render.rs).
So H1a is substantially delivered at the intent level. The
handoff's two canonical examples already behave: insert into T ('Oli') → custom Form-A message; update T set x=1 → structural
"expected where or --all-rows" + usage block.
What remains — the genuine gap
The remaining work is systematic verification plus targeted polish, not a missing feature:
- No enumerated coverage guarantee. Coverage is curated case-by-case; nothing asserts that every required slot in every command produces a pedagogically-sound near-miss message.
- Literal expectations render terse.
Word/Literal/Punct/Flagslots come out as backticked literals (`where`,`=`,`--all-rows`). Correct, but a learner is helped more by a short prose gloss in select high-value positions. - Advanced-mode SQL parse pedagogy is thinner than the DSL
surface (RETURNING scope, CTE-arity diagnostic positioning,
CROSS JOIN … ON, INSERT…SELECT column-count). No other ADR or open issue covers this (ADR-0019 §OOS-2 covers advanced-SQL engine-error sanitisation — a different layer).
Decision
1. Definition of done — a verified near-miss matrix
H1a is "done" when there is a test matrix that, for every command in the REGISTRY, exercises its salient near-miss inputs and asserts the rendered output reads pedagogically. "Salient near-misses" per command means at minimum:
- the bare entry keyword alone (
create,add,update); - each required clause omitted (e.g.
update T set x=1with no filter rail;insert into T (cols)with novalues); - a wrong token where a specific slot is expected (e.g. a number where a table name belongs);
- the zero-prefix / unknown-command case (available-commands fallback).
The matrix lives in the existing surfaces — tests/typing_surface/
(snapshot-based, the standalone typing_surface_matrix binary) for
the typing-time hint/validity view, and
tests/it/parse_error_pedagogy.rs (the consolidated it binary)
for the submit-time rendered three-block output. New integration
tests go in tests/it/ per the handoff-57 §3 layout rule — not
as new top-level tests/*.rs.
Work is test-first: add the matrix entry, observe the current rendering, and only then adjust wording/labels where it reads poorly. A near-miss whose current rendering is already good is locked by a snapshot, not rewritten.
2. Friendlier literal expectation labels
format_expectation gains, for high-value keyword/punct positions,
an optional prose gloss while always keeping the exact literal
visible — a learner must still see the precise token to type.
The principle: a label may add role context, never replace the
literal.
Illustrative target (final wording settled per-case against the matrix, as is normal for pedagogical text):
expected \where` or `--all-rows`→expected a filter clause: `where …` or `--all-rows``expected \values`(after a Form-A column list) → already covered byparse.custom.insert_form_a_missing_values`; the matrix confirms it fires.
Mechanism (illustrative, finalised at implementation time): a
grammar Word/Punct node may carry an optional expectation-label
key, mirroring how Ident slots derive a label from
IdentSource. Absent an override, rendering is unchanged (the
backticked literal). This keeps the change additive and per-slot —
no blanket reword that would churn the anchor-phrase tests
needlessly.
New glosses are catalog-sourced (parse.expect.* or reuse of
parse.usage.* fragments — chosen at implementation time) so
wording stays in en-US.yaml, not in code, consistent with
ADR-0019.
3. Advanced-mode SQL parse pedagogy — in scope
The same matrix discipline (§1) extends to the advanced-mode SQL
surface. Two of the relevant arity diagnostics already exist and
must not be re-built — cte_arity_mismatch and
compound_arity_mismatch (en-US.yaml:590-591); for these the work
is matrix coverage and, for CTE, auditing whether the diagnostic is
positioned at the CTE name (easiest to fix) rather than the body.
The remaining items were re-checked empirically at implementation
time (2026-06-05) and most turned out already handled — see the
Implementation-outcome section's advanced-SQL paragraph for the
corrected picture. The : one-shot escape (a simple-mode line run
once in advanced mode) is part of the advanced surface and is
covered by the mode-aware usage threading (G3).
This stays clear of ADR-0019 §OOS-2 (advanced-SQL engine-error sanitisation): §OOS-2 reworks errors raised by executing SQL; H1a here concerns errors raised while parsing it. If a near-miss turns out to be an engine error rather than a parse error, it is out of H1a scope and noted against §OOS-2 instead.
4. Catalog and anchor-phrase discipline
All new or reworded user-facing strings go through the i18n catalog
(en-US.yaml) and the KEYS_AND_PLACEHOLDERS validator, per
ADR-0019. No engine vocabulary in any string (CLAUDE.md).
Two anchor styles constrain §2's glosses and both are preserved by its "literal always visible" rule:
- The substring assertions in
src/dsl/parser.rstests ("after…", "expected table name", "found end of input", "unknown type", "expected one of"). - The substring assertions in
tests/it/parse_error_pedagogy.rs, which check for backticked literals and usage fragments (e.g.`column`,`1`, "create table", "with pk"). This test is.contains()-based, not snapshot-based, so a §2 gloss that dropped the bare literal would fail it — which is precisely the regression §2's rule prevents.
The snapshot-based tests/typing_surface/ matrix will re-baseline
on any §2 wording change (expected; reviewed via cargo insta),
but the two substring suites above must stay green without edits to
their assertions.
Implementation outcome (2026-06-05)
The baseline capture (§Implementation notes step 1) triaged four
gaps; all four are fixed test-first, locked by the near-miss matrix
in tests/it/parse_error_pedagogy.rs:
-
G1 — the bare
1cardinality literal openingadd 1:n relationship …rendered cryptically. Render it as`1:n relationship`informat_expectation(error wording only; completion still offers the literal1). -
G2 — bare
selectdumped the 14-item expression first-set. Collapse it to "a projection:*, a column, or an expression" informat_walker_error, detected by thedistinct+allquantifier pair being jointly expectable — a signature unique to a projection start (empirically verified not to misfire atcount(,union,union all,select distinct, or mid-list). Render-only; the completion/hint layer still expands the full set. -
G3 — the usage block was mode-blind (
render_usage_blockresolved shared entry words to the first-registered Simple node).usage_key(s)_for_inputgain mode-aware_in_modevariants.Decision (user-confirmed, after the DA pass). In advanced mode the DSL forms remain valid input via fallback — verified:
create table Foo with pk,drop column from table T: c,drop relationship r,add column …all parse and dispatch in advanced mode. So the advanced usage block shows every form valid in the mode, mode-primary (SQL) first, then the DSL fallback forms — a usage hint must never hide input that works. (An initial implementation that showed SQL-only was flagged by the DA pass as hidingcreate table … with pk/drop column …and corrected.) Simple mode shows DSL forms only — the SQL-only forms hit the "this is SQL" rail and are unreachable. -
G4 —
withborrowedselect's usage; it gains its ownparse.usage.withCTE template.
Advanced-SQL pedagogy (§3) — empirical re-check (2026-06-05).
§3 (drafted from a code survey) listed RETURNING scope,
CROSS JOIN … ON, and INSERT…SELECT column-count as absences.
Verifying each against the running app reversed two of three:
- INSERT…SELECT column-count is already handled — a count
mismatch fires
verdict = Errorwith "the column list names N column(s) but M value(s) are given" (walker testinsert_select_arity_mismatch_fires). Not a gap. - RETURNING scope is already handled — at a bare
returningposition completion offers the table's columns;returning <unknown>fires theunknown_columndiagnostic. Not a gap. CROSS JOIN … ONwas a genuine residual: the grammar rejects theonbut the structural error said only "expected end of input". Fixed —parse.cross_join_no_onrenders "a CROSS JOIN has no ON clause — …" when the failing token isonand the most recent consumed join is a CROSS join (a precise signature: every other join requireson, so thereonis expected, not a failure). Render-only, no grammar change; two misfire guards (plain join still asks foron; a strayonwith no join does not fire). The CTE/compound arity diagnostics noted above remain present and correct.
Known low-priority residual (user-confirmed to defer). At
submit time, an incomplete expression position that is not a
SELECT projection (bare where , returning , having , set col=) still renders the raw ~14-item expression first-set; only the
SELECT projection is glossed (G2, keyed on the distinct+all
quantifier pair). This is low-impact because typing-time
completion already offers the correct candidates (columns,
functions, expression keywords) at those positions. Generalising the
gloss was considered and deferred — the payoff is small and a
broader render-side collapse adds misfire surface.
Coverage: the matrix covers, in both modes, every entry word's bare
/ missing-clause / wrong-token near-misses, the app-lifecycle
trailing-junk cases, and the committed multi-form variants
(add index / add constraint / add 1:n relationship, drop index / drop constraint / drop relationship, show table,
change column …, create index, alter table … add / … drop).
The committed forms were audited 2026-06-05 and each renders its own
form-specific missing-keyword message + usage (e.g. add index →
"expected on or as"; drop constraint → "expected not,
unique, default, or check"), regression-locked in
near_miss_matrix_committed_multiforms.
Out of scope
- Advanced-SQL engine-error sanitisation — ADR-0019 §OOS-2.
- Tab completion (I3) and syntax highlighting (I4) as features — they share the walker but are separate ADRs.
- Schema-aware "did you mean
Customers?" spell-correction — ADR-0021's out-of-scope §2; belongs with I3. - Multi-error reporting. The walker reports the first error and stops; unchanged.
messages-style verbosity gating of the usage block. Per ADR-0021 §8 the usage block is always shown; parse errors are exactly when pedagogical surface should be maximal. Unchanged.- Auto-generating usage/help text from the grammar. ADR-0024 left help prose hand-curated; templates stay hand-written.
Consequences
Positive
- H1a gains an explicit, enumerated definition of done instead of an open-ended "systematic pass still pending".
- The matrix becomes a regression lock: future grammar changes that degrade a near-miss message fail a snapshot.
- Literal-label glosses close the last terse-wording gap without a blanket reword.
- The advanced-SQL surface reaches parity with the DSL surface for the audience that has switched to raw SQL.
Costs
- Wording iteration across many near-miss cases — but cheap, catalog-driven, and snapshot-guarded.
- The §2 per-node label field is one more annotation a new command may set (optional; default unchanged).
- Snapshot volume grows; acceptable given the existing ~160-entry typing-surface matrix.
Neutral
- No public API change.
parse_command*signatures, theParseErrorshape, and the three-block render path are all unchanged; this ADR adds wording, labels, and tests within them.
Implementation notes
Order of operations (test-first throughout):
- Enumerate the per-command near-miss matrix (§1) as failing/asserting
tests in
tests/typing_surface/+tests/it/parse_error_pedagogy.rs. Capture current rendering as the starting baseline. - Triage: which entries read poorly? Only those get wording work.
- Add the optional expectation-label mechanism (§2) and apply it to the high-value keyword/punct positions surfaced in triage.
- Advanced-SQL near-miss audit + fixes (§3), distinguishing parse from engine errors as they arise.
- Catalog validator + anchor-phrase checks stay green (§4).
- Update
requirements.mdH1a with the matrix as the done-marker; flip to[x]only when the matrix is complete and green.
See also
- ADR-0021 — Parser-as-source-of-truth for H1a (mechanism superseded; intent continued here).
- ADR-0020 — Tokenization layer (superseded by the scannerless walker).
- ADR-0024 — Unified grammar tree (the architecture H1a is built on).
- ADR-0022 — Ambient typing assistance (shares the expected-set machinery).
- ADR-0019 — Friendly-error layer and i18n catalog (§OOS-2 is the adjacent engine-error scope).
- ADR-0009 — DSL command-syntax conventions (usage surface form).
requirements.md— H1a tracking entry.