diff --git a/docs/adr/0020-tokenization-layer-for-the-dsl-parser.md b/docs/adr/0020-tokenization-layer-for-the-dsl-parser.md
new file mode 100644
index 0000000..fab1700
--- /dev/null
+++ b/docs/adr/0020-tokenization-layer-for-the-dsl-parser.md
@@ -0,0 +1,638 @@
+# ADR-0020: Tokenization layer for the DSL parser
+
+## Status
+
+Accepted.
+
+Amends ADR-0001 (language and TUI framework) by adding a
+tokenization layer between the source string and the chumsky
+grammar. The chumsky choice itself is unchanged; what changes
+is the input to chumsky — `&[Token]` instead of `&str`.
+
+Foundation for ADR-0021 (parser-as-source-of-truth pedagogy /
+H1a). Also the load-bearing piece for I3 (tab completion) and
+I4 (syntax highlighting) once those land.
+
+## Context
+
+### What the user sees today
+
+Typing `create` at the prompt produces:
+
+```
+parse error: after `create`, expected `table`
+```
+
+Typing `frobulate Customers` produces:
+
+```
+parse error: expected `create`, found `frobulate`
+```
+
+Both are technically true and both are bad. The first hides
+that `create` is the entry to a single command (`create
+table …`) that the user almost certainly does not yet know
+the shape of. The second points at one of ten possible
+command-starting keywords and silently picks one. A user
+hitting either error gets a sharper "what comes next?" answer
+from a 1980s `lex`/`yacc` toolchain than from this app.
+
+### Two technical roots
+
+The bad UX traces to two parser-level decisions, both
+correctable but not without architectural change:
+
+1. **`keyword_ci` emits `Rich::custom` errors instead of
+   structural ones.** Chumsky's `choice` combinator merges the
+   `expected` sets of its alternatives when they all fail —
+   that is exactly what produces "expected `data` or `table`"
+   instead of just "expected `table`". Custom errors don't
+   participate in that merge: only one wins, deterministically
+   the first. So our top-level `choice((create_table, drop_*,
+   add_*, …))` collapses to whichever branch reported its
+   custom error first, throwing the others away.
+
+2. **The parser has no concept of a "command" beyond its
+   chumsky combinator graph.** There is no place to attach a
+   per-command grammar template (the "what does `create
+   table` actually want?" answer). Adding one is possible but
+   awkward without an explicit handle on "the entry token
+   for this command".
+
+ADR-0021 addresses (2). This ADR addresses (1) by removing
+the underlying cause: `keyword_ci`'s `try_map → Rich::custom`
+shape only exists because the parser operates on raw
+characters and has to hand-write keyword recognition. Over a
+token stream, a keyword is just a token, and `just(...)` over
+tokens aggregates naturally.
+
+### Bespoke machinery in `parser.rs`
+
+`parser.rs` carries ~180 lines of error-handling helpers
+(`humanise()`, `consumed_context()`, `oxford_or()`,
+`describe_pattern()`, `describe_char()`, `format_expected_found()`,
+`first_custom_message()`, the `prefer-custom-over-structural`
+selection in `into_parse_error()`). Most of it exists because
+chumsky-over-`&str` produces character-level error patterns
+that need humanising before a user can read them
+(`RichPattern::Token('e')` → "`e`" is not what the user
+needed to know).
+
+Over a token stream, the patterns are token kinds — `Keyword(Create)`,
+`Identifier`, `Punct(Colon)` — which render directly via the
+i18n catalog without per-character translation. Roughly half
+of that machinery dissolves.
+
+### Honest history
+
+ADR-0001 chose Rust + chumsky for the DSL. It did not name
+"no lexer" as a design choice — the no-lexer shape grew
+incrementally inside `parser.rs` without an ADR.
+
+The known requirements at the time included **H1a** (friendly
+error layer for parse errors), **I3** (tab completion), and
+**I4** (syntax highlighting). All three are easier with a
+token stream than without one — H1a needs aggregated expected
+sets, I3 needs "what tokens are valid at cursor", I4 needs
+token classification on potentially-invalid input. The
+lexer-shape question should have been considered against
+these requirements when the parser was built. It wasn't.
+
+The user has previously raised this — pointing out that
+`lex`/`yacc` already handled this kind of error reporting
+better, and asking why we weren't getting comparable
+behaviour. That observation was correct on the merits and
+should have triggered an ADR amendment then. ADR-0020 is that
+amendment, late but in the right place.
+
+This is not a "we now realise" framing. It is a "we should
+have decided this earlier and didn't" framing. The cost of
+acting on it now is real but bounded; the cost of acting on
+it after the query DSL or constraint-management commands have
+landed would be substantially larger. We have no users; the
+agility cost of refactoring is at its lowest.
+
+### What is and isn't this ADR
+
+This ADR is purely about the **input layer to the DSL
+parser**. It does not specify per-command usage rendering,
+catalog keys for parse-error wording, or the renderer
+composition of caret + structural error + usage hint. Those
+are ADR-0021's scope. The two ADRs share an implementation
+session; the dependency runs one way (ADR-0021 needs
+ADR-0020's tokens).
+
+This ADR also does not touch SQL parsing in advanced mode.
+That path uses `sqlparser-rs` (per ADR-0001) and has its own
+tokenization built in. A future ADR for the SQL subset (Q4)
+will decide whether to share the DSL lexer's token model or
+keep `sqlparser-rs`'s token surface as-is — that's not
+prejudged here.
+
+## Decision
+
+### 1. Two-phase parse: lex → parse
+
+```rust
+pub fn parse_command(input: &str) -> Result<Command, ParseError> {
+    let tokens = lex(input)?;          // Stage 1
+    parse_tokens(&tokens, input)       // Stage 2
+}
+```
+
+Stage 1 (`lex`) produces a span-tagged token stream. Stage 2
+(`parse_tokens`) is a chumsky parser whose input type is
+`&[Token]` instead of `&str`. The two stages are separately
+testable; the lexer has its own test surface that doesn't
+exercise the parser.
+
+`parse_tokens` takes the original `&str` as a second
+argument purely so the parser can consume bare path arguments
+for the `replay` command directly from source (see §6). All
+other parser logic operates over the token slice.
+
+### 2. Token model
+
+```rust
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct Token {
+    pub kind: TokenKind,
+    pub span: Span,           // (start, end) byte offsets
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum TokenKind {
+    Keyword(Keyword),         // case-folded reserved word
+    Identifier(String),       // case-preserving (ADR-0009)
+    Number(String),           // raw text; numeric parse is a parser concern
+    StringLiteral(String),    // unquoted, escapes processed
+    Punct(Punct),             // : ( ) , = . - one char each
+    Flag(String),             // --name (e.g. "--all-rows")
+    Error(LexError),          // unrecognised char / unterminated string
+                              //   — a token kind, not a Result variant
+}
+
+// Keyword is a closed set declared via a macro that is the
+// single source of truth (§2a). Punct follows the same pattern.
+```
+
+The `Keyword` and `Punct` enums are not hand-declared. They
+come out of macros described in §2a — one declaration site
+that generates the enum, the lex-side string→variant
+mapping, the variant→literal rendering, and the catalog-key
+derivation in one place.
+
+Notes on the model:
+
+- **Keywords are an enum, not a string.** This makes
+  `kw(Keyword::Create)` a single-token chumsky match with
+  exact identity — fastest path, cleanest error patterns,
+  no string allocation in the hot path.
+- **Type names stay as identifiers.** `int`, `text`,
+  `serial`, `varchar` all lex as `Identifier(_)`. The
+  parser-level `type_keyword()` continues to call
+  `Type::from_str` which produces the existing "unknown type
+  'varchar' (expected one of: text, int, real, …)"
+  message. That custom-error path is correct and stays. The
+  closed-set Keyword enum is for the *grammar's* reserved
+  words — words whose presence determines which command is
+  being parsed. Type names are *content*.
+- **Number is raw text, not parsed.** `Value::Number(String)`
+  per ADR-0014 stays string-backed; the lexer doesn't try to
+  validate or convert. `1`, `-3.14`, `1e10`, `1abc` all
+  produce candidates for the parser to decide on. (The
+  current parser rejects `1abc` already; that doesn't change.)
+- **`StringLiteral` is post-escape.** The lexer processes
+  `''` → `'` per the existing string syntax. The original
+  span covers the surrounding quotes; the payload is the
+  unescaped content. (Same convention as the current
+  `string_literal()` parser.)
+- **`Flag(String)` is `--name` exactly.** No further parsing.
+  The parser matches `Flag(s)` and checks `s == "all-rows"`
+  etc. against a small set.
+- **`Error(_)` is a token kind, not a `Result` variant.**
+  Lex always succeeds in producing `Vec<Token>` — even on
+  unterminated strings or unrecognised characters, a
+  diagnostic Error token is emitted in place. Rationale: I4
+  (syntax highlighting) needs to colour partial / invalid
+  input; treating lex errors as data instead of control flow
+  lets the highlighter walk a token stream uniformly. The
+  parser sees `Error(_)` tokens and raises a structural
+  error that names the underlying cause via the catalog.
+
+### 2a. Single source of truth for keywords and punctuation
+
+The `Keyword` enum, the lexer's reserved-word table, the
+variant→literal rendering, and the catalog-key derivation
+all come from one declaration. A `define_keywords!`
+`macro_rules!` invocation in `src/dsl/keyword.rs`:
+
+```rust
+macro_rules! define_keywords {
+    ( $( $variant:ident => $literal:literal ),+ $(,)? ) => {
+        #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
+        pub enum Keyword { $( $variant ),+ }
+
+        impl Keyword {
+            pub const ALL: &'static [(Keyword, &'static str)] = &[
+                $( (Keyword::$variant, $literal) ),+
+            ];
+
+            /// Lex-side mapping. Case-insensitive per ADR-0009.
+            pub fn from_word(s: &str) -> Option<Keyword> {
+                Self::ALL.iter()
+                    .find(|(_, lit)| s.eq_ignore_ascii_case(lit))
+                    .map(|(kw, _)| *kw)
+            }
+
+            /// Canonical lowercase literal for this variant.
+            pub fn as_str(self) -> &'static str {
+                Self::ALL.iter()
+                    .find(|(kw, _)| *kw == self)
+                    .map(|(_, lit)| *lit)
+                    .unwrap()
+            }
+
+            /// `parse.token.keyword.<lit>` — the catalog key
+            /// the renderer looks up for the expected-set
+            /// vocabulary (ADR-0021 §4).
+            pub fn catalog_token_key(self) -> String {
+                format!("parse.token.keyword.{}", self.as_str())
+            }
+        }
+    };
+}
+
+define_keywords! {
+    Create => "create",
+    Drop   => "drop",
+    Add    => "add",
+    // ... one line per keyword ...
+}
+```
+
+`Punct` follows the same pattern via `define_punct!` —
+Colon, OpenParen, CloseParen, Comma, Equals, Dot, each
+generated from one line of the invocation.
+
+Adding a new keyword is **one line** in the
+`define_keywords!` invocation, plus one line in the
+en-US YAML under `parse.token.keyword.<lit>` (the catalog
+validator catches a missing entry at test time per ADR-0021
+§7). Everything else — every parser combinator that
+references the keyword, every usage-registry entry, every
+catalog-key lookup — is a *use* of the source of truth, not
+a duplicate, and is compile-time-checked by the type system
+(typos in `kw(Keyword::Creaate)` don't compile).
+
+### 3. Span carriage
+
+Every `Token` carries `(start, end)` byte offsets in the
+original source string. The lexer is byte-exact for
+multi-byte UTF-8 sequences (ASCII-only is the realistic
+input but the lexer does not panic on Unicode in identifiers
+or string literals).
+
+`ParseError::Invalid` continues to carry `position: usize`,
+which is the `start` offset of the failing token (or the
+end-of-input position when the failure is at EOF). Caret
+rendering (in `app.rs`) is unchanged — same byte-position
+contract.
+
+### 4. Lexer-vs-parser error split
+
+The lexer is responsible for:
+
+- **Tokenization shape errors**: unterminated string literal,
+  unrecognised character. Both surface as `TokenKind::Error(_)`
+  tokens with span coverage of the offending region.
+- **Nothing else.** Specifically: type names, command
+  keywords, value validity, flag names, numeric overflow,
+  identifier collisions are all parser- or higher-layer
+  concerns.
+
+The parser is responsible for:
+
+- **Grammar-shape errors**: missing tokens, wrong tokens,
+  unexpected tokens, end-of-input mid-command. These come
+  out of chumsky's structural error machinery and aggregate
+  across `choice` naturally.
+- **Content errors via `try_map`**: unknown type name
+  ("unknown type 'varchar'"), mutually-exclusive flags
+  ("`--force-conversion` and `--dont-convert` are mutually
+  exclusive"), "with pk needs at least one column",
+  referential-clause repeated. These keep their hand-written
+  messages — the `Rich::custom` path is correct for content,
+  it was only wrong for keyword-shape.
+
+The render layer (humanise) handles both uniformly via the
+catalog (ADR-0021).
+
+### 5. The chumsky-over-tokens combinator surface
+
+Chumsky 0.13 parses arbitrary input types, not just `&str`.
+The combinator surface for `Parser<'a, &'a [Token], …>` is
+the same as for `&str`: `just`, `choice`, `then`,
+`then_ignore`, `ignore_then`, `or_not`, `repeated`,
+`separated_by`, `try_map`, `labelled`. The only differences:
+
+- `just(Token { kind: Keyword::Create, .. })` doesn't work
+  literally because spans differ. We define helpers
+  `kw(Keyword::Create)`, `punct(Punct::Colon)`, `ident()`,
+  `number()`, `string_lit()`, `flag(name)` that match by
+  kind and ignore span on the input side, returning either
+  `()` or the carried payload.
+- `padded()` (which strips whitespace) is no longer needed —
+  the lexer skips whitespace already. This simplifies a lot
+  of combinator chains (`just(':').padded()` → `punct(Punct::Colon)`).
+- `text::keyword(...)`, `any().filter(...)`, etc. — all the
+  character-level helpers — are gone from the parser. They
+  belong to the lexer if anywhere.
+
+The parser keeps producing `Rich<Token>` errors. The
+`expected` set members are `RichPattern<Token>`; the
+catalog's `parse.token.*` keys (ADR-0021 §4) translate them.
+
+### 6. The `replay` path-argument wart
+
+The current grammar admits bare paths: `replay history.log`,
+`replay /tmp/seed.commands`. Bare paths contain `.` and `/`,
+which are punctuation tokens. A naive token-based parser
+would see `Keyword(Replay) Identifier("history") Punct(Dot)
+Identifier("log")` and have to reassemble — annoyingly
+context-dependent.
+
+Decision: keep the existing bare-path UX. The parser
+special-cases the `replay` command at one point: after
+matching `Keyword(Replay)`, it consumes its argument
+**directly from the original source string** by reading from
+the next token's start byte to end-of-input (or to the next
+unambiguous terminator, of which today there is none in the
+DSL). The lexer's job for the path argument is essentially
+to identify *where the path starts*; the parser does the
+rest by source-slicing.
+
+The quoted-path form (`replay 'my project/seed.commands'`)
+still goes through the lexer's normal `StringLiteral` path
+and is matched as `Keyword(Replay) StringLiteral(s)`.
+
+This special-casing is documented inline in the parser. It
+costs ~10 lines of code and avoids a breaking change to the
+DSL surface. No other DSL command takes a free-form
+filesystem-path argument; if a future command does, it
+either uses the same special-case or accepts only quoted
+paths.
+
+### 7. Whitespace
+
+The lexer skips ASCII whitespace (` \t\r\n`) between tokens.
+This honours ADR-0009's "whitespace is liberal" rule
+unchanged — the user can put any amount of whitespace
+between any two tokens, and the parser doesn't see it.
+
+The lexer does **not** track whitespace as tokens. I4 (syntax
+highlighting) recovers whitespace as the gaps between token
+spans, computed at render time.
+
+### 8. Comments
+
+The DSL has no comment syntax today and this ADR doesn't
+introduce one. If a comment syntax is added later (likely
+`--` line comments to match SQL, but that conflicts with
+flag prefixes — this is a real design decision a future ADR
+will need to settle), the lexer is the right layer to skip
+them. Out of scope here.
+
+### 9. I3 (tab completion) hook
+
+This ADR commits to making the parser's expected-token-set
+queryable at any point in the input. Concretely:
+
+- `parse_tokens(&[Token], &str)` returns
+  `(Result<Command, ParseError>, ParseDiagnostics)` where
+  `ParseDiagnostics` carries the raw chumsky output —
+  including the merged `expected` set at the failure point.
+- Truncating the token stream at the cursor and re-parsing
+  yields the expected-token-set at the cursor position. I3
+  uses this for completion.
+- The lexer is independently useful to I3: tokenizing the
+  in-progress input lets the completion logic see what
+  *partial* token the cursor is in (mid-keyword, mid-identifier,
+  mid-string).
+
+The full I3 surface (cursor positioning, completion menu,
+disambiguation, completion of identifiers from schema) is
+out of scope here. ADR-0020 commits only to the parser
+contract.
+
+### 10. I4 (syntax highlighting) hook
+
+The token kinds are the natural classification for syntax
+highlighting:
+
+- `Keyword(_)` → keyword colour.
+- `Identifier(_)` → identifier colour.
+- `Number(_)`, `StringLiteral(_)` → literal colours
+  (probably distinct from each other).
+- `Punct(_)` → punctuation colour (or no special colouring).
+- `Flag(_)` → flag colour.
+- `Error(_)` → error highlight (red underline / squiggle).
+
+The lexer succeeds on partial / invalid input (§2). I4 does
+not need a successful parse to render highlighting — only
+successful tokenization. This is the load-bearing property
+for "live" highlighting as the user types.
+
+The colour scheme, theme integration, render layer, and
+performance budget are out of scope here. ADR-0020 commits
+only to "the lexer always produces a token stream over which
+highlighting can iterate".
+
+### 11. Recovery-based partial AST — explicitly deferred
+
+Chumsky supports parser combinators with recovery, producing
+partial ASTs alongside errors. Useful for I3 ("the user has
+typed `create table Foo with pk` and now I want to know
+what the partial AST is so I can suggest column-spec
+completions") but the design space is large (which recovery
+strategies, what the partial AST shape is, how downstream
+consumers handle it).
+
+This ADR keeps the parser non-recovering: a failed parse
+returns `Err(ParseError)` and no partial AST. I3's ADR will
+decide whether recovery is needed; if so, the change is
+local to `parse_tokens` and doesn't ripple.
+
+### 12. Migration of existing parser tests
+
+The 50+ unit tests in `dsl/parser.rs::tests` are the spec of
+the current grammar. The migration is mechanical:
+
+- Tests that call `parse_command(input)` keep doing so —
+  `parse_command` is the public boundary and its signature
+  doesn't change.
+- Tests that assert on `ParseError::Invalid { message, ..  }`
+  may need wording updates if the new error layer rewords
+  them, but anchor substrings ("unknown type", "specified
+  twice", "mutually exclusive", "varchar", "expected one
+  of") stay intact (those come from `try_map` content errors
+  that survive unchanged).
+- Two existing tests assert on chumsky's structural wording:
+  `structural_error_for_show_data_without_arg`
+  ("after `show data`", "expected identifier", "found end of
+  input") and `structural_error_for_change_column_with_swapped_args`
+  ("after `change column Rich`", "expected `:`"). The new
+  rendering preserves the same shape — `after <prefix>,
+  expected <set>, found <token>` — so these tests should
+  port with at most minor adjustments. ADR-0021 specifies
+  the rendering precisely.
+
+A new test surface for the lexer itself: lex output for
+representative inputs, lex error tokens for unterminated
+strings + unknown chars, span correctness.
+
+## Out of scope
+
+Deliberately deferred to keep this ADR focused:
+
+1. **Per-command usage templates.** ADR-0021.
+2. **`parse.usage.*` and `parse.token.*` catalog keys.**
+   ADR-0021.
+3. **Error renderer composition** (caret + structural error
+   + usage hint). ADR-0021.
+4. **I3 completion UI** + cursor logic + identifier
+   completion from schema. Future I3 ADR.
+5. **I4 colour scheme** + theme integration. Future I4 ADR.
+6. **Recovery-based partial AST.** §11 — re-opens with I3.
+7. **Comment syntax.** §8.
+8. **Sharing the lexer with `sqlparser-rs`** (advanced-mode
+   SQL). The two parsers stay separate today; a future SQL
+   subset ADR may revisit.
+
+## Consequences
+
+### Positive
+
+- **Aggregation works.** Top-level `choice((create_*, drop_*,
+  add_*, …))` failures emit "expected `create`, `drop`,
+  `add`, …" structurally. The user sees the family of
+  available commands rather than one branch's report.
+- **The bespoke `humanise()` machinery shrinks.** Roughly
+  half the helpers in `parser.rs` (~80-100 lines) are no
+  longer needed because token-level error patterns render
+  directly via the catalog. Less code is less code to
+  maintain.
+- **I3 / I4 inherit a clean foundation.** Their ADRs can
+  focus on UX/UI rather than re-litigating parser shape.
+- **Lex errors and parse errors share a render path.**
+  Unterminated strings and missing keywords both surface
+  through the same catalog-driven layer.
+- **The token stream is a natural API for future tooling**:
+  schema-aware highlighting, structural editing, command
+  history rendering with token-level colour.
+
+### Costs
+
+- **One-time migration of `dsl/parser.rs`.** Every combinator
+  rewrites against `&[Token]`. Estimated 600-900 lines of
+  parser.rs change including the lexer module; the lexer
+  itself is probably 200-300 lines plus 150-250 lines of
+  unit tests.
+- **Catalog grows by one entry per keyword and per punct.**
+  The macro-driven `Keyword` / `Punct` enums (§2a) collapse
+  the enum + lex table + as-str + catalog-key derivation into
+  one declaration site, so adding a keyword is one line of
+  Rust + one line of YAML. The catalog validator enforces
+  completeness at test time (ADR-0021 §7). Compared to today
+  — where adding a keyword means a new `keyword_ci("...")`
+  call site (or several, if used in multiple commands) — the
+  per-keyword cost is comparable; what shifts is *where* the
+  edit happens. A unit test additionally asserts every
+  `Keyword::ALL` variant is referenced by some parser
+  combinator, catching dead enum entries.
+- **Span model needs care for UTF-8.** Byte offsets are the
+  contract; the lexer must split tokens at UTF-8 boundaries.
+  Identifier/string tests should include at least one
+  multi-byte case.
+- **`replay` special-case** in the parser (§6). One command,
+  ~10 lines, documented inline. Acceptable; not a precedent
+  for other commands.
+- **Tests churn.** Two structural-error tests need new wording
+  (mechanical port). Existing content-error tests stay as-is.
+
+### Neutral
+
+- **chumsky stays.** No framework change, no new dependency.
+  The parser still expresses the grammar declaratively; only
+  the input atoms change.
+- **`Command` AST is unchanged.** The parser produces the
+  same `Command` enum it does today; downstream code (runtime,
+  app, db) is untouched.
+- **Public API of `dsl::parser` is unchanged.**
+  `parse_command(&str) → Result<Command, ParseError>` keeps
+  its signature. ADR-0021 may extend the public surface (e.g.
+  expose `lex()` and `parse_tokens()` for I3/I4) but does so
+  additively.
+
+## Implementation notes
+
+These are sketch-level — implementation will produce more
+detailed work, but they're enough that a session picking
+this up has direction.
+
+### Order of operations
+
+1. **Lexer module.** New file `src/dsl/lexer.rs`. Token /
+   TokenKind / Keyword / Punct types. `lex(input: &str) →
+   Vec<Token>` (always succeeds; embeds `Error(_)` tokens
+   for shape errors). Unit tests for representative inputs,
+   span correctness, error-token cases, multi-byte UTF-8.
+2. **Token-aware combinator helpers.** Probably in
+   `src/dsl/parser.rs` next to the existing combinators.
+   `kw(Keyword)`, `punct(Punct)`, `ident()`, `number()`,
+   `string_lit()`, `flag(&str)`, `eof()`. Each parses a
+   single token by kind and returns its payload (or `()`).
+3. **Rewrite `command_parser()` and its sub-parsers.** One
+   sub-parser at a time; run the existing tests after each.
+   Aim for green-after-every-step rather than a big-bang
+   port.
+4. **The `replay` source-slice special case** (§6).
+5. **Strip `humanise()` machinery** that's no longer needed.
+   The render path in `into_parse_error()` simplifies.
+   ADR-0021 owns the new render shape; until it lands, keep
+   a minimal humaniser that produces the existing wording.
+6. **Public API check.** `parse_command` signature unchanged.
+   Add `lex(&str) → Vec<Token>` and `parse_tokens(&[Token],
+   &str) → Result<Command, ParseError>` as `pub` so I3/I4
+   can hook in later (their ADRs will use these).
+
+### Things that interact subtly
+
+- **The `try_map` content errors** survive unchanged. They
+  fire on tokens (e.g. `try_map` on a parsed `Identifier(s)`
+  that's expected to be a type name) but their messages and
+  classification are identical to today. The catalog
+  vocabulary they use ("unknown type", "expected one of",
+  "mutually exclusive", "specified twice") stays.
+- **The `1:n` cardinality** lexes as `Number("1")`,
+  `Punct(Colon)`, `Identifier("n"|"N")`. The parser composes
+  these into the relationship-cardinality assertion as
+  today; no special token kind for it.
+- **Negative number literals.** Lexer emits `Punct(Minus)`?
+  No — there is no Minus in the punct set above because the
+  current grammar has no need for a unary `-` outside number
+  literals. Decision: the lexer recognises `-` as part of a
+  number literal when followed by an ASCII digit, producing a
+  single `Number("-5")` token. A bare `-` not followed by a
+  digit is a `Punct(Minus)` only if a `Minus` variant is
+  added — for now, treat as `Error(UnknownChar)`. This
+  matches the current grammar (which only accepts `-` as a
+  number sign).
+- **The `--` flag prefix vs. a future `--` line comment.**
+  Today `--all-rows` etc. are flags. If a future ADR
+  introduces SQL-style `--` line comments in advanced mode
+  only, the lexer may need a mode parameter. ADR-0020 doesn't
+  pre-empt that.
+- **The hard-coded `"running: "` prefix in `app.rs`** for
+  caret padding: unchanged. The parser still reports a
+  byte-position; the caret math is the same.
diff --git a/docs/adr/0021-parser-as-source-of-truth-for-h1a.md b/docs/adr/0021-parser-as-source-of-truth-for-h1a.md
new file mode 100644
index 0000000..244f7f4
--- /dev/null
+++ b/docs/adr/0021-parser-as-source-of-truth-for-h1a.md
@@ -0,0 +1,452 @@
+# ADR-0021: Parser-as-source-of-truth for H1a (per-command usage in parse errors)
+
+## Status
+
+Accepted.
+
+Builds on ADR-0020 (tokenization layer). Addresses H1a from
+`requirements.md` — the parse-error pedagogy gap that
+ADR-0019's friendly-error layer left untouched.
+
+Cross-references ADR-0019 (i18n catalog conventions; H1a's
+output goes through the same catalog) and ADR-0009 (DSL
+syntax conventions; usage templates render in the project's
+documented surface form).
+
+## Context
+
+ADR-0019 dramatically improved engine-error wording.
+Parse-error wording is now the visibly-weakest user surface —
+the user-reported gap was concrete: typing `create` produces
+
+```
+parse error: after `create`, expected `table`
+```
+
+The error is *structurally* correct (chumsky has consumed
+`create` and is now looking for the next required token) but
+*pedagogically* silent. A learner who got this far typed
+`create` because they'd been told that's how new tables are
+made; what they need next is the shape of the command, not
+a single missing-token pointer.
+
+Comparable observations apply across the whole DSL surface:
+
+- `add`  → expected `column` or `1` (uninformative; user
+  needs the shape of `add column …` AND `add 1:n
+  relationship …`).
+- `update Customers`  → expected `set` (true; but `update`'s
+  full grammar with `set …`, `where …`, `--all-rows` is what
+  the user wants illustrated).
+- `frobulate Customers`  → expected one of `create`, `drop`,
+  `add`, `rename`, `change`, `show`, `insert`, `update`,
+  `delete`, `replay` (true after ADR-0020; the available-commands
+  list is now informative, but the no-prefix case wants its
+  own framing — "available commands" rather than "expected").
+
+H1a's job is to close that gap by surfacing the **grammar**
+of the command at the point of error, not just the next
+token.
+
+### What ADR-0020 supplies
+
+ADR-0020 lands the lexer + parser-over-tokens architecture.
+What that buys H1a:
+
+- **Aggregated `expected` sets at the failure point** (top-level
+  `choice` failures now list every command-starting keyword,
+  not just one). The user-visible "available commands" list
+  becomes correct without any work in this ADR.
+- **Token-kind error patterns** (`RichPattern<Token>` instead
+  of `RichPattern<char>`). Each pattern renders via a stable
+  catalog key — no per-character humanising.
+- **A canonical entry-token for each command** (the first
+  `Keyword(_)` consumed). H1a keys per-command usage
+  templates off this token.
+
+### What this ADR adds on top
+
+- A registry of per-command **usage templates** (one
+  declaration per command).
+- A renderer that composes the parse error with: caret +
+  structural error wording + matching usage template(s).
+- New catalog keys under `parse.usage.*` (templates) and
+  `parse.token.*` (single-token rendering for expected-set
+  joins).
+- A "no commands consumed" fallback that renders an
+  available-commands list under a different prefix
+  ("available commands:" rather than "expected:") for the
+  zero-prefix case.
+
+## Decision
+
+### 1. Per-command usage template registry
+
+Each command parser is paired with a `UsageEntry`:
+
+```rust
+pub struct UsageEntry {
+    /// First keyword that distinguishes this command. Used
+    /// as the registry key.
+    pub entry: Keyword,
+    /// Catalog key for the grammar template body (under
+    /// `parse.usage.*`). One key per command.
+    pub catalog_key: &'static str,
+}
+```
+
+The registry is a `&'static [UsageEntry]` declared in one
+place (`src/dsl/usage.rs`). Lookup: given a consumed entry
+keyword, return all entries whose `entry == keyword`. For
+`Keyword::Add` the registry returns the `add column` and
+`add 1:n relationship` entries; for `Keyword::Drop` it
+returns `drop table`, `drop column`, `drop relationship`;
+for unique-entry keywords (e.g. `Keyword::Create` today) it
+returns one.
+
+The catalog key is what gets translated. Template bodies
+live in `src/friendly/strings/en-US.yaml` under
+`parse.usage.*`:
+
+```yaml
+parse:
+  usage:
+    create_table: "create table <Name> with pk [<col>:<type>[, ...]]"
+    drop_table:   "drop table <Name>"
+    add_column:   "add column [to] [table] <Table>: <Name> (<Type>)"
+    add_relationship: |
+      add 1:n relationship [as <Name>]
+        from <Parent>.<col> to <Child>.<col>
+        [on delete <action>] [on update <action>]
+        [--create-fk]
+    rename_column: "rename column [in] [table] <Table>: <Old> to <New>"
+    change_column: |
+      change column [in] [table] <Table>: <Name> (<Type>)
+        [--force-conversion | --dont-convert]
+    show_data:    "show data <Table>"
+    show_table:   "show table <Table>"
+    insert:       "insert into <Table> [(<col>[, ...])] [values] (<value>[, ...])"
+    update:       "update <Table> set <col>=<value>[, ...] (where <col>=<value> | --all-rows)"
+    delete:       "delete from <Table> (where <col>=<value> | --all-rows)"
+    drop_column:  "drop column [from] [table] <Table>: <Name>"
+    drop_relationship: |
+      drop relationship <Name>
+      drop relationship from <Parent>.<col> to <Child>.<col>
+    replay:       "replay <path>  |  replay '<path with spaces>'"
+```
+
+(Wording is illustrative; exact phrasing settled at
+implementation time. The bracket convention `[...]` for
+optional parts and angle-bracket `<...>` for placeholders
+matches ADR-0009's documentation surface.)
+
+### 2. The renderer composes three blocks
+
+A parse error renders as:
+
+```
+running: <user input>
+        ^               ← caret (existing, unchanged)
+parse error: <structural-or-content message>
+usage: <template1>
+       <template2>      ← when multiple entries share the entry keyword
+```
+
+Block 1 (the echo + caret) is unchanged from today.
+
+Block 2 is the structural or content error. ADR-0020
+guarantees the structural error is now properly aggregated
+("expected `data` or `table`" not "expected `table`"). The
+content errors (unknown type, mutually-exclusive flags) are
+unchanged in voice.
+
+Block 3 (usage:) is new. It is rendered if and only if **at
+least one keyword token was consumed** before the parser
+failed AND that keyword is a registered entry. If no keyword
+was consumed (e.g., `frobulate Customers`, where `frobulate`
+is an `Identifier`, not a `Keyword`), Block 3 is replaced
+with the no-prefix fallback (§5).
+
+If multiple entries match (e.g., the `add` family), all are
+listed under a single `usage:` prefix, one per line.
+
+### 3. Identifying the consumed entry keyword
+
+The parser surfaces, alongside the `ParseError`, the
+**deepest successfully-consumed keyword token**. Mechanism:
+
+- `parse_tokens` returns `(Result<Command, ParseError>,
+  ParseDiagnostics)` where `ParseDiagnostics` carries the
+  furthest position chumsky reached AND a snapshot of the
+  consumed prefix.
+- The renderer walks the consumed prefix backward to find the
+  first `Keyword(_)` token. (Almost always the first token,
+  but a future grammar where a command starts with a
+  literal — none today — would still resolve correctly.)
+
+This logic lives in `src/dsl/usage.rs::matched_entry()` so
+the registry and the lookup sit together.
+
+### 4. `parse.token.*` — single-token catalog vocabulary
+
+Chumsky's expected-set rendering needs a name for each token
+kind. Today `humanise()` hand-codes these
+(`describe_pattern` returns "`create`", "identifier", etc.).
+ADR-0021 moves the vocabulary into the catalog:
+
+```yaml
+parse:
+  token:
+    # Keywords — one entry per Keyword enum variant.
+    keyword.create: "`create`"
+    keyword.table:  "`table`"
+    keyword.with:   "`with`"
+    # ... one per Keyword variant ...
+
+    # Punctuation.
+    punct.colon:        "`:`"
+    punct.open_paren:   "`(`"
+    punct.close_paren:  "`)`"
+    punct.comma:        "`,`"
+    punct.equals:       "`=`"
+    punct.dot:          "`.`"
+
+    # Token-class labels.
+    identifier:     "identifier"
+    number:         "number"
+    string_literal: "string literal"
+    flag:           "flag (--name)"
+    end_of_input:   "end of input"
+
+    # Lexer-error tokens.
+    error.unterminated_string: "unterminated string starting at column {column}"
+    error.unknown_char:        "unrecognised character {found}"
+```
+
+Joining ("`a`, `b`, or `c`") stays in code (`oxford_or` from
+the current humanise machinery, lifted intact). Wording of
+each token is in the catalog.
+
+`parse.error` (existing wrapper key) stays. Its `{detail}`
+placeholder is filled by:
+
+```
+{consumed_prefix} expected {oxford_or(expected)}, found {found_token}
+```
+
+— each piece sourced from the catalog, joined in code.
+
+`parse.caret` (existing) and `parse.empty` (existing)
+unchanged.
+
+### 5. No-prefix fallback: "available commands"
+
+When the parser fails with **no keyword consumed**, the
+"expected" set lists every top-level command-starting
+keyword. That's correct but the framing should be
+"available commands" rather than "expected".
+
+Renderer detects this case (consumed-keyword count == 0) and
+substitutes Block 3 with:
+
+```
+available commands: create, drop, add, rename, change,
+                    show, insert, update, delete, replay
+```
+
+via a new catalog key:
+
+```yaml
+parse:
+  available_commands: "available commands: {commands}"
+```
+
+The list is the alphabetised set of `entry` keywords from
+the usage registry, each rendered via its `parse.token.keyword.*`
+catalog entry (so the strings are catalog-sourced, not
+hard-coded).
+
+This case only fires when the user typed something the
+parser couldn't classify as any known command keyword — the
+"frobulate Customers" case. It's both rarer and more useful
+than the with-prefix case: a user this lost benefits more
+from the full menu than from a missing-token pointer.
+
+### 6. Anchor-phrase compliance (ADR-0019 §10)
+
+ADR-0019's anchor-phrase list contains nine substrings the
+catalog commits to keeping stable. None are parse-error-specific,
+so this ADR doesn't add to the list. The existing parser
+test that asserts on "unknown type" and "expected one of"
+substrings stays — those come from `Type::from_str`'s custom
+error message which ADR-0020 §4 keeps unchanged.
+
+The current structural-error tests assert on substrings like
+"after `show data`", "expected identifier", "found end of
+input", "after `change column Rich`", "expected `:`". The
+new render shape preserves all of these — the rendering
+template is `{prefix} expected {set}, found {token}` and
+the prefix / set / token come from the catalog with the same
+wording. Tests should port unchanged or with at most minor
+adjustments.
+
+### 7. Catalog validator covers the new keys
+
+ADR-0019 §8.6's `KEYS_AND_PLACEHOLDERS` validator extends
+to cover:
+
+- Every `parse.usage.<command>` key referenced from the
+  registry exists.
+- Every `parse.token.keyword.<variant>` key for every
+  `Keyword` enum variant exists.
+- Every `parse.token.punct.<variant>` key for every `Punct`
+  variant exists.
+- The `parse.token.{identifier, number, string_literal,
+  flag, end_of_input}` keys exist.
+- The `parse.token.error.*` keys exist for every
+  `LexErrorKind` variant.
+- The `parse.available_commands` key exists.
+- No format specifiers (already enforced).
+- No engine vocabulary (already enforced).
+
+### 8. The `usage:` block respects the verbosity setting?
+
+No. The `messages (short|verbose)` setting (ADR-0019)
+governs *engine-error* verbosity (whether to render the
+hint block of a `FriendlyError`). Parse errors don't go
+through `FriendlyError`; they have their own render path,
+and the usage block is always shown. Rationale: a learner
+toggling to `messages short` is signalling they recognise
+the engine-error patterns and want less explanation around
+those — they're not signalling that they want less
+parse-help. Parse errors mean the user couldn't even
+formulate a runnable command; that's exactly the moment to
+maximise pedagogical surface, regardless of the
+engine-error verbosity preference.
+
+If experience shows this is wrong, a future amendment can
+gate the usage block on a separate setting. Doesn't need
+to be designed now.
+
+## Out of scope
+
+1. **Tab completion (I3) and syntax highlighting (I4)**
+   themselves. ADR-0020 §9-10 commits to the parser
+   contract; ADR-0021 doesn't extend it.
+2. **Schema-aware suggestions** ("did you mean `Customers`?"
+   when the user typed `Customrs`). Useful but a separate
+   feature; would land in I3 territory (completion + spell
+   check share a candidate list).
+3. **Suggested fixes** ("change `crete` to `create`"). Same
+   bucket as schema-aware suggestions.
+4. **Multi-error reporting.** Today and after this ADR, the
+   parser reports the first error and stops. Recovery-based
+   multi-error parsing is out of scope and re-opens with
+   I3's ADR (ADR-0020 §11).
+5. **Persisting the verbosity setting** (which doesn't
+   affect parse errors anyway, per §8). ADR-0019 deferred it
+   to a future settings ADR.
+
+## Consequences
+
+### Positive
+
+- **Per-command usage at point of failure.** A learner who
+  types `create` sees the full `create table` grammar
+  instead of "expected `table`". The user-reported gap
+  closes.
+- **Aggregated `available commands` for cold starts.**
+  `frobulate Customers` now lists the ten command-starting
+  keywords under a sensible framing.
+- **Vocabulary lives in the catalog, not in code.** Renaming
+  a keyword's user-facing wording is one YAML edit. Adding
+  a new keyword adds two lines (registry + token-name key);
+  the validator catches both if forgotten.
+- **The render path simplifies.** `humanise()` shrinks to a
+  small composer over catalog lookups — no per-character
+  description, no `RichPattern` walking, no
+  prefer-custom-over-structural switching (the latter
+  becomes "render the structural error and append the usage
+  template").
+- **Composes with ADR-0019's `FriendlyError`.** Engine
+  errors and parse errors are rendered through different
+  paths but both go through the catalog, so vocabulary
+  drift between them is impossible.
+
+### Costs
+
+- **A second registry to keep in sync** with the parser. The
+  validator (§7) catches missing usage entries / missing
+  token keys at test time, but adding a new command means
+  three steps (parser combinator, usage-registry entry,
+  catalog YAML edit). Mitigation: a unit test asserts every
+  command in the parser has a registry entry (catches
+  forgotten entries; matches the friendly-module pattern).
+- **Catalog grows by ~30-40 entries** (one usage template
+  per command, one keyword name per `Keyword` variant, a
+  handful of token-class names, a handful of error names).
+  Each entry is one line of YAML; total catalog grows from
+  ~170 entries to ~210. Within budget.
+- **Wording iteration** on the usage templates will probably
+  happen post-merge. This is normal for pedagogical text
+  and the catalog makes it cheap.
+
+### Neutral
+
+- **Public parser API is unchanged.** `parse_command(&str)`
+  signature stable. The new `lex` and `parse_tokens`
+  functions exposed by ADR-0020 are the I3/I4 hook;
+  ADR-0021 doesn't add to that surface.
+- **`AppEvent` shape unchanged.** Parse errors continue to
+  flow through `dispatch_dsl`'s existing path (push echo,
+  push caret, push error). This ADR's render changes are
+  internal to that function plus the `t!()` calls inside it.
+
+## Implementation notes
+
+### Order of operations (within the joint ADR-0020 + ADR-0021 implementation session)
+
+1. Land ADR-0020 (lexer + parser refactor + minimal
+   humaniser).
+2. Add `src/dsl/usage.rs` with the registry struct, the
+   static table, and `matched_entry()`.
+3. Populate `parse.usage.*` and `parse.token.*` catalog
+   sections.
+4. Extend `friendly::keys::KEYS_AND_PLACEHOLDERS` with the
+   new keys.
+5. Rewrite `dispatch_dsl`'s error-render arm in `app.rs` to
+   compose the three blocks per §2 (or §5 fallback).
+6. Add tests:
+   - Unit: every registered usage entry resolves through the
+     catalog. Every `Keyword` variant has a `parse.token.keyword.*`
+     entry.
+   - Integration (`tests/parse_error_pedagogy.rs`, new):
+     `create`, `add`, `update Customers`, `frobulate
+     Customers`, `create table` (no PK clause), `insert into
+     T` (no values), each producing the expected
+     three-block output.
+7. Update or port the two existing structural-error tests
+   in `parser.rs::tests` to the new render shape.
+
+### Things that interact subtly
+
+- **The "deepest consumed keyword" mechanism** (§3) walks
+  the prefix once per parse failure. Cheap; no perf concern.
+  But it must not pick up keywords from inside content that
+  is itself part of a partial AST (e.g. an identifier the
+  user is typing that happens to be the first letters of a
+  keyword); since the lexer commits to identifier-vs-keyword
+  classification before the parser sees tokens, this isn't a
+  real risk. Documented inline.
+- **Multiple usage entries per `add` / `drop`** are rendered
+  under one `usage:` prefix per §2. This is one of the
+  pedagogically-best parts of the change: the user gets the
+  full family rather than guessing which sibling they
+  wanted.
+- **`replay`'s special-case parsing** (ADR-0020 §6) is
+  invisible to the usage layer. The user typing `replay`
+  with no path gets the `parse.usage.replay` template.
+- **`messages` is an app-level command, not a DSL command**,
+  so it is not in the parser registry and doesn't appear in
+  `available commands:`. Same posture as `mode`, `help`,
+  `quit`. Documented in the registry's prelude.
diff --git a/docs/adr/README.md b/docs/adr/README.md
index 48cdcca..5b03a68 100644
--- a/docs/adr/README.md
+++ b/docs/adr/README.md
@@ -25,3 +25,5 @@ This directory contains the project's ADRs, recorded per
 - [ADR-0017 — Column type-change compatibility](0017-column-type-change-compatibility.md)
 - [ADR-0018 — Auto-fill contracts for `serial` and `shortid` columns](0018-auto-fill-contracts-for-serial-and-shortid.md)
 - [ADR-0019 — Friendly error layer (H1) and i18n message catalog](0019-friendly-error-layer-and-i18n.md)
+- [ADR-0020 — Tokenization layer for the DSL parser](0020-tokenization-layer-for-the-dsl-parser.md)
+- [ADR-0021 — Parser-as-source-of-truth for H1a (per-command usage in parse errors)](0021-parser-as-source-of-truth-for-h1a.md)