ADR-0024 Phase A: walker framework + app-lifecycle commands
Stand up the unified-grammar tree walker alongside the existing
chumsky parser and migrate the eleven app-lifecycle commands
(quit, help, rebuild, save / save as, new, load, export, import,
mode, messages) end-to-end. The router in parse_tokens consults
the walker first; non-migrated commands still fall through to
chumsky.
Scope:
- src/dsl/grammar/{mod,app}.rs: Node enum (13 kinds), Word /
IdentSource / HintMode / HighlightClass / ValidationError /
CommandNode types, REGISTRY of the eleven app commands.
- src/dsl/walker/{mod,driver,context,outcome,lex_helpers}.rs:
scannerless byte-level walker, per-node-kind dispatch with
Choice/Seq/Optional backtracking, WalkContext (Phase B-D
schema fields stubbed), WalkOutcome with Match/Incomplete/
Mismatch/ValidationFailed.
- src/dsl/parser.rs: try_walker_route() runs first in
parse_tokens; bridge converts WalkOutcome to ParseError
preserving catalog wording (mode.unknown / messages.unknown
surface verbatim via friendly::translate). Legacy
try_parse_app_path_command deleted; chumsky's bare-keyword
app branches remain unreachable until Phase F sweep.
Walker design choices worth noting:
- mode <value> / messages <value> use Choice(Word, Word, Ident)
so known keywords appear in the expected-set; the trailing
Ident catch-all funnels unknown values into the friendly
validator that always errors with the catalog wording.
- save / save as is one CommandNode (Optional(Word("as"))) -
closes the round-5 "save Tab can't offer as" limitation
structurally.
- Path-bearing UX shipped per ADR-0024: BarePath terminates at
whitespace; paths with spaces use the (not-yet-wired) quoted
form. Existing tests pass on the new shape.
Tests:
- 28 new walker-specific tests in dsl::walker::tests covering
every app-lifecycle command, friendly-error wording for
mode/messages unknown values, trailing-garbage detection,
whitespace tolerance, and routing fall-through.
- Total: 805 passed, 0 failed, 1 ignored (was 777 / 1).
- cargo clippy --all-targets -- -D warnings clean.
This commit is contained in:
@@ -0,0 +1,99 @@
|
||||
//! Byte-level helpers for the scannerless walker (ADR-0024
|
||||
//! §scannerless).
|
||||
//!
|
||||
//! Each helper takes the source string and a byte position,
|
||||
//! returns either `Some(end_position)` (matched, post-token end)
|
||||
//! or `None` (didn't match here). Helpers are pure and span-
|
||||
//! exact; multi-byte UTF-8 within identifiers and string
|
||||
//! literals is handled byte-correctly.
|
||||
//!
|
||||
//! These helpers internally mirror the logic of the legacy
|
||||
//! `dsl::lexer` module but are invoked per-position by the
|
||||
//! walker rather than as a pre-pass.
|
||||
|
||||
/// Return the byte index of the first non-whitespace byte at or
|
||||
/// after `start`. If the rest is all whitespace, returns
|
||||
/// `source.len()`.
|
||||
pub fn skip_whitespace(source: &str, start: usize) -> usize {
|
||||
let bytes = source.as_bytes();
|
||||
let mut i = start;
|
||||
while i < bytes.len() && bytes[i].is_ascii_whitespace() {
|
||||
i += 1;
|
||||
}
|
||||
i
|
||||
}
|
||||
|
||||
/// Identifier shape: ASCII letter or `_` to start, then ASCII
|
||||
/// alphanumeric or `_`. Returns `Some((start, end))` on match.
|
||||
pub fn consume_ident(source: &str, start: usize) -> Option<(usize, usize)> {
|
||||
let bytes = source.as_bytes();
|
||||
let first = *bytes.get(start)?;
|
||||
if !(first.is_ascii_alphabetic() || first == b'_') {
|
||||
return None;
|
||||
}
|
||||
let mut i = start + 1;
|
||||
while i < bytes.len() {
|
||||
let b = bytes[i];
|
||||
if b.is_ascii_alphanumeric() || b == b'_' {
|
||||
i += 1;
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
Some((start, i))
|
||||
}
|
||||
|
||||
/// Try to match `keyword` at `position` case-insensitively.
|
||||
///
|
||||
/// The match must end at a non-identifier byte (or end-of-input)
|
||||
/// so that `save` doesn't half-match the prefix of `saved`.
|
||||
/// Returns the end byte index on match.
|
||||
pub fn match_keyword(source: &str, position: usize, keyword: &str) -> Option<usize> {
|
||||
let bytes = source.as_bytes();
|
||||
let kw_bytes = keyword.as_bytes();
|
||||
if position + kw_bytes.len() > bytes.len() {
|
||||
return None;
|
||||
}
|
||||
for (offset, &kb) in kw_bytes.iter().enumerate() {
|
||||
let sb = bytes[position + offset];
|
||||
if !sb.eq_ignore_ascii_case(&kb) {
|
||||
return None;
|
||||
}
|
||||
}
|
||||
let end = position + kw_bytes.len();
|
||||
if end < bytes.len() {
|
||||
let next = bytes[end];
|
||||
if next.is_ascii_alphanumeric() || next == b'_' {
|
||||
return None;
|
||||
}
|
||||
}
|
||||
Some(end)
|
||||
}
|
||||
|
||||
/// Bare-path token: a non-whitespace run.
|
||||
///
|
||||
/// Per ADR-0024 the path-bearing UX dropped the "spaces don't
|
||||
/// need quoting" feature; paths with spaces use `StringLit`.
|
||||
/// Phase A's `import` / `export` slots use this.
|
||||
pub fn consume_bare_path(source: &str, start: usize) -> Option<(usize, usize)> {
|
||||
let bytes = source.as_bytes();
|
||||
if start >= bytes.len() || bytes[start].is_ascii_whitespace() {
|
||||
return None;
|
||||
}
|
||||
let mut i = start;
|
||||
while i < bytes.len() && !bytes[i].is_ascii_whitespace() {
|
||||
i += 1;
|
||||
}
|
||||
Some((start, i))
|
||||
}
|
||||
|
||||
/// Match a single punctuation character at `position`.
|
||||
#[allow(dead_code)]
|
||||
pub fn match_punct(source: &str, position: usize, ch: char) -> Option<usize> {
|
||||
let bytes = source.as_bytes();
|
||||
if position < bytes.len() && bytes[position] == ch as u8 {
|
||||
Some(position + 1)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user