50b3542050
Stand up the unified-grammar tree walker alongside the existing
chumsky parser and migrate the eleven app-lifecycle commands
(quit, help, rebuild, save / save as, new, load, export, import,
mode, messages) end-to-end. The router in parse_tokens consults
the walker first; non-migrated commands still fall through to
chumsky.
Scope:
- src/dsl/grammar/{mod,app}.rs: Node enum (13 kinds), Word /
IdentSource / HintMode / HighlightClass / ValidationError /
CommandNode types, REGISTRY of the eleven app commands.
- src/dsl/walker/{mod,driver,context,outcome,lex_helpers}.rs:
scannerless byte-level walker, per-node-kind dispatch with
Choice/Seq/Optional backtracking, WalkContext (Phase B-D
schema fields stubbed), WalkOutcome with Match/Incomplete/
Mismatch/ValidationFailed.
- src/dsl/parser.rs: try_walker_route() runs first in
parse_tokens; bridge converts WalkOutcome to ParseError
preserving catalog wording (mode.unknown / messages.unknown
surface verbatim via friendly::translate). Legacy
try_parse_app_path_command deleted; chumsky's bare-keyword
app branches remain unreachable until Phase F sweep.
Walker design choices worth noting:
- mode <value> / messages <value> use Choice(Word, Word, Ident)
so known keywords appear in the expected-set; the trailing
Ident catch-all funnels unknown values into the friendly
validator that always errors with the catalog wording.
- save / save as is one CommandNode (Optional(Word("as"))) -
closes the round-5 "save Tab can't offer as" limitation
structurally.
- Path-bearing UX shipped per ADR-0024: BarePath terminates at
whitespace; paths with spaces use the (not-yet-wired) quoted
form. Existing tests pass on the new shape.
Tests:
- 28 new walker-specific tests in dsl::walker::tests covering
every app-lifecycle command, friendly-error wording for
mode/messages unknown values, trailing-garbage detection,
whitespace tolerance, and routing fall-through.
- Total: 805 passed, 0 failed, 1 ignored (was 777 / 1).
- cargo clippy --all-targets -- -D warnings clean.
100 lines
3.1 KiB
Rust
100 lines
3.1 KiB
Rust
//! Byte-level helpers for the scannerless walker (ADR-0024
|
|
//! §scannerless).
|
|
//!
|
|
//! Each helper takes the source string and a byte position,
|
|
//! returns either `Some(end_position)` (matched, post-token end)
|
|
//! or `None` (didn't match here). Helpers are pure and span-
|
|
//! exact; multi-byte UTF-8 within identifiers and string
|
|
//! literals is handled byte-correctly.
|
|
//!
|
|
//! These helpers internally mirror the logic of the legacy
|
|
//! `dsl::lexer` module but are invoked per-position by the
|
|
//! walker rather than as a pre-pass.
|
|
|
|
/// Return the byte index of the first non-whitespace byte at or
|
|
/// after `start`. If the rest is all whitespace, returns
|
|
/// `source.len()`.
|
|
pub fn skip_whitespace(source: &str, start: usize) -> usize {
|
|
let bytes = source.as_bytes();
|
|
let mut i = start;
|
|
while i < bytes.len() && bytes[i].is_ascii_whitespace() {
|
|
i += 1;
|
|
}
|
|
i
|
|
}
|
|
|
|
/// Identifier shape: ASCII letter or `_` to start, then ASCII
|
|
/// alphanumeric or `_`. Returns `Some((start, end))` on match.
|
|
pub fn consume_ident(source: &str, start: usize) -> Option<(usize, usize)> {
|
|
let bytes = source.as_bytes();
|
|
let first = *bytes.get(start)?;
|
|
if !(first.is_ascii_alphabetic() || first == b'_') {
|
|
return None;
|
|
}
|
|
let mut i = start + 1;
|
|
while i < bytes.len() {
|
|
let b = bytes[i];
|
|
if b.is_ascii_alphanumeric() || b == b'_' {
|
|
i += 1;
|
|
} else {
|
|
break;
|
|
}
|
|
}
|
|
Some((start, i))
|
|
}
|
|
|
|
/// Try to match `keyword` at `position` case-insensitively.
|
|
///
|
|
/// The match must end at a non-identifier byte (or end-of-input)
|
|
/// so that `save` doesn't half-match the prefix of `saved`.
|
|
/// Returns the end byte index on match.
|
|
pub fn match_keyword(source: &str, position: usize, keyword: &str) -> Option<usize> {
|
|
let bytes = source.as_bytes();
|
|
let kw_bytes = keyword.as_bytes();
|
|
if position + kw_bytes.len() > bytes.len() {
|
|
return None;
|
|
}
|
|
for (offset, &kb) in kw_bytes.iter().enumerate() {
|
|
let sb = bytes[position + offset];
|
|
if !sb.eq_ignore_ascii_case(&kb) {
|
|
return None;
|
|
}
|
|
}
|
|
let end = position + kw_bytes.len();
|
|
if end < bytes.len() {
|
|
let next = bytes[end];
|
|
if next.is_ascii_alphanumeric() || next == b'_' {
|
|
return None;
|
|
}
|
|
}
|
|
Some(end)
|
|
}
|
|
|
|
/// Bare-path token: a non-whitespace run.
|
|
///
|
|
/// Per ADR-0024 the path-bearing UX dropped the "spaces don't
|
|
/// need quoting" feature; paths with spaces use `StringLit`.
|
|
/// Phase A's `import` / `export` slots use this.
|
|
pub fn consume_bare_path(source: &str, start: usize) -> Option<(usize, usize)> {
|
|
let bytes = source.as_bytes();
|
|
if start >= bytes.len() || bytes[start].is_ascii_whitespace() {
|
|
return None;
|
|
}
|
|
let mut i = start;
|
|
while i < bytes.len() && !bytes[i].is_ascii_whitespace() {
|
|
i += 1;
|
|
}
|
|
Some((start, i))
|
|
}
|
|
|
|
/// Match a single punctuation character at `position`.
|
|
#[allow(dead_code)]
|
|
pub fn match_punct(source: &str, position: usize, ch: char) -> Option<usize> {
|
|
let bytes = source.as_bytes();
|
|
if position < bytes.len() && bytes[position] == ch as u8 {
|
|
Some(position + 1)
|
|
} else {
|
|
None
|
|
}
|
|
}
|