Insert grammar: Form C type-awareness via lookahead (ADR-0024 §Phase D)

Form C (`insert into T (vals)`) shared the `(` opener with Form A,
so its paren was an untyped Repeated(Choice(literal, ident)) — values
weren't type- or count-checked at parse time (handoff-12 §2.2).

New Node::Lookahead variant: a factory that peeks the source. The
insert first-paren factory inspects the first token — a value literal
routes the contents through the typed column_value_list (Form B
dispatch contract: per-non-auto-column typed slots); an identifier or
empty paren routes to a Form A column-name list. So Form C now gets
the same per-column typed slots, hints, and parse-time type/count
checking Form B has.

The explicit-Choice-branch split is impossible here (committed-choice
semantics commit after `(` matches); lookahead is the only route, and
DynamicSubgrammar factories couldn't see the source. Node::Lookahead
is not memoized — its output depends on source — but it returns only
a small node (a Repeated, or a thin DynamicSubgrammar wrapper that
delegates to the memoized column_value_list).

`insert into T (` now cleanly shows Form A column candidates instead
of mixed Form-A/C suggestions. Form C matrix tests updated for the
type-aware behaviour.
This commit is contained in:
claude@clouddev1
2026-05-15 22:27:53 +00:00
parent 9bbb96e735
commit 90e3f5dbfb
18 changed files with 411 additions and 262 deletions
+79 -55
View File
@@ -18,9 +18,10 @@
use crate::dsl::command::{Command, RowFilter};
use crate::dsl::grammar::{
CommandNode, HintMode, IdentSource, Node, ValidationError, Word,
CommandNode, IdentSource, Node, ValidationError, Word,
shared::{column_value_list, current_column_value},
};
use crate::dsl::walker::context::WalkContext;
use crate::dsl::value::Value;
use crate::dsl::walker::outcome::{MatchedItem, MatchedKind, MatchedPath};
@@ -52,25 +53,6 @@ const TABLE_NAME_INSERT: Node = Node::Ident {
writes_user_listed_column: false,
};
// `value_literal` — null / true / false / number / string. The
// chumsky-side equivalent (`value_literal()` in dsl/parser.rs).
const VALUE_LITERAL_CHOICES: &[Node] = &[
Node::Word(Word::keyword("null")),
Node::Word(Word::keyword("true")),
Node::Word(Word::keyword("false")),
Node::NumberLit { validator: None },
Node::StringLit,
];
const VALUE_LITERAL_INNER: Node = Node::Choice(VALUE_LITERAL_CHOICES);
/// Value-literal slot with the `ProseOnly` HintMode
/// (ADR-0024 §HintMode-per-node) — the hint resolver surfaces
/// the generic "Type a value: …" prose rather than the
/// misleading `null`/`true`/`false` candidate trio.
const VALUE_LITERAL: Node = Node::Hinted {
mode: HintMode::ProseOnly("hint.value_literal_slot"),
inner: &VALUE_LITERAL_INNER,
};
// =================================================================
// show — `show (data|table) <T>`
// =================================================================
@@ -97,43 +79,85 @@ const SHOW_SHAPE: Node = Node::Choice(SHOW_CHOICES);
// =================================================================
//
// Forms A (with column list) and C (bare value list) both start
// with `(`. To avoid the walker's "first commit wins" semantics
// rejecting Form C when the inner content is values rather than
// column names, the inside of the first paren is parsed as a
// repeated `Choice(Ident, ValueLiteral)`. The AST builder then
// disambiguates: if a `values` keyword follows the first paren,
// the inner content was column names; otherwise it was values.
// with `(`. The walker's "first commit wins" Choice semantics
// can't pick between them after the `(` matches, so the first
// paren's contents are resolved by a `Node::Lookahead` factory
// (`insert_first_paren`): it peeks the first token to decide.
//
// - First token is a value literal (number / string /
// null / true / false) → Form C → the typed `column_value_list`
// (same dispatch contract as Form B — ADR-0024 §Phase D Form-C
// type-awareness). Form C values are now type-checked at parse
// time, not only at bind time.
// - Otherwise (column-name identifier, or an empty paren) →
// Form A → a repeated column-name list. The idents write
// `WalkContext::user_listed_columns` so the trailing
// `values (…)` slots mirror the user's selection.
const INSERT_PAREN_ITEM_CHOICES: &[Node] = &[
// VALUE_LITERAL first so that `true`/`false`/`null` match
// their Word branch rather than the broader Ident{Columns}
// catch-all (consume_ident doesn't filter against the
// keyword set; without this ordering, `(true)` would lex
// as a column-name list).
VALUE_LITERAL,
Node::Ident {
source: IdentSource::Columns,
role: "insert_first_item",
validator: None,
highlight_override: None,
writes_table: false,
writes_column: false,
// Form A signal: when the user lists explicit columns
// in `insert into <T> (col1, col2, …)`, the walker
// appends each matched name to
// `WalkContext::user_listed_columns`. The inner
// `values (…)` slot list then mirrors that user
// selection instead of the auto-filtered default
// (ADR-0024 §Phase D §column_value_list).
writes_user_listed_column: true,
},
];
const INSERT_PAREN_ITEM: Node = Node::Choice(INSERT_PAREN_ITEM_CHOICES);
const INSERT_PAREN_LIST: Node = Node::Repeated {
inner: &INSERT_PAREN_ITEM,
separator: Some(&Node::Punct(',')),
min: 1,
/// Form A's column-name slot. `static` (not `const`) so the
/// `insert_first_paren` factory can take a `&'static` reference
/// to it when building the repeated list at walk time.
static FORM_A_COLUMN: Node = Node::Ident {
source: IdentSource::Columns,
role: "insert_first_item",
validator: None,
highlight_override: None,
writes_table: false,
writes_column: false,
writes_user_listed_column: true,
};
static INSERT_COMMA: Node = Node::Punct(',');
/// First-paren resolver (ADR-0024 §Phase D Form-C type-awareness).
/// Peeks the first token after `(` to route to Form A's
/// column-name list or Form C's typed value list.
fn insert_first_paren(_ctx: &WalkContext, source: &str, pos: usize) -> Node {
if first_paren_item_is_value_literal(source, pos) {
// Form C — bare value list. `column_value_list` with no
// user-listed columns dispatches per non-auto-generated
// column, exactly as Form B does.
Node::DynamicSubgrammar(column_value_list)
} else {
// Form A (or Form A in progress / empty paren).
Node::Repeated {
inner: &FORM_A_COLUMN,
separator: Some(&INSERT_COMMA),
min: 1,
}
}
}
/// True when the first token after the insert `(` is a
/// value literal — the signal that the paren is a Form C value
/// list rather than a Form A column-name list. An empty paren
/// or an identifier-shaped token (a column name) returns false.
fn first_paren_item_is_value_literal(source: &str, pos: usize) -> bool {
use crate::dsl::walker::lex_helpers::{
consume_ident, consume_number_literal, consume_string_literal,
skip_whitespace,
};
let p = skip_whitespace(source, pos);
if p >= source.len() {
return false; // empty paren — treat as Form A
}
if consume_string_literal(source, p).is_some() {
return true;
}
if consume_number_literal(source, p).is_some() {
return true;
}
if let Some((s, e)) = consume_ident(source, p) {
let word = &source[s..e];
// `null` / `true` / `false` are value literals; any
// other identifier is a column name (Form A).
return word.eq_ignore_ascii_case("null")
|| word.eq_ignore_ascii_case("true")
|| word.eq_ignore_ascii_case("false");
}
false // punctuation (e.g. `)`) — treat as Form A
}
const INSERT_PAREN_LIST: Node = Node::Lookahead(insert_first_paren);
/// Schema-aware value list: when the walker has a populated
/// `current_table_columns`, unfolds to a `Seq` of typed slots
+12 -1
View File
@@ -286,9 +286,20 @@ pub enum Node {
min: usize,
},
/// Resolves at walk time using the active `WalkContext`.
/// Phase D+ uses this for `column_value_list`.
/// Phase D+ uses this for `column_value_list`. The factory
/// is pure in `ctx`, so the walker memoizes the resolution
/// (one leak per distinct schema shape).
#[allow(dead_code)]
DynamicSubgrammar(fn(&WalkContext) -> Self),
/// Like `DynamicSubgrammar` but the factory also sees the
/// source and the current byte position, so it can look
/// ahead. Used by the insert first-paren to discriminate
/// Form A (`(cols) values (...)`) from Form C (`(vals)`)
/// before walking the contents — Form C then routes through
/// the typed `column_value_list` (ADR-0024 §Phase D, Form C
/// type-awareness). Not memoized: the output depends on the
/// source, not just `ctx`.
Lookahead(fn(&WalkContext, &str, usize) -> Self),
/// Typed value-literal slot (ADR-0024 §Phase D §typed-value-slots).
///
/// Walks `inner` to consume the literal but records the
+14
View File
@@ -212,6 +212,20 @@ fn walk_node_inner(
let resolved = resolve_dynamic(*factory, ctx);
walk_node(source, pos, resolved, ctx, path, per_byte)
}
Node::Lookahead(factory) => {
// ADR-0024 §Phase D Form-C type-awareness: the
// factory peeks the source at `pos` (e.g. to tell a
// Form A column list from a Form C value list) and
// returns the shape to walk. Not memoized — the
// result depends on the source — but the factory
// returns a small node (a Repeated, or a thin
// DynamicSubgrammar wrapper that delegates to the
// memoized `column_value_list`), so the per-walk
// leak is a few bytes, not a whole typed tree.
let resolved: &'static Node =
Box::leak(Box::new(factory(ctx, source, pos)));
walk_node(source, pos, resolved, ctx, path, per_byte)
}
Node::TypedValueSlot {
ty,
column_name,