ADR-0024 Phase D (full): schema-aware value typing

Schema-aware typed value slots — the central design claim of
ADR-0024 §Phase D. Insert / update / delete value slots now
dispatch on the user-facing column type at parse time, rejecting
mis-shaped input with localised wording instead of waiting for
the bind-time error.

What changed:

**SchemaCache extension** (`src/completion.rs`):
- New `TableColumn { name, user_type }` for per-table column
  metadata.
- `SchemaCache.table_columns: HashMap<String, Vec<TableColumn>>`.
- `SchemaCache::columns_for_table(name)` — case-insensitive
  lookup, mirrors the walker's case-insensitive entry-word
  resolution.

**WalkContext schema plumbing** (`src/dsl/walker/context.rs`):
- `WalkContext<'a>` gains a lifetime and a `schema: Option<&'a
  SchemaCache>`. `WalkContext::new()` keeps the schemaless
  default; `with_schema(s)` is the new schema-aware constructor.

**Parser entry point** (`src/dsl/parser.rs`):
- `parse_command_with_schema(input, schema)` is the new public
  schema-aware variant. `parse_command(input)` becomes a thin
  wrapper that delegates with `None` for back-compat.
- Internal `try_walker_route` accepts an `Option<&SchemaCache>`
  and threads it into the WalkContext.

**Node::Ident writes_table/writes_column** (`src/dsl/grammar/mod.rs`):
- Two new fields on `Node::Ident`. When `writes_table: true` and
  `source: Tables`, the walker writes the matched ident's name
  into `current_table` and resolves `current_table_columns`
  against the schema cache. When `writes_column: true` and
  `source: Columns`, the walker writes the resolved
  `TableColumn` into `current_column`.

**Walker driver DynamicSubgrammar dispatch** (`src/dsl/walker/driver.rs`):
- The `Node::DynamicSubgrammar(factory)` branch now resolves the
  factory at walk time and `Box::leak`s the result so its inner
  static-slice fields (Choice/Seq) have the lifetime the walker
  expects (per ADR-0024 §sub-grammars). The leak is bounded by
  command-shape complexity per walk; per-walk arena is a future
  optimisation.
- `walk_ident` extends to perform the schema writes when the
  flags are set.

**Typed value slot factories + dynamic sub-grammars** (`src/dsl/grammar/shared.rs`):
- `int_slot` / `real_slot` / `decimal_slot` / `bool_slot` /
  `text_slot` / `date_slot` / `datetime_slot` / `blob_slot` —
  one per `Type`. Each accepts the appropriate literal kind plus
  `null`; integer-only validator rejects `3.14` at int columns;
  decimal validator pins numeric shape.
- `slot_for_type(ty) -> Node` is the dispatcher.
- `current_column_value(ctx) -> Node` is the dynamic sub-grammar
  for `set col = …` and `where col = …` values; reads
  `current_column` and dispatches via `slot_for_type`.
- `column_value_list(ctx) -> Node` is the dynamic sub-grammar
  for `insert into T values (…)`; reads `current_table_columns`
  and unfolds a Seq of typed slots separated by commas.
- Both fall back to the schemaless `VALUE_LITERAL` choice when
  the context lacks the schema-resolved entries — keeps
  schemaless `parse_command` callers (tests, replay path)
  working.

**Data-command grammar wires the new types** (`src/dsl/grammar/data.rs`):
- `TABLE_NAME_INSERT` / `TABLE_NAME_WRITES` (new): table-name
  slots that set `writes_table: true`. Used by insert / update /
  delete to populate `current_table_columns`.
- `SET_COLUMN` / `FILTER_COLUMN` (new): column-name slots in
  `set col=…` / `where col=…` set `writes_column: true`.
- `INSERT_VALUES_LIST` becomes `DynamicSubgrammar(column_value_list)`.
- `UPDATE_ASSIGNMENT` and `WHERE_CLAUSE` use
  `PER_COLUMN_VALUE = DynamicSubgrammar(current_column_value)`.

**Runtime plumbs schema-with-types** (`src/runtime.rs`):
- `refresh_schema_cache` calls `describe_table` for each table
  and populates `SchemaCache::table_columns` with
  `TableColumn { name, user_type }` entries. Best-effort: a
  `describe_table` miss leaves that table unpopulated and the
  walker falls back to schemaless dispatch.

**App dispatches with schema** (`src/app.rs`):
- `dispatch_dsl` routes through `parse_command_with_schema(&self
  .schema_cache, …)` so live typing/dispatch sees the typed
  slots. The replay path stays schemaless (deferred — replay
  bind-time errors still catch type mismatches).

**Catalog** (`src/friendly/strings/en-US.yaml`, `src/friendly/keys.rs`):
- New `parse.custom.bind_type_mismatch` entry with `{found}` and
  `{expected}` placeholders. Surfaced by the int_slot /
  decimal_slot validators.

Tests:
- 11 new walker-side Phase D tests cover insert / update /
  delete with schemas — typed acceptance per column, decimal
  rejection at int columns, null acceptance at any slot,
  multi-assignment per-column dispatch, schemaless fallback.
- The pre-existing `parse_command(input)` test suite (no
  schema) still passes — the fallback path is behaviour-
  preserving.
- 828 passing total, 0 failing, 1 ignored. Clippy clean.
This commit is contained in:
claude@clouddev1
2026-05-15 17:45:56 +00:00
parent 85817791dc
commit abebd7944f
14 changed files with 754 additions and 74 deletions
+49 -30
View File
@@ -1,43 +1,62 @@
//! `WalkContext` — per-walk mutable state that flows through the
//! walker (ADR-0024 §WalkContext).
//! walker (ADR-0024 §WalkContext, §Phase D).
//!
//! Phase A keeps this minimal: app-lifecycle commands have no
//! schema dependency. The `current_table`, `current_table_columns`,
//! and schema-cache pointer become populated as Phase B-D land
//! the schema-aware DDL/data commands.
//! Phase D plumbed a schema reference through the context so
//! schema-aware nodes (`Ident { source: Tables }` writing
//! `current_table`, `DynamicSubgrammar` reading
//! `current_table_columns`) can resolve real entities at walk
//! time. Pre-Phase-D `default()` callers (tests, the chumsky-
//! era `parse_command(input)` signature) still work — the
//! schema slot is `None` and dynamic dispatch falls back to a
//! generic value-literal slot.
/// Per-walk state. Cheap to construct; `default()` is the right
/// shape for app-lifecycle commands.
use crate::completion::{SchemaCache, TableColumn};
/// Per-walk state.
///
/// Carries an optional schema reference (so callers without a
/// schema continue to work) plus mutable accumulators that
/// nodes can write to during the walk:
///
/// - `current_table` / `current_table_columns` — populated when
/// an `Ident { source: Tables }` node with `writes_table:
/// true` matches a known table.
/// - `current_column` — populated by `Ident { source: Columns
/// writes_column: true }` for `set col = …` / `where col =
/// …` slots so the next value-slot picks the column's typed
/// sub-grammar.
#[derive(Debug, Default)]
pub struct WalkContext {
/// Table whose name an `Ident { source: Tables, writes_table:
/// true }` matched earlier in the walk. Phase B+ writes this.
pub struct WalkContext<'a> {
pub schema: Option<&'a SchemaCache>,
pub current_table: Option<String>,
/// Columns of `current_table`, resolved against the schema
/// cache when the table identifier matched. Phase D+ uses
/// this to drive the dynamic `column_value_list` sub-grammar.
#[allow(dead_code)]
pub current_table_columns: Option<Vec<ColumnInfo>>,
/// For `set col=…` and `where col=…`, the column whose value
/// is about to be consumed. Phase D+ writes this so the value
/// slot picks the right typed sub-grammar.
#[allow(dead_code)]
pub current_column: Option<ColumnInfo>,
pub current_table_columns: Option<Vec<TableColumn>>,
pub current_column: Option<TableColumn>,
}
impl WalkContext {
impl<'a> WalkContext<'a> {
/// Schemaless walk context — the legacy default used by
/// pre-Phase-D callers and tests that don't care about
/// schema-aware narrowing.
#[must_use]
pub fn new() -> Self {
Self::default()
}
/// Schema-aware walk context. Dynamic sub-grammars read
/// `schema` (via `current_table_columns`) to unfold typed
/// per-column value slots.
#[must_use]
pub const fn with_schema(schema: &'a SchemaCache) -> Self {
Self {
schema: Some(schema),
current_table: None,
current_table_columns: None,
current_column: None,
}
}
}
/// Schema info for a single column. Phase D+ populates this from
/// the schema cache; Phase A leaves it unused.
#[derive(Debug, Clone)]
/// Convenience re-export so non-walker modules don't reach
/// across `completion::TableColumn` directly.
#[allow(dead_code)]
pub struct ColumnInfo {
pub name: String,
pub user_type: crate::dsl::types::Type,
}
pub type ColumnInfo = TableColumn;
+53 -7
View File
@@ -92,21 +92,44 @@ pub fn walk_node(
role,
validator,
highlight_override: _,
} => walk_ident(source, pos, *src, role, *validator, path, per_byte),
writes_table,
writes_column,
} => walk_ident(
source,
pos,
*src,
role,
*validator,
*writes_table,
*writes_column,
ctx,
path,
per_byte,
),
Node::NumberLit { validator } => walk_number_lit(source, pos, *validator, path, per_byte),
Node::Literal(literal) => walk_literal(source, pos, literal, path, per_byte),
Node::StringLit => walk_string_lit(source, pos, path, per_byte),
Node::BlobLit | Node::DynamicSubgrammar(_) => {
// Phase A-D: not exercised yet. Reaching this branch
// means a future-phase grammar got declared without
// the walker support landing — surface as a hard
// failure so tests catch it loudly rather than
// silently mis-parsing.
Node::BlobLit => {
// BlobLit terminals are declared but no current grammar
// node uses them. Reaching this branch means a future
// grammar declared a BlobLit without walker support
// landing — surface as a hard failure so tests catch
// it loudly rather than silently mis-parsing.
NodeWalkResult::Failed {
position: pos,
kind: FailureKind::Mismatch { expected: vec![] },
}
}
Node::DynamicSubgrammar(factory) => {
// ADR-0024 §sub-grammars: resolve the inner Node at
// walk time using the active `WalkContext`, then
// recursively walk it. `Box::leak` per-walk gives the
// inner static-slice fields (Choice/Seq) the lifetime
// they require; the leak is bounded by command-shape
// complexity per walk.
let resolved: &'static Node = Box::leak(Box::new(factory(ctx)));
walk_node(source, pos, resolved, ctx, path, per_byte)
}
Node::Flag(name) => walk_flag(source, pos, name, path, per_byte),
Node::Repeated {
inner,
@@ -185,12 +208,16 @@ fn walk_punct(
}
}
#[allow(clippy::too_many_arguments)]
fn walk_ident(
source: &str,
position: usize,
src: crate::dsl::grammar::IdentSource,
role: &'static str,
validator: Option<crate::dsl::grammar::IdentValidator>,
writes_table: bool,
writes_column: bool,
ctx: &mut WalkContext,
path: &mut MatchedPath,
per_byte: &mut Vec<ByteClass>,
) -> NodeWalkResult {
@@ -209,6 +236,25 @@ fn walk_ident(
kind: FailureKind::Validation(err),
};
}
// ADR-0024 §Phase D: schema-aware writes. When the ident is
// a Tables source with `writes_table`, resolve the matched
// name against the schema cache and populate current_table /
// current_table_columns so subsequent dynamic sub-grammars
// can read them. `writes_column` resolves against the
// already-populated `current_table_columns`.
if writes_table && matches!(src, crate::dsl::grammar::IdentSource::Tables) {
ctx.current_table = Some(text.clone());
ctx.current_table_columns = ctx
.schema
.and_then(|s| s.columns_for_table(&text).map(<[_]>::to_vec));
}
if writes_column && matches!(src, crate::dsl::grammar::IdentSource::Columns) {
ctx.current_column = ctx.current_table_columns.as_ref().and_then(|cols| {
cols.iter()
.find(|c| c.name.eq_ignore_ascii_case(&text))
.cloned()
});
}
path.push(MatchedItem {
kind: MatchedKind::Ident { role },
text,
+230 -2
View File
@@ -152,10 +152,10 @@ pub fn expected_at_input(source: &str) -> Vec<outcome::Expectation> {
/// walker's error.
/// - `(None, None)` when the entry word doesn't match any
/// registered command — the router falls through to chumsky.
pub fn walk(
pub fn walk<'a>(
source: &str,
bound: WalkBound,
ctx: &mut WalkContext,
ctx: &mut WalkContext<'a>,
) -> (Option<WalkResult>, Option<Command>) {
// Phase A only consumes EndOfInput; Position would slice
// the source, which is the same operation.
@@ -1165,4 +1165,232 @@ mod tests {
// schema — schema-listable slot, not a HintMode case.
assert!(hint_mode_at_input("show data ").is_none());
}
// =========================================================
// Phase D full — schema-aware value typing.
// =========================================================
use crate::completion::{SchemaCache, TableColumn};
use crate::dsl::parser::parse_command_with_schema;
fn schema_with(table: &str, columns: &[(&str, Type)]) -> SchemaCache {
let cols: Vec<TableColumn> = columns
.iter()
.map(|(n, t)| TableColumn {
name: (*n).to_string(),
user_type: *t,
})
.collect();
let mut cache = SchemaCache::default();
cache.tables.push(table.to_string());
for c in &cols {
cache.columns.push(c.name.clone());
}
cache.table_columns.insert(table.to_string(), cols);
cache
}
#[test]
fn phase_d_insert_with_schema_accepts_typed_values_per_column() {
let schema = schema_with(
"Customers",
&[("id", Type::Serial), ("Name", Type::Text), ("Active", Type::Bool)],
);
// 3 columns: int, text, bool. Each value matches its slot.
let cmd = parse_command_with_schema(
"insert into Customers values (1, 'Alice', true)",
&schema,
)
.expect("parse");
match cmd {
Command::Insert { table, values, .. } => {
assert_eq!(table, "Customers");
assert_eq!(values.len(), 3);
}
other => panic!("expected Insert, got {other:?}"),
}
}
#[test]
fn phase_d_insert_rejects_decimal_in_int_column() {
// The schema has `id` as Int. `3.14` is a Number with a
// decimal — the typed `int_slot` validator rejects.
let schema = schema_with("T", &[("id", Type::Int)]);
let err = parse_command_with_schema("insert into T values (3.14)", &schema)
.expect_err("should reject");
match err {
crate::dsl::ParseError::Invalid { message, .. } => {
assert!(
message.contains("integer") || message.contains("3.14"),
"got: {message}"
);
}
other => panic!("expected Invalid, got {other:?}"),
}
}
#[test]
fn phase_d_insert_accepts_null_at_any_column() {
// null is the absence sentinel; every typed slot
// accepts it.
let schema = schema_with(
"T",
&[("a", Type::Int), ("b", Type::Text), ("c", Type::Bool)],
);
let cmd = parse_command_with_schema(
"insert into T values (null, null, null)",
&schema,
)
.expect("parse");
match cmd {
Command::Insert { values, .. } => {
assert!(values.iter().all(|v| matches!(v, Value::Null)));
}
other => panic!("expected Insert, got {other:?}"),
}
}
#[test]
fn phase_d_insert_falls_back_when_table_not_in_schema() {
// The schema is empty; the walker can't resolve column
// info for `Customers`. The DynamicSubgrammar falls
// back to the schemaless generic value-literal list and
// accepts mixed-shape values as it did pre-Phase-D.
let schema = SchemaCache::default();
let cmd = parse_command_with_schema(
"insert into Customers values (1, 'Alice')",
&schema,
)
.expect("parse — fallback path");
match cmd {
Command::Insert { values, .. } => assert_eq!(values.len(), 2),
other => panic!("expected Insert, got {other:?}"),
}
}
#[test]
fn phase_d_schemaless_parse_command_still_works() {
// The pre-Phase-D `parse_command(input)` signature
// passes no schema; the DynamicSubgrammar falls back to
// the schemaless value-literal list.
let cmd = parse("insert into T values (1, 'Alice', null)").expect("parse");
match cmd {
Command::Insert { values, .. } => assert_eq!(values.len(), 3),
other => panic!("expected Insert, got {other:?}"),
}
}
#[test]
fn phase_d_insert_accepts_bool_value_for_bool_column() {
let schema = schema_with("T", &[("flag", Type::Bool)]);
let cmd = parse_command_with_schema("insert into T values (false)", &schema)
.expect("parse");
match cmd {
Command::Insert { values, .. } => {
assert_eq!(values, vec![Value::Bool(false)]);
}
other => panic!("expected Insert, got {other:?}"),
}
}
#[test]
fn phase_d_update_accepts_text_value_for_text_column() {
let schema = schema_with(
"Customers",
&[("id", Type::Int), ("Email", Type::Text)],
);
let cmd = parse_command_with_schema(
"update Customers set Email='new@b.c' where id=1",
&schema,
)
.expect("parse");
match cmd {
Command::Update { assignments, .. } => {
assert_eq!(assignments.len(), 1);
assert_eq!(assignments[0].0, "Email");
}
other => panic!("expected Update, got {other:?}"),
}
}
#[test]
fn phase_d_update_rejects_decimal_in_int_set_column() {
// Email is text; Score is int. Assigning `3.14` to Score
// hits the int_slot validator.
let schema = schema_with(
"T",
&[("id", Type::Int), ("Score", Type::Int)],
);
let err = parse_command_with_schema(
"update T set Score=3.14 where id=1",
&schema,
)
.expect_err("should reject");
match err {
crate::dsl::ParseError::Invalid { message, .. } => {
assert!(
message.contains("integer") || message.contains("3.14"),
"got: {message}"
);
}
other => panic!("expected Invalid, got {other:?}"),
}
}
#[test]
fn phase_d_delete_where_uses_typed_column_value() {
// `where id=1` — id is Int; `1` matches the int_slot.
let schema = schema_with("T", &[("id", Type::Int), ("Name", Type::Text)]);
let cmd = parse_command_with_schema("delete from T where id=1", &schema)
.expect("parse");
match cmd {
Command::Delete { .. } => {}
other => panic!("expected Delete, got {other:?}"),
}
}
#[test]
fn phase_d_delete_where_rejects_decimal_at_int_column() {
// `where id=3.14` — id is Int; the typed slot rejects.
let schema = schema_with("T", &[("id", Type::Int)]);
let err = parse_command_with_schema("delete from T where id=3.14", &schema)
.expect_err("should reject");
match err {
crate::dsl::ParseError::Invalid { message, .. } => {
assert!(
message.contains("integer") || message.contains("3.14"),
"got: {message}"
);
}
other => panic!("expected Invalid, got {other:?}"),
}
}
#[test]
fn phase_d_update_multi_assignment_uses_per_column_types() {
let schema = schema_with(
"Customers",
&[
("id", Type::Int),
("Name", Type::Text),
("Score", Type::Int),
],
);
// `Score=42` (int slot) and `Name='Alice'` (text slot)
// — each value slot dispatches on the column whose
// ident matched immediately before.
let cmd = parse_command_with_schema(
"update Customers set Score=42, Name='Alice' where id=1",
&schema,
)
.expect("parse");
match cmd {
Command::Update { assignments, .. } => {
assert_eq!(assignments.len(), 2);
assert_eq!(assignments[0].0, "Score");
assert_eq!(assignments[1].0, "Name");
}
other => panic!("expected Update, got {other:?}"),
}
}
}