docs: ADR-0032 Amendment 2 + §10.6 regression tests

Amendment 2 records the §10.6 fixup-pass mechanism choice. §10.6
prescribes "rewriting the highlight class" on projection-list
idents at end-of-walk; the actual implementation uses a different
mechanism that achieves the identical user-visible behavior:

1. 2d's two-pass schema-existence diagnostic collects every FROM
   binding from the matched path first, then resolves projection
   idents against the complete scope. The post-walk re-resolve
   §10.6 calls for, just embedded in the diagnostic emitter.

2. input_render.rs's diagnostic-overlay path colors each
   diagnostic span Error/Warning, achieving the visual change
   §10.6 describes without needing a new HighlightClass variant.

The completion-mid-typing piece is improved by the §10.5
look-ahead probe (sub-phase 2e earlier).

Four new regression tests in `projection_before_from_tests` pin
the behavior so a future refactor can't silently regress it:
correct ident resolves silently, unknown ident flags via
diagnostic on its span, multi-projection only flags unknowns,
projection-without-FROM is silent.

ADR index entry updated to reference Amendment 2.

Test totals: 1424 → 1428 passing (+4). Clippy clean.
This commit is contained in:
claude@clouddev1
2026-05-20 21:19:57 +00:00
parent 0fc7b082b2
commit ee0dafd86b
3 changed files with 216 additions and 1 deletions
+90
View File
@@ -1390,6 +1390,96 @@ This amendment narrows the honest limitation in §12 from
and recursive CTE result columns" — a tighter, factually and recursive CTE result columns" — a tighter, factually
verified carve-out. verified carve-out.
## Amendment 2 — §10.6 fixup-pass mechanism (2026-05-20)
§10.6's prescription for the post-walk fixup is written in
terms of "rewriting the highlight class" on projection-list
`Ident` terminals — downgrading "column" → "unknown identifier"
when an ident doesn't belong to the eventual `from_scope`, or
upgrading the reverse direction once a `FROM` is typed. The
implementation chose a different mechanism that achieves the
identical user-visible effect; this Amendment records the
choice so a reader of §10.6 doesn't go looking for a literal
`per_byte_class` rewrite step that does not exist.
### Mechanism actually used
Two pieces, both already in the codebase by the end of
sub-phase 2d:
1. **Two-pass schema-existence diagnostic.** The 2d rewrite of
`schema_existence_diagnostics` (`src/dsl/walker/mod.rs`)
runs a pre-pass over the matched path that collects every
`IdentSource::Tables` / `cte_name` / `table_alias` ident
into a single binding vec, regardless of where in the path
it sits. The main pass then resolves each `sql_expr_ident`
against the **complete** binding set. A projection ident
that resolves under the eventual FROM scope produces no
diagnostic; one that doesn't produces an
`unknown_column` diagnostic on its own span.
2. **Diagnostic-overlay renderer.** `src/input_render.rs`
reads the walker's diagnostic list at every keystroke and
overlays each diagnostic's span with the appropriate
colour (Error red for unknown-column, Warning for
type-mismatch / `LIKE`-on-numeric / etc.). The overlay
sits on top of the walker's `per_byte_class` (which keeps
all idents at `HighlightClass::Identifier`).
Combined, the two yield the §10.6 user-visible behaviour:
typing `select bogus_col`, the diagnostic emits and the
overlay paints the ident red as soon as a FROM appears that
shows the column doesn't exist; typing `select real_col`, no
diagnostic emits and the ident stays Identifier-coloured.
Within one debounce cycle.
### Why this is equivalent
§10.6's stated goal is correctness of the end-of-walk
classification — "rewriting the highlight class" is one
implementation strategy for that goal. The HighlightClass
enum in the codebase has only one identifier slot
(`Identifier`); the Error tint comes from diagnostic overlay,
not from a separate `Column` vs `UnknownIdentifier` class.
The two-pass diagnostic pass is the "post-walk fixup" that
§10.6 calls for — it just runs inside the diagnostic emitter
rather than as a separate rewrite step. The integration
point (§10.6's "final stage of the walk itself") still
holds: `schema_existence_diagnostics` runs after the walk's
main work, mutating the walker's accumulated diagnostic
vector in place. Consumers see a single coherent snapshot.
### Completion mid-typing
§10.6's second user-visible promise — "during-typing
completion of projection-list column names uses the global
fallback" — is preserved as a posture, but improved at the
edges in sub-phase 2e by a look-ahead probe in
`src/completion.rs`. When the leading walk produces no
`from_scope` (the projection-before-FROM state) **and** the
full input does have a FROM after the cursor, a second walk
on the full input populates the binding set, and column
candidates narrow to that scope. The fallback to global
`SchemaCache.columns` remains the path when the full input
doesn't parse cleanly (e.g., the user deleted `*` and is
mid-edit). This is a strict improvement: the realistic
"edit an existing query" workflow now narrows correctly.
### What §10.6's prescription becomes
The "rewrite the highlight class" wording is superseded by:
**the post-walk diagnostic pass re-resolves projection
idents against the complete scope and emits / withholds the
unknown-column diagnostic accordingly; the renderer's
diagnostic-overlay path achieves the visual change**. No
new `HighlightClass` variant is required.
§10.6's other prescriptions stand verbatim — the integration
point (final walk stage, in-place mutation of walker
accumulators), the per-keystroke re-walk (ADR-0027's
debounced cadence), and the ORDER BY no-fixup-needed
clarification.
## See also ## See also
- ADR-0005 — the ten-type vocabulary §10 resolves back to. - ADR-0005 — the ten-type vocabulary §10 resolves back to.
+1 -1
View File
@@ -37,4 +37,4 @@ This directory contains the project's ADRs, recorded per
- [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`) - [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`)
- [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`) - [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`)
- [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2 - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2
- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless - [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless; Amendment 2 records that §10.6's "rewrite the highlight class" prescription is realised via the two-pass schema-existence diagnostic + the renderer's diagnostic-overlay path (no separate per-byte rewrite step needed; no new HighlightClass variant), and that the projection-before-FROM completion narrowing has been improved by an `src/completion.rs` look-ahead probe when the leading walk's `from_scope` is empty but the full input parses
+125
View File
@@ -4178,3 +4178,128 @@ mod tests {
); );
} }
} }
#[cfg(test)]
mod projection_before_from_tests {
//! ADR-0032 §10.6 — projection-before-FROM correctness
//! after the full walk. The 2d schema-existence pass's
//! two-pass binding collection (gather all FROM bindings
//! first, then resolve column refs) means the diagnostic
//! verdict is already correct at end-of-walk:
//!
//! - A projection ident that resolves under the eventual
//! FROM scope produces no diagnostic.
//! - A projection ident that does NOT resolve produces an
//! `unknown_column` diagnostic on its span — the renderer
//! then overlays this as an Error visual via the
//! `input_render.rs` diagnostic-overlay path, achieving
//! the user-visible effect §10.6 prescribes ("the
//! highlight snaps to the column class … or to the
//! unknown-identifier diagnostic").
//!
//! These tests pin the behavior so a future refactor can't
//! silently regress it.
use super::*;
use crate::completion::{SchemaCache, TableColumn};
use crate::dsl::types::Type;
fn schema_with_table_and_columns() -> SchemaCache {
let mut s = SchemaCache::default();
s.tables.push("mytable".to_string());
s.columns.push("real_col".to_string());
s.columns.push("another_col".to_string());
s.table_columns.insert(
"mytable".to_string(),
vec![
TableColumn {
name: "real_col".to_string(),
user_type: Type::Text,
},
TableColumn {
name: "another_col".to_string(),
user_type: Type::Int,
},
],
);
s
}
fn diagnostics_advanced(
source: &str,
schema: &SchemaCache,
) -> Vec<outcome::Diagnostic> {
let mut ctx = context::WalkContext::with_schema(schema);
ctx.mode = crate::mode::Mode::Advanced;
let (result, _) =
walk(source, outcome::WalkBound::EndOfInput, &mut ctx);
result.map_or_else(Vec::new, |r| r.diagnostics)
}
#[test]
fn projection_before_from_resolves_via_eventual_from() {
// `select real_col from mytable` — the projection
// ident appears in the path BEFORE the FROM binding,
// but the two-pass diagnostic resolves correctly
// against the eventual scope. No diagnostic.
let schema = schema_with_table_and_columns();
let diags =
diagnostics_advanced("select real_col from mytable", &schema);
assert!(
diags.is_empty(),
"projection-before-FROM legit column must not be flagged; got {diags:?}",
);
}
#[test]
fn projection_before_from_flags_unknown_column() {
// `select bogus_col from mytable` — bogus_col doesn't
// belong to mytable. The diagnostic fires on the
// projection ident's span; the renderer overlays this
// as Error in `input_render.rs`.
let schema = schema_with_table_and_columns();
let diags =
diagnostics_advanced("select bogus_col from mytable", &schema);
assert_eq!(diags.len(), 1, "{diags:?}");
assert_eq!(diags[0].severity, outcome::Severity::Error);
// Span should cover `bogus_col` (offset 7..16).
assert_eq!(diags[0].span, (7, 16));
assert!(
diags[0].message.contains("no such column"),
"expected unknown_column wording; got {:?}",
diags[0].message,
);
}
#[test]
fn multi_projection_before_from_flags_only_unknowns() {
// `select real_col, bogus_col, another_col from mytable`
// — only bogus_col flags; the two real ones resolve.
let schema = schema_with_table_and_columns();
let diags = diagnostics_advanced(
"select real_col, bogus_col, another_col from mytable",
&schema,
);
assert_eq!(
diags.len(),
1,
"expected exactly one diagnostic; got {diags:?}",
);
assert!(diags[0].message.contains("bogus_col"));
}
#[test]
fn projection_without_from_is_silent() {
// `select c1, c2` — no FROM in scope at all. The
// current behavior is to skip the bare-column check
// entirely (avoid noise on `SELECT 1` style
// expressions). This is documented in the
// schema_existence pass.
let schema = schema_with_table_and_columns();
let diags = diagnostics_advanced("select c1, c2", &schema);
assert!(
diags.is_empty(),
"no FROM → silent; got {diags:?}",
);
}
}