diff --git a/docs/adr/0032-sql-select-grammar.md b/docs/adr/0032-sql-select-grammar.md index fd511eb..35930a7 100644 --- a/docs/adr/0032-sql-select-grammar.md +++ b/docs/adr/0032-sql-select-grammar.md @@ -1390,6 +1390,96 @@ This amendment narrows the honest limitation in §12 from and recursive CTE result columns" — a tighter, factually verified carve-out. +## Amendment 2 — §10.6 fixup-pass mechanism (2026-05-20) + +§10.6's prescription for the post-walk fixup is written in +terms of "rewriting the highlight class" on projection-list +`Ident` terminals — downgrading "column" → "unknown identifier" +when an ident doesn't belong to the eventual `from_scope`, or +upgrading the reverse direction once a `FROM` is typed. The +implementation chose a different mechanism that achieves the +identical user-visible effect; this Amendment records the +choice so a reader of §10.6 doesn't go looking for a literal +`per_byte_class` rewrite step that does not exist. + +### Mechanism actually used + +Two pieces, both already in the codebase by the end of +sub-phase 2d: + +1. **Two-pass schema-existence diagnostic.** The 2d rewrite of + `schema_existence_diagnostics` (`src/dsl/walker/mod.rs`) + runs a pre-pass over the matched path that collects every + `IdentSource::Tables` / `cte_name` / `table_alias` ident + into a single binding vec, regardless of where in the path + it sits. The main pass then resolves each `sql_expr_ident` + against the **complete** binding set. A projection ident + that resolves under the eventual FROM scope produces no + diagnostic; one that doesn't produces an + `unknown_column` diagnostic on its own span. + +2. **Diagnostic-overlay renderer.** `src/input_render.rs` + reads the walker's diagnostic list at every keystroke and + overlays each diagnostic's span with the appropriate + colour (Error red for unknown-column, Warning for + type-mismatch / `LIKE`-on-numeric / etc.). The overlay + sits on top of the walker's `per_byte_class` (which keeps + all idents at `HighlightClass::Identifier`). + +Combined, the two yield the §10.6 user-visible behaviour: +typing `select bogus_col`, the diagnostic emits and the +overlay paints the ident red as soon as a FROM appears that +shows the column doesn't exist; typing `select real_col`, no +diagnostic emits and the ident stays Identifier-coloured. +Within one debounce cycle. + +### Why this is equivalent + +§10.6's stated goal is correctness of the end-of-walk +classification — "rewriting the highlight class" is one +implementation strategy for that goal. The HighlightClass +enum in the codebase has only one identifier slot +(`Identifier`); the Error tint comes from diagnostic overlay, +not from a separate `Column` vs `UnknownIdentifier` class. +The two-pass diagnostic pass is the "post-walk fixup" that +§10.6 calls for — it just runs inside the diagnostic emitter +rather than as a separate rewrite step. The integration +point (§10.6's "final stage of the walk itself") still +holds: `schema_existence_diagnostics` runs after the walk's +main work, mutating the walker's accumulated diagnostic +vector in place. Consumers see a single coherent snapshot. + +### Completion mid-typing + +§10.6's second user-visible promise — "during-typing +completion of projection-list column names uses the global +fallback" — is preserved as a posture, but improved at the +edges in sub-phase 2e by a look-ahead probe in +`src/completion.rs`. When the leading walk produces no +`from_scope` (the projection-before-FROM state) **and** the +full input does have a FROM after the cursor, a second walk +on the full input populates the binding set, and column +candidates narrow to that scope. The fallback to global +`SchemaCache.columns` remains the path when the full input +doesn't parse cleanly (e.g., the user deleted `*` and is +mid-edit). This is a strict improvement: the realistic +"edit an existing query" workflow now narrows correctly. + +### What §10.6's prescription becomes + +The "rewrite the highlight class" wording is superseded by: +**the post-walk diagnostic pass re-resolves projection +idents against the complete scope and emits / withholds the +unknown-column diagnostic accordingly; the renderer's +diagnostic-overlay path achieves the visual change**. No +new `HighlightClass` variant is required. + +§10.6's other prescriptions stand verbatim — the integration +point (final walk stage, in-place mutation of walker +accumulators), the per-keystroke re-walk (ADR-0027's +debounced cadence), and the ORDER BY no-fixup-needed +clarification. + ## See also - ADR-0005 — the ten-type vocabulary §10 resolves back to. diff --git a/docs/adr/README.md b/docs/adr/README.md index d4fb2d2..049da96 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -37,4 +37,4 @@ This directory contains the project's ADRs, recorded per - [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`) - [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`) - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2 -- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless +- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless; Amendment 2 records that §10.6's "rewrite the highlight class" prescription is realised via the two-pass schema-existence diagnostic + the renderer's diagnostic-overlay path (no separate per-byte rewrite step needed; no new HighlightClass variant), and that the projection-before-FROM completion narrowing has been improved by an `src/completion.rs` look-ahead probe when the leading walk's `from_scope` is empty but the full input parses diff --git a/src/dsl/walker/mod.rs b/src/dsl/walker/mod.rs index 3b87d83..44222b3 100644 --- a/src/dsl/walker/mod.rs +++ b/src/dsl/walker/mod.rs @@ -4178,3 +4178,128 @@ mod tests { ); } } + +#[cfg(test)] +mod projection_before_from_tests { + //! ADR-0032 §10.6 — projection-before-FROM correctness + //! after the full walk. The 2d schema-existence pass's + //! two-pass binding collection (gather all FROM bindings + //! first, then resolve column refs) means the diagnostic + //! verdict is already correct at end-of-walk: + //! + //! - A projection ident that resolves under the eventual + //! FROM scope produces no diagnostic. + //! - A projection ident that does NOT resolve produces an + //! `unknown_column` diagnostic on its span — the renderer + //! then overlays this as an Error visual via the + //! `input_render.rs` diagnostic-overlay path, achieving + //! the user-visible effect §10.6 prescribes ("the + //! highlight snaps to the column class … or to the + //! unknown-identifier diagnostic"). + //! + //! These tests pin the behavior so a future refactor can't + //! silently regress it. + + use super::*; + use crate::completion::{SchemaCache, TableColumn}; + use crate::dsl::types::Type; + + fn schema_with_table_and_columns() -> SchemaCache { + let mut s = SchemaCache::default(); + s.tables.push("mytable".to_string()); + s.columns.push("real_col".to_string()); + s.columns.push("another_col".to_string()); + s.table_columns.insert( + "mytable".to_string(), + vec![ + TableColumn { + name: "real_col".to_string(), + user_type: Type::Text, + }, + TableColumn { + name: "another_col".to_string(), + user_type: Type::Int, + }, + ], + ); + s + } + + fn diagnostics_advanced( + source: &str, + schema: &SchemaCache, + ) -> Vec { + let mut ctx = context::WalkContext::with_schema(schema); + ctx.mode = crate::mode::Mode::Advanced; + let (result, _) = + walk(source, outcome::WalkBound::EndOfInput, &mut ctx); + result.map_or_else(Vec::new, |r| r.diagnostics) + } + + #[test] + fn projection_before_from_resolves_via_eventual_from() { + // `select real_col from mytable` — the projection + // ident appears in the path BEFORE the FROM binding, + // but the two-pass diagnostic resolves correctly + // against the eventual scope. No diagnostic. + let schema = schema_with_table_and_columns(); + let diags = + diagnostics_advanced("select real_col from mytable", &schema); + assert!( + diags.is_empty(), + "projection-before-FROM legit column must not be flagged; got {diags:?}", + ); + } + + #[test] + fn projection_before_from_flags_unknown_column() { + // `select bogus_col from mytable` — bogus_col doesn't + // belong to mytable. The diagnostic fires on the + // projection ident's span; the renderer overlays this + // as Error in `input_render.rs`. + let schema = schema_with_table_and_columns(); + let diags = + diagnostics_advanced("select bogus_col from mytable", &schema); + assert_eq!(diags.len(), 1, "{diags:?}"); + assert_eq!(diags[0].severity, outcome::Severity::Error); + // Span should cover `bogus_col` (offset 7..16). + assert_eq!(diags[0].span, (7, 16)); + assert!( + diags[0].message.contains("no such column"), + "expected unknown_column wording; got {:?}", + diags[0].message, + ); + } + + #[test] + fn multi_projection_before_from_flags_only_unknowns() { + // `select real_col, bogus_col, another_col from mytable` + // — only bogus_col flags; the two real ones resolve. + let schema = schema_with_table_and_columns(); + let diags = diagnostics_advanced( + "select real_col, bogus_col, another_col from mytable", + &schema, + ); + assert_eq!( + diags.len(), + 1, + "expected exactly one diagnostic; got {diags:?}", + ); + assert!(diags[0].message.contains("bogus_col")); + } + + #[test] + fn projection_without_from_is_silent() { + // `select c1, c2` — no FROM in scope at all. The + // current behavior is to skip the bare-column check + // entirely (avoid noise on `SELECT 1` style + // expressions). This is documented in the + // schema_existence pass. + let schema = schema_with_table_and_columns(); + let diags = diagnostics_advanced("select c1, c2", &schema); + assert!( + diags.is_empty(), + "no FROM → silent; got {diags:?}", + ); + } +}