docs: ADR-0032 Amendment 2 + §10.6 regression tests

Amendment 2 records the §10.6 fixup-pass mechanism choice. §10.6 prescribes "rewriting the highlight class" on projection-list idents at end-of-walk; the actual implementation uses a different mechanism that achieves the identical user-visible behavior: 1. 2d's two-pass schema-existence diagnostic collects every FROM binding from the matched path first, then resolves projection idents against the complete scope. The post-walk re-resolve §10.6 calls for, just embedded in the diagnostic emitter. 2. input_render.rs's diagnostic-overlay path colors each diagnostic span Error/Warning, achieving the visual change §10.6 describes without needing a new HighlightClass variant. The completion-mid-typing piece is improved by the §10.5 look-ahead probe (sub-phase 2e earlier). Four new regression tests in `projection_before_from_tests` pin the behavior so a future refactor can't silently regress it: correct ident resolves silently, unknown ident flags via diagnostic on its span, multi-projection only flags unknowns, projection-without-FROM is silent. ADR index entry updated to reference Amendment 2. Test totals: 1424 → 1428 passing (+4). Clippy clean.
2026-05-20 21:19:57 +00:00
parent 0fc7b082b2
commit ee0dafd86b
3 changed files with 216 additions and 1 deletions
@@ -1390,6 +1390,96 @@ This amendment narrows the honest limitation in §12 from
 and recursive CTE result columns" — a tighter, factually
 verified carve-out.
 ## Amendment 2 — §10.6 fixup-pass mechanism (2026-05-20)
 §10.6's prescription for the post-walk fixup is written in
 terms of "rewriting the highlight class" on projection-list
 `Ident` terminals — downgrading "column" → "unknown identifier"
 when an ident doesn't belong to the eventual `from_scope`, or
 upgrading the reverse direction once a `FROM` is typed. The
 implementation chose a different mechanism that achieves the
 identical user-visible effect; this Amendment records the
 choice so a reader of §10.6 doesn't go looking for a literal
 `per_byte_class` rewrite step that does not exist.
 ### Mechanism actually used
 Two pieces, both already in the codebase by the end of
 sub-phase 2d:
 1. **Two-pass schema-existence diagnostic.** The 2d rewrite of
   `schema_existence_diagnostics` (`src/dsl/walker/mod.rs`)
   runs a pre-pass over the matched path that collects every
   `IdentSource::Tables` / `cte_name` / `table_alias` ident
   into a single binding vec, regardless of where in the path
   it sits. The main pass then resolves each `sql_expr_ident`
   against the **complete** binding set. A projection ident
   that resolves under the eventual FROM scope produces no
   diagnostic; one that doesn't produces an
   `unknown_column` diagnostic on its own span.
 2. **Diagnostic-overlay renderer.** `src/input_render.rs`
   reads the walker's diagnostic list at every keystroke and
   overlays each diagnostic's span with the appropriate
   colour (Error red for unknown-column, Warning for
   type-mismatch / `LIKE`-on-numeric / etc.). The overlay
   sits on top of the walker's `per_byte_class` (which keeps
   all idents at `HighlightClass::Identifier`).
 Combined, the two yield the §10.6 user-visible behaviour:
 typing `select bogus_col`, the diagnostic emits and the
 overlay paints the ident red as soon as a FROM appears that
 shows the column doesn't exist; typing `select real_col`, no
 diagnostic emits and the ident stays Identifier-coloured.
 Within one debounce cycle.
 ### Why this is equivalent
 §10.6's stated goal is correctness of the end-of-walk
 classification — "rewriting the highlight class" is one
 implementation strategy for that goal. The HighlightClass
 enum in the codebase has only one identifier slot
 (`Identifier`); the Error tint comes from diagnostic overlay,
 not from a separate `Column` vs `UnknownIdentifier` class.
 The two-pass diagnostic pass is the "post-walk fixup" that
 §10.6 calls for — it just runs inside the diagnostic emitter
 rather than as a separate rewrite step. The integration
 point (§10.6's "final stage of the walk itself") still
 holds: `schema_existence_diagnostics` runs after the walk's
 main work, mutating the walker's accumulated diagnostic
 vector in place. Consumers see a single coherent snapshot.
 ### Completion mid-typing
 §10.6's second user-visible promise — "during-typing
 completion of projection-list column names uses the global
 fallback" — is preserved as a posture, but improved at the
 edges in sub-phase 2e by a look-ahead probe in
 `src/completion.rs`. When the leading walk produces no
 `from_scope` (the projection-before-FROM state) **and** the
 full input does have a FROM after the cursor, a second walk
 on the full input populates the binding set, and column
 candidates narrow to that scope. The fallback to global
 `SchemaCache.columns` remains the path when the full input
 doesn't parse cleanly (e.g., the user deleted `*` and is
 mid-edit). This is a strict improvement: the realistic
 "edit an existing query" workflow now narrows correctly.
 ### What §10.6's prescription becomes
 The "rewrite the highlight class" wording is superseded by:
 **the post-walk diagnostic pass re-resolves projection
 idents against the complete scope and emits / withholds the
 unknown-column diagnostic accordingly; the renderer's
 diagnostic-overlay path achieves the visual change**. No
 new `HighlightClass` variant is required.
 §10.6's other prescriptions stand verbatim — the integration
 point (final walk stage, in-place mutation of walker
 accumulators), the per-keystroke re-walk (ADR-0027's
 debounced cadence), and the ORDER BY no-fixup-needed
 clarification.
 ## See also
 - ADR-0005 — the ten-type vocabulary §10 resolves back to.
@@ -37,4 +37,4 @@ This directory contains the project's ADRs, recorded per
 - [ADR-0029 — Column constraints (NOT NULL / UNIQUE / CHECK / DEFAULT)](0029-column-constraints.md) — **Accepted**, the four column-level constraints declared in the column-spec suffix (`create table` / `add column`) and modified on existing columns via `add constraint …` / `drop constraint …`; a pre-flight dry-run guards populated columns; `CHECK` reuses the ADR-0026 expression grammar via `Subgrammar` (`C3`)
 - [ADR-0030 — Advanced mode: the standard-SQL surface](0030-advanced-mode-sql-surface.md) — **Accepted**, SQL added as grammar *within the unified grammar tree* (ADR-0024), not a separate batch parser — so SQL gets the same completion / highlighting / hints / parse-errors as the DSL; mode gates the SQL forms; DDL routes through the typed `Command` executor (metadata + type vocabulary preserved), DML and `SELECT` execute as validated SQL; engine-neutral posture, the DSL→SQL teaching echo; supersedes ADR-0001's `sqlparser-rs` reservation; phased plan (`Q1` / `Q2` / `Q4`)
 - [ADR-0031 — The SQL expression grammar](0031-sql-expression-grammar.md) — **Accepted**, the stratified SQL expression grammar fragment commissioned by ADR-0030 §3: a single precedence ladder (`OR`/`AND`/`NOT`, the comparison/`LIKE`/`IN`/`BETWEEN`/`IS NULL` predicate set, arithmetic incl. `||`, function calls, `CASE`) — the superset of ADR-0026's DSL `WHERE` grammar, authored as a parallel fragment so simple mode is untouched; pure validation, builds **no** AST (consumers run/store SQL as text per ADR-0030 §4/§6); reuses ADR-0026's `Subgrammar` recursion + depth cap unchanged; subquery expressions and qualified column refs deferred to ADR-0030 Phase 2
- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless
+- [ADR-0032 — The full SQL `SELECT` grammar](0032-sql-select-grammar.md) — **Accepted**, the Phase-2 grammar commissioned by ADR-0030 §3: full `SELECT` with `INNER`/`LEFT`/`RIGHT`/`FULL OUTER`/`CROSS` joins, `GROUP BY`/`HAVING`, all four set ops (`UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT`), `WITH` and `WITH RECURSIVE` CTEs, `LIMIT … OFFSET`, `DISTINCT`, `t.*`, and bare-alias projection (lifting Phase-1 §4.2); additive extensions to ADR-0031's `sql_expr` for scalar subqueries, `IN (SELECT …)`, `[NOT] EXISTS`, and qualified column refs (redeeming ADR-0031 §7 OOS-1/OOS-2); grammar-recursion via `Subgrammar(&SQL_SELECT_COMPOUND)` reuses ADR-0026's `MAX_SUBGRAMMAR_DEPTH = 64` cap unchanged; **softens ADR-0030 §8's "ambient assistance comes for free" claim**: completion scope needs new `WalkContext` accumulators (a `from_scope_stack` of `ScopeFrame`s holding `from_scope` / `cte_bindings` / `projection_aliases`), a **new walker node variant `Node::ScopedSubgrammar(&Node)`** as the push/pop trigger (existing `Node::Subgrammar` unchanged so DSL `Expr` and `sql_expr` recursion are unaffected), qualified-prefix completion narrowing, body-projection-derived CTE column resolution (so `SELECT *` and explicit-projection CTE bodies both yield real column completion past `cte_alias.|`), and a **post-walk fixup pass** that re-resolves projection-list identifier highlighting/validity once `FROM` is parsed (the projection-before-FROM problem); classifies every Phase-2 validation case against ADR-0027's ERROR/WARNING guideline (§11): five new `diagnostic.*` keys for parse-time-detectable cases (unknown qualifier, ambiguous column, projection-alias misplaced, CTE/compound arity mismatch) plus eight `engine.*` translation keys; a MatchedPath-walking predicate-warnings variant that closes the Phase-1 gap where SQL `WHERE` expressions emitted no `LIKE`-on-numeric / `= NULL` / type-mismatch warnings (ADR-0027 Amendment 1 finally extends to the SQL surface); adds a worker-side post-prepare type-resolution pass via engine column-origin metadata so bare column refs recover their playground type (partially lifting Phase-1 §4.5, the bool→0/1 case) — `Cargo.toml` gains `column_metadata` to rusqlite features (verified against pinned 0.39.0); `__rdbms_*` rejection extended to every new table-source slot; Amendment 1 narrows §12's resolution rule from a grammar-side structural classification to "trust the engine's column-origin metadata verbatim" after an empirical probe showed origin metadata follows through non-recursive CTEs, scalar subqueries, derived tables, set ops, and joins — the one structural exception is recursive CTE result columns, which return None and stay typeless; Amendment 2 records that §10.6's "rewrite the highlight class" prescription is realised via the two-pass schema-existence diagnostic + the renderer's diagnostic-overlay path (no separate per-byte rewrite step needed; no new HighlightClass variant), and that the projection-before-FROM completion narrowing has been improved by an `src/completion.rs` look-ahead probe when the leading walk's `from_scope` is empty but the full input parses
@@ -4178,3 +4178,128 @@ mod tests {
        );
    }
 }
 #[cfg(test)]
 mod projection_before_from_tests {
    //! ADR-0032 §10.6 — projection-before-FROM correctness
    //! after the full walk. The 2d schema-existence pass's
    //! two-pass binding collection (gather all FROM bindings
    //! first, then resolve column refs) means the diagnostic
    //! verdict is already correct at end-of-walk:
    //!
    //! - A projection ident that resolves under the eventual
    //!   FROM scope produces no diagnostic.
    //! - A projection ident that does NOT resolve produces an
    //!   `unknown_column` diagnostic on its span — the renderer
    //!   then overlays this as an Error visual via the
    //!   `input_render.rs` diagnostic-overlay path, achieving
    //!   the user-visible effect §10.6 prescribes ("the
    //!   highlight snaps to the column class … or to the
    //!   unknown-identifier diagnostic").
    //!
    //! These tests pin the behavior so a future refactor can't
    //! silently regress it.
    use super::*;
    use crate::completion::{SchemaCache, TableColumn};
    use crate::dsl::types::Type;
    fn schema_with_table_and_columns() -> SchemaCache {
        let mut s = SchemaCache::default();
        s.tables.push("mytable".to_string());
        s.columns.push("real_col".to_string());
        s.columns.push("another_col".to_string());
        s.table_columns.insert(
            "mytable".to_string(),
            vec![
                TableColumn {
                    name: "real_col".to_string(),
                    user_type: Type::Text,
                },
                TableColumn {
                    name: "another_col".to_string(),
                    user_type: Type::Int,
                },
            ],
        );
        s
    }
    fn diagnostics_advanced(
        source: &str,
        schema: &SchemaCache,
    ) -> Vec<outcome::Diagnostic> {
        let mut ctx = context::WalkContext::with_schema(schema);
        ctx.mode = crate::mode::Mode::Advanced;
        let (result, _) =
            walk(source, outcome::WalkBound::EndOfInput, &mut ctx);
        result.map_or_else(Vec::new, |r| r.diagnostics)
    }
    #[test]
    fn projection_before_from_resolves_via_eventual_from() {
        // `select real_col from mytable` — the projection
        // ident appears in the path BEFORE the FROM binding,
        // but the two-pass diagnostic resolves correctly
        // against the eventual scope. No diagnostic.
        let schema = schema_with_table_and_columns();
        let diags =
            diagnostics_advanced("select real_col from mytable", &schema);
        assert!(
            diags.is_empty(),
            "projection-before-FROM legit column must not be flagged; got {diags:?}",
        );
    }
    #[test]
    fn projection_before_from_flags_unknown_column() {
        // `select bogus_col from mytable` — bogus_col doesn't
        // belong to mytable. The diagnostic fires on the
        // projection ident's span; the renderer overlays this
        // as Error in `input_render.rs`.
        let schema = schema_with_table_and_columns();
        let diags =
            diagnostics_advanced("select bogus_col from mytable", &schema);
        assert_eq!(diags.len(), 1, "{diags:?}");
        assert_eq!(diags[0].severity, outcome::Severity::Error);
        // Span should cover `bogus_col` (offset 7..16).
        assert_eq!(diags[0].span, (7, 16));
        assert!(
            diags[0].message.contains("no such column"),
            "expected unknown_column wording; got {:?}",
            diags[0].message,
        );
    }
    #[test]
    fn multi_projection_before_from_flags_only_unknowns() {
        // `select real_col, bogus_col, another_col from mytable`
        // — only bogus_col flags; the two real ones resolve.
        let schema = schema_with_table_and_columns();
        let diags = diagnostics_advanced(
            "select real_col, bogus_col, another_col from mytable",
            &schema,
        );
        assert_eq!(
            diags.len(),
            1,
            "expected exactly one diagnostic; got {diags:?}",
        );
        assert!(diags[0].message.contains("bogus_col"));
    }
    #[test]
    fn projection_without_from_is_silent() {
        // `select c1, c2` — no FROM in scope at all. The
        // current behavior is to skip the bare-column check
        // entirely (avoid noise on `SELECT 1` style
        // expressions). This is documented in the
        // schema_existence pass.
        let schema = schema_with_table_and_columns();
        let diags = diagnostics_advanced("select c1, c2", &schema);
        assert!(
            diags.is_empty(),
            "no FROM → silent; got {diags:?}",
        );
    }
 }