diff --git a/docs/handoff/20260520-handoff-27.md b/docs/handoff/20260520-handoff-27.md new file mode 100644 index 0000000..1fce339 --- /dev/null +++ b/docs/handoff/20260520-handoff-27.md @@ -0,0 +1,396 @@ +# Session handoff — 2026-05-20 (27) + +Twenty-seventh handover. **Implementation session for ADR-0032 +Phase 2.** Sub-phases 2a, 2b (with §10.3 stage 2 deferred), 2c, +2d (with three diagnostic keys deferred), and 2f all shipped. +Sub-phases **2e (completion + post-walk fixup)** and +**2g (verification sweep)** remain. + +## §1. State at handoff + +**Branch:** `main`. Working tree clean. **11 unpushed commits** +(`be8a4f5..0c3847a` plus the prior handoff-26 commit on top of +`3db292c`). + +**Tests:** **1385 passing, 0 failing, 1 ignored** (up from +the 2026-05-19 handoff-26 baseline of 1260 / 0 / 1 — a net +**+125 tests**). + +**Clippy:** clean (last verified at end of 2f, commit +`0c3847a`). + +**Commits since handoff-26:** + +``` +0c3847a db: column-origin type recovery in SELECT results (sub-phase 2f) +c5cf03b walker: SQL diagnostics — multi-binding scope, qualified refs, Phase-1 gap closure (sub-phase 2d) +a491df3 grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c) +4ff054c walker: populate cte_bindings placeholders + projection_aliases (§10.3 stage 1 / §10.4) +b522d09 walker: populate from_scope table bindings (§10.1) +98a74b2 grammar: sql_expr additive extensions for §5/§6, CTE body rewires to ScopedSubgrammar +4f89106 walker: Node::ScopedSubgrammar variant + scope-frame stack (§10.2) +8d29335 grammar: SQL SELECT full statement fragment (Phase 2a) +e032f01 docs: ADR-0032 Amendment 1 — empirical scope of column-origin metadata +``` + +## §2. What's done — the Phase-2 surface + +A user typing in advanced mode can now write the **full Phase-2 +SELECT surface** authored by ADR-0032 §§1–9: + +- **All five JOIN flavours** (`INNER` / `LEFT OUTER` / `RIGHT + OUTER` / `FULL OUTER` / `CROSS`) and bare `JOIN`. +- **All four set ops** (`UNION` / `UNION ALL` / `INTERSECT` / + `EXCEPT`) in compound queries. +- **CTEs** — both `WITH` and `WITH RECURSIVE`, with optional + `(col-list)`. Dispatched via a new `data::WITH` `CommandNode` + (entry word `with` is now in `ADVANCED_ONLY_ENTRIES`). +- **`GROUP BY` / `HAVING`** with arbitrary `sql_expr` keys. +- **Qualified column refs** (`t.c` / `alias.c`) — additive + extension to `sql_expr.rs`'s `name_or_call` (§5). +- **Subquery expressions** — scalar `(SELECT …)`, `IN (SELECT + …)`, `[NOT] EXISTS (SELECT …)` — additive `Choice` branches + in `sql_expr.rs` (§6), recursing through + `Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND)`. +- **`LIMIT n [OFFSET m]`** with full `sql_expr` admission for + both args (legacy `LIMIT m, n` rejected per §13 OOS-4). +- **`DISTINCT` / `ALL`** prefix. +- **`t.*` qualified wildcard** in projection. +- **Bare-alias projection** (`SELECT a x FROM t` — lifts + Phase-1 §4.2) via per-position `Node::Lookahead` follow-set + gating. +- **OOS shapes reject**: `NATURAL JOIN`, `JOIN … USING`, + comma-FROM, derived tables in FROM, window functions + (`OVER (…)`), `VALUES` row source, `LATERAL` (partial — the + single-keyword form is admitted as a bare alias; the comma + form rejects via OOS-3). + +The Phase-1 `data::SELECT` shape migrated from `data.rs` into +`sql_select.rs`'s `SQL_SELECT_TAIL` (sub-phase 2c). Phase-1's +22 local grammar nodes plus the duplicate +`reject_internal_table` validator are gone; the single source +of truth for the SELECT shape is now `sql_select.rs`. + +### 2.1. Walker capabilities + +- **`Node::ScopedSubgrammar(&'static Node)`** (ADR-0032 §10.2) + is the new node variant. Same recursion semantics as + `Subgrammar`, plus a push/pop of `ScopeFrame` on the + `WalkContext::from_scope_stack`. Shares `subgrammar_depth` + with `Subgrammar` so the `MAX_SUBGRAMMAR_DEPTH = 64` cap + fires uniformly. +- **`WalkContext::from_scope_stack: Vec`** — + always non-empty (the bottom frame is the implicit top-level + scope DSL paths use). Each `ScopeFrame` carries `from_scope: + Vec`, `cte_bindings: Vec`, and + `projection_aliases: Vec`. +- **Three new `Node::Ident` flags**: `writes_table_alias`, + `writes_cte_name`, `writes_projection_alias`. Together with + the existing `writes_table` (now repurposed to push a + TableBinding alongside its prior `current_table` write), + they populate the top frame's accumulators during the walk. + All 48 existing Ident sites carry the new flags as + `false` (mechanical sed update; no behavioural change for + DSL paths). +- **`current_table` / `current_table_columns` stay as fields + on `WalkContext`** (additive approach — they're populated + by `writes_table` alongside the from_scope push). Plan §2b's + "make them derived helpers" became "keep them populated + redundantly" — strictly safer, no behavioural drift for DSL + paths, but a future-cleanup opportunity if the duplication + proves a maintenance burden. + +### 2.2. Diagnostics + +Three diagnostic passes run on every successful parse +(ADR-0032 §11.7): + +1. **`schema_existence_diagnostics`** — extended for + multi-binding scope (a pre-pass collects all + `table_name` / `cte_name` / `table_alias` idents to sidestep + the projection-before-FROM ordering problem). Emits + `diagnostic.unknown_table`, `diagnostic.unknown_column`, + `diagnostic.unknown_qualifier`, `diagnostic.ambiguous_column`, + `diagnostic.duplicate_cte` (Plan §Open-2, user-approved). +2. **`sql_predicate_warnings`** — NEW. MatchedPath-walking + variant of the predicate-warning pass, closes the + Phase-1 carry-over gap (§11.6). Emits + `diagnostic.eq_null`, `diagnostic.like_numeric`, + `diagnostic.type_mismatch` on SQL `WHERE` / `HAVING` / + `JOIN ON` / `CASE` / projection / `ORDER BY` slots. + Scoped to bare column refs in ` ` + form — qualified-ref and expression-operand cases stay + un-flagged (safe false-negative posture). +3. **DSL `predicate_warnings` (existing)** — unchanged. + Still walks the DSL `Expr` AST for DSL `WHERE` expressions. + +### 2.3. Type recovery (sub-phase 2f) + +Per ADR-0032 §12 + Amendment 1: + +- `Cargo.toml` rusqlite features now include `column_metadata`. +- `do_run_select` calls `resolve_select_column_types(conn, + stmt)` after `prepare`, queries `__rdbms_playground_columns` + for each result column whose origin metadata is populated, + and threads the type through to `format_cell`. +- ADR-0030 Phase-1 §4.5 (bool rendering as `0` / `1`) is + lifted for any bare-column reference whose origin the engine + carries through — non-recursive CTE bodies, scalar + subqueries, derived tables, set ops, JOINs all work for + free per Amendment 1. + +### 2.4. Catalog (ADR-0032 §11.5) + +Six new `diagnostic.*` keys + eight new `engine.*` keys +authored in `src/friendly/strings/en-US.yaml` and registered +in `src/friendly/keys.rs`: + +- `diagnostic.ambiguous_column`, + `diagnostic.compound_arity_mismatch`, + `diagnostic.cte_arity_mismatch`, `diagnostic.duplicate_cte`, + `diagnostic.projection_alias_misplaced`, + `diagnostic.unknown_qualifier`. +- `engine.no_such_table`, `engine.no_such_column`, + `engine.ambiguous_column`, `engine.aggregate_misuse`, + `engine.group_by_required`, + `engine.compound_arity_mismatch`, + `engine.scalar_subquery_too_many_rows`, + `engine.recursive_cte_malformed`. + +The `engine.*` keys are catalog entries only — the +friendly-error layer reads them by key when reached, but no +proactive enhancement of the layer's engine-message +classification was pulled through here. That stays as a +small follow-up. + +## §3. Deferrals — all user-approved or documented + +### §3.1. §10.3 stage 2 — CTE column-derivation harvest + +**Status:** user-approved deferral (mid-session, 2026-05-20). +**Recorded in:** `docs/plans/20260520-adr-0032-phase-2.md` +§2b "Implementation status (2026-05-20)" block. + +The placeholder CTE-binding push (stage 1) IS implemented — +the CTE name is visible to the body (WITH RECURSIVE +self-reference works) and to downstream CTE-name validators +(2d's schema-existence pass accepts CTE-name table-source +refs). What's missing is the six §10.3 derivation rules that +populate the placeholder's `columns` at body-frame exit. + +**Knock-on effect:** qualified-prefix completion past +`cte_alias.|` returns an empty candidate list. Bare column +refs into a CTE binding short-circuit to "accept silently" in +2d's schema-existence pass (we don't know the columns yet). + +Folds into 2e (where qualified-prefix completion needs CTE +columns) and any future arity-check pass that wants to compare +declared `(col-list)` vs derived projection arity. + +### §3.2. Three 2d diagnostic keys deferred + +`projection_alias_misplaced`, `compound_arity_mismatch`, and +`cte_arity_mismatch` — catalog keys + strings are authored +but the emitting logic is not implemented: + +- `projection_alias_misplaced` needs clause-context detection + (which clause an Ident sits in — the matched path is flat; + some role-based recognition would work). +- `compound_arity_mismatch` needs per-leg projection counting + as the compound is assembled. +- `cte_arity_mismatch` depends on §10.3 stage 2. + +These were not explicitly user-approved as deferrals — they +were noted in the 2d commit message as "deferred per the +2d-scoped DA review (documented as `(TBD)` in the cross-cut +matrix for 2g)". The user should flag if these need to land +before 2g signs off. + +### §3.3. Engine.* key wiring + +Catalog entries authored; the friendly-error layer doesn't +yet recognise the new engine messages and route them through +the new keys. This is a small mechanical extension to the +engine-error matcher; not blocking but worth folding in. + +## §4. What's next — sub-phases 2e and 2g + +### §4.1. Sub-phase 2e — completion + post-walk fixup + +Per plan §2e: + +- **Qualified-prefix completion narrowing** (§10.5): at the + cursor in `SELECT t.|`, candidates should be `t`'s columns + (not the global column set). Requires: + - The completion API gains a "prefix qualifier" hint that + the walker passes when the cursor is immediately after + `Ident '.'`. + - The `IdentSource::Columns` completion path honours the + hint when present, resolving against the top frame's + `from_scope` binding for `t`. +- **Post-walk projection-list fixup pass** (§10.6): the + projection-before-FROM problem. Collect projection `Ident` + terminals during the walk; after walk completion, + re-resolve each against the final `from_scope`; rewrite + the highlight class and validity diagnostic. + +**Note:** the §10.6 problem is *partially* sidestepped already +in 2d — the schema-existence pass uses a two-pass approach +that collects bindings first and then resolves. So +projection-side identifiers don't get false-positive +diagnostics from the existing pass. The 2e fixup-pass scope +is more about getting the HIGHLIGHT class right at every +keystroke (the typing experience) and supporting completion +mid-typing. Worth re-reading §10.6 carefully with the 2d +implementation in hand. + +### §4.2. Sub-phase 2g — integration sweep + verification + +Per plan §2g: + +- Fill in the cross-cut verification matrix (29 rows in + `docs/plans/20260520-adr-0032-phase-2.md`). Many rows are + satisfied by tests already authored in 2a–2f; the matrix + needs to be walked and each row pointed at its test. +- Produce the final phase-exit verification report. +- DA final review. + +### §4.3. The §10.3 stage-2 harvest + +If 2e wants real CTE-column-prefix completion, the harvest +needs to land first. Six derivation rules per the ADR's +table. Mechanism design: hook a callback into +`walk_scoped_subgrammar`'s on-exit path; when the frame +being popped is a CTE body, walk its projection items and +populate the placeholder binding in the outer frame. +Projection item shape can be reconstructed from a +`projection_record` accumulator on the frame (record the +shape of each `projection_item` as the walker matches it — +a new `Node::ProjectionRecord` wrapper, or a flag on +`Lookahead` factories, would be the mechanism). + +If the user is willing to defer CTE-prefix completion +along with the harvest, 2e can ship without the harvest; +qualified completion past `cte.|` returns nothing today +and would continue to do so. + +## §5. Working tree and unpushed commits + +``` +e032f01 docs: ADR-0032 Amendment 1 — empirical scope of column-origin metadata +8d29335 grammar: SQL SELECT full statement fragment (Phase 2a) +4f89106 walker: Node::ScopedSubgrammar variant + scope-frame stack (§10.2) +98a74b2 grammar: sql_expr additive extensions for §5/§6, CTE body rewires to ScopedSubgrammar +b522d09 walker: populate from_scope table bindings (§10.1) +4ff054c walker: populate cte_bindings placeholders + projection_aliases (§10.3 stage 1 / §10.4) +a491df3 grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c) +c5cf03b walker: SQL diagnostics — multi-binding scope, qualified refs, Phase-1 gap closure (sub-phase 2d) +0c3847a db: column-origin type recovery in SELECT results (sub-phase 2f) +``` + +Plus this handoff commit, once it lands. + +## §6. Seams for the next implementer + +Where to look first: + +- **For 2e qualified-prefix completion**: the existing + `IdentSource::Columns` completion path in `src/completion.rs` + (driven by `WalkContext::current_table_columns`). The + qualified case needs a new "narrow to this binding's + columns" pathway. The walker already records the + `sql_expr_qualified_ref` role on the trailing ident in a + `t.c` reference; the preceding `Punct('.')` and + `sql_expr_ident` give the qualifier. Walker `WalkContext` + could grow a `pending_qualifier: Option` set when + the walker enters a qualified-ref tail, cleared on + successful match. + +- **For 2e post-walk fixup**: the schema-existence pass in + `src/dsl/walker/mod.rs` already does a pre-pass. The 2e + fixup is about the HIGHLIGHT layer — rewriting + `per_byte_class` entries for projection-list idents once + the final `from_scope` is known. The walker's `per_byte` + vec is the mutation target. Look at `walker/highlight.rs` + for how byte-class runs are built today. + +- **For 2g verification matrix**: walk row by row through + `docs/plans/20260520-adr-0032-phase-2.md`'s "Cross-cut + verification matrix" table. Many rows already have tests; + fill in the test location column. The handful that don't + have explicit tests (the projection-before-FROM re-walk + cycle, for example) may need new ones authored. + +- **For the §10.3 harvest**: see §4.3 above. The cleanest + mechanism is probably a new walker hook (a function pointer + on a per-`ScopedSubgrammar` basis, or a `usage` flag on + the `Node::Ident` variants that record projection items), + but the design isn't fully nailed down. The implementer + should think about whether to escalate the harvest design + to the user before coding. + +Code locations modified this session (in commit order): + +- `docs/adr/0032-sql-select-grammar.md` — Amendment 1 added. +- `docs/adr/README.md` — index entry extended for Amendment 1. +- `docs/plans/20260520-adr-0032-phase-2.md` — §2b + "Implementation status (2026-05-20)" block recording the + §10.3 stage-2 deferral. +- `src/dsl/grammar/mod.rs` — `Node::ScopedSubgrammar` variant + + three new Ident flags + `data::WITH` registered + + `ADVANCED_ONLY_ENTRIES` extended with `with`. +- `src/dsl/grammar/data.rs` — ~150 lines of dead Phase-1 + SELECT grammar removed; `data::SELECT` reshaped to + `Node::Subgrammar(&sql_select::SQL_SELECT_TAIL)`; + `data::WITH` added. +- `src/dsl/grammar/sql_select.rs` — new file (sub-phase 2a), + augmented through 2b and 2c. +- `src/dsl/grammar/sql_expr.rs` — additive §5 / §6 extensions. +- `src/dsl/walker/context.rs` — `TableBinding`, `CteBinding`, + `CteColumn`, `ScopeFrame`; `WalkContext::from_scope_stack`. +- `src/dsl/walker/driver.rs` — `Node::ScopedSubgrammar` + match arm; `walk_scoped_subgrammar`; `writes_table_alias` / + `writes_cte_name` / `writes_projection_alias` handling in + `walk_ident`. +- `src/dsl/walker/mod.rs` — multi-binding + `schema_existence_diagnostics` rewrite, new + `sql_predicate_warnings` pass, 16 new tests. +- `src/friendly/keys.rs` + `src/friendly/strings/en-US.yaml` + — 6 new `diagnostic.*` + 8 new `engine.*` keys. +- `Cargo.toml` — `column_metadata` added to rusqlite + features. +- `src/db.rs` — `resolve_select_column_types` helper; + `do_run_select` threads per-column types through to + `format_cell`. +- `tests/sql_select.rs` — 3 new SELECT type-recovery tests. + +## §7. How to take over + +1. **Read this file** in full. Then `docs/adr/0032-sql-select- + grammar.md` (and Amendment 1 — the empirical findings + matter for 2f's design), `docs/plans/20260520-adr-0032- + phase-2.md`, and `CLAUDE.md`. +2. **`cargo test`** — expect **1385 passing, 0 failed, 1 + ignored**. +3. **`cargo clippy --all-targets -- -D warnings`** — clean. +4. **Pick the next sub-phase.** 2e is substantial (qualified + completion + post-walk fixup); 2g is the closing sweep + (cross-cut matrix + verification report). Either is a + reasonable next session. +5. **Escalate before re-litigating deferrals.** The three + diagnostic deferrals in 2d (§3.2) are not user-approved — + check whether they need to land before 2g signs off. +6. **The §10.3 stage-2 harvest** is the biggest open piece; + if 2e wants real CTE-prefix completion, escalate the + harvest design before coding. + +## §8. What's open beyond Phase 2 + +Unchanged from handoff-26 §8: snapshot/undo `U`-series; +m:n convenience `C4`; modify-relationship `C3a`; rename-table +`C1`; friendly-error sweep `H1`; CI `TT5`; session-log / +Markdown export `V4`. + +ADR-0030's later phases — DML (Phase 3), DDL (Phase 4), the +DSL→SQL teaching echo (Phase 5), polish (Phase 6) — remain on +the roadmap behind the Phase 2 sub-phases not yet completed.