Files
rdbms-playground/docs/handoff/20260520-handoff-27.md
claude@clouddev1 5d716e64ab docs: handoff 27 — ADR-0032 Phase 2 sub-phases 2a / 2b / 2c / 2d / 2f shipped
Session summary covering nine implementation commits since
handoff-26 (e032f01..0c3847a). Records the user-approved
deferral of §10.3 stage 2 (CTE column-derivation harvest),
flags three further 2d diagnostic-key deferrals that need
user confirmation, and points the next implementer at the
seams for sub-phases 2e (qualified-prefix completion +
post-walk fixup) and 2g (verification sweep + final
report).

Test totals at handoff: 1385 passing, 0 failed, 1 ignored
(+125 from the 1260 baseline at handoff-26).
2026-05-20 16:29:18 +00:00

17 KiB
Raw Permalink Blame History

Session handoff — 2026-05-20 (27)

Twenty-seventh handover. Implementation session for ADR-0032 Phase 2. Sub-phases 2a, 2b (with §10.3 stage 2 deferred), 2c, 2d (with three diagnostic keys deferred), and 2f all shipped. Sub-phases 2e (completion + post-walk fixup) and 2g (verification sweep) remain.

§1. State at handoff

Branch: main. Working tree clean. 11 unpushed commits (be8a4f5..0c3847a plus the prior handoff-26 commit on top of 3db292c).

Tests: 1385 passing, 0 failing, 1 ignored (up from the 2026-05-19 handoff-26 baseline of 1260 / 0 / 1 — a net +125 tests).

Clippy: clean (last verified at end of 2f, commit 0c3847a).

Commits since handoff-26:

0c3847a db: column-origin type recovery in SELECT results (sub-phase 2f)
c5cf03b walker: SQL diagnostics — multi-binding scope, qualified refs, Phase-1 gap closure (sub-phase 2d)
a491df3 grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c)
4ff054c walker: populate cte_bindings placeholders + projection_aliases (§10.3 stage 1 / §10.4)
b522d09 walker: populate from_scope table bindings (§10.1)
98a74b2 grammar: sql_expr additive extensions for §5/§6, CTE body rewires to ScopedSubgrammar
4f89106 walker: Node::ScopedSubgrammar variant + scope-frame stack (§10.2)
8d29335 grammar: SQL SELECT full statement fragment (Phase 2a)
e032f01 docs: ADR-0032 Amendment 1 — empirical scope of column-origin metadata

§2. What's done — the Phase-2 surface

A user typing in advanced mode can now write the full Phase-2 SELECT surface authored by ADR-0032 §§19:

  • All five JOIN flavours (INNER / LEFT OUTER / RIGHT OUTER / FULL OUTER / CROSS) and bare JOIN.
  • All four set ops (UNION / UNION ALL / INTERSECT / EXCEPT) in compound queries.
  • CTEs — both WITH and WITH RECURSIVE, with optional (col-list). Dispatched via a new data::WITH CommandNode (entry word with is now in ADVANCED_ONLY_ENTRIES).
  • GROUP BY / HAVING with arbitrary sql_expr keys.
  • Qualified column refs (t.c / alias.c) — additive extension to sql_expr.rs's name_or_call (§5).
  • Subquery expressions — scalar (SELECT …), IN (SELECT …), [NOT] EXISTS (SELECT …) — additive Choice branches in sql_expr.rs (§6), recursing through Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND).
  • LIMIT n [OFFSET m] with full sql_expr admission for both args (legacy LIMIT m, n rejected per §13 OOS-4).
  • DISTINCT / ALL prefix.
  • t.* qualified wildcard in projection.
  • Bare-alias projection (SELECT a x FROM t — lifts Phase-1 §4.2) via per-position Node::Lookahead follow-set gating.
  • OOS shapes reject: NATURAL JOIN, JOIN … USING, comma-FROM, derived tables in FROM, window functions (OVER (…)), VALUES row source, LATERAL (partial — the single-keyword form is admitted as a bare alias; the comma form rejects via OOS-3).

The Phase-1 data::SELECT shape migrated from data.rs into sql_select.rs's SQL_SELECT_TAIL (sub-phase 2c). Phase-1's 22 local grammar nodes plus the duplicate reject_internal_table validator are gone; the single source of truth for the SELECT shape is now sql_select.rs.

2.1. Walker capabilities

  • Node::ScopedSubgrammar(&'static Node) (ADR-0032 §10.2) is the new node variant. Same recursion semantics as Subgrammar, plus a push/pop of ScopeFrame on the WalkContext::from_scope_stack. Shares subgrammar_depth with Subgrammar so the MAX_SUBGRAMMAR_DEPTH = 64 cap fires uniformly.
  • WalkContext::from_scope_stack: Vec<ScopeFrame> — always non-empty (the bottom frame is the implicit top-level scope DSL paths use). Each ScopeFrame carries from_scope: Vec<TableBinding>, cte_bindings: Vec<CteBinding>, and projection_aliases: Vec<String>.
  • Three new Node::Ident flags: writes_table_alias, writes_cte_name, writes_projection_alias. Together with the existing writes_table (now repurposed to push a TableBinding alongside its prior current_table write), they populate the top frame's accumulators during the walk. All 48 existing Ident sites carry the new flags as false (mechanical sed update; no behavioural change for DSL paths).
  • current_table / current_table_columns stay as fields on WalkContext (additive approach — they're populated by writes_table alongside the from_scope push). Plan §2b's "make them derived helpers" became "keep them populated redundantly" — strictly safer, no behavioural drift for DSL paths, but a future-cleanup opportunity if the duplication proves a maintenance burden.

2.2. Diagnostics

Three diagnostic passes run on every successful parse (ADR-0032 §11.7):

  1. schema_existence_diagnostics — extended for multi-binding scope (a pre-pass collects all table_name / cte_name / table_alias idents to sidestep the projection-before-FROM ordering problem). Emits diagnostic.unknown_table, diagnostic.unknown_column, diagnostic.unknown_qualifier, diagnostic.ambiguous_column, diagnostic.duplicate_cte (Plan §Open-2, user-approved).
  2. sql_predicate_warnings — NEW. MatchedPath-walking variant of the predicate-warning pass, closes the Phase-1 carry-over gap (§11.6). Emits diagnostic.eq_null, diagnostic.like_numeric, diagnostic.type_mismatch on SQL WHERE / HAVING / JOIN ON / CASE / projection / ORDER BY slots. Scoped to bare column refs in <column> <op> <literal> form — qualified-ref and expression-operand cases stay un-flagged (safe false-negative posture).
  3. DSL predicate_warnings (existing) — unchanged. Still walks the DSL Expr AST for DSL WHERE expressions.

2.3. Type recovery (sub-phase 2f)

Per ADR-0032 §12 + Amendment 1:

  • Cargo.toml rusqlite features now include column_metadata.
  • do_run_select calls resolve_select_column_types(conn, stmt) after prepare, queries __rdbms_playground_columns for each result column whose origin metadata is populated, and threads the type through to format_cell.
  • ADR-0030 Phase-1 §4.5 (bool rendering as 0 / 1) is lifted for any bare-column reference whose origin the engine carries through — non-recursive CTE bodies, scalar subqueries, derived tables, set ops, JOINs all work for free per Amendment 1.

2.4. Catalog (ADR-0032 §11.5)

Six new diagnostic.* keys + eight new engine.* keys authored in src/friendly/strings/en-US.yaml and registered in src/friendly/keys.rs:

  • diagnostic.ambiguous_column, diagnostic.compound_arity_mismatch, diagnostic.cte_arity_mismatch, diagnostic.duplicate_cte, diagnostic.projection_alias_misplaced, diagnostic.unknown_qualifier.
  • engine.no_such_table, engine.no_such_column, engine.ambiguous_column, engine.aggregate_misuse, engine.group_by_required, engine.compound_arity_mismatch, engine.scalar_subquery_too_many_rows, engine.recursive_cte_malformed.

The engine.* keys are catalog entries only — the friendly-error layer reads them by key when reached, but no proactive enhancement of the layer's engine-message classification was pulled through here. That stays as a small follow-up.

§3. Deferrals — all user-approved or documented

§3.1. §10.3 stage 2 — CTE column-derivation harvest

Status: user-approved deferral (mid-session, 2026-05-20). Recorded in: docs/plans/20260520-adr-0032-phase-2.md §2b "Implementation status (2026-05-20)" block.

The placeholder CTE-binding push (stage 1) IS implemented — the CTE name is visible to the body (WITH RECURSIVE self-reference works) and to downstream CTE-name validators (2d's schema-existence pass accepts CTE-name table-source refs). What's missing is the six §10.3 derivation rules that populate the placeholder's columns at body-frame exit.

Knock-on effect: qualified-prefix completion past cte_alias.| returns an empty candidate list. Bare column refs into a CTE binding short-circuit to "accept silently" in 2d's schema-existence pass (we don't know the columns yet).

Folds into 2e (where qualified-prefix completion needs CTE columns) and any future arity-check pass that wants to compare declared (col-list) vs derived projection arity.

§3.2. Three 2d diagnostic keys deferred

projection_alias_misplaced, compound_arity_mismatch, and cte_arity_mismatch — catalog keys + strings are authored but the emitting logic is not implemented:

  • projection_alias_misplaced needs clause-context detection (which clause an Ident sits in — the matched path is flat; some role-based recognition would work).
  • compound_arity_mismatch needs per-leg projection counting as the compound is assembled.
  • cte_arity_mismatch depends on §10.3 stage 2.

These were not explicitly user-approved as deferrals — they were noted in the 2d commit message as "deferred per the 2d-scoped DA review (documented as (TBD) in the cross-cut matrix for 2g)". The user should flag if these need to land before 2g signs off.

§3.3. Engine.* key wiring

Catalog entries authored; the friendly-error layer doesn't yet recognise the new engine messages and route them through the new keys. This is a small mechanical extension to the engine-error matcher; not blocking but worth folding in.

§4. What's next — sub-phases 2e and 2g

§4.1. Sub-phase 2e — completion + post-walk fixup

Per plan §2e:

  • Qualified-prefix completion narrowing (§10.5): at the cursor in SELECT t.|, candidates should be t's columns (not the global column set). Requires:
    • The completion API gains a "prefix qualifier" hint that the walker passes when the cursor is immediately after Ident '.'.
    • The IdentSource::Columns completion path honours the hint when present, resolving against the top frame's from_scope binding for t.
  • Post-walk projection-list fixup pass (§10.6): the projection-before-FROM problem. Collect projection Ident terminals during the walk; after walk completion, re-resolve each against the final from_scope; rewrite the highlight class and validity diagnostic.

Note: the §10.6 problem is partially sidestepped already in 2d — the schema-existence pass uses a two-pass approach that collects bindings first and then resolves. So projection-side identifiers don't get false-positive diagnostics from the existing pass. The 2e fixup-pass scope is more about getting the HIGHLIGHT class right at every keystroke (the typing experience) and supporting completion mid-typing. Worth re-reading §10.6 carefully with the 2d implementation in hand.

§4.2. Sub-phase 2g — integration sweep + verification

Per plan §2g:

  • Fill in the cross-cut verification matrix (29 rows in docs/plans/20260520-adr-0032-phase-2.md). Many rows are satisfied by tests already authored in 2a2f; the matrix needs to be walked and each row pointed at its test.
  • Produce the final phase-exit verification report.
  • DA final review.

§4.3. The §10.3 stage-2 harvest

If 2e wants real CTE-column-prefix completion, the harvest needs to land first. Six derivation rules per the ADR's table. Mechanism design: hook a callback into walk_scoped_subgrammar's on-exit path; when the frame being popped is a CTE body, walk its projection items and populate the placeholder binding in the outer frame. Projection item shape can be reconstructed from a projection_record accumulator on the frame (record the shape of each projection_item as the walker matches it — a new Node::ProjectionRecord wrapper, or a flag on Lookahead factories, would be the mechanism).

If the user is willing to defer CTE-prefix completion along with the harvest, 2e can ship without the harvest; qualified completion past cte.| returns nothing today and would continue to do so.

§5. Working tree and unpushed commits

e032f01 docs: ADR-0032 Amendment 1 — empirical scope of column-origin metadata
8d29335 grammar: SQL SELECT full statement fragment (Phase 2a)
4f89106 walker: Node::ScopedSubgrammar variant + scope-frame stack (§10.2)
98a74b2 grammar: sql_expr additive extensions for §5/§6, CTE body rewires to ScopedSubgrammar
b522d09 walker: populate from_scope table bindings (§10.1)
4ff054c walker: populate cte_bindings placeholders + projection_aliases (§10.3 stage 1 / §10.4)
a491df3 grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c)
c5cf03b walker: SQL diagnostics — multi-binding scope, qualified refs, Phase-1 gap closure (sub-phase 2d)
0c3847a db: column-origin type recovery in SELECT results (sub-phase 2f)

Plus this handoff commit, once it lands.

§6. Seams for the next implementer

Where to look first:

  • For 2e qualified-prefix completion: the existing IdentSource::Columns completion path in src/completion.rs (driven by WalkContext::current_table_columns). The qualified case needs a new "narrow to this binding's columns" pathway. The walker already records the sql_expr_qualified_ref role on the trailing ident in a t.c reference; the preceding Punct('.') and sql_expr_ident give the qualifier. Walker WalkContext could grow a pending_qualifier: Option<String> set when the walker enters a qualified-ref tail, cleared on successful match.

  • For 2e post-walk fixup: the schema-existence pass in src/dsl/walker/mod.rs already does a pre-pass. The 2e fixup is about the HIGHLIGHT layer — rewriting per_byte_class entries for projection-list idents once the final from_scope is known. The walker's per_byte vec is the mutation target. Look at walker/highlight.rs for how byte-class runs are built today.

  • For 2g verification matrix: walk row by row through docs/plans/20260520-adr-0032-phase-2.md's "Cross-cut verification matrix" table. Many rows already have tests; fill in the test location column. The handful that don't have explicit tests (the projection-before-FROM re-walk cycle, for example) may need new ones authored.

  • For the §10.3 harvest: see §4.3 above. The cleanest mechanism is probably a new walker hook (a function pointer on a per-ScopedSubgrammar basis, or a usage flag on the Node::Ident variants that record projection items), but the design isn't fully nailed down. The implementer should think about whether to escalate the harvest design to the user before coding.

Code locations modified this session (in commit order):

  • docs/adr/0032-sql-select-grammar.md — Amendment 1 added.
  • docs/adr/README.md — index entry extended for Amendment 1.
  • docs/plans/20260520-adr-0032-phase-2.md — §2b "Implementation status (2026-05-20)" block recording the §10.3 stage-2 deferral.
  • src/dsl/grammar/mod.rsNode::ScopedSubgrammar variant
    • three new Ident flags + data::WITH registered + ADVANCED_ONLY_ENTRIES extended with with.
  • src/dsl/grammar/data.rs — ~150 lines of dead Phase-1 SELECT grammar removed; data::SELECT reshaped to Node::Subgrammar(&sql_select::SQL_SELECT_TAIL); data::WITH added.
  • src/dsl/grammar/sql_select.rs — new file (sub-phase 2a), augmented through 2b and 2c.
  • src/dsl/grammar/sql_expr.rs — additive §5 / §6 extensions.
  • src/dsl/walker/context.rsTableBinding, CteBinding, CteColumn, ScopeFrame; WalkContext::from_scope_stack.
  • src/dsl/walker/driver.rsNode::ScopedSubgrammar match arm; walk_scoped_subgrammar; writes_table_alias / writes_cte_name / writes_projection_alias handling in walk_ident.
  • src/dsl/walker/mod.rs — multi-binding schema_existence_diagnostics rewrite, new sql_predicate_warnings pass, 16 new tests.
  • src/friendly/keys.rs + src/friendly/strings/en-US.yaml — 6 new diagnostic.* + 8 new engine.* keys.
  • Cargo.tomlcolumn_metadata added to rusqlite features.
  • src/db.rsresolve_select_column_types helper; do_run_select threads per-column types through to format_cell.
  • tests/sql_select.rs — 3 new SELECT type-recovery tests.

§7. How to take over

  1. Read this file in full. Then docs/adr/0032-sql-select- grammar.md (and Amendment 1 — the empirical findings matter for 2f's design), docs/plans/20260520-adr-0032- phase-2.md, and CLAUDE.md.
  2. cargo test — expect 1385 passing, 0 failed, 1 ignored.
  3. cargo clippy --all-targets -- -D warnings — clean.
  4. Pick the next sub-phase. 2e is substantial (qualified completion + post-walk fixup); 2g is the closing sweep (cross-cut matrix + verification report). Either is a reasonable next session.
  5. Escalate before re-litigating deferrals. The three diagnostic deferrals in 2d (§3.2) are not user-approved — check whether they need to land before 2g signs off.
  6. The §10.3 stage-2 harvest is the biggest open piece; if 2e wants real CTE-prefix completion, escalate the harvest design before coding.

§8. What's open beyond Phase 2

Unchanged from handoff-26 §8: snapshot/undo U-series; m:n convenience C4; modify-relationship C3a; rename-table C1; friendly-error sweep H1; CI TT5; session-log / Markdown export V4.

ADR-0030's later phases — DML (Phase 3), DDL (Phase 4), the DSL→SQL teaching echo (Phase 5), polish (Phase 6) — remain on the roadmap behind the Phase 2 sub-phases not yet completed.