Files

T

claude@clouddev1 1c42e78d92 docs: ADR-0032 Phase 2 — phase-exit verification report

The §5 deliverable from the implementation plan, this time with
a non-rubber-stamp DA review.

Documents:
- Final test state (1446 / 0 / 1 — clippy clean).
- Cross-cut matrix outcome (29 rows, all green per the plan doc).
- Requirements-to-test mapping for ADR-0032 §§1–13 + both
  Amendments.
- Autonomous-decision audit (7 implementation decisions, each
  with explicit user-confirmation pointer).
- DA's written final review with three blocking critiques
  (now closed in commit 05884bd) and four non-blocking
  observations recorded as known trade-offs.
- Process critique on the first DA pass being a rubber stamp.

Verdict: PASS, with non-blocking observations pinned in the
report rather than carried into the next phase as folklore.

2026-05-20 21:59:45 +00:00

16 KiB

Raw Blame History

ADR-0032 Phase 2 — Phase-exit verification report (2026-05-20)

Closes sub-phase 2g. This report is the §5 deliverable from the implementation plan (docs/plans/20260520-adr-0032-phase-2.md, §Final phase-exit verification report). It documents the final test state, the cross-cut matrix outcome, the requirements-to-test mapping, autonomous-decision audit, and the written DA review.

§1. Test suite totals

cargo test
→ 1446 passing, 0 failed, 1 ignored

Phase-1 baseline (handoff-26): 1260 / 0 / 1. Handoff-27 baseline (start of this session): 1385 / 0 / 1. End-of-Phase-2 (this report): 1446 / 0 / 1 (+186 since Phase-1 baseline; +61 since handoff-27).

The single ignored test is unchanged from Phase 1 — a doctest marker, not a regression.

Clippy: cargo clippy --all-targets -- -D warnings clean.

§2. Cross-cut verification matrix

Every row green. See docs/plans/20260520-adr-0032-phase-2.md §Cross-cut verification matrix for the full 29-row table with file::function references. Summary of coverage:

5 rows for ADR-0030 §8 / §9 / §11 — highlighting, completion, hints, validity indicator, parse errors, OOS rejections, history logging.
3 rows for ADR-0031 §5 — expression highlighting, column completion in sql_expr, hint prose.
3 rows for ADR-0032 §10.2 — Subgrammar vs ScopedSubgrammar scope-push semantics.
5 rows for ADR-0032 §10.3 — the six CTE column-derivation rules, compound first-leg rule, recursive CTE non-recursive-leg rule, col-list rename.
1 row for ADR-0032 §10.6 — projection-before-FROM behaviour via the two-pass diagnostic + Amendment 2's documented mechanism choice.
4 rows for ADR-0032 §11.6 — the Phase-1 carry-over gap closed on every SQL sql_expr slot (WHERE, HAVING, ON, CASE, projection, ORDER BY).
2 rows for ADR-0032 §11.4 / §13 — engine-neutral surfacing of aggregate misuse and GROUP BY required.
1 row for ADR-0032 §9 — depth cap on pathological nesting.
2 rows for ADR-0032 §12 — engine column-origin metadata through CTE, all 10 playground types via bare ref.
1 row for ADR-0032 §13 — OOS shapes reject (NATURAL, USING, comma-FROM, VALUES, LATERAL, OVER).

§3. Requirements-to-test mapping

Each numbered decision in ADR-0032 §§1–13 traced to its tests:

§1 (the SELECT shape). src/dsl/grammar/sql_select.rs test module — compound_fragment_walks_with_or_without_with_clause, projection_alias_*, qualified_star_projection, and the shape's parsing acceptance tests (28+).
§4 (JOINs). Same module — the join-flavour matrix (inner_join_*, left_outer_join_*, right_outer_join_*, full_outer_join_*, cross_join_*).
§5 (sql_expr extensions). src/dsl/grammar/sql_expr.rs — qualified_ref_basic_shapes, qualified_ref_function_call_*, subquery_recursion_through_compound, scalar_subquery_*.
§6 (subquery primaries). Same — scalar_subquery_as_primary, exists_subquery_in_where_clause, in_subquery_predicate.
§7 (set operations). sql_select.rs — union_*, intersect_*, except_*, set_op_with_outer_order_by_and_limit.
§8 (CTEs). sql_select.rs — cte_* parse tests; walker driver — cte_harvest_* (10 tests) + cte_arity_* (4 tests).
§9 (out-of-subset rejects + depth cap). sql_select.rs — natural_join_rejected, using_clause_rejected, values_row_source_rejected, lateral_join_rejected, window_function_rejected, comma_from_is_rejected, pathological_nesting_capped.
§10.1–§10.6 (scope discipline + harvest). Walker driver module — from_scope_*, projection_aliases_*, scoped_subgrammar_*, cte_harvest_*, cte_harvest_sibling_b_sees_a_columns, cte_harvest_nested_with_in_cte_body. Walker module — projection_before_from_tests module (4 tests). Completion — qualified_prefix_* (5 tests), lookahead_* (4 tests).
§11.2 ERROR diagnostics. Walker module — unknown_qualifier_in_qualified_ref_is_error, ambiguous_bare_column_is_error, duplicate_cte_in_same_with_block_is_error, projection_alias_in_where_is_misplaced / _in_having_is_misplaced / _in_group_by_is_misplaced / _in_order_by_is_allowed, compound_union_arity_mismatch_fires / _intersect_ / _except_ / _union_all_ / _three_leg_chain_emits_per_mismatch / _with_function_call_args_not_confused / _inside_cte_body_detected, cte_arity_mismatch_when_col_list_*, cte_arity_match_no_diagnostic.
§11.6 WARNING diagnostics (Phase-1 gap closure). Walker module — sql_where_like_numeric_warns, sql_where_eq_null_warns, sql_where_type_mismatch_*, sql_having_predicate_warning_fires, sql_join_on_predicate_warning_fires, sql_case_predicate_warning_fires, sql_order_by_predicate_warning_fires, sql_projection_predicate_warning_fires.
§11.5 catalog + §11.4 engine routing. Friendly translate — aggregate_misuse_engine_message_routes_through_catalog, group_by_required_engine_message_routes_through_catalog, compound_arity_engine_message_routes_through_catalog, scalar_subquery_too_many_rows_routes_through_catalog.
§12 type recovery. tests/sql_select.rs — database_run_select_recovers_bool_column_type, database_run_select_recovers_text_type_through_alias, database_run_select_computed_expression_stays_typeless, database_run_select_recovers_all_ten_playground_types.
§13 OOS list. sql_select.rs — see §9 above (the OOS-rejection tests cover §13's enumerated shapes).
Amendment 1 (empirical column-origin scope). Worker probe in src/db.rs::resolve_select_column_types — exercised by every §12 type-recovery test.
Amendment 2 (§10.6 mechanism). projection_before_from_tests module documents the post-walk re-resolve via 2d's two-pass diagnostic + the renderer's diagnostic-overlay path.

§4. Autonomous-decision audit

Per CLAUDE.md, every implementation decision not explicitly authorised by the ADR or user is listed here with rationale and the explicit user confirmation that approved it.

4.1. Sub-phase 2d.1 (this session)

Three diagnostic.* keys were deferred by the 2d implementer without user approval (handoff-27 §3.2). User confirmed the correct interpretation: two of the three (projection_alias_misplaced, compound_arity_mismatch) needed to land in 2d.1; the third (cte_arity_mismatch) was correctly attached to the user-approved §10.3 stage-2 deferral. 2d.1 commit c20c6e0 closes both independent keys. cte_arity_mismatch landed alongside the harvest in dd37a1c.

4.2. §10.3 stage-2 harvest design

Per handoff-27 §4.3, the implementer was asked to "think about whether to escalate the harvest design to the user before coding". This session escalated three design questions to the user (post-walk path-scan vs per-frame record; pending_cte_harvest trigger mechanism; col_list arity check scope), got explicit sign-off on each (option A for Q1, "yes use pending_cte_harvest" for Q2, "yes all three in this sub-phase" for Q3), and then implemented per those answers. Commit dd37a1c reflects the user-approved design.

4.3. Nested WITH inside subqueries / CTE bodies

ADR-0032 §10.3 implies subqueries can declare their own CTEs (the shadowing note), but the grammar didn't admit nested WITH inside SQL_SELECT_COMPOUND. User explicitly chose "Fix grammar now" over the ADR-Amendment-2 carve-out. Commit fd25904 closes the ADR-vs-implementation gap.

4.4. Completion look-ahead for the edit-scenario

The user raised the realistic "edit an existing query" workflow (select c| from mytable where FROM exists after the cursor). ADR §10.6 explicitly accepted the "noisy global fallback" posture for this case. User chose to improve it via a look-ahead probe. Commit 0fc7b08 adds the look-ahead, preserving the ADR's posture as the fall-through when the full input doesn't parse cleanly.

4.5. §10.6 fixup mechanism choice

The ADR prescribed "rewriting the highlight class"; the implementation uses a different mechanism (diagnostic-overlay renderer + 2d's two-pass binding collection). User explicitly chose "Write Amendment 2 now" to document this. Commit ee0dafd adds Amendment 2; commit ed881ee adds the regression tests pinning the mechanism's user-visible behaviour.

4.6. 2g matrix-driven implementation gaps

Filling the matrix surfaced three real production gaps:

Advanced-mode UI rendering bypassed the highlight walker.
Engine.* catalog keys were authored but unreached.
Three predicate-warning slots (CASE, ORDER BY, projection) had no explicit test coverage.

Each was a real Phase-2 deliverable, not a future enhancement. Commit ed881ee closes all three. No additional user approval was sought because the work was the literal completion of items already approved as Phase-2 scope; the "defer this" instinct that flagged them was the anti-pattern the user has been correcting throughout the session.

4.7. DA rework — fourth UI gap

The first DA review of this report rubber-stamped the work with PASS. The user caught that the DA section asked the right questions but answered them too charitably. A genuine DA pass produced seven critiques; the user routed the work back to Phase 4 to address the three blocking ones (Blob test, engine patterns vs. real SQLite output, manual UI verification).

Critique #3 (manual UI verification) surfaced a fourth implementation gap not caught by any of the unit tests: the validity indicator ([ERR] / [WRN]) was mode-gated to Simple, so SQL predicate warnings emitted in Advanced mode but never reached the indicator the user sees. Without the manual TUI check this would have shipped silently. Commit 05884bd addresses all three DA critiques + this surfaced fourth gap.

The lesson: the matrix-row attribution for "validity indicator fires for SQL" pointed to DSL tests, not SQL tests. Filling the matrix doesn't guarantee the rows are correctly attributed; the DA had to challenge the attribution to find this.

§5. DA's written final review

The first attempt at this section was a rubber-stamp PASS. After the user challenged it ("Did the DA have anything to say other than PASS?"), the DA hat produced seven genuine critiques. Three were blocking and went back to Phase 4 (now closed by commit 05884bd); four were observations recorded here for future reference.

Critiques addressed (now closed)

#1 — All-10-types test's NULL-blob cell. Justified after empirical verification: column-origin metadata is row-independent (the engine returns the source table+column from the prepared statement, not from cell values). Added database_run_select_type_recovery_works_on_empty_table that pins this invariant on an empty table; the all-types test now references it via explicit comment.

#2 — Engine. pattern matching against hand-coded strings.* Closed: three new tests in tests/sql_select.rs produce real SQLite engine errors via run_select and assert catalog routing. Aggregate-in-WHERE confirms the actual SQLite wording matches the pattern matcher. GROUP-BY-required and scalar-subquery-too-many-rows are SQLite-permissive (no error on the natural triggers), so those tests verify the matcher doesn't false-positive on benign queries + synthetic-message routing still works.

#3 — No manual UI verification. Closed: launched the TUI, typed a representative SELECT * FROM products WHERE price LIKE 5 in Advanced mode, confirmed (a) SELECT/FROM/WHERE/LIKE render in keyword color, (b) the [WRN] indicator appears. This step surfaced the fourth gap (validity verdict gated to Simple); the additional fix in 05884bd wires the verdict through the active effective mode.

Critiques recorded (non-blocking)

#4 — Group-by pattern overbreadth. translate_generic matches any error containing "group by" — could route unrelated future SQLite errors through engine.group_by_required. Low risk in practice (SQLite's other "group by" mentions are rare and the catalog wording is generic enough to not mislead). Pin for follow-up if a real false-positive surfaces.

#5 — Look-ahead probe cost. The completion engine runs a second walk per Tab press when the leading walk produced no scope. For complex inputs this doubles parse cost. Not benchmarked. Acceptable in current debounce-cadence usage; revisit if profiling reveals it.

#6 — Tests-after-code on the matrix-coverage tests. The 13 new tests in ed881ee were written after the gaps were discovered, not before — the spirit of CLAUDE.md's "establish test coverage BEFORE making changes" rule. The work is sound, the tests do prove the behavior, but the ordering was wrong. Noted for future sub-phases.

#7 — Matrix attribution wasn't verified row-by-row. The validity-indicator row pointed at DSL tests; the DA had to catch this. Future matrix-filling work should cross-check the attributed test actually exercises the claimed surface (e.g., a "fires for SQL" row should point at a SQL-input test, not a DSL test that happens to share the same code path).

Process critique

The DA verdict on the first pass was a rubber stamp. The user had to challenge the verdict to get genuine engagement. That's a Phase-1 anti-pattern (CLAUDE.md: "the DA is not a rubber stamp"). The second-pass DA produced seven specific critiques and routed the three blockers to rework. Next time, DA review starts by listing specific critiques without the verdict, then concludes — not the other way around.

Are all tests green with zero skips?

Yes. 1446 passing, 0 failed, 1 ignored. The one ignored test is the unchanged Phase-1 doctest marker, present in the baseline. Clippy clean across the workspace.

Final verdict

PASS, with the four non-blocking observations recorded above for follow-up.

Phase 2 is complete:

Every ADR-0032 decision (§§1–13) has at least one test.
Both Amendments (1 + 2) are documented and have regression coverage.
Every autonomous decision is flagged with the user confirmation that approved it.
The manual TUI check confirms SQL keywords highlight in Advanced mode and the validity indicator surfaces SQL diagnostics correctly.

The four non-blocking critiques are pinned in this report rather than carried into the next phase as folklore. A future session can decide whether to address them or accept them as known trade-offs.

§6. Commits

In commit order (oldest first), all 18 unpushed at the moment this report writes:

e032f01 docs: ADR-0032 Amendment 1
8d29335 grammar: SQL SELECT full statement fragment (Phase 2a)
4f89106 walker: Node::ScopedSubgrammar variant + scope-frame stack
98a74b2 grammar: sql_expr additive extensions for §5/§6
b522d09 walker: populate from_scope table bindings
4ff054c walker: populate cte_bindings placeholders + projection_aliases
a491df3 grammar: migrate Phase-1 SELECT to ADR-0032 fragment
c5cf03b walker: SQL diagnostics — multi-binding scope, qualified refs
0c3847a db: column-origin type recovery in SELECT results
5d716e6 docs: handoff 27
c20c6e0 walker: 2d.1 — projection-alias misplaced + compound-arity
dd37a1c walker: 2e prereq — §10.3 stage-2 CTE harvest + cte_arity_mismatch
fd25904 grammar: admit WITH inside subqueries / CTE bodies (ADR-0032 §10.3)
0fc7b08 completion: §10.5 qualified-prefix + edit-scenario look-ahead
ee0dafd docs: ADR-0032 Amendment 2 + §10.6 regression tests
ed881ee 2g: advanced-mode highlight + engine.* wiring + matrix tests
05884bd 2g rework: address DA findings on type recovery + engine routing + UI
<this report>

Per CLAUDE.md, push is a user step. Phase 2 lands in origin/main whenever the user pushes.

16 KiB Raw Blame History Unescape Escape