grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c)

The Phase-1 SQL `SELECT` grammar nodes that used to live in
`src/dsl/grammar/data.rs` retire — 22 statics / consts and the
`reject_internal_table` validator copy are removed, ~150 lines
of grammar machinery gone. `data::SELECT.shape` now references
the post-`SELECT` portion of the ADR-0032 fragment via a thin
`Node::Subgrammar(&sql_select::SQL_SELECT_TAIL)`.

`SQL_SELECT_TAIL` is a new export from `sql_select.rs`,
parallel to `SQL_SELECT_STATEMENT`. It represents what a
top-level `SELECT` statement looks like AFTER the registry's
entry-word dispatch has already consumed the leading `SELECT`
keyword: the DISTINCT/ALL prefix, projection list, optional
FROM / WHERE / GROUP BY / HAVING, the compound set-op chain
(each subsequent leg's `SELECT` is part of `SET_OP_TAIL`),
outer ORDER BY / LIMIT, and a tolerated trailing `;`.

WITH-prefixed statements (`WITH x AS (…) SELECT * FROM x`)
are NOT in 2c's scope — they need a separate `data::WITH`
`CommandNode` so the entry-word dispatch routes correctly.
For now, top-level WITH continues to fall through to the
chumsky parser route (the same as in Phase 1). The
`SQL_SELECT_STATEMENT` static (which includes the optional
WITH prefix) stays available for use by that future
CommandNode or by any other consumer that needs the full
statement shape.

All seven Phase-1 SQL `SELECT` integration tests
(`tests/sql_select.rs`) pass without modification, satisfying
the 2c exit gate's "behaviour preserved" requirement. The
70 fragment unit tests and the 26 driver-level scope tests
also pass — the migration is a refactor, no new tests
required.

Behaviour change explicitly sanctioned by ADR-0032 §8:
Phase-1's `LIMIT_VALIDATOR` (positive-int-only, parse-time)
is superseded by the full `sql_expr` admission. `LIMIT max(10,
x)` and similar now parse; the engine constrains the value at
execution time per the ADR's "grammar admits, engine
rejects" posture.

Plan §2b status note: the 2026-05-20 deferral of §10.3 stage 2
(CTE output-column harvest derivation) is recorded in
`docs/plans/20260520-adr-0032-phase-2.md` per the
user-approved deferral.

Test totals: 1366 passing (unchanged), 0 failed, 1 ignored.
Clippy clean. data.rs loses ~150 lines of dead grammar; the
single source of truth for the SQL `SELECT` shape is now
`sql_select.rs`.
This commit is contained in:
claude@clouddev1
2026-05-20 15:42:44 +00:00
parent 4ff054ca75
commit a491df32a0
3 changed files with 93 additions and 144 deletions
+28
View File
@@ -270,6 +270,34 @@ coverage).
## Sub-phase 2b — `ScopedSubgrammar` + scope accumulators
### Implementation status (2026-05-20)
Sub-phase 2b shipped in five commits (`4f89106`, `98a74b2`,
`b522d09`, `4ff054c`, and earlier 2a foundations). The
scope-accumulator infrastructure — `Node::ScopedSubgrammar`,
`ScopeFrame`, `from_scope_stack`, the new Ident flags
(`writes_table_alias`, `writes_cte_name`,
`writes_projection_alias`), and the sql_expr §5 / §6 additive
extensions — is complete and exercised by 26 driver-level
tests.
**Deferral, explicitly user-approved:** §10.3 stage 2 (the six
CTE output-column derivation rules) is NOT implemented in 2b.
Stage 1 (placeholder CTE binding push) IS implemented; stage 2
will fold into 2d (where the new arity-check pass needs
declared-vs-derived column counts) and 2e (where qualified-
prefix completion needs CTE columns). Until then, a CTE
binding's `columns` stays empty after the body exits, and
qualified-prefix completion past `cte_alias.|` returns an
empty candidate list. The CTE-name is still visible as a
table source from inside the body (WITH RECURSIVE
self-reference works) and from outside (downstream CTE-name
validators see it).
The 2g cross-cut matrix rows for §10.3 derivation cases are
deferred along with the harvest; they will land when 2d/2e
implements the rules.
### Scope (in)
- Add the new walker node variant `Node::ScopedSubgrammar(&Node)`.