grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c)

The Phase-1 SQL `SELECT` grammar nodes that used to live in
`src/dsl/grammar/data.rs` retire — 22 statics / consts and the
`reject_internal_table` validator copy are removed, ~150 lines
of grammar machinery gone. `data::SELECT.shape` now references
the post-`SELECT` portion of the ADR-0032 fragment via a thin
`Node::Subgrammar(&sql_select::SQL_SELECT_TAIL)`.

`SQL_SELECT_TAIL` is a new export from `sql_select.rs`,
parallel to `SQL_SELECT_STATEMENT`. It represents what a
top-level `SELECT` statement looks like AFTER the registry's
entry-word dispatch has already consumed the leading `SELECT`
keyword: the DISTINCT/ALL prefix, projection list, optional
FROM / WHERE / GROUP BY / HAVING, the compound set-op chain
(each subsequent leg's `SELECT` is part of `SET_OP_TAIL`),
outer ORDER BY / LIMIT, and a tolerated trailing `;`.

WITH-prefixed statements (`WITH x AS (…) SELECT * FROM x`)
are NOT in 2c's scope — they need a separate `data::WITH`
`CommandNode` so the entry-word dispatch routes correctly.
For now, top-level WITH continues to fall through to the
chumsky parser route (the same as in Phase 1). The
`SQL_SELECT_STATEMENT` static (which includes the optional
WITH prefix) stays available for use by that future
CommandNode or by any other consumer that needs the full
statement shape.

All seven Phase-1 SQL `SELECT` integration tests
(`tests/sql_select.rs`) pass without modification, satisfying
the 2c exit gate's "behaviour preserved" requirement. The
70 fragment unit tests and the 26 driver-level scope tests
also pass — the migration is a refactor, no new tests
required.

Behaviour change explicitly sanctioned by ADR-0032 §8:
Phase-1's `LIMIT_VALIDATOR` (positive-int-only, parse-time)
is superseded by the full `sql_expr` admission. `LIMIT max(10,
x)` and similar now parse; the engine constrains the value at
execution time per the ADR's "grammar admits, engine
rejects" posture.

Plan §2b status note: the 2026-05-20 deferral of §10.3 stage 2
(CTE output-column harvest derivation) is recorded in
`docs/plans/20260520-adr-0032-phase-2.md` per the
user-approved deferral.

Test totals: 1366 passing (unchanged), 0 failed, 1 ignored.
Clippy clean. data.rs loses ~150 lines of dead grammar; the
single source of truth for the SQL `SELECT` shape is now
`sql_select.rs`.
This commit is contained in:
claude@clouddev1
2026-05-20 15:42:44 +00:00
parent 4ff054ca75
commit a491df32a0
3 changed files with 93 additions and 144 deletions
+28
View File
@@ -270,6 +270,34 @@ coverage).
## Sub-phase 2b — `ScopedSubgrammar` + scope accumulators
### Implementation status (2026-05-20)
Sub-phase 2b shipped in five commits (`4f89106`, `98a74b2`,
`b522d09`, `4ff054c`, and earlier 2a foundations). The
scope-accumulator infrastructure — `Node::ScopedSubgrammar`,
`ScopeFrame`, `from_scope_stack`, the new Ident flags
(`writes_table_alias`, `writes_cte_name`,
`writes_projection_alias`), and the sql_expr §5 / §6 additive
extensions — is complete and exercised by 26 driver-level
tests.
**Deferral, explicitly user-approved:** §10.3 stage 2 (the six
CTE output-column derivation rules) is NOT implemented in 2b.
Stage 1 (placeholder CTE binding push) IS implemented; stage 2
will fold into 2d (where the new arity-check pass needs
declared-vs-derived column counts) and 2e (where qualified-
prefix completion needs CTE columns). Until then, a CTE
binding's `columns` stays empty after the body exits, and
qualified-prefix completion past `cte_alias.|` returns an
empty candidate list. The CTE-name is still visible as a
table source from inside the body (WITH RECURSIVE
self-reference works) and from outside (downstream CTE-name
validators see it).
The 2g cross-cut matrix rows for §10.3 derivation cases are
deferred along with the harvest; they will land when 2d/2e
implements the rules.
### Scope (in)
- Add the new walker node variant `Node::ScopedSubgrammar(&Node)`.
+19 -140
View File
@@ -20,7 +20,7 @@ use crate::dsl::command::{Command, Expr, RowFilter};
use crate::dsl::grammar::{
CommandNode, IdentSource, Node, NumberValidator, ValidationError, Word, expr,
shared::{column_value_list, current_column_value},
sql_expr,
sql_select,
};
use crate::dsl::walker::context::WalkContext;
use crate::dsl::value::Value;
@@ -393,141 +393,15 @@ const EXPLAIN_SHAPE: Node = Node::Choice(EXPLAIN_CHOICES);
// cycle through the sql_expr ⇄ sql_select recursion (see the
// matching note in sql_select.rs).
/// `as <alias>` — the explicit projection alias. Implicit
/// aliasing (`select a x`) is not supported: a bare alias is
/// ambiguous with the `from` keyword, whereas `as` is not.
const SELECT_ALIAS_IDENT: Node = Node::Ident {
source: IdentSource::NewName,
role: "select_alias",
validator: None,
highlight_override: None,
writes_table: false,
writes_column: false,
writes_user_listed_column: false,
writes_table_alias: false,
writes_cte_name: false,
writes_projection_alias: false,
};
static SELECT_AS_ALIAS_NODES: &[Node] = &[
Node::Word(Word::keyword("as")),
SELECT_ALIAS_IDENT,
];
static SELECT_AS_ALIAS: Node = Node::Seq(SELECT_AS_ALIAS_NODES);
/// A projection item: a SQL expression with an optional alias.
static SELECT_PROJ_ITEM_NODES: &[Node] = &[
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
Node::Optional(&SELECT_AS_ALIAS),
];
static SELECT_PROJ_ITEM: Node = Node::Seq(SELECT_PROJ_ITEM_NODES);
/// `proj_item ( , proj_item )*`.
/// `static` (not `const`) to avoid a Rust const-evaluation
/// cycle through the `sql_expr` ⇄ `sql_select` recursion. The
/// cycle is valid at link-time (statics resolve lazily) but
/// not at const-eval — see notes in sql_select.rs.
static SELECT_PROJ_LIST: Node = Node::Repeated {
inner: &SELECT_PROJ_ITEM,
separator: Some(&Node::Punct(',')),
min: 1,
};
/// `projection := '*' | proj_item ( , proj_item )*`. (`t.*`
/// qualified star is Phase 2 — it needs join scope.)
static SELECT_PROJECTION_CHOICES: &[Node] =
&[Node::Punct('*'), Node::Subgrammar(&SELECT_PROJ_LIST)];
static SELECT_PROJECTION: Node = Node::Choice(SELECT_PROJECTION_CHOICES);
/// The `FROM` table. `writes_table` so the `WHERE` / `ORDER BY`
/// expression column slots complete against this table; the
/// validator rejects the internal `__rdbms_*` metadata tables
/// (ADR-0030 §6).
const SELECT_FROM_TABLE: Node = Node::Ident {
source: IdentSource::Tables,
role: "table_name",
validator: Some(reject_internal_table),
highlight_override: None,
writes_table: true,
writes_column: false,
writes_user_listed_column: false,
writes_table_alias: false,
writes_cte_name: false,
writes_projection_alias: false,
};
/// `where <sql_expr>`.
static SELECT_WHERE_NODES: &[Node] = &[Node::Word(Word::keyword("where")), Node::Subgrammar(&sql_expr::SQL_OR_EXPR)];
static SELECT_WHERE: Node = Node::Seq(SELECT_WHERE_NODES);
/// `order by <item> ( , <item> )*`, each item a SQL expression
/// with an optional `asc` / `desc`.
const SELECT_SORT_DIR_CHOICES: &[Node] = &[
Node::Word(Word::keyword("asc")),
Node::Word(Word::keyword("desc")),
];
static SELECT_ORDER_ITEM_NODES: &[Node] = &[
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
Node::Optional(&Node::Choice(SELECT_SORT_DIR_CHOICES)),
];
static SELECT_ORDER_ITEM: Node = Node::Seq(SELECT_ORDER_ITEM_NODES);
static SELECT_ORDER_BY_NODES: &[Node] = &[
Node::Word(Word::keyword("order")),
Node::Word(Word::keyword("by")),
Node::Repeated {
inner: &SELECT_ORDER_ITEM,
separator: Some(&Node::Punct(',')),
min: 1,
},
];
static SELECT_ORDER_BY: Node = Node::Seq(SELECT_ORDER_BY_NODES);
/// `limit <n>` — reuses the `show data` non-negative-integer
/// validator (`OFFSET` is Phase 2).
static SELECT_LIMIT_NODES: &[Node] = &[
Node::Word(Word::keyword("limit")),
Node::NumberLit {
validator: Some(LIMIT_VALIDATOR),
},
];
static SELECT_LIMIT: Node = Node::Seq(SELECT_LIMIT_NODES);
/// `from <Table>` — the FROM clause is optional so that the
/// zero-table form `select 1`, `select upper('x')` (a constant
/// or function-call expression) is admitted alongside the
/// single-table form. The expression slots resolve column
/// completion against the FROM table when present; without it,
/// the engine will reject any column references.
static SELECT_FROM_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("from")),
SELECT_FROM_TABLE,
];
static SELECT_FROM_CLAUSE: Node = Node::Seq(SELECT_FROM_CLAUSE_NODES);
const SELECT_NODES: &[Node] = &[
Node::Subgrammar(&SELECT_PROJECTION),
Node::Optional(&SELECT_FROM_CLAUSE),
Node::Optional(&SELECT_WHERE),
Node::Optional(&SELECT_ORDER_BY),
Node::Optional(&SELECT_LIMIT),
// ADR-0030 §3: one statement per submission; a trailing `;`
// is tolerated.
Node::Optional(&Node::Punct(';')),
];
const SELECT_SHAPE: Node = Node::Seq(SELECT_NODES);
/// Reject a reference to an internal `__rdbms_*` metadata table
/// in a `SELECT`'s `FROM` (ADR-0030 §6 — those tables are not in
/// the query surface).
fn reject_internal_table(name: &str) -> Result<(), ValidationError> {
if name.to_ascii_lowercase().starts_with("__rdbms_") {
Err(ValidationError {
message_key: "select.internal_table",
args: vec![("table", name.to_string())],
})
} else {
Ok(())
}
}
// Phase 1's local `SELECT_*` grammar nodes have been retired in
// favour of `sql_select::SQL_SELECT_TAIL` (ADR-0032 sub-phase
// 2c). The shape definition that `data::SELECT` references now
// lives in the dedicated `sql_select` module — including the
// `reject_internal_table` validator, the `LIMIT` count
// validator, and the projection / FROM / WHERE / ORDER BY
// machinery. The full §1 grammar (JOIN, GROUP BY, HAVING,
// set-ops, qualified refs, subqueries, CTEs) is admitted as a
// natural superset.
// =================================================================
// AST builders
@@ -1027,12 +901,17 @@ pub static EXPLAIN: CommandNode = CommandNode {
help_id: Some("data.explain"),
usage_ids: &["parse.usage.explain"],};
/// SQL `SELECT` (ADR-0030 §6, ADR-0031). Advanced mode only —
/// gated by `grammar::is_advanced_only`. `help_id` is `None`
/// until the `help sql` page lands (ADR-0030 Phase 6).
/// SQL `SELECT` (ADR-0030 §6, ADR-0031, ADR-0032).
///
/// Advanced mode only — gated by `grammar::is_advanced_only`.
/// The shape is the post-`SELECT` portion of a top-level
/// statement; the registry's entry-word dispatch consumes the
/// leading `SELECT` keyword before the shape walks (sub-phase
/// 2c migration). `help_id` is `None` until the `help sql`
/// page lands (ADR-0030 Phase 6).
pub static SELECT: CommandNode = CommandNode {
entry: Word::keyword("select"),
shape: SELECT_SHAPE,
shape: Node::Subgrammar(&sql_select::SQL_SELECT_TAIL),
ast_builder: build_select,
help_id: None,
usage_ids: &["parse.usage.select"],};
+46 -4
View File
@@ -660,12 +660,54 @@ static SELECT_STATEMENT_NODES: &[Node] = &[
Node::Subgrammar(&SQL_SELECT_COMPOUND),
Node::Optional(&SEMI),
];
/// The full statement, including the optional `WITH` prefix and
/// a tolerated trailing `;`. This is what `data::SELECT`'s
/// `CommandNode` will reference once sub-phase 2c migrates the
/// Phase-1 grammar.
/// The full statement, including the optional `WITH` prefix
/// and a tolerated trailing `;`.
///
/// Used by the fragment's own tests and by any future
/// `data::WITH` `CommandNode` that dispatches `WITH …`
/// statements. Top-level `SELECT` statements (entry word
/// `select`) reference `SQL_SELECT_TAIL` instead, which omits
/// the leading `SELECT` keyword that the registry's
/// entry-word dispatch already consumed.
pub static SQL_SELECT_STATEMENT: Node = Node::Seq(SELECT_STATEMENT_NODES);
// =================================================================
// select_statement — entry-consumed form (ADR-0030 §6, 2c)
// =================================================================
/// The post-`SELECT` portion of a top-level statement.
/// `data::SELECT`'s `CommandNode` has `entry: Word::keyword
/// ("select")`, so the registry's dispatch consumes the leading
/// `SELECT` keyword before the shape walks. The shape is then
/// the rest of `select_core` (`DISTINCT/ALL`, projection,
/// FROM, WHERE, GROUP BY, HAVING), followed by the compound
/// set-op chain (each subsequent leg's `SELECT` keyword is
/// part of `SET_OP_TAIL`), the outer `ORDER BY` / `LIMIT`, and
/// a tolerated trailing `;`.
///
/// WITH-prefixed statements (`WITH x AS (…) SELECT …`) are
/// dispatched separately by entry word `with`. Adding a
/// `data::WITH` `CommandNode` is a future sub-phase; for now
/// top-level WITH falls back to the chumsky parser route, the
/// same as in Phase 1.
static SQL_SELECT_TAIL_NODES: &[Node] = &[
Node::Subgrammar(&DISTINCT_OR_ALL_OPTIONAL),
Node::Subgrammar(&PROJECTION_LIST),
Node::Optional(&FROM_CLAUSE),
Node::Optional(&WHERE_CLAUSE),
Node::Optional(&GROUP_BY_CLAUSE),
Node::Optional(&HAVING_CLAUSE),
Node::Repeated {
inner: &SET_OP_TAIL,
separator: None,
min: 0,
},
Node::Optional(&ORDER_BY_CLAUSE),
Node::Optional(&LIMIT_CLAUSE),
Node::Optional(&SEMI),
];
pub static SQL_SELECT_TAIL: Node = Node::Seq(SQL_SELECT_TAIL_NODES);
// =================================================================
// Tests
// =================================================================