grammar: migrate Phase-1 SELECT to the ADR-0032 fragment (sub-phase 2c)
The Phase-1 SQL `SELECT` grammar nodes that used to live in `src/dsl/grammar/data.rs` retire — 22 statics / consts and the `reject_internal_table` validator copy are removed, ~150 lines of grammar machinery gone. `data::SELECT.shape` now references the post-`SELECT` portion of the ADR-0032 fragment via a thin `Node::Subgrammar(&sql_select::SQL_SELECT_TAIL)`. `SQL_SELECT_TAIL` is a new export from `sql_select.rs`, parallel to `SQL_SELECT_STATEMENT`. It represents what a top-level `SELECT` statement looks like AFTER the registry's entry-word dispatch has already consumed the leading `SELECT` keyword: the DISTINCT/ALL prefix, projection list, optional FROM / WHERE / GROUP BY / HAVING, the compound set-op chain (each subsequent leg's `SELECT` is part of `SET_OP_TAIL`), outer ORDER BY / LIMIT, and a tolerated trailing `;`. WITH-prefixed statements (`WITH x AS (…) SELECT * FROM x`) are NOT in 2c's scope — they need a separate `data::WITH` `CommandNode` so the entry-word dispatch routes correctly. For now, top-level WITH continues to fall through to the chumsky parser route (the same as in Phase 1). The `SQL_SELECT_STATEMENT` static (which includes the optional WITH prefix) stays available for use by that future CommandNode or by any other consumer that needs the full statement shape. All seven Phase-1 SQL `SELECT` integration tests (`tests/sql_select.rs`) pass without modification, satisfying the 2c exit gate's "behaviour preserved" requirement. The 70 fragment unit tests and the 26 driver-level scope tests also pass — the migration is a refactor, no new tests required. Behaviour change explicitly sanctioned by ADR-0032 §8: Phase-1's `LIMIT_VALIDATOR` (positive-int-only, parse-time) is superseded by the full `sql_expr` admission. `LIMIT max(10, x)` and similar now parse; the engine constrains the value at execution time per the ADR's "grammar admits, engine rejects" posture. Plan §2b status note: the 2026-05-20 deferral of §10.3 stage 2 (CTE output-column harvest derivation) is recorded in `docs/plans/20260520-adr-0032-phase-2.md` per the user-approved deferral. Test totals: 1366 passing (unchanged), 0 failed, 1 ignored. Clippy clean. data.rs loses ~150 lines of dead grammar; the single source of truth for the SQL `SELECT` shape is now `sql_select.rs`.
This commit is contained in:
@@ -270,6 +270,34 @@ coverage).
|
|||||||
|
|
||||||
## Sub-phase 2b — `ScopedSubgrammar` + scope accumulators
|
## Sub-phase 2b — `ScopedSubgrammar` + scope accumulators
|
||||||
|
|
||||||
|
### Implementation status (2026-05-20)
|
||||||
|
|
||||||
|
Sub-phase 2b shipped in five commits (`4f89106`, `98a74b2`,
|
||||||
|
`b522d09`, `4ff054c`, and earlier 2a foundations). The
|
||||||
|
scope-accumulator infrastructure — `Node::ScopedSubgrammar`,
|
||||||
|
`ScopeFrame`, `from_scope_stack`, the new Ident flags
|
||||||
|
(`writes_table_alias`, `writes_cte_name`,
|
||||||
|
`writes_projection_alias`), and the sql_expr §5 / §6 additive
|
||||||
|
extensions — is complete and exercised by 26 driver-level
|
||||||
|
tests.
|
||||||
|
|
||||||
|
**Deferral, explicitly user-approved:** §10.3 stage 2 (the six
|
||||||
|
CTE output-column derivation rules) is NOT implemented in 2b.
|
||||||
|
Stage 1 (placeholder CTE binding push) IS implemented; stage 2
|
||||||
|
will fold into 2d (where the new arity-check pass needs
|
||||||
|
declared-vs-derived column counts) and 2e (where qualified-
|
||||||
|
prefix completion needs CTE columns). Until then, a CTE
|
||||||
|
binding's `columns` stays empty after the body exits, and
|
||||||
|
qualified-prefix completion past `cte_alias.|` returns an
|
||||||
|
empty candidate list. The CTE-name is still visible as a
|
||||||
|
table source from inside the body (WITH RECURSIVE
|
||||||
|
self-reference works) and from outside (downstream CTE-name
|
||||||
|
validators see it).
|
||||||
|
|
||||||
|
The 2g cross-cut matrix rows for §10.3 derivation cases are
|
||||||
|
deferred along with the harvest; they will land when 2d/2e
|
||||||
|
implements the rules.
|
||||||
|
|
||||||
### Scope (in)
|
### Scope (in)
|
||||||
|
|
||||||
- Add the new walker node variant `Node::ScopedSubgrammar(&Node)`.
|
- Add the new walker node variant `Node::ScopedSubgrammar(&Node)`.
|
||||||
|
|||||||
+19
-140
@@ -20,7 +20,7 @@ use crate::dsl::command::{Command, Expr, RowFilter};
|
|||||||
use crate::dsl::grammar::{
|
use crate::dsl::grammar::{
|
||||||
CommandNode, IdentSource, Node, NumberValidator, ValidationError, Word, expr,
|
CommandNode, IdentSource, Node, NumberValidator, ValidationError, Word, expr,
|
||||||
shared::{column_value_list, current_column_value},
|
shared::{column_value_list, current_column_value},
|
||||||
sql_expr,
|
sql_select,
|
||||||
};
|
};
|
||||||
use crate::dsl::walker::context::WalkContext;
|
use crate::dsl::walker::context::WalkContext;
|
||||||
use crate::dsl::value::Value;
|
use crate::dsl::value::Value;
|
||||||
@@ -393,141 +393,15 @@ const EXPLAIN_SHAPE: Node = Node::Choice(EXPLAIN_CHOICES);
|
|||||||
// cycle through the sql_expr ⇄ sql_select recursion (see the
|
// cycle through the sql_expr ⇄ sql_select recursion (see the
|
||||||
// matching note in sql_select.rs).
|
// matching note in sql_select.rs).
|
||||||
|
|
||||||
/// `as <alias>` — the explicit projection alias. Implicit
|
// Phase 1's local `SELECT_*` grammar nodes have been retired in
|
||||||
/// aliasing (`select a x`) is not supported: a bare alias is
|
// favour of `sql_select::SQL_SELECT_TAIL` (ADR-0032 sub-phase
|
||||||
/// ambiguous with the `from` keyword, whereas `as` is not.
|
// 2c). The shape definition that `data::SELECT` references now
|
||||||
const SELECT_ALIAS_IDENT: Node = Node::Ident {
|
// lives in the dedicated `sql_select` module — including the
|
||||||
source: IdentSource::NewName,
|
// `reject_internal_table` validator, the `LIMIT` count
|
||||||
role: "select_alias",
|
// validator, and the projection / FROM / WHERE / ORDER BY
|
||||||
validator: None,
|
// machinery. The full §1 grammar (JOIN, GROUP BY, HAVING,
|
||||||
highlight_override: None,
|
// set-ops, qualified refs, subqueries, CTEs) is admitted as a
|
||||||
writes_table: false,
|
// natural superset.
|
||||||
writes_column: false,
|
|
||||||
writes_user_listed_column: false,
|
|
||||||
writes_table_alias: false,
|
|
||||||
writes_cte_name: false,
|
|
||||||
writes_projection_alias: false,
|
|
||||||
};
|
|
||||||
static SELECT_AS_ALIAS_NODES: &[Node] = &[
|
|
||||||
Node::Word(Word::keyword("as")),
|
|
||||||
SELECT_ALIAS_IDENT,
|
|
||||||
];
|
|
||||||
static SELECT_AS_ALIAS: Node = Node::Seq(SELECT_AS_ALIAS_NODES);
|
|
||||||
|
|
||||||
/// A projection item: a SQL expression with an optional alias.
|
|
||||||
static SELECT_PROJ_ITEM_NODES: &[Node] = &[
|
|
||||||
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
||||||
Node::Optional(&SELECT_AS_ALIAS),
|
|
||||||
];
|
|
||||||
static SELECT_PROJ_ITEM: Node = Node::Seq(SELECT_PROJ_ITEM_NODES);
|
|
||||||
|
|
||||||
/// `proj_item ( , proj_item )*`.
|
|
||||||
/// `static` (not `const`) to avoid a Rust const-evaluation
|
|
||||||
/// cycle through the `sql_expr` ⇄ `sql_select` recursion. The
|
|
||||||
/// cycle is valid at link-time (statics resolve lazily) but
|
|
||||||
/// not at const-eval — see notes in sql_select.rs.
|
|
||||||
static SELECT_PROJ_LIST: Node = Node::Repeated {
|
|
||||||
inner: &SELECT_PROJ_ITEM,
|
|
||||||
separator: Some(&Node::Punct(',')),
|
|
||||||
min: 1,
|
|
||||||
};
|
|
||||||
|
|
||||||
/// `projection := '*' | proj_item ( , proj_item )*`. (`t.*`
|
|
||||||
/// qualified star is Phase 2 — it needs join scope.)
|
|
||||||
static SELECT_PROJECTION_CHOICES: &[Node] =
|
|
||||||
&[Node::Punct('*'), Node::Subgrammar(&SELECT_PROJ_LIST)];
|
|
||||||
static SELECT_PROJECTION: Node = Node::Choice(SELECT_PROJECTION_CHOICES);
|
|
||||||
|
|
||||||
/// The `FROM` table. `writes_table` so the `WHERE` / `ORDER BY`
|
|
||||||
/// expression column slots complete against this table; the
|
|
||||||
/// validator rejects the internal `__rdbms_*` metadata tables
|
|
||||||
/// (ADR-0030 §6).
|
|
||||||
const SELECT_FROM_TABLE: Node = Node::Ident {
|
|
||||||
source: IdentSource::Tables,
|
|
||||||
role: "table_name",
|
|
||||||
validator: Some(reject_internal_table),
|
|
||||||
highlight_override: None,
|
|
||||||
writes_table: true,
|
|
||||||
writes_column: false,
|
|
||||||
writes_user_listed_column: false,
|
|
||||||
writes_table_alias: false,
|
|
||||||
writes_cte_name: false,
|
|
||||||
writes_projection_alias: false,
|
|
||||||
};
|
|
||||||
|
|
||||||
/// `where <sql_expr>`.
|
|
||||||
static SELECT_WHERE_NODES: &[Node] = &[Node::Word(Word::keyword("where")), Node::Subgrammar(&sql_expr::SQL_OR_EXPR)];
|
|
||||||
static SELECT_WHERE: Node = Node::Seq(SELECT_WHERE_NODES);
|
|
||||||
|
|
||||||
/// `order by <item> ( , <item> )*`, each item a SQL expression
|
|
||||||
/// with an optional `asc` / `desc`.
|
|
||||||
const SELECT_SORT_DIR_CHOICES: &[Node] = &[
|
|
||||||
Node::Word(Word::keyword("asc")),
|
|
||||||
Node::Word(Word::keyword("desc")),
|
|
||||||
];
|
|
||||||
static SELECT_ORDER_ITEM_NODES: &[Node] = &[
|
|
||||||
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
||||||
Node::Optional(&Node::Choice(SELECT_SORT_DIR_CHOICES)),
|
|
||||||
];
|
|
||||||
static SELECT_ORDER_ITEM: Node = Node::Seq(SELECT_ORDER_ITEM_NODES);
|
|
||||||
static SELECT_ORDER_BY_NODES: &[Node] = &[
|
|
||||||
Node::Word(Word::keyword("order")),
|
|
||||||
Node::Word(Word::keyword("by")),
|
|
||||||
Node::Repeated {
|
|
||||||
inner: &SELECT_ORDER_ITEM,
|
|
||||||
separator: Some(&Node::Punct(',')),
|
|
||||||
min: 1,
|
|
||||||
},
|
|
||||||
];
|
|
||||||
static SELECT_ORDER_BY: Node = Node::Seq(SELECT_ORDER_BY_NODES);
|
|
||||||
|
|
||||||
/// `limit <n>` — reuses the `show data` non-negative-integer
|
|
||||||
/// validator (`OFFSET` is Phase 2).
|
|
||||||
static SELECT_LIMIT_NODES: &[Node] = &[
|
|
||||||
Node::Word(Word::keyword("limit")),
|
|
||||||
Node::NumberLit {
|
|
||||||
validator: Some(LIMIT_VALIDATOR),
|
|
||||||
},
|
|
||||||
];
|
|
||||||
static SELECT_LIMIT: Node = Node::Seq(SELECT_LIMIT_NODES);
|
|
||||||
|
|
||||||
/// `from <Table>` — the FROM clause is optional so that the
|
|
||||||
/// zero-table form `select 1`, `select upper('x')` (a constant
|
|
||||||
/// or function-call expression) is admitted alongside the
|
|
||||||
/// single-table form. The expression slots resolve column
|
|
||||||
/// completion against the FROM table when present; without it,
|
|
||||||
/// the engine will reject any column references.
|
|
||||||
static SELECT_FROM_CLAUSE_NODES: &[Node] = &[
|
|
||||||
Node::Word(Word::keyword("from")),
|
|
||||||
SELECT_FROM_TABLE,
|
|
||||||
];
|
|
||||||
static SELECT_FROM_CLAUSE: Node = Node::Seq(SELECT_FROM_CLAUSE_NODES);
|
|
||||||
|
|
||||||
const SELECT_NODES: &[Node] = &[
|
|
||||||
Node::Subgrammar(&SELECT_PROJECTION),
|
|
||||||
Node::Optional(&SELECT_FROM_CLAUSE),
|
|
||||||
Node::Optional(&SELECT_WHERE),
|
|
||||||
Node::Optional(&SELECT_ORDER_BY),
|
|
||||||
Node::Optional(&SELECT_LIMIT),
|
|
||||||
// ADR-0030 §3: one statement per submission; a trailing `;`
|
|
||||||
// is tolerated.
|
|
||||||
Node::Optional(&Node::Punct(';')),
|
|
||||||
];
|
|
||||||
const SELECT_SHAPE: Node = Node::Seq(SELECT_NODES);
|
|
||||||
|
|
||||||
/// Reject a reference to an internal `__rdbms_*` metadata table
|
|
||||||
/// in a `SELECT`'s `FROM` (ADR-0030 §6 — those tables are not in
|
|
||||||
/// the query surface).
|
|
||||||
fn reject_internal_table(name: &str) -> Result<(), ValidationError> {
|
|
||||||
if name.to_ascii_lowercase().starts_with("__rdbms_") {
|
|
||||||
Err(ValidationError {
|
|
||||||
message_key: "select.internal_table",
|
|
||||||
args: vec![("table", name.to_string())],
|
|
||||||
})
|
|
||||||
} else {
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// =================================================================
|
// =================================================================
|
||||||
// AST builders
|
// AST builders
|
||||||
@@ -1027,12 +901,17 @@ pub static EXPLAIN: CommandNode = CommandNode {
|
|||||||
help_id: Some("data.explain"),
|
help_id: Some("data.explain"),
|
||||||
usage_ids: &["parse.usage.explain"],};
|
usage_ids: &["parse.usage.explain"],};
|
||||||
|
|
||||||
/// SQL `SELECT` (ADR-0030 §6, ADR-0031). Advanced mode only —
|
/// SQL `SELECT` (ADR-0030 §6, ADR-0031, ADR-0032).
|
||||||
/// gated by `grammar::is_advanced_only`. `help_id` is `None`
|
///
|
||||||
/// until the `help sql` page lands (ADR-0030 Phase 6).
|
/// Advanced mode only — gated by `grammar::is_advanced_only`.
|
||||||
|
/// The shape is the post-`SELECT` portion of a top-level
|
||||||
|
/// statement; the registry's entry-word dispatch consumes the
|
||||||
|
/// leading `SELECT` keyword before the shape walks (sub-phase
|
||||||
|
/// 2c migration). `help_id` is `None` until the `help sql`
|
||||||
|
/// page lands (ADR-0030 Phase 6).
|
||||||
pub static SELECT: CommandNode = CommandNode {
|
pub static SELECT: CommandNode = CommandNode {
|
||||||
entry: Word::keyword("select"),
|
entry: Word::keyword("select"),
|
||||||
shape: SELECT_SHAPE,
|
shape: Node::Subgrammar(&sql_select::SQL_SELECT_TAIL),
|
||||||
ast_builder: build_select,
|
ast_builder: build_select,
|
||||||
help_id: None,
|
help_id: None,
|
||||||
usage_ids: &["parse.usage.select"],};
|
usage_ids: &["parse.usage.select"],};
|
||||||
|
|||||||
@@ -660,12 +660,54 @@ static SELECT_STATEMENT_NODES: &[Node] = &[
|
|||||||
Node::Subgrammar(&SQL_SELECT_COMPOUND),
|
Node::Subgrammar(&SQL_SELECT_COMPOUND),
|
||||||
Node::Optional(&SEMI),
|
Node::Optional(&SEMI),
|
||||||
];
|
];
|
||||||
/// The full statement, including the optional `WITH` prefix and
|
/// The full statement, including the optional `WITH` prefix
|
||||||
/// a tolerated trailing `;`. This is what `data::SELECT`'s
|
/// and a tolerated trailing `;`.
|
||||||
/// `CommandNode` will reference once sub-phase 2c migrates the
|
///
|
||||||
/// Phase-1 grammar.
|
/// Used by the fragment's own tests and by any future
|
||||||
|
/// `data::WITH` `CommandNode` that dispatches `WITH …`
|
||||||
|
/// statements. Top-level `SELECT` statements (entry word
|
||||||
|
/// `select`) reference `SQL_SELECT_TAIL` instead, which omits
|
||||||
|
/// the leading `SELECT` keyword that the registry's
|
||||||
|
/// entry-word dispatch already consumed.
|
||||||
pub static SQL_SELECT_STATEMENT: Node = Node::Seq(SELECT_STATEMENT_NODES);
|
pub static SQL_SELECT_STATEMENT: Node = Node::Seq(SELECT_STATEMENT_NODES);
|
||||||
|
|
||||||
|
// =================================================================
|
||||||
|
// select_statement — entry-consumed form (ADR-0030 §6, 2c)
|
||||||
|
// =================================================================
|
||||||
|
|
||||||
|
/// The post-`SELECT` portion of a top-level statement.
|
||||||
|
/// `data::SELECT`'s `CommandNode` has `entry: Word::keyword
|
||||||
|
/// ("select")`, so the registry's dispatch consumes the leading
|
||||||
|
/// `SELECT` keyword before the shape walks. The shape is then
|
||||||
|
/// the rest of `select_core` (`DISTINCT/ALL`, projection,
|
||||||
|
/// FROM, WHERE, GROUP BY, HAVING), followed by the compound
|
||||||
|
/// set-op chain (each subsequent leg's `SELECT` keyword is
|
||||||
|
/// part of `SET_OP_TAIL`), the outer `ORDER BY` / `LIMIT`, and
|
||||||
|
/// a tolerated trailing `;`.
|
||||||
|
///
|
||||||
|
/// WITH-prefixed statements (`WITH x AS (…) SELECT …`) are
|
||||||
|
/// dispatched separately by entry word `with`. Adding a
|
||||||
|
/// `data::WITH` `CommandNode` is a future sub-phase; for now
|
||||||
|
/// top-level WITH falls back to the chumsky parser route, the
|
||||||
|
/// same as in Phase 1.
|
||||||
|
static SQL_SELECT_TAIL_NODES: &[Node] = &[
|
||||||
|
Node::Subgrammar(&DISTINCT_OR_ALL_OPTIONAL),
|
||||||
|
Node::Subgrammar(&PROJECTION_LIST),
|
||||||
|
Node::Optional(&FROM_CLAUSE),
|
||||||
|
Node::Optional(&WHERE_CLAUSE),
|
||||||
|
Node::Optional(&GROUP_BY_CLAUSE),
|
||||||
|
Node::Optional(&HAVING_CLAUSE),
|
||||||
|
Node::Repeated {
|
||||||
|
inner: &SET_OP_TAIL,
|
||||||
|
separator: None,
|
||||||
|
min: 0,
|
||||||
|
},
|
||||||
|
Node::Optional(&ORDER_BY_CLAUSE),
|
||||||
|
Node::Optional(&LIMIT_CLAUSE),
|
||||||
|
Node::Optional(&SEMI),
|
||||||
|
];
|
||||||
|
pub static SQL_SELECT_TAIL: Node = Node::Seq(SQL_SELECT_TAIL_NODES);
|
||||||
|
|
||||||
// =================================================================
|
// =================================================================
|
||||||
// Tests
|
// Tests
|
||||||
// =================================================================
|
// =================================================================
|
||||||
|
|||||||
Reference in New Issue
Block a user