grammar: sql_expr additive extensions for §5/§6, CTE body rewires to ScopedSubgrammar

Sub-phase 2b checkpoint 2 — closes the recursion loop between
sql_expr.rs and sql_select.rs so subquery expressions and
qualified column refs become structurally valid in every SQL
context where they belong.

sql_expr.rs:

- §5 qualified-ref tail. `name_or_call` gains a `.identifier`
  suffix as a Choice sibling of the function-call `(args)`
  tail. The leading identifier is still matched once (per
  ADR-0031 §1's factoring); the optional tail dispatches
  between the two suffixes by their first character (`.` vs
  `(`).
- §6.1 scalar subquery as primary. The `(or_expr)` and
  `(SELECT …)` branches share the leading `(`; the first
  inside token (`SELECT` → subquery, anything else →
  expression) discriminates. The subquery recurses through
  `Node::ScopedSubgrammar(&sql_select::SQL_SELECT_COMPOUND)`.
- §6.2 IN (subquery) predicate. Sibling of the existing
  IN-value-list; same `(` factoring, same dispatch.
- §6.3 [NOT] EXISTS primary. Bare `EXISTS (compound_select)`
  lives in `primary`; `NOT EXISTS` falls out via the existing
  `not_expr := NOT not_expr` tier above `primary`.

sql_select.rs:

- CTE body recursion rewires `Node::Subgrammar` →
  `Node::ScopedSubgrammar`, matching §10.2. The top-level
  statement's COMPOUND embedding stays plain Subgrammar — the
  implicit bottom frame is the right scope for a statement-
  level SELECT.

Structural side-effect — const-eval cycle workaround:

Closing the sql_expr ⇄ sql_select reference loop made Rust's
const-evaluator follow the cycle through every `const Node`
that transitively reaches it. Mirroring sql_expr.rs's existing
pattern, composition Nodes in sql_select.rs (Seq / Choice /
Optional / Repeated / Lookahead) are now `static Node` and
appear in slice positions through `Node::Subgrammar(&NAME)`
wraps; only leaf items (Punct, Word, Ident) remain `const`.
Same workaround applies to data.rs's SELECT_PROJ_LIST /
SELECT_PROJECTION chain and the inlined `SQL_EXPR` reference.
Statics resolve lazily at link time, so the cycle is valid;
const-eval is not, and the named `const SQL_EXPR` alias is
gone in both files (replaced with the inline `Node::Subgrammar
(&sql_expr::SQL_OR_EXPR)` expression at every use site).

Test coverage:

- sql_expr.rs gains 11 new tests for qualified refs, scalar
  subquery, IN-subquery, EXISTS / NOT EXISTS, nested
  subqueries, and the existing IN-value-list form (regression).
- sql_select.rs gains 7 new tests for qualified refs in WHERE,
  scalar subqueries in WHERE / projection, IN / EXISTS / NOT
  EXISTS in WHERE, nested subqueries, and qualified refs
  inside CTE bodies.
- All 70 prior sql_select tests still pass; the 2a baseline
  is preserved.

`(WITH x AS (…) SELECT * FROM x)` is explicitly NOT admitted
as a scalar subquery — ADR-0032 §1 / §9 wire subqueries to
SQL_SELECT_COMPOUND, which omits the outer with_clause. WITH
remains a statement-level-only construct. Documented in the
relevant test.

Test totals: 1333 → 1351 passing, 0 failed, 1 ignored
(unchanged). Clippy clean.
This commit is contained in:
claude@clouddev1
2026-05-20 11:47:27 +00:00
parent 4f89106a63
commit 98a74b23d3
3 changed files with 314 additions and 98 deletions
+142 -78
View File
@@ -106,17 +106,20 @@ const LPAREN: Node = Node::Punct('(');
const RPAREN: Node = Node::Punct(')');
const SEMI: Node = Node::Punct(';');
/// SQL expression slot — recursion into ADR-0031's fragment
/// through `Node::Subgrammar`. Stays `Subgrammar` (not
/// `ScopedSubgrammar`) — `sql_expr` recursion is part of the
/// precedence ladder, not a new lexical scope (ADR-0032 §10.2).
const SQL_EXPR: Node = Node::Subgrammar(&sql_expr::SQL_OR_EXPR);
// SQL expression slot — `Node::Subgrammar(&sql_expr::SQL_OR_EXPR)`
// is inlined at each use site rather than aliased through a
// named `const`. The `const SQL_EXPR: Node = …` form triggered
// a Rust const-evaluation cycle through the sql_expr ⇄
// sql_select recursion (valid at link time, where statics
// resolve lazily, but not at const-eval). Stays as a plain
// `Subgrammar` — sql_expr recursion is part of the precedence
// ladder, not a new lexical scope (ADR-0032 §10.2).
/// A node that never matches. Used as the "no" branch of
/// lookahead-driven disambiguation: an empty `Choice` walks to
/// `NoMatch`, which `Optional` / `Choice` gracefully treat as
/// "skip" or "fall through to the next branch".
const EMPTY_NOMATCH: Node = Node::Choice(&[]);
static EMPTY_NOMATCH: Node = Node::Choice(&[]);
// =================================================================
// Bare-alias dispatch (ADR-0032 §1)
@@ -168,10 +171,10 @@ fn projection_bare_alias_factory(
Some(word)
if PROJECTION_FOLLOW_SET.iter().any(|k| *k == word) =>
{
EMPTY_NOMATCH
Node::Subgrammar(&EMPTY_NOMATCH)
}
Some(_) => BARE_ALIAS_IDENT,
None => EMPTY_NOMATCH,
None => Node::Subgrammar(&EMPTY_NOMATCH),
}
}
@@ -184,10 +187,10 @@ fn table_source_bare_alias_factory(
Some(word)
if TABLE_SOURCE_FOLLOW_SET.iter().any(|k| *k == word) =>
{
EMPTY_NOMATCH
Node::Subgrammar(&EMPTY_NOMATCH)
}
Some(_) => BARE_ALIAS_IDENT,
None => EMPTY_NOMATCH,
None => Node::Subgrammar(&EMPTY_NOMATCH),
}
}
@@ -209,23 +212,23 @@ static AS_ALIAS_NODES: &[Node] = &[
Node::Word(Word::keyword("as")),
BARE_ALIAS_IDENT,
];
const AS_ALIAS_EXPLICIT: Node = Node::Seq(AS_ALIAS_NODES);
static AS_ALIAS_EXPLICIT: Node = Node::Seq(AS_ALIAS_NODES);
static PROJECTION_ALIAS_CHOICES: &[Node] = &[
AS_ALIAS_EXPLICIT,
Node::Subgrammar(&AS_ALIAS_EXPLICIT),
Node::Lookahead(projection_bare_alias_factory),
];
const PROJECTION_ALIAS_CHOICE: Node = Node::Choice(PROJECTION_ALIAS_CHOICES);
const PROJECTION_ALIAS_OPTIONAL: Node =
static PROJECTION_ALIAS_CHOICE: Node = Node::Choice(PROJECTION_ALIAS_CHOICES);
static PROJECTION_ALIAS_OPTIONAL: Node =
Node::Optional(&PROJECTION_ALIAS_CHOICE);
static TABLE_SOURCE_ALIAS_CHOICES: &[Node] = &[
AS_ALIAS_EXPLICIT,
Node::Subgrammar(&AS_ALIAS_EXPLICIT),
Node::Lookahead(table_source_bare_alias_factory),
];
const TABLE_SOURCE_ALIAS_CHOICE: Node =
static TABLE_SOURCE_ALIAS_CHOICE: Node =
Node::Choice(TABLE_SOURCE_ALIAS_CHOICES);
const TABLE_SOURCE_ALIAS_OPTIONAL: Node =
static TABLE_SOURCE_ALIAS_OPTIONAL: Node =
Node::Optional(&TABLE_SOURCE_ALIAS_CHOICE);
// =================================================================
@@ -247,13 +250,13 @@ static QUALIFIED_STAR_NODES: &[Node] = &[
Node::Punct('.'),
Node::Punct('*'),
];
const QUALIFIED_STAR: Node = Node::Seq(QUALIFIED_STAR_NODES);
static QUALIFIED_STAR: Node = Node::Seq(QUALIFIED_STAR_NODES);
static PROJECTION_EXPR_ITEM_NODES: &[Node] = &[
SQL_EXPR,
PROJECTION_ALIAS_OPTIONAL,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
Node::Subgrammar(&PROJECTION_ALIAS_OPTIONAL),
];
const PROJECTION_EXPR_ITEM: Node = Node::Seq(PROJECTION_EXPR_ITEM_NODES);
static PROJECTION_EXPR_ITEM: Node = Node::Seq(PROJECTION_EXPR_ITEM_NODES);
/// Dispatch one projection item via a 3-token lookahead.
///
@@ -280,16 +283,16 @@ fn projection_item_factory(
if bytes.get(after_ident) == Some(&b'.') {
let after_dot = skip_whitespace(source, after_ident + 1);
if bytes.get(after_dot) == Some(&b'*') {
return QUALIFIED_STAR;
return Node::Subgrammar(&QUALIFIED_STAR);
}
}
}
PROJECTION_EXPR_ITEM
Node::Subgrammar(&PROJECTION_EXPR_ITEM)
}
const PROJECTION_ITEM: Node = Node::Lookahead(projection_item_factory);
static PROJECTION_ITEM: Node = Node::Lookahead(projection_item_factory);
const PROJECTION_LIST: Node = Node::Repeated {
static PROJECTION_LIST: Node = Node::Repeated {
inner: &PROJECTION_ITEM,
separator: Some(&COMMA),
min: 1,
@@ -303,8 +306,8 @@ static DISTINCT_OR_ALL_CHOICES: &[Node] = &[
Node::Word(Word::keyword("distinct")),
Node::Word(Word::keyword("all")),
];
const DISTINCT_OR_ALL_CHOICE: Node = Node::Choice(DISTINCT_OR_ALL_CHOICES);
const DISTINCT_OR_ALL_OPTIONAL: Node =
static DISTINCT_OR_ALL_CHOICE: Node = Node::Choice(DISTINCT_OR_ALL_CHOICES);
static DISTINCT_OR_ALL_OPTIONAL: Node =
Node::Optional(&DISTINCT_OR_ALL_CHOICE);
// =================================================================
@@ -323,9 +326,9 @@ const TABLE_NAME_IDENT: Node = Node::Ident {
static TABLE_SOURCE_NODES: &[Node] = &[
TABLE_NAME_IDENT,
TABLE_SOURCE_ALIAS_OPTIONAL,
Node::Subgrammar(&TABLE_SOURCE_ALIAS_OPTIONAL),
];
const TABLE_SOURCE: Node = Node::Seq(TABLE_SOURCE_NODES);
static TABLE_SOURCE: Node = Node::Seq(TABLE_SOURCE_NODES);
// =================================================================
// JOIN flavours
@@ -333,7 +336,7 @@ const TABLE_SOURCE: Node = Node::Seq(TABLE_SOURCE_NODES);
const JOIN_WORD: Node = Node::Word(Word::keyword("join"));
const ON_WORD: Node = Node::Word(Word::keyword("on"));
const OUTER_OPTIONAL: Node =
static OUTER_OPTIONAL: Node =
Node::Optional(&Node::Word(Word::keyword("outer")));
// `INNER JOIN` and bare `JOIN` are split into two Choice
@@ -344,49 +347,49 @@ const OUTER_OPTIONAL: Node =
static INNER_JOIN_NODES: &[Node] = &[
Node::Word(Word::keyword("inner")),
JOIN_WORD,
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
ON_WORD,
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
static BARE_JOIN_NODES: &[Node] = &[
JOIN_WORD,
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
ON_WORD,
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
static LEFT_JOIN_NODES: &[Node] = &[
Node::Word(Word::keyword("left")),
OUTER_OPTIONAL,
Node::Subgrammar(&OUTER_OPTIONAL),
JOIN_WORD,
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
ON_WORD,
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
static RIGHT_JOIN_NODES: &[Node] = &[
Node::Word(Word::keyword("right")),
OUTER_OPTIONAL,
Node::Subgrammar(&OUTER_OPTIONAL),
JOIN_WORD,
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
ON_WORD,
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
static FULL_JOIN_NODES: &[Node] = &[
Node::Word(Word::keyword("full")),
OUTER_OPTIONAL,
Node::Subgrammar(&OUTER_OPTIONAL),
JOIN_WORD,
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
ON_WORD,
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
static CROSS_JOIN_NODES: &[Node] = &[
Node::Word(Word::keyword("cross")),
JOIN_WORD,
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
];
/// JOIN flavour dispatch. Each branch has a distinct leading
@@ -401,7 +404,7 @@ static JOIN_CLAUSE_CHOICES: &[Node] = &[
Node::Seq(INNER_JOIN_NODES),
Node::Seq(BARE_JOIN_NODES),
];
const JOIN_CLAUSE: Node = Node::Choice(JOIN_CLAUSE_CHOICES);
static JOIN_CLAUSE: Node = Node::Choice(JOIN_CLAUSE_CHOICES);
// =================================================================
// FROM / WHERE / GROUP BY / HAVING
@@ -409,37 +412,37 @@ const JOIN_CLAUSE: Node = Node::Choice(JOIN_CLAUSE_CHOICES);
static FROM_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("from")),
TABLE_SOURCE,
Node::Subgrammar(&TABLE_SOURCE),
Node::Repeated {
inner: &JOIN_CLAUSE,
separator: None,
min: 0,
},
];
const FROM_CLAUSE: Node = Node::Seq(FROM_CLAUSE_NODES);
static FROM_CLAUSE: Node = Node::Seq(FROM_CLAUSE_NODES);
static WHERE_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("where")),
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
const WHERE_CLAUSE: Node = Node::Seq(WHERE_CLAUSE_NODES);
static WHERE_CLAUSE: Node = Node::Seq(WHERE_CLAUSE_NODES);
static GROUP_BY_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("group")),
Node::Word(Word::keyword("by")),
Node::Repeated {
inner: &SQL_EXPR,
inner: &Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
separator: Some(&COMMA),
min: 1,
},
];
const GROUP_BY_CLAUSE: Node = Node::Seq(GROUP_BY_CLAUSE_NODES);
static GROUP_BY_CLAUSE: Node = Node::Seq(GROUP_BY_CLAUSE_NODES);
static HAVING_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("having")),
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
const HAVING_CLAUSE: Node = Node::Seq(HAVING_CLAUSE_NODES);
static HAVING_CLAUSE: Node = Node::Seq(HAVING_CLAUSE_NODES);
// =================================================================
// ORDER BY / LIMIT / OFFSET
@@ -449,12 +452,12 @@ static ASC_DESC_CHOICES: &[Node] = &[
Node::Word(Word::keyword("asc")),
Node::Word(Word::keyword("desc")),
];
const ASC_DESC_CHOICE: Node = Node::Choice(ASC_DESC_CHOICES);
static ASC_DESC_CHOICE: Node = Node::Choice(ASC_DESC_CHOICES);
static ORDER_ITEM_NODES: &[Node] = &[
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
Node::Optional(&ASC_DESC_CHOICE),
];
const ORDER_ITEM: Node = Node::Seq(ORDER_ITEM_NODES);
static ORDER_ITEM: Node = Node::Seq(ORDER_ITEM_NODES);
static ORDER_BY_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("order")),
@@ -465,21 +468,21 @@ static ORDER_BY_CLAUSE_NODES: &[Node] = &[
min: 1,
},
];
const ORDER_BY_CLAUSE: Node = Node::Seq(ORDER_BY_CLAUSE_NODES);
static ORDER_BY_CLAUSE: Node = Node::Seq(ORDER_BY_CLAUSE_NODES);
static OFFSET_NODES: &[Node] = &[
Node::Word(Word::keyword("offset")),
SQL_EXPR,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
];
const OFFSET_SEQ: Node = Node::Seq(OFFSET_NODES);
const OFFSET_OPTIONAL: Node = Node::Optional(&OFFSET_SEQ);
static OFFSET_SEQ: Node = Node::Seq(OFFSET_NODES);
static OFFSET_OPTIONAL: Node = Node::Optional(&OFFSET_SEQ);
static LIMIT_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("limit")),
SQL_EXPR,
OFFSET_OPTIONAL,
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
Node::Subgrammar(&OFFSET_OPTIONAL),
];
const LIMIT_CLAUSE: Node = Node::Seq(LIMIT_CLAUSE_NODES);
static LIMIT_CLAUSE: Node = Node::Seq(LIMIT_CLAUSE_NODES);
// =================================================================
// select_core (per-leg of a compound)
@@ -487,14 +490,14 @@ const LIMIT_CLAUSE: Node = Node::Seq(LIMIT_CLAUSE_NODES);
static SELECT_CORE_NODES: &[Node] = &[
Node::Word(Word::keyword("select")),
DISTINCT_OR_ALL_OPTIONAL,
PROJECTION_LIST,
Node::Subgrammar(&DISTINCT_OR_ALL_OPTIONAL),
Node::Subgrammar(&PROJECTION_LIST),
Node::Optional(&FROM_CLAUSE),
Node::Optional(&WHERE_CLAUSE),
Node::Optional(&GROUP_BY_CLAUSE),
Node::Optional(&HAVING_CLAUSE),
];
const SELECT_CORE: Node = Node::Seq(SELECT_CORE_NODES);
static SELECT_CORE: Node = Node::Seq(SELECT_CORE_NODES);
// =================================================================
// compound_select
@@ -518,13 +521,14 @@ static SET_OP_CHOICES: &[Node] = &[
Node::Word(Word::keyword("intersect")),
Node::Word(Word::keyword("except")),
];
const SET_OP: Node = Node::Choice(SET_OP_CHOICES);
static SET_OP: Node = Node::Choice(SET_OP_CHOICES);
static SET_OP_TAIL_NODES: &[Node] = &[SET_OP, SELECT_CORE];
const SET_OP_TAIL: Node = Node::Seq(SET_OP_TAIL_NODES);
static SET_OP_TAIL_NODES: &[Node] =
&[Node::Subgrammar(&SET_OP), Node::Subgrammar(&SELECT_CORE)];
static SET_OP_TAIL: Node = Node::Seq(SET_OP_TAIL_NODES);
static COMPOUND_SELECT_NODES: &[Node] = &[
SELECT_CORE,
Node::Subgrammar(&SELECT_CORE),
Node::Repeated {
inner: &SET_OP_TAIL,
separator: None,
@@ -572,24 +576,28 @@ static CTE_COLUMN_LIST_NODES: &[Node] = &[
},
RPAREN,
];
const CTE_COLUMN_LIST_SEQ: Node = Node::Seq(CTE_COLUMN_LIST_NODES);
const CTE_COLUMN_LIST_OPTIONAL: Node =
static CTE_COLUMN_LIST_SEQ: Node = Node::Seq(CTE_COLUMN_LIST_NODES);
static CTE_COLUMN_LIST_OPTIONAL: Node =
Node::Optional(&CTE_COLUMN_LIST_SEQ);
// CTE body recursion pushes a fresh lexical scope frame (ADR-
// 0032 §4 / §10.2). Subqueries in `sql_expr.rs` do the same;
// the top-level statement's own COMPOUND embedding does not
// (it shares the implicit bottom frame).
static CTE_BODY_NODES: &[Node] = &[
LPAREN,
Node::Subgrammar(&SQL_SELECT_COMPOUND),
Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND),
RPAREN,
];
const CTE_BODY: Node = Node::Seq(CTE_BODY_NODES);
static CTE_BODY: Node = Node::Seq(CTE_BODY_NODES);
static CTE_DEF_NODES: &[Node] = &[
CTE_NAME_IDENT,
CTE_COLUMN_LIST_OPTIONAL,
Node::Subgrammar(&CTE_COLUMN_LIST_OPTIONAL),
Node::Word(Word::keyword("as")),
CTE_BODY,
Node::Subgrammar(&CTE_BODY),
];
const CTE_DEF: Node = Node::Seq(CTE_DEF_NODES);
static CTE_DEF: Node = Node::Seq(CTE_DEF_NODES);
static WITH_CLAUSE_NODES: &[Node] = &[
Node::Word(Word::keyword("with")),
@@ -600,7 +608,7 @@ static WITH_CLAUSE_NODES: &[Node] = &[
min: 1,
},
];
const WITH_CLAUSE: Node = Node::Seq(WITH_CLAUSE_NODES);
static WITH_CLAUSE: Node = Node::Seq(WITH_CLAUSE_NODES);
// =================================================================
// select_statement — the fragment entry point
@@ -1168,4 +1176,60 @@ mod tests {
"with x as (select 1) select * from x",
));
}
// ---- ADR-0032 §5/§6 — subqueries and qualified refs in
// ---- statement-level positions (sql_expr extensions
// ---- recurse through SQL_SELECT_COMPOUND via
// ---- ScopedSubgrammar).
#[test]
fn qualified_ref_in_where_clause() {
good("select * from t where t.id = 1");
good("select * from a join b on a.id = b.id");
good("select t.name from t where t.age > 18");
}
#[test]
fn scalar_subquery_in_where_clause() {
good("select * from t where x = (select y from u)");
good("select * from t where x > (select count(*) from u)");
}
#[test]
fn in_subquery_in_where_clause() {
good("select * from t where id in (select user_id from orders)");
good(
"select * from customers where id not in (select customer_id from blocklist)",
);
}
#[test]
fn exists_subquery_in_where_clause() {
good(
"select * from customers c where exists (select 1 from orders o where o.customer_id = c.id)",
);
good("select * from t where not exists (select 1 from u)");
}
#[test]
fn nested_subqueries() {
good(
"select * from t where x in (select y from u where y in (select z from v))",
);
}
#[test]
fn subquery_in_projection() {
good("select (select max(price) from products) from t");
good(
"select name, (select count(*) from orders where customer_id = c.id) from customers c",
);
}
#[test]
fn cte_body_references_qualified_columns() {
good(
"with x as (select t.name, t.age from t) select x.name from x",
);
}
}