b522d09f5a
Sub-phase 2b checkpoint 3 — the `writes_table` / `writes_table_alias` flags now drive the multi-binding `from_scope` accumulator on the top `ScopeFrame`. Node::Ident gains `writes_table_alias: bool`. When set on an ident-name slot, the matched name lands on the most-recently- pushed `TableBinding`'s `alias`. All 46 existing Ident sites across the codebase are updated to `writes_table_alias: false` (mechanical — no behavioral change for DSL paths). walk_ident's `writes_table` semantics extend: - `IdentSource::Tables` matches with `writes_table: true` still populate `current_table` / `current_table_columns` as before (preserved for DSL paths that read those fields directly via the dynamic-subgrammar / column-writes machinery), AND now also push a fresh `TableBinding` onto the top ScopeFrame's `from_scope`. The two mechanisms coexist additively — current_table reflects the most-recent `writes_table` write (single-binding view, as before); from_scope is the authoritative multi-binding accumulator that SQL JOINs, subqueries, and CTE bodies use. sql_select.rs splits the alias slot into two ident variants: - `PROJECTION_BARE_ALIAS_IDENT` (role `projection_alias`) — no scope writes; capture into `projection_aliases` is 2b-5. - `TABLE_SOURCE_BARE_ALIAS_IDENT` (role `table_alias`, `writes_table_alias: true`) — sets the top binding's alias. The `AS alias` form likewise splits into PROJECTION_AS_ALIAS and TABLE_SOURCE_AS_ALIAS so each path threads through the correct ident. The bare-alias lookahead factories return the projection or table-source ident accordingly. `TABLE_NAME_IDENT` in sql_select.rs gets `writes_table: true` so each FROM / JOIN table source pushes a binding. The schema-resolved columns are stored on the TableBinding for later use by qualified-prefix completion (2e) and the schema-existence diagnostic (2d). Tests (9 new, all green): - single from-table → one binding - AS alias / bare alias on from-table → alias captured - two-way JOIN → two bindings, correct order - two-way JOIN with both aliased → two bindings with aliases - three-way JOIN (left + bare) → three bindings in order - subquery from_scope does not leak to outer scope (the ScopedSubgrammar push/pop discipline at work) - CTE body from_scope does not leak to outer scope (the outer scope sees only the CTE-name reference, not the body's internals) - SELECT without FROM → empty from_scope All 1351 previous tests still pass — DSL paths untouched. Test totals: 1358 passing, 0 failed, 1 ignored. Clippy clean. Frame is_cte_body marker, body-projection harvest, and projection_aliases population are the remaining 2b work (2b-4 and 2b-5).
1264 lines
40 KiB
Rust
1264 lines
40 KiB
Rust
//! The full SQL `SELECT` grammar fragment (ADR-0032).
|
|
//!
|
|
//! ADR-0030 Phase 2. This fragment is the standalone walkable
|
|
//! shape for the full standard-SQL `SELECT`: `INNER` / `LEFT` /
|
|
//! `RIGHT` / `FULL OUTER` / `CROSS` joins, `GROUP BY` / `HAVING`,
|
|
//! the four set operators (`UNION` / `UNION ALL` / `INTERSECT`
|
|
//! / `EXCEPT`), `WITH` and `WITH RECURSIVE` common table
|
|
//! expressions, `LIMIT … OFFSET`, `DISTINCT`, `t.*` projection,
|
|
//! and bare-alias projection (lifting ADR-0030 Phase 1 §4.2).
|
|
//!
|
|
//! Recursion into `SQL_SELECT_COMPOUND` is via `Node::Subgrammar`
|
|
//! at sub-phase 2a; sub-phase 2b replaces those references with
|
|
//! `Node::ScopedSubgrammar` for completion-scope discipline
|
|
//! (ADR-0032 §10.2). The Phase-1 `data::SELECT` `CommandNode`
|
|
//! continues to use its own grammar until sub-phase 2c's
|
|
//! migration — this fragment is reachable only from its own
|
|
//! tests in 2a.
|
|
//!
|
|
//! # BNF (ADR-0032 §1)
|
|
//!
|
|
//! ```text
|
|
//! select_statement := [ with_clause ] compound_select [ ';' ]
|
|
//! compound_select := select_core ( set_op select_core )*
|
|
//! [ order_by_clause ]
|
|
//! [ limit_clause ]
|
|
//! set_op := UNION [ ALL ] | INTERSECT | EXCEPT
|
|
//! select_core := SELECT [ DISTINCT | ALL ]
|
|
//! projection_list
|
|
//! [ from_clause ]
|
|
//! [ where_clause ]
|
|
//! [ group_by_clause ]
|
|
//! [ having_clause ]
|
|
//! with_clause := WITH [ RECURSIVE ] cte_def
|
|
//! ( ',' cte_def )*
|
|
//! cte_def := identifier [ '(' column_name_list ')' ]
|
|
//! AS '(' compound_select ')'
|
|
//! projection_list := projection_item ( ',' projection_item )*
|
|
//! projection_item := '*'
|
|
//! | identifier '.' '*'
|
|
//! | sql_expr [ [ AS ] identifier ]
|
|
//! from_clause := FROM table_source ( join_clause )*
|
|
//! table_source := identifier [ [ AS ] identifier ]
|
|
//! join_clause := [ INNER ] JOIN table_source ON sql_expr
|
|
//! | LEFT [ OUTER ] JOIN table_source ON sql_expr
|
|
//! | RIGHT [ OUTER ] JOIN table_source ON sql_expr
|
|
//! | FULL [ OUTER ] JOIN table_source ON sql_expr
|
|
//! | CROSS JOIN table_source
|
|
//! where_clause := WHERE sql_expr
|
|
//! group_by_clause := GROUP BY sql_expr ( ',' sql_expr )*
|
|
//! having_clause := HAVING sql_expr
|
|
//! order_by_clause := ORDER BY order_item ( ',' order_item )*
|
|
//! order_item := sql_expr [ ASC | DESC ]
|
|
//! limit_clause := LIMIT sql_expr [ OFFSET sql_expr ]
|
|
//! ```
|
|
//!
|
|
//! # Disambiguation via `Node::Lookahead`
|
|
//!
|
|
//! Two places need lookahead to dispatch cleanly:
|
|
//!
|
|
//! - **Projection item** (ADR-0032 §1 `projection_item`). The
|
|
//! three alternatives all share a leading identifier shape
|
|
//! (`*` and the `ident . *` qualified wildcard, plus `sql_expr`
|
|
//! which also begins on an ident for the column-ref case). A
|
|
//! factory peeks the first 3 tokens to pick `*`, `ident . *`,
|
|
//! or `sql_expr [ alias ]`.
|
|
//!
|
|
//! - **Bare alias** (ADR-0032 §1 — lifts Phase-1 §4.2). The
|
|
//! walker's `walk_ident` happily matches keyword-shaped tokens
|
|
//! as identifiers, and `Choice`/`Optional` are first-match-
|
|
//! wins (no backtracking on a successful match). To prevent
|
|
//! bare-alias slots from swallowing continuation keywords, the
|
|
//! alias slot is a `Lookahead` that returns an empty `Choice`
|
|
//! (NoMatch) when the next ident-shaped token is a
|
|
//! continuation keyword for that position.
|
|
|
|
use crate::dsl::grammar::{IdentSource, Node, ValidationError, Word, sql_expr};
|
|
use crate::dsl::walker::context::WalkContext;
|
|
use crate::dsl::walker::lex_helpers::{consume_ident, skip_whitespace};
|
|
|
|
// =================================================================
|
|
// Validators
|
|
// =================================================================
|
|
|
|
/// Reject internal `__rdbms_*` metadata tables in any
|
|
/// table-source slot (ADR-0030 §6 reused by ADR-0032 §4 — extends
|
|
/// to every Phase-2 table-source slot: `FROM`, `JOIN` targets,
|
|
/// CTE name, and the `FROM` inside any CTE body).
|
|
fn reject_internal_table(name: &str) -> Result<(), ValidationError> {
|
|
if name.to_ascii_lowercase().starts_with("__rdbms_") {
|
|
Err(ValidationError {
|
|
message_key: "select.internal_table",
|
|
args: vec![("table", name.to_string())],
|
|
})
|
|
} else {
|
|
Ok(())
|
|
}
|
|
}
|
|
|
|
// =================================================================
|
|
// Shared leaf nodes
|
|
// =================================================================
|
|
|
|
const COMMA: Node = Node::Punct(',');
|
|
const STAR: Node = Node::Punct('*');
|
|
const LPAREN: Node = Node::Punct('(');
|
|
const RPAREN: Node = Node::Punct(')');
|
|
const SEMI: Node = Node::Punct(';');
|
|
|
|
// SQL expression slot — `Node::Subgrammar(&sql_expr::SQL_OR_EXPR)`
|
|
// is inlined at each use site rather than aliased through a
|
|
// named `const`. The `const SQL_EXPR: Node = …` form triggered
|
|
// a Rust const-evaluation cycle through the sql_expr ⇄
|
|
// sql_select recursion (valid at link time, where statics
|
|
// resolve lazily, but not at const-eval). Stays as a plain
|
|
// `Subgrammar` — sql_expr recursion is part of the precedence
|
|
// ladder, not a new lexical scope (ADR-0032 §10.2).
|
|
|
|
/// A node that never matches. Used as the "no" branch of
|
|
/// lookahead-driven disambiguation: an empty `Choice` walks to
|
|
/// `NoMatch`, which `Optional` / `Choice` gracefully treat as
|
|
/// "skip" or "fall through to the next branch".
|
|
static EMPTY_NOMATCH: Node = Node::Choice(&[]);
|
|
|
|
// =================================================================
|
|
// Bare-alias dispatch (ADR-0032 §1)
|
|
// =================================================================
|
|
//
|
|
// The walker's `walk_ident` accepts any identifier-shape token,
|
|
// including keyword-shape ones. With `Optional` / `Choice`
|
|
// being first-match-wins, an unrestricted bare-alias slot would
|
|
// greedily consume `FROM` / `WHERE` / `JOIN` / etc. as if they
|
|
// were aliases. `Node::Lookahead` peeks the next token; when it
|
|
// matches a continuation keyword for this position, the factory
|
|
// returns `EMPTY_NOMATCH` so `Optional` skips and the keyword
|
|
// reaches the next clause.
|
|
|
|
/// Continuation keywords that may legitimately follow a
|
|
/// projection item's bare alias (or its absence). Includes the
|
|
/// `select_core` follow keywords and the compound-query / outer
|
|
/// suffix keywords. `as` is not listed — the AS-form alias is a
|
|
/// separate `Choice` branch that fires before the lookahead.
|
|
const PROJECTION_FOLLOW_SET: &[&str] = &[
|
|
"from", "where", "group", "order", "having", "limit",
|
|
"union", "intersect", "except",
|
|
];
|
|
|
|
/// Continuation keywords that may legitimately follow a table
|
|
/// source's bare alias (or its absence). Includes the join
|
|
/// keywords (so `FROM a JOIN b` doesn't read `JOIN` as `a`'s
|
|
/// alias) and the `select_core` / compound suffix keywords.
|
|
/// `on` is included because `FROM a JOIN b ON …` reaches `on`
|
|
/// only when `b` has no alias — `on` is not a base-table name a
|
|
/// learner would type as an alias.
|
|
const TABLE_SOURCE_FOLLOW_SET: &[&str] = &[
|
|
"where", "group", "order", "having", "limit",
|
|
"union", "intersect", "except",
|
|
"inner", "left", "right", "full", "cross", "join", "on",
|
|
];
|
|
|
|
fn peek_next_ident_lower(source: &str, pos: usize) -> Option<String> {
|
|
let p = skip_whitespace(source, pos);
|
|
consume_ident(source, p).map(|(s, e)| source[s..e].to_ascii_lowercase())
|
|
}
|
|
|
|
fn projection_bare_alias_factory(
|
|
_: &WalkContext,
|
|
source: &str,
|
|
pos: usize,
|
|
) -> Node {
|
|
match peek_next_ident_lower(source, pos) {
|
|
Some(word)
|
|
if PROJECTION_FOLLOW_SET.iter().any(|k| *k == word) =>
|
|
{
|
|
Node::Subgrammar(&EMPTY_NOMATCH)
|
|
}
|
|
Some(_) => PROJECTION_BARE_ALIAS_IDENT,
|
|
None => Node::Subgrammar(&EMPTY_NOMATCH),
|
|
}
|
|
}
|
|
|
|
fn table_source_bare_alias_factory(
|
|
_: &WalkContext,
|
|
source: &str,
|
|
pos: usize,
|
|
) -> Node {
|
|
match peek_next_ident_lower(source, pos) {
|
|
Some(word)
|
|
if TABLE_SOURCE_FOLLOW_SET.iter().any(|k| *k == word) =>
|
|
{
|
|
Node::Subgrammar(&EMPTY_NOMATCH)
|
|
}
|
|
Some(_) => TABLE_SOURCE_BARE_ALIAS_IDENT,
|
|
None => Node::Subgrammar(&EMPTY_NOMATCH),
|
|
}
|
|
}
|
|
|
|
// =================================================================
|
|
// Alias slot
|
|
// =================================================================
|
|
|
|
/// Projection-list alias slot. `writes_table_alias` stays
|
|
/// `false` — the projection alias is not a table binding's
|
|
/// alias. (Capture into `projection_aliases` lands in 2b-5.)
|
|
const PROJECTION_BARE_ALIAS_IDENT: Node = Node::Ident {
|
|
source: IdentSource::NewName,
|
|
role: "projection_alias",
|
|
validator: None,
|
|
highlight_override: None,
|
|
writes_table: false,
|
|
writes_column: false,
|
|
writes_user_listed_column: false,
|
|
writes_table_alias: false,
|
|
};
|
|
|
|
/// Table-source alias slot — `writes_table_alias: true` so the
|
|
/// matched name lands on the most-recently-pushed
|
|
/// `TableBinding`'s `alias` (ADR-0032 §10.1).
|
|
const TABLE_SOURCE_BARE_ALIAS_IDENT: Node = Node::Ident {
|
|
source: IdentSource::NewName,
|
|
role: "table_alias",
|
|
validator: None,
|
|
highlight_override: None,
|
|
writes_table: false,
|
|
writes_column: false,
|
|
writes_user_listed_column: false,
|
|
writes_table_alias: true,
|
|
};
|
|
|
|
static PROJECTION_AS_ALIAS_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("as")),
|
|
PROJECTION_BARE_ALIAS_IDENT,
|
|
];
|
|
static PROJECTION_AS_ALIAS: Node = Node::Seq(PROJECTION_AS_ALIAS_NODES);
|
|
|
|
static TABLE_SOURCE_AS_ALIAS_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("as")),
|
|
TABLE_SOURCE_BARE_ALIAS_IDENT,
|
|
];
|
|
static TABLE_SOURCE_AS_ALIAS: Node = Node::Seq(TABLE_SOURCE_AS_ALIAS_NODES);
|
|
|
|
static PROJECTION_ALIAS_CHOICES: &[Node] = &[
|
|
Node::Subgrammar(&PROJECTION_AS_ALIAS),
|
|
Node::Lookahead(projection_bare_alias_factory),
|
|
];
|
|
static PROJECTION_ALIAS_CHOICE: Node = Node::Choice(PROJECTION_ALIAS_CHOICES);
|
|
static PROJECTION_ALIAS_OPTIONAL: Node =
|
|
Node::Optional(&PROJECTION_ALIAS_CHOICE);
|
|
|
|
static TABLE_SOURCE_ALIAS_CHOICES: &[Node] = &[
|
|
Node::Subgrammar(&TABLE_SOURCE_AS_ALIAS),
|
|
Node::Lookahead(table_source_bare_alias_factory),
|
|
];
|
|
static TABLE_SOURCE_ALIAS_CHOICE: Node =
|
|
Node::Choice(TABLE_SOURCE_ALIAS_CHOICES);
|
|
static TABLE_SOURCE_ALIAS_OPTIONAL: Node =
|
|
Node::Optional(&TABLE_SOURCE_ALIAS_CHOICE);
|
|
|
|
// =================================================================
|
|
// Projection item
|
|
// =================================================================
|
|
|
|
const QUALIFIED_STAR_QUALIFIER: Node = Node::Ident {
|
|
source: IdentSource::Tables,
|
|
role: "qualified_star_qualifier",
|
|
validator: None,
|
|
highlight_override: None,
|
|
writes_table: false,
|
|
writes_column: false,
|
|
writes_user_listed_column: false,
|
|
writes_table_alias: false,
|
|
};
|
|
|
|
static QUALIFIED_STAR_NODES: &[Node] = &[
|
|
QUALIFIED_STAR_QUALIFIER,
|
|
Node::Punct('.'),
|
|
Node::Punct('*'),
|
|
];
|
|
static QUALIFIED_STAR: Node = Node::Seq(QUALIFIED_STAR_NODES);
|
|
|
|
static PROJECTION_EXPR_ITEM_NODES: &[Node] = &[
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
Node::Subgrammar(&PROJECTION_ALIAS_OPTIONAL),
|
|
];
|
|
static PROJECTION_EXPR_ITEM: Node = Node::Seq(PROJECTION_EXPR_ITEM_NODES);
|
|
|
|
/// Dispatch one projection item via a 3-token lookahead.
|
|
///
|
|
/// - `*` (and only `*`) → bare wildcard.
|
|
/// - `ident . *` → qualified wildcard.
|
|
/// - anything else → `sql_expr [ alias ]`.
|
|
///
|
|
/// The factory is the cleanest way to handle the shared-prefix
|
|
/// ambiguity between `t.*` and `sql_expr` (which can match a
|
|
/// bare `t`), since the walker's `Choice` doesn't backtrack on
|
|
/// a committed match.
|
|
fn projection_item_factory(
|
|
_: &WalkContext,
|
|
source: &str,
|
|
pos: usize,
|
|
) -> Node {
|
|
let p = skip_whitespace(source, pos);
|
|
let bytes = source.as_bytes();
|
|
if bytes.get(p) == Some(&b'*') {
|
|
return STAR;
|
|
}
|
|
if let Some((_, end1)) = consume_ident(source, p) {
|
|
let after_ident = skip_whitespace(source, end1);
|
|
if bytes.get(after_ident) == Some(&b'.') {
|
|
let after_dot = skip_whitespace(source, after_ident + 1);
|
|
if bytes.get(after_dot) == Some(&b'*') {
|
|
return Node::Subgrammar(&QUALIFIED_STAR);
|
|
}
|
|
}
|
|
}
|
|
Node::Subgrammar(&PROJECTION_EXPR_ITEM)
|
|
}
|
|
|
|
static PROJECTION_ITEM: Node = Node::Lookahead(projection_item_factory);
|
|
|
|
static PROJECTION_LIST: Node = Node::Repeated {
|
|
inner: &PROJECTION_ITEM,
|
|
separator: Some(&COMMA),
|
|
min: 1,
|
|
};
|
|
|
|
// =================================================================
|
|
// DISTINCT / ALL prefix
|
|
// =================================================================
|
|
|
|
static DISTINCT_OR_ALL_CHOICES: &[Node] = &[
|
|
Node::Word(Word::keyword("distinct")),
|
|
Node::Word(Word::keyword("all")),
|
|
];
|
|
static DISTINCT_OR_ALL_CHOICE: Node = Node::Choice(DISTINCT_OR_ALL_CHOICES);
|
|
static DISTINCT_OR_ALL_OPTIONAL: Node =
|
|
Node::Optional(&DISTINCT_OR_ALL_CHOICE);
|
|
|
|
// =================================================================
|
|
// Table source (FROM / JOIN target)
|
|
// =================================================================
|
|
|
|
const TABLE_NAME_IDENT: Node = Node::Ident {
|
|
source: IdentSource::Tables,
|
|
role: "table_name",
|
|
validator: Some(reject_internal_table),
|
|
highlight_override: None,
|
|
writes_table: true,
|
|
writes_column: false,
|
|
writes_user_listed_column: false,
|
|
writes_table_alias: false,
|
|
};
|
|
|
|
static TABLE_SOURCE_NODES: &[Node] = &[
|
|
TABLE_NAME_IDENT,
|
|
Node::Subgrammar(&TABLE_SOURCE_ALIAS_OPTIONAL),
|
|
];
|
|
static TABLE_SOURCE: Node = Node::Seq(TABLE_SOURCE_NODES);
|
|
|
|
// =================================================================
|
|
// JOIN flavours
|
|
// =================================================================
|
|
|
|
const JOIN_WORD: Node = Node::Word(Word::keyword("join"));
|
|
const ON_WORD: Node = Node::Word(Word::keyword("on"));
|
|
static OUTER_OPTIONAL: Node =
|
|
Node::Optional(&Node::Word(Word::keyword("outer")));
|
|
|
|
// `INNER JOIN` and bare `JOIN` are split into two Choice
|
|
// branches so each branch has a distinct leading keyword
|
|
// (`inner` vs `join`). Avoids the "optional leading child →
|
|
// idx > 0 → EOF becomes Incomplete" hazard in walk_seq that a
|
|
// shared `Optional(Word("inner"))` would otherwise create.
|
|
static INNER_JOIN_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("inner")),
|
|
JOIN_WORD,
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
ON_WORD,
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
|
|
static BARE_JOIN_NODES: &[Node] = &[
|
|
JOIN_WORD,
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
ON_WORD,
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
|
|
static LEFT_JOIN_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("left")),
|
|
Node::Subgrammar(&OUTER_OPTIONAL),
|
|
JOIN_WORD,
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
ON_WORD,
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
|
|
static RIGHT_JOIN_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("right")),
|
|
Node::Subgrammar(&OUTER_OPTIONAL),
|
|
JOIN_WORD,
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
ON_WORD,
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
|
|
static FULL_JOIN_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("full")),
|
|
Node::Subgrammar(&OUTER_OPTIONAL),
|
|
JOIN_WORD,
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
ON_WORD,
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
|
|
static CROSS_JOIN_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("cross")),
|
|
JOIN_WORD,
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
];
|
|
|
|
/// JOIN flavour dispatch. Each branch has a distinct leading
|
|
/// keyword so `Choice` first-match-wins discriminates cleanly
|
|
/// without invoking the walker's `Optional`-leading-child
|
|
/// hazard.
|
|
static JOIN_CLAUSE_CHOICES: &[Node] = &[
|
|
Node::Seq(LEFT_JOIN_NODES),
|
|
Node::Seq(RIGHT_JOIN_NODES),
|
|
Node::Seq(FULL_JOIN_NODES),
|
|
Node::Seq(CROSS_JOIN_NODES),
|
|
Node::Seq(INNER_JOIN_NODES),
|
|
Node::Seq(BARE_JOIN_NODES),
|
|
];
|
|
static JOIN_CLAUSE: Node = Node::Choice(JOIN_CLAUSE_CHOICES);
|
|
|
|
// =================================================================
|
|
// FROM / WHERE / GROUP BY / HAVING
|
|
// =================================================================
|
|
|
|
static FROM_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("from")),
|
|
Node::Subgrammar(&TABLE_SOURCE),
|
|
Node::Repeated {
|
|
inner: &JOIN_CLAUSE,
|
|
separator: None,
|
|
min: 0,
|
|
},
|
|
];
|
|
static FROM_CLAUSE: Node = Node::Seq(FROM_CLAUSE_NODES);
|
|
|
|
static WHERE_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("where")),
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
static WHERE_CLAUSE: Node = Node::Seq(WHERE_CLAUSE_NODES);
|
|
|
|
static GROUP_BY_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("group")),
|
|
Node::Word(Word::keyword("by")),
|
|
Node::Repeated {
|
|
inner: &Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
separator: Some(&COMMA),
|
|
min: 1,
|
|
},
|
|
];
|
|
static GROUP_BY_CLAUSE: Node = Node::Seq(GROUP_BY_CLAUSE_NODES);
|
|
|
|
static HAVING_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("having")),
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
static HAVING_CLAUSE: Node = Node::Seq(HAVING_CLAUSE_NODES);
|
|
|
|
// =================================================================
|
|
// ORDER BY / LIMIT / OFFSET
|
|
// =================================================================
|
|
|
|
static ASC_DESC_CHOICES: &[Node] = &[
|
|
Node::Word(Word::keyword("asc")),
|
|
Node::Word(Word::keyword("desc")),
|
|
];
|
|
static ASC_DESC_CHOICE: Node = Node::Choice(ASC_DESC_CHOICES);
|
|
static ORDER_ITEM_NODES: &[Node] = &[
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
Node::Optional(&ASC_DESC_CHOICE),
|
|
];
|
|
static ORDER_ITEM: Node = Node::Seq(ORDER_ITEM_NODES);
|
|
|
|
static ORDER_BY_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("order")),
|
|
Node::Word(Word::keyword("by")),
|
|
Node::Repeated {
|
|
inner: &ORDER_ITEM,
|
|
separator: Some(&COMMA),
|
|
min: 1,
|
|
},
|
|
];
|
|
static ORDER_BY_CLAUSE: Node = Node::Seq(ORDER_BY_CLAUSE_NODES);
|
|
|
|
static OFFSET_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("offset")),
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
];
|
|
static OFFSET_SEQ: Node = Node::Seq(OFFSET_NODES);
|
|
static OFFSET_OPTIONAL: Node = Node::Optional(&OFFSET_SEQ);
|
|
|
|
static LIMIT_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("limit")),
|
|
Node::Subgrammar(&sql_expr::SQL_OR_EXPR),
|
|
Node::Subgrammar(&OFFSET_OPTIONAL),
|
|
];
|
|
static LIMIT_CLAUSE: Node = Node::Seq(LIMIT_CLAUSE_NODES);
|
|
|
|
// =================================================================
|
|
// select_core (per-leg of a compound)
|
|
// =================================================================
|
|
|
|
static SELECT_CORE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("select")),
|
|
Node::Subgrammar(&DISTINCT_OR_ALL_OPTIONAL),
|
|
Node::Subgrammar(&PROJECTION_LIST),
|
|
Node::Optional(&FROM_CLAUSE),
|
|
Node::Optional(&WHERE_CLAUSE),
|
|
Node::Optional(&GROUP_BY_CLAUSE),
|
|
Node::Optional(&HAVING_CLAUSE),
|
|
];
|
|
static SELECT_CORE: Node = Node::Seq(SELECT_CORE_NODES);
|
|
|
|
// =================================================================
|
|
// compound_select
|
|
// =================================================================
|
|
//
|
|
// `UNION ALL` is a single `Choice` branch (matched before bare
|
|
// `UNION`) so the matched-path keyword sequence reads cleanly.
|
|
|
|
// `UNION` and `UNION ALL` are factored as one `Seq[union,
|
|
// Optional(all)]` branch so the Choice doesn't commit on `union`
|
|
// inside a multi-token branch and then fail when `all` is
|
|
// missing. The trailing `Optional(all)` is the last child of
|
|
// the Seq, so a skip there doesn't trigger the
|
|
// optional-leading-then-EOF-becomes-Incomplete hazard.
|
|
static UNION_OR_UNION_ALL_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("union")),
|
|
Node::Optional(&Node::Word(Word::keyword("all"))),
|
|
];
|
|
static SET_OP_CHOICES: &[Node] = &[
|
|
Node::Seq(UNION_OR_UNION_ALL_NODES),
|
|
Node::Word(Word::keyword("intersect")),
|
|
Node::Word(Word::keyword("except")),
|
|
];
|
|
static SET_OP: Node = Node::Choice(SET_OP_CHOICES);
|
|
|
|
static SET_OP_TAIL_NODES: &[Node] =
|
|
&[Node::Subgrammar(&SET_OP), Node::Subgrammar(&SELECT_CORE)];
|
|
static SET_OP_TAIL: Node = Node::Seq(SET_OP_TAIL_NODES);
|
|
|
|
static COMPOUND_SELECT_NODES: &[Node] = &[
|
|
Node::Subgrammar(&SELECT_CORE),
|
|
Node::Repeated {
|
|
inner: &SET_OP_TAIL,
|
|
separator: None,
|
|
min: 0,
|
|
},
|
|
Node::Optional(&ORDER_BY_CLAUSE),
|
|
Node::Optional(&LIMIT_CLAUSE),
|
|
];
|
|
/// The compound-select fragment that subqueries / CTE bodies
|
|
/// recurse into via `Subgrammar` (2a) / `ScopedSubgrammar` (2b).
|
|
/// Omits the outer `with_clause`; that lives on
|
|
/// `SQL_SELECT_STATEMENT`.
|
|
pub static SQL_SELECT_COMPOUND: Node = Node::Seq(COMPOUND_SELECT_NODES);
|
|
|
|
// =================================================================
|
|
// CTE definitions
|
|
// =================================================================
|
|
|
|
const CTE_NAME_IDENT: Node = Node::Ident {
|
|
source: IdentSource::NewName,
|
|
role: "cte_name",
|
|
validator: Some(reject_internal_table),
|
|
highlight_override: None,
|
|
writes_table: false,
|
|
writes_column: false,
|
|
writes_user_listed_column: false,
|
|
writes_table_alias: false,
|
|
};
|
|
|
|
const CTE_COLUMN_IDENT: Node = Node::Ident {
|
|
source: IdentSource::NewName,
|
|
role: "cte_column",
|
|
validator: None,
|
|
highlight_override: None,
|
|
writes_table: false,
|
|
writes_column: false,
|
|
writes_user_listed_column: false,
|
|
writes_table_alias: false,
|
|
};
|
|
|
|
static CTE_COLUMN_LIST_NODES: &[Node] = &[
|
|
LPAREN,
|
|
Node::Repeated {
|
|
inner: &CTE_COLUMN_IDENT,
|
|
separator: Some(&COMMA),
|
|
min: 1,
|
|
},
|
|
RPAREN,
|
|
];
|
|
static CTE_COLUMN_LIST_SEQ: Node = Node::Seq(CTE_COLUMN_LIST_NODES);
|
|
static CTE_COLUMN_LIST_OPTIONAL: Node =
|
|
Node::Optional(&CTE_COLUMN_LIST_SEQ);
|
|
|
|
// CTE body recursion pushes a fresh lexical scope frame (ADR-
|
|
// 0032 §4 / §10.2). Subqueries in `sql_expr.rs` do the same;
|
|
// the top-level statement's own COMPOUND embedding does not
|
|
// (it shares the implicit bottom frame).
|
|
static CTE_BODY_NODES: &[Node] = &[
|
|
LPAREN,
|
|
Node::ScopedSubgrammar(&SQL_SELECT_COMPOUND),
|
|
RPAREN,
|
|
];
|
|
static CTE_BODY: Node = Node::Seq(CTE_BODY_NODES);
|
|
|
|
static CTE_DEF_NODES: &[Node] = &[
|
|
CTE_NAME_IDENT,
|
|
Node::Subgrammar(&CTE_COLUMN_LIST_OPTIONAL),
|
|
Node::Word(Word::keyword("as")),
|
|
Node::Subgrammar(&CTE_BODY),
|
|
];
|
|
static CTE_DEF: Node = Node::Seq(CTE_DEF_NODES);
|
|
|
|
static WITH_CLAUSE_NODES: &[Node] = &[
|
|
Node::Word(Word::keyword("with")),
|
|
Node::Optional(&Node::Word(Word::keyword("recursive"))),
|
|
Node::Repeated {
|
|
inner: &CTE_DEF,
|
|
separator: Some(&COMMA),
|
|
min: 1,
|
|
},
|
|
];
|
|
static WITH_CLAUSE: Node = Node::Seq(WITH_CLAUSE_NODES);
|
|
|
|
// =================================================================
|
|
// select_statement — the fragment entry point
|
|
// =================================================================
|
|
|
|
static SELECT_STATEMENT_NODES: &[Node] = &[
|
|
Node::Optional(&WITH_CLAUSE),
|
|
Node::Subgrammar(&SQL_SELECT_COMPOUND),
|
|
Node::Optional(&SEMI),
|
|
];
|
|
/// The full statement, including the optional `WITH` prefix and
|
|
/// a tolerated trailing `;`. This is what `data::SELECT`'s
|
|
/// `CommandNode` will reference once sub-phase 2c migrates the
|
|
/// Phase-1 grammar.
|
|
pub static SQL_SELECT_STATEMENT: Node = Node::Seq(SELECT_STATEMENT_NODES);
|
|
|
|
// =================================================================
|
|
// Tests
|
|
// =================================================================
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::{SQL_SELECT_COMPOUND, SQL_SELECT_STATEMENT};
|
|
use crate::dsl::grammar::Node;
|
|
use crate::dsl::walker::context::WalkContext;
|
|
use crate::dsl::walker::driver::{NodeWalkResult, walk_node};
|
|
use crate::dsl::walker::outcome::MatchedPath;
|
|
|
|
/// Walk `input` against `fragment`. Returns `true` only when
|
|
/// the walk matches *and* consumes all of `input` (trailing
|
|
/// whitespace allowed).
|
|
fn walks_via(fragment: &'static Node, input: &str) -> bool {
|
|
let mut ctx = WalkContext::new();
|
|
let mut path = MatchedPath::new();
|
|
let mut per_byte = Vec::new();
|
|
match walk_node(input, 0, fragment, &mut ctx, &mut path, &mut per_byte) {
|
|
NodeWalkResult::Matched { end, .. } => {
|
|
input[end..].trim().is_empty()
|
|
}
|
|
_ => false,
|
|
}
|
|
}
|
|
|
|
fn walks(input: &str) -> bool {
|
|
walks_via(&SQL_SELECT_STATEMENT, input)
|
|
}
|
|
|
|
fn good(input: &str) {
|
|
assert!(
|
|
walks(input),
|
|
"{input:?} should be a valid SELECT statement"
|
|
);
|
|
}
|
|
|
|
fn bad(input: &str) {
|
|
assert!(
|
|
!walks(input),
|
|
"{input:?} should NOT walk as a complete SELECT statement"
|
|
);
|
|
}
|
|
|
|
// ----- minimal forms -----
|
|
|
|
#[test]
|
|
fn bare_constant_select_with_no_from() {
|
|
good("select 1");
|
|
good("select 'hello'");
|
|
good("select null");
|
|
good("select true");
|
|
good("select false");
|
|
}
|
|
|
|
#[test]
|
|
fn single_table_select_star() {
|
|
good("select * from users");
|
|
good("select * from users;");
|
|
}
|
|
|
|
#[test]
|
|
fn single_column_projection() {
|
|
good("select name from users");
|
|
good("select name, age from users");
|
|
good("select name, age, email from users");
|
|
}
|
|
|
|
// ----- DISTINCT / ALL -----
|
|
|
|
#[test]
|
|
fn distinct_modifier() {
|
|
good("select distinct name from users");
|
|
good("select distinct a, b from t");
|
|
}
|
|
|
|
#[test]
|
|
fn all_modifier() {
|
|
good("select all name from users");
|
|
}
|
|
|
|
// Note: `select distinct all name from users` and the like
|
|
// are admitted structurally — the second keyword parses as
|
|
// a column reference (the walker doesn't reject keyword-shape
|
|
// idents as columns). Engine semantics deals with it. This
|
|
// matches ADR-0030's "grammar admits, engine rejects" posture.
|
|
|
|
// ----- projection wildcard / qualified-star / alias -----
|
|
|
|
#[test]
|
|
fn qualified_star_projection() {
|
|
good("select users.* from users");
|
|
good("select u.* from users u");
|
|
good("select a.*, b.* from a join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn mixed_projection_with_qualified_star() {
|
|
good("select users.*, age from users");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_with_as_alias() {
|
|
good("select name as n from users");
|
|
good("select name as n, age as a from users");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_with_bare_alias() {
|
|
good("select name n from users");
|
|
good("select name n, age a from users");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_alias_mixed_forms() {
|
|
good("select name as n, age a, email from users");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_bare_alias_does_not_swallow_from() {
|
|
// The bare-alias lookahead must skip when next ident
|
|
// is `from`; otherwise this would fail with "alias `from`
|
|
// followed by nothing".
|
|
good("select name from users");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_bare_alias_does_not_swallow_where_or_group_etc() {
|
|
good("select name from users where id > 0");
|
|
good("select name from users group by name");
|
|
good("select name from users order by name");
|
|
good("select name from users limit 5");
|
|
good("select name from users group by name having count(*) > 1");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_expression_with_arithmetic() {
|
|
good("select a + b from t");
|
|
good("select a + b as total from t");
|
|
good("select a * 2 from t");
|
|
}
|
|
|
|
#[test]
|
|
fn projection_function_calls() {
|
|
good("select upper(name) from users");
|
|
good("select count(*) from users");
|
|
good("select count(distinct customer_id) from orders");
|
|
}
|
|
|
|
// ----- FROM / JOIN flavours -----
|
|
|
|
#[test]
|
|
fn from_with_table_alias() {
|
|
good("select * from users u");
|
|
good("select * from users as u");
|
|
}
|
|
|
|
#[test]
|
|
fn inner_join_explicit() {
|
|
good("select * from a inner join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn inner_join_bare() {
|
|
good("select * from a join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn left_outer_join() {
|
|
good("select * from a left join b on x = y");
|
|
good("select * from a left outer join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn right_outer_join() {
|
|
good("select * from a right join b on x = y");
|
|
good("select * from a right outer join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn full_outer_join() {
|
|
good("select * from a full join b on x = y");
|
|
good("select * from a full outer join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn cross_join() {
|
|
good("select * from a cross join b");
|
|
}
|
|
|
|
#[test]
|
|
fn cross_join_with_no_on() {
|
|
// CROSS JOIN takes no ON; an ON clause is a parse error.
|
|
bad("select * from a cross join b on x = y");
|
|
}
|
|
|
|
#[test]
|
|
fn chained_joins() {
|
|
good("select * from a join b on x = y join c on y = z");
|
|
good("select * from a left join b on x = y inner join c on y = z");
|
|
}
|
|
|
|
#[test]
|
|
fn join_with_table_aliases() {
|
|
good("select * from a u join b v on x = y");
|
|
good("select * from a as u join b as v on x = y");
|
|
}
|
|
|
|
// ----- WHERE / GROUP BY / HAVING -----
|
|
|
|
#[test]
|
|
fn where_clause() {
|
|
good("select * from t where id = 1");
|
|
good("select * from t where a > 0 and b < 10");
|
|
}
|
|
|
|
#[test]
|
|
fn group_by_single_column() {
|
|
good("select name from t group by name");
|
|
}
|
|
|
|
#[test]
|
|
fn group_by_multiple_columns() {
|
|
good("select a, b from t group by a, b");
|
|
}
|
|
|
|
#[test]
|
|
fn group_by_expression() {
|
|
good("select count(*) from t group by upper(name)");
|
|
}
|
|
|
|
#[test]
|
|
fn having_clause() {
|
|
good("select name from t group by name having count(*) > 1");
|
|
// HAVING without GROUP BY is admitted structurally;
|
|
// engine may reject. The grammar admits it.
|
|
good("select count(*) from t having count(*) > 0");
|
|
}
|
|
|
|
// ----- set operators -----
|
|
|
|
#[test]
|
|
fn union_two_selects() {
|
|
good("select a from t union select b from u");
|
|
}
|
|
|
|
#[test]
|
|
fn union_all_two_selects() {
|
|
good("select a from t union all select b from u");
|
|
}
|
|
|
|
#[test]
|
|
fn intersect_two_selects() {
|
|
good("select a from t intersect select b from u");
|
|
}
|
|
|
|
#[test]
|
|
fn except_two_selects() {
|
|
good("select a from t except select b from u");
|
|
}
|
|
|
|
#[test]
|
|
fn set_op_chain() {
|
|
good(
|
|
"select a from t union select b from u intersect select c from v",
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn set_op_with_outer_order_by_and_limit() {
|
|
good(
|
|
"select a from t union select b from u order by a limit 10",
|
|
);
|
|
}
|
|
|
|
// ----- ORDER BY / LIMIT / OFFSET -----
|
|
|
|
#[test]
|
|
fn order_by_single_column() {
|
|
good("select * from t order by name");
|
|
}
|
|
|
|
#[test]
|
|
fn order_by_with_direction() {
|
|
good("select * from t order by name asc");
|
|
good("select * from t order by name desc");
|
|
}
|
|
|
|
#[test]
|
|
fn order_by_multiple_items() {
|
|
good("select * from t order by name asc, age desc");
|
|
}
|
|
|
|
#[test]
|
|
fn order_by_column_position() {
|
|
// A column-position reference falls out of `sql_expr`
|
|
// (an integer literal is a valid expression).
|
|
good("select a, b from t order by 1");
|
|
good("select a, b from t order by 1, 2 desc");
|
|
}
|
|
|
|
#[test]
|
|
fn limit_only() {
|
|
good("select * from t limit 10");
|
|
}
|
|
|
|
#[test]
|
|
fn limit_with_offset() {
|
|
good("select * from t limit 10 offset 5");
|
|
}
|
|
|
|
#[test]
|
|
fn legacy_limit_comma_form_rejected() {
|
|
// `LIMIT m, n` (offset-first MySQL/SQLite legacy) is
|
|
// OOS per ADR-0032 §13 OOS-4.
|
|
bad("select * from t limit 5, 10");
|
|
}
|
|
|
|
// ----- CTEs -----
|
|
|
|
#[test]
|
|
fn non_recursive_cte() {
|
|
good("with x as (select 1) select * from x");
|
|
}
|
|
|
|
#[test]
|
|
fn non_recursive_cte_select_star() {
|
|
good("with x as (select * from users) select * from x");
|
|
}
|
|
|
|
#[test]
|
|
fn cte_with_column_list_rename() {
|
|
good("with x(n) as (select name from users) select n from x");
|
|
good("with x(a, b) as (select a, b from t) select * from x");
|
|
}
|
|
|
|
#[test]
|
|
fn recursive_cte() {
|
|
good(
|
|
"with recursive r as (select 1 union all select 2) select * from r",
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn multiple_ctes() {
|
|
good(
|
|
"with a as (select 1), b as (select 2) select * from a union select * from b",
|
|
);
|
|
}
|
|
|
|
// ----- subquery shapes (recursion through SQL_SELECT_COMPOUND) -----
|
|
//
|
|
// True subquery expressions inside `sql_expr` arrive in 2b
|
|
// (additive `Choice` branches in `sql_expr.rs`). 2a verifies
|
|
// that the compound fragment recurses cleanly from CTE
|
|
// bodies and that the deepest depth check still fires.
|
|
|
|
#[test]
|
|
fn nested_cte_body_with_union() {
|
|
good(
|
|
"with x as (select 1 union select 2) select * from x",
|
|
);
|
|
}
|
|
|
|
// ----- case insensitivity / spacing -----
|
|
|
|
#[test]
|
|
fn keywords_are_case_insensitive() {
|
|
good("SELECT * FROM users");
|
|
good("Select Distinct A From T Where Id = 1 Order By A Desc Limit 5 Offset 2");
|
|
good("WITH RECURSIVE r AS (SELECT 1 UNION ALL SELECT 2) SELECT * FROM r");
|
|
}
|
|
|
|
#[test]
|
|
fn trailing_semicolon_tolerated() {
|
|
good("select 1;");
|
|
good("select * from users;");
|
|
good("with x as (select 1) select * from x;");
|
|
}
|
|
|
|
// ----- malformed input -----
|
|
|
|
#[test]
|
|
fn empty_projection_rejected() {
|
|
// Note: `select from t` is structurally admitted as
|
|
// `<col "from"> AS <alias "t">` — the walker does not
|
|
// reject keyword-shape idents as column refs. This
|
|
// matches ADR-0030's posture (grammar admits, engine
|
|
// rejects). The genuinely-malformed `select` alone is
|
|
// still rejected because there is no expression to
|
|
// match.
|
|
bad("select");
|
|
}
|
|
|
|
#[test]
|
|
fn missing_join_target() {
|
|
bad("select * from a join");
|
|
bad("select * from a join b");
|
|
bad("select * from a join b on");
|
|
}
|
|
|
|
#[test]
|
|
fn dangling_set_op() {
|
|
bad("select a from t union");
|
|
bad("select a from t union select");
|
|
}
|
|
|
|
#[test]
|
|
fn dangling_clauses() {
|
|
bad("select a from t where");
|
|
bad("select a from t order by");
|
|
bad("select a from t group by");
|
|
bad("select a from t having");
|
|
bad("select a from t limit");
|
|
bad("select a from t limit 5 offset");
|
|
}
|
|
|
|
#[test]
|
|
fn cte_missing_body() {
|
|
bad("with x as select 1");
|
|
bad("with x as (");
|
|
bad("with x as ()");
|
|
}
|
|
|
|
#[test]
|
|
fn cte_missing_as() {
|
|
bad("with x (select 1) select * from x");
|
|
}
|
|
|
|
#[test]
|
|
fn bare_recursive_without_with_is_invalid() {
|
|
bad("recursive r as (select 1) select * from r");
|
|
}
|
|
|
|
// ----- OOS shapes (ADR-0032 §13) -----
|
|
|
|
#[test]
|
|
fn comma_from_is_rejected() {
|
|
// OOS-3: implicit cross join via comma list.
|
|
bad("select * from a, b");
|
|
}
|
|
|
|
#[test]
|
|
fn natural_join_rejected() {
|
|
// OOS-2.
|
|
bad("select * from a natural join b");
|
|
}
|
|
|
|
#[test]
|
|
fn using_clause_rejected() {
|
|
// OOS-2.
|
|
bad("select * from a join b using (id)");
|
|
}
|
|
|
|
#[test]
|
|
fn values_row_source_rejected() {
|
|
// OOS-7.
|
|
bad("select * from (values (1), (2))");
|
|
}
|
|
|
|
#[test]
|
|
fn lateral_join_rejected() {
|
|
// OOS-6. The bare comma-FROM form is rejected because
|
|
// we do not admit comma-separated FROM lists (OOS-3),
|
|
// so `from a, lateral …` cannot parse as a join. The
|
|
// single-token `LATERAL JOIN` form is admitted
|
|
// structurally — `lateral` parses as a table-source
|
|
// bare alias for `a` and the JOIN that follows is just
|
|
// a normal join. This matches the rest of the grammar's
|
|
// posture: keyword-shape identifiers are admitted as
|
|
// names; non-admitted syntactic forms (comma-FROM) are
|
|
// what makes a query reject.
|
|
bad("select * from a, lateral (select 1)");
|
|
}
|
|
|
|
#[test]
|
|
fn window_function_rejected() {
|
|
// OOS-5: `OVER (…)` window clauses are not part of the
|
|
// Phase-2 grammar.
|
|
bad("select row_number() over () from t");
|
|
bad("select sum(x) over (partition by y) from t");
|
|
}
|
|
|
|
#[test]
|
|
fn derived_table_in_from_rejected() {
|
|
// OOS-1: `FROM (SELECT …) alias` is OOS.
|
|
// CTEs cover the same use case.
|
|
bad("select * from (select * from users) sub");
|
|
bad("select * from (select * from users) as sub");
|
|
}
|
|
|
|
// ----- internal-table rejection (ADR-0030 §6) -----
|
|
|
|
#[test]
|
|
fn internal_table_in_from_rejected() {
|
|
bad("select * from __rdbms_columns");
|
|
bad("select * from __rdbms_playground_columns");
|
|
}
|
|
|
|
#[test]
|
|
fn internal_table_as_cte_name_rejected() {
|
|
bad("with __rdbms_x as (select 1) select * from __rdbms_x");
|
|
}
|
|
|
|
#[test]
|
|
fn internal_table_in_cte_body_rejected() {
|
|
bad("with x as (select * from __rdbms_columns) select * from x");
|
|
}
|
|
|
|
#[test]
|
|
fn internal_table_in_join_rejected() {
|
|
bad("select * from users join __rdbms_columns on x = y");
|
|
}
|
|
|
|
// ----- depth cap (ADR-0026 §1 / ADR-0032 §9) -----
|
|
|
|
#[test]
|
|
fn pathological_nesting_capped() {
|
|
// Deep parenthesised CTE-body chain is rejected by the
|
|
// shared `MAX_SUBGRAMMAR_DEPTH = 64` cap, not by stack
|
|
// overflow.
|
|
let depth = 200;
|
|
let mut input = String::new();
|
|
for _ in 0..depth {
|
|
input.push_str("with x as (");
|
|
}
|
|
input.push_str("select 1");
|
|
for _ in 0..depth {
|
|
input.push_str(") select * from x");
|
|
}
|
|
assert!(!walks(&input));
|
|
}
|
|
|
|
// ----- compound-select fragment entry point -----
|
|
|
|
#[test]
|
|
fn compound_fragment_walks_without_with_clause() {
|
|
// SQL_SELECT_COMPOUND is what subqueries / CTE bodies
|
|
// recurse into. It admits a select_core + optional
|
|
// set-op chain + outer ORDER/LIMIT.
|
|
assert!(walks_via(&SQL_SELECT_COMPOUND, "select 1"));
|
|
assert!(walks_via(
|
|
&SQL_SELECT_COMPOUND,
|
|
"select a from t union select b from u",
|
|
));
|
|
assert!(!walks_via(
|
|
&SQL_SELECT_COMPOUND,
|
|
"with x as (select 1) select * from x",
|
|
));
|
|
}
|
|
|
|
// ---- ADR-0032 §5/§6 — subqueries and qualified refs in
|
|
// ---- statement-level positions (sql_expr extensions
|
|
// ---- recurse through SQL_SELECT_COMPOUND via
|
|
// ---- ScopedSubgrammar).
|
|
|
|
#[test]
|
|
fn qualified_ref_in_where_clause() {
|
|
good("select * from t where t.id = 1");
|
|
good("select * from a join b on a.id = b.id");
|
|
good("select t.name from t where t.age > 18");
|
|
}
|
|
|
|
#[test]
|
|
fn scalar_subquery_in_where_clause() {
|
|
good("select * from t where x = (select y from u)");
|
|
good("select * from t where x > (select count(*) from u)");
|
|
}
|
|
|
|
#[test]
|
|
fn in_subquery_in_where_clause() {
|
|
good("select * from t where id in (select user_id from orders)");
|
|
good(
|
|
"select * from customers where id not in (select customer_id from blocklist)",
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn exists_subquery_in_where_clause() {
|
|
good(
|
|
"select * from customers c where exists (select 1 from orders o where o.customer_id = c.id)",
|
|
);
|
|
good("select * from t where not exists (select 1 from u)");
|
|
}
|
|
|
|
#[test]
|
|
fn nested_subqueries() {
|
|
good(
|
|
"select * from t where x in (select y from u where y in (select z from v))",
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn subquery_in_projection() {
|
|
good("select (select max(price) from products) from t");
|
|
good(
|
|
"select name, (select count(*) from orders where customer_id = c.id) from customers c",
|
|
);
|
|
}
|
|
|
|
#[test]
|
|
fn cte_body_references_qualified_columns() {
|
|
good(
|
|
"with x as (select t.name, t.age from t) select x.name from x",
|
|
);
|
|
}
|
|
}
|