Skip to content

Commit 8a02af3

Browse files
committed
Add lateral support to CrossRel and integrate id-based outer references
- Add bool lateral field to CrossRel for lateral cross product semantics - Add RelCommon.id and OuterReference.id_reference from PR substrait-io#1031 - Update JoinRel and CrossRel lateral comments to use id_reference - Update logical_relations.md with lateral docs for both JoinRel and CrossRel Depends-on: substrait-io#1031
1 parent e52c611 commit 8a02af3

2 files changed

Lines changed: 31 additions & 11 deletions

File tree

proto/substrait/algebra.proto

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,11 @@ message JoinRel {
280280
// When true, the right input is evaluated once per row of the left input
281281
// (lateral join / correlated subquery). The right input may reference fields
282282
// from the current left row using FieldReference with OuterReference as the
283-
// root_type and steps_out = 1.
283+
// root_type. This JoinRel must have RelCommon.id set so the right input can
284+
// use OuterReference with id_reference pointing to that id (preferred), or
285+
// steps_out = 1 for simple tree-shaped plans.
286+
//
287+
// The id_reference resolves against the left input's output schema.
284288
//
285289
// When false (default), both inputs are independent.
286290
//
@@ -316,6 +320,19 @@ message CrossRel {
316320
Rel left = 2;
317321
Rel right = 3;
318322

323+
// When true, the right input is evaluated once per row of the left input
324+
// (lateral semantics). The right input may reference fields from the
325+
// current left row using FieldReference with OuterReference as the
326+
// root_type. This CrossRel must have RelCommon.id set so the right input
327+
// can use OuterReference with id_reference pointing to that id (preferred),
328+
// or steps_out = 1 for simple tree-shaped plans.
329+
//
330+
// The id_reference resolves against the left input's output schema.
331+
//
332+
// When false (default), both inputs are independent and the result is the
333+
// standard Cartesian product.
334+
bool lateral = 4;
335+
319336
substrait.extensions.AdvancedExtension advanced_extension = 10;
320337
}
321338

site/docs/relations/logical_relations.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,8 @@ The cross product operation will combine two separate inputs into a single outpu
218218
| Property | Description | Required |
219219
| --------------- | ------------------------------------------------------------ | ---------------------------------- |
220220
| Left Input | A relational input. | Required |
221-
| Right Input | A relational input. | Required |
221+
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `id_reference` pointing to this CrossRel's `RelCommon.id`. | Required |
222+
| Lateral | When true, the right input is evaluated once per row of the left input (lateral semantics). This CrossRel must have `RelCommon.id` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `id_reference` pointing to this CrossRel's id (preferred) or `steps_out = 1` for simple tree-shaped plans. The `id_reference` resolves against the left input's output schema. When false (default), both inputs are independent and the result is the standard Cartesian product. | Optional (default: false) |
222223

223224

224225
=== "CrossRel Message"
@@ -245,11 +246,11 @@ The join operation will combine two separate inputs into a single output, based
245246
| Property | Description | Required |
246247
| ---------------- | ------------------------------------------------------------ | ---------------------------------- |
247248
| Left Input | A relational input. | Required |
248-
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference`. | Required |
249+
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `id_reference` pointing to this JoinRel's `RelCommon.id`. | Required |
249250
| Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. Field references correspond to the input order of the data. | Required. Can be the literal True. |
250251
| Post-Join Filter | An optional boolean condition applied to the output of the join. Semantically equivalent to placing a [Filter](#filter-operation) directly above the join. Does not influence which rows are considered matches. Field references correspond to the direct output order of the join operation. | Optional, defaults to True. |
251252
| Join Type | One of the join types defined below. | Required |
252-
| Lateral | When true, the right input is evaluated once per row of the left input (lateral join / correlated subquery). The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type` and `steps_out = 1`. When false (default), both inputs are independent. See [Lateral Joins](#lateral-joins) for details.| Optional (default: false) |
253+
| Lateral | When true, the right input is evaluated once per row of the left input (lateral join / correlated subquery). This JoinRel must have `RelCommon.id` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `id_reference` pointing to this JoinRel's id (preferred) or `steps_out = 1` for simple tree-shaped plans. The `id_reference` resolves against the left input's output schema. When false (default), both inputs are independent. See [Lateral Joins](#lateral-joins) for details.| Optional (default: false) |
253254

254255
### Join Types
255256

@@ -277,34 +278,36 @@ The join operation will combine two separate inputs into a single output, based
277278

278279
### Lateral Joins
279280

280-
When the `lateral` flag is set to true, the join operates as a lateral (correlated) join. The right input is evaluated once per row of the left input, and the right input may reference fields from the current left row using a `FieldReference` with `OuterReference` as the `root_type` and `steps_out = 1`.
281+
When the `lateral` flag is set to true, the join operates as a lateral (correlated) join. The right input is evaluated once per row of the left input, and the right input may reference fields from the current left row using a `FieldReference` with `OuterReference` as the `root_type`.
282+
283+
This JoinRel must have `RelCommon.id` set. The right input should use `OuterReference` with `id_reference` pointing to this JoinRel's id. The `id_reference` resolves against the left input's output schema. For simple tree-shaped plans, `steps_out = 1` may also be used. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details on `steps_out` vs `id_reference`.
281284

282285
For example, the SQL query:
283286

284287
```sql
285288
SELECT a, (SELECT MAX(b) FROM T2 WHERE T2.x = T1.a) FROM T1
286289
```
287290

288-
can be represented as an inner lateral join where `T1` is the left input and the scalar subquery `SELECT MAX(b) FROM T2 WHERE T2.x = T1.a` is the right input. Inside the right input, `T1.a` is referenced via a `FieldReference` with `OuterReference { steps_out = 1 }` as the root.
291+
can be represented as an inner lateral join where `T1` is the left input and the scalar subquery `SELECT MAX(b) FROM T2 WHERE T2.x = T1.a` is the right input. The JoinRel has `RelCommon.id` set, and inside the right input, `T1.a` is referenced via a `FieldReference` with `OuterReference { id_reference = <JoinRel's id> }` as the root.
289292

290293
#### Permitted Join Types for Lateral
291294

292295
Because the right input only exists in the context of a specific left row, only `INNER` and left-oriented join types (`LEFT`, `LEFT_SEMI`, `LEFT_ANTI`, `LEFT_SINGLE`, `LEFT_MARK`) are valid when `lateral` is true. Right-oriented types (`RIGHT`, `RIGHT_SEMI`, `RIGHT_ANTI`, `RIGHT_SINGLE`, `RIGHT_MARK`) and `OUTER` are invalid since the right input has no independent existence outside a left row context.
293296

294297
#### Nested Lateral Joins
295298

296-
Lateral joins can introduce multiple levels of correlated subqueries. All outer input fields can be referenced using appropriate `steps_out` values with `OuterReference`:
299+
Lateral joins can introduce multiple levels of correlated subqueries. Each JoinRel with `lateral=true` must have `RelCommon.id` set so outer references can name the binding relation via `id_reference`. The `id_reference` resolves against the left input's output schema. For simple tree-shaped plans, `steps_out` may also be used:
297300

298301
```
299-
Join (left, lateral=true)
302+
JoinRel (left, lateral=true) [id=1]
300303
/ \
301-
Input1(a) Join (inner, lateral=true)
304+
Input1(a) JoinRel (inner, lateral=true) [id=2]
302305
/ \
303306
Input2(b) Subquery
304307
305308
OuterReference access within each scope:
306-
Input2 : a [steps_out=1]
307-
Subquery : a [steps_out=2], b [steps_out=1]
309+
Input2 : a [id_reference=1 or steps_out=1]
310+
Subquery : a [id_reference=1 or steps_out=2], b [id_reference=2 or steps_out=1]
308311
```
309312

310313

0 commit comments

Comments
 (0)