You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add lateral support to CrossRel and integrate id-based outer references
- Add bool lateral field to CrossRel for lateral cross product semantics
- Add RelCommon.id and OuterReference.id_reference from PR substrait-io#1031
- Update JoinRel and CrossRel lateral comments to use id_reference
- Update logical_relations.md with lateral docs for both JoinRel and CrossRel
Depends-on: substrait-io#1031
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `id_reference` pointing to this CrossRel's `RelCommon.id`. | Required |
222
+
| Lateral | When true, the right input is evaluated once per row of the left input (lateral semantics). This CrossRel must have `RelCommon.id` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `id_reference` pointing to this CrossRel's id (preferred) or `steps_out = 1` for simple tree-shaped plans. The `id_reference` resolves against the left input's output schema. When false (default), both inputs are independent and the result is the standard Cartesian product. | Optional (default: false) |
222
223
223
224
224
225
=== "CrossRel Message"
@@ -245,11 +246,11 @@ The join operation will combine two separate inputs into a single output, based
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference`. | Required |
249
+
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `id_reference` pointing to this JoinRel's `RelCommon.id`. | Required |
249
250
| Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. Field references correspond to the input order of the data. | Required. Can be the literal True. |
250
251
| Post-Join Filter | An optional boolean condition applied to the output of the join. Semantically equivalent to placing a [Filter](#filter-operation) directly above the join. Does not influence which rows are considered matches. Field references correspond to the direct output order of the join operation. | Optional, defaults to True. |
251
252
| Join Type | One of the join types defined below. | Required |
252
-
| Lateral | When true, the right input is evaluated once per row of the left input (lateral join / correlated subquery). The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type` and `steps_out = 1`. When false (default), both inputs are independent. See [Lateral Joins](#lateral-joins) for details.| Optional (default: false) |
253
+
| Lateral | When true, the right input is evaluated once per row of the left input (lateral join / correlated subquery). This JoinRel must have `RelCommon.id` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `id_reference` pointing to this JoinRel's id (preferred) or `steps_out = 1` for simple tree-shaped plans. The `id_reference` resolves against the left input's output schema. When false (default), both inputs are independent. See [Lateral Joins](#lateral-joins) for details.| Optional (default: false) |
253
254
254
255
### Join Types
255
256
@@ -277,34 +278,36 @@ The join operation will combine two separate inputs into a single output, based
277
278
278
279
### Lateral Joins
279
280
280
-
When the `lateral` flag is set to true, the join operates as a lateral (correlated) join. The right input is evaluated once per row of the left input, and the right input may reference fields from the current left row using a `FieldReference` with `OuterReference` as the `root_type` and `steps_out = 1`.
281
+
When the `lateral` flag is set to true, the join operates as a lateral (correlated) join. The right input is evaluated once per row of the left input, and the right input may reference fields from the current left row using a `FieldReference` with `OuterReference` as the `root_type`.
282
+
283
+
This JoinRel must have `RelCommon.id` set. The right input should use `OuterReference` with `id_reference` pointing to this JoinRel's id. The `id_reference` resolves against the left input's output schema. For simple tree-shaped plans, `steps_out = 1` may also be used. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details on `steps_out` vs `id_reference`.
281
284
282
285
For example, the SQL query:
283
286
284
287
```sql
285
288
SELECT a, (SELECTMAX(b) FROM T2 WHERET2.x=T1.a) FROM T1
286
289
```
287
290
288
-
can be represented as an inner lateral join where `T1` is the left input and the scalar subquery `SELECT MAX(b) FROM T2 WHERE T2.x = T1.a` is the right input. Inside the right input, `T1.a` is referenced via a `FieldReference` with `OuterReference { steps_out = 1 }` as the root.
291
+
can be represented as an inner lateral join where `T1` is the left input and the scalar subquery `SELECT MAX(b) FROM T2 WHERE T2.x = T1.a` is the right input. The JoinRel has `RelCommon.id` set, and inside the right input, `T1.a` is referenced via a `FieldReference` with `OuterReference { id_reference = <JoinRel's id> }` as the root.
289
292
290
293
#### Permitted Join Types for Lateral
291
294
292
295
Because the right input only exists in the context of a specific left row, only `INNER` and left-oriented join types (`LEFT`, `LEFT_SEMI`, `LEFT_ANTI`, `LEFT_SINGLE`, `LEFT_MARK`) are valid when `lateral` is true. Right-oriented types (`RIGHT`, `RIGHT_SEMI`, `RIGHT_ANTI`, `RIGHT_SINGLE`, `RIGHT_MARK`) and `OUTER` are invalid since the right input has no independent existence outside a left row context.
293
296
294
297
#### Nested Lateral Joins
295
298
296
-
Lateral joins can introduce multiple levels of correlated subqueries. All outer input fields can be referenced using appropriate `steps_out` values with `OuterReference`:
299
+
Lateral joins can introduce multiple levels of correlated subqueries. Each JoinRel with `lateral=true` must have `RelCommon.id` set so outer references can name the binding relation via `id_reference`. The `id_reference` resolves against the left input's output schema. For simple tree-shaped plans, `steps_out` may also be used:
297
300
298
301
```
299
-
Join (left, lateral=true)
302
+
JoinRel (left, lateral=true) [id=1]
300
303
/ \
301
-
Input1(a) Join (inner, lateral=true)
304
+
Input1(a) JoinRel (inner, lateral=true) [id=2]
302
305
/ \
303
306
Input2(b) Subquery
304
307
305
308
OuterReference access within each scope:
306
-
Input2 : a [steps_out=1]
307
-
Subquery : a [steps_out=2], b [steps_out=1]
309
+
Input2 : a [id_reference=1 or steps_out=1]
310
+
Subquery : a [id_reference=1 or steps_out=2], b [id_reference=2 or steps_out=1]
0 commit comments