From 65a49a879712fb995520bd49f2b81070ac210320 Mon Sep 17 00:00:00 2001 From: YongChul Kwon Date: Thu, 2 Apr 2026 16:32:00 -0700 Subject: [PATCH 1/8] Add id-based outer reference resolution for DAG plans Add optional RelCommon.id field and OuterReference.id_reference to support unambiguous outer reference resolution in plans with common subexpressions (ReferenceRel). The existing steps_out offset-based mechanism remains for backward compatibility with tree-shaped plans. Closes #1024 --- proto/substrait/algebra.proto | 24 ++++++++++++++--- site/docs/expressions/field_references.md | 33 ++++++++++++++++++++++- site/docs/expressions/subqueries.md | 20 ++++++++++++++ 3 files changed, 72 insertions(+), 5 deletions(-) diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index 64bc59315..d954e3d9f 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -24,6 +24,12 @@ message RelCommon { Hint hint = 3; substrait.extensions.AdvancedExtension advanced_extension = 4; + // Optional plan-wide unique identifier for this relation. Required when + // this relation is the binding point for an OuterReference using + // id_reference. Must be unique across all rels within a Plan. + // Must be > 0 when set. + optional uint32 id = 5; + // Direct indicates no change on presence and ordering of fields in the output message Direct {} @@ -1590,12 +1596,22 @@ message Expression { // incoming record type message RootReference {} - // A root reference for the outer relation's subquery + // A root reference for the outer relation's subquery. + // + // At least one of steps_out or id_reference must be set. When both are + // present, id_reference is authoritative and steps_out is advisory. message OuterReference { - // number of subquery boundaries to traverse up for this field's reference - // - // This value must be >= 1 + // Number of subquery boundaries to traverse up for this field's + // reference. This value must be >= 1 when set. A value of 0 indicates + // that steps_out is not provided and id_reference must be used instead. uint32 steps_out = 1; + + // Plan-wide unique id of the relation whose output this reference is + // rooted on. Must match an id defined on a RelCommon in the plan. + // Must be > 0 when set. Recommended when the plan contains shared + // subexpressions (ReferenceRel) where offset-based resolution via + // steps_out alone would be ambiguous. + optional uint32 id_reference = 2; } // A reference to a lambda parameter within a lambda body expression. diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index 0dd052196..720068c8a 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -5,7 +5,7 @@ In Substrait, all fields are dealt with on a positional basis. Field names are o Field references can originate from different root types: - **RootReference**: References the incoming record from the relation -- **OuterReference**: References outer query records in correlated subqueries +- **OuterReference**: References outer query records in correlated subqueries, supporting both offset-based (`steps_out`) and id-based (`id_reference`) resolution (see [Outer References](#outer-references)) - **Expression**: References the result of evaluating an expression - **LambdaParameterReference**: References lambda parameters within lambda body expressions (see [Lambda Expressions](lambda_expressions.md)) @@ -155,3 +155,34 @@ By default, when only a single field is selected from a struct, that struct is r * Should we support column reordering/positioning using a masked complex expression? (Right now, you can only mask things out.) +### Outer References + +Outer references allow expressions inside a subquery to access records from an enclosing relation. The `OuterReference` root type supports two resolution strategies that can be used independently or together: + +#### `steps_out` (offset-based) + +`steps_out` resolves the reference by counting subquery boundaries upward (`steps_out >= 1`). This works correctly whenever the plan is a **tree** — each relation has exactly one parent, so the path to the binding relation is unique. + +#### `id_reference` (id-based) + +When a plan contains **common subexpressions** via `ReferenceRel`, the same relation can be reached through multiple paths with different depths. In that case, offset-based resolution is ambiguous because `steps_out` depends on *which* path is followed. + +`id_reference` resolves this by naming the binding relation directly via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (> 0) and unique across all relations in the plan. + +At least one of `steps_out` or `id_reference` must be set. When both are present, `id_reference` is authoritative and `steps_out` is advisory (useful for backward compatibility with consumers that only understand `steps_out`). + +For example, consider a plan with two nested scalar subqueries that share a common subexpression `x`. The outer reference to `tableA.a` lives inside `x`, which is reached via paths of different depth: + +``` +ProjectRel +├── ReadRel(tableA) [id=1] +└── Subquery.Scalar ── (1) + └── SetRel(MINUS_PRIMARY) + ├── Subquery.Scalar ── (2) + │ └── x: FilterRel(a > outer_ref(tableA.a)) [id=2] + │ └── ReadRel(tableB) + └── x (shared via ReferenceRel) +``` + +The common subexpression `x` contains an outer reference to `tableA.a`. It is reached through two paths — one via subquery (2), one directly — so `steps_out` would need a different value depending on which path is followed. With `id_reference = 1`, the reference inside `x` unambiguously names `ReadRel(tableA)` regardless of path. + diff --git a/site/docs/expressions/subqueries.md b/site/docs/expressions/subqueries.md index c71fee0dc..768d2b39c 100644 --- a/site/docs/expressions/subqueries.md +++ b/site/docs/expressions/subqueries.md @@ -68,6 +68,26 @@ WHERE x < ANY(SELECT y from t2) +## Outer References in Subqueries + +Subqueries may contain *outer references* — field references that reach +outside the subquery boundary to access records from an enclosing relation. +The `OuterReference` root type provides two resolution fields: + +* **`steps_out`**: Resolves the reference by counting subquery boundaries + upward. This works correctly when the plan is a tree (each relation has a + single parent). + +* **`id_reference`**: Resolves the reference by naming the binding relation + via its plan-wide unique `RelCommon.id`. This is recommended when the plan + contains shared subexpressions (`ReferenceRel`) because offset-based + resolution is ambiguous when multiple paths exist to the same relation. + +At least one must be set. When both are present, `id_reference` is +authoritative. See +[Field References — Outer References](field_references.md#outer-references) +for details. + === "Protobuf Representation" ```proto From 4c944347450829324d618c1ecb0dd0d22c1a5080 Mon Sep 17 00:00:00 2001 From: YongChul Kwon <1606237+yongchul@users.noreply.github.com> Date: Wed, 15 Apr 2026 13:50:34 -0700 Subject: [PATCH 2/8] Apply suggestions from code review Co-authored-by: Ben Bellick <36523439+benbellick@users.noreply.github.com> --- site/docs/expressions/field_references.md | 4 ++-- site/docs/expressions/subqueries.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index 720068c8a..ab69e86ef 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -161,11 +161,11 @@ Outer references allow expressions inside a subquery to access records from an e #### `steps_out` (offset-based) -`steps_out` resolves the reference by counting subquery boundaries upward (`steps_out >= 1`). This works correctly whenever the plan is a **tree** — each relation has exactly one parent, so the path to the binding relation is unique. +`steps_out` resolves the reference by counting subquery boundaries upward (`steps_out >= 1`). This works correctly whenever the plan is a **tree**, i.e when each relation has exactly one parent, the path to the binding relation can be uniquely determined via `steps_out`. #### `id_reference` (id-based) -When a plan contains **common subexpressions** via `ReferenceRel`, the same relation can be reached through multiple paths with different depths. In that case, offset-based resolution is ambiguous because `steps_out` depends on *which* path is followed. +When a plan contains **common subrelation** via `ReferenceRel`, the same relation can be reached through multiple paths with different depths. In that case, offset-based resolution is ambiguous because `steps_out` depends on *which* path is followed. `id_reference` resolves this by naming the binding relation directly via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (> 0) and unique across all relations in the plan. diff --git a/site/docs/expressions/subqueries.md b/site/docs/expressions/subqueries.md index 768d2b39c..ad5eca40c 100644 --- a/site/docs/expressions/subqueries.md +++ b/site/docs/expressions/subqueries.md @@ -70,7 +70,7 @@ WHERE x < ANY(SELECT y from t2) ## Outer References in Subqueries -Subqueries may contain *outer references* — field references that reach +Subqueries may contain *outer references*, which are field references that reach outside the subquery boundary to access records from an enclosing relation. The `OuterReference` root type provides two resolution fields: From cc1f7cd00a6d08cfc867044b679a474a42ebcd28 Mon Sep 17 00:00:00 2001 From: YongChul Kwon Date: Wed, 22 Apr 2026 09:58:26 -0700 Subject: [PATCH 3/8] Changes to oneof. update example. resolved PR feedback. --- proto/substrait/algebra.proto | 26 +++++++++----------- site/docs/expressions/field_references.md | 30 +++++++++++++---------- site/docs/expressions/subqueries.md | 12 ++++----- 3 files changed, 35 insertions(+), 33 deletions(-) diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index d954e3d9f..e7ffe57ca 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -1597,21 +1597,19 @@ message Expression { message RootReference {} // A root reference for the outer relation's subquery. - // - // At least one of steps_out or id_reference must be set. When both are - // present, id_reference is authoritative and steps_out is advisory. message OuterReference { - // Number of subquery boundaries to traverse up for this field's - // reference. This value must be >= 1 when set. A value of 0 indicates - // that steps_out is not provided and id_reference must be used instead. - uint32 steps_out = 1; - - // Plan-wide unique id of the relation whose output this reference is - // rooted on. Must match an id defined on a RelCommon in the plan. - // Must be > 0 when set. Recommended when the plan contains shared - // subexpressions (ReferenceRel) where offset-based resolution via - // steps_out alone would be ambiguous. - optional uint32 id_reference = 2; + oneof outer_reference_type { + // Number of subquery boundaries to traverse up for this field's + // reference. Must be >= 1. + uint32 steps_out = 1; + + // Plan-wide unique id of the relation whose input order this reference is + // rooted on. Must match an id defined on a RelCommon in the plan. + // Must be > 0. Must be used instead of steps_out when the plan + // contains shared relations (ReferenceRel) and offset-based + // resolution via steps_out would be ambiguous. + uint32 id_reference = 2; + } } // A reference to a lambda parameter within a lambda body expression. diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index ab69e86ef..ce38c6e1f 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -5,7 +5,7 @@ In Substrait, all fields are dealt with on a positional basis. Field names are o Field references can originate from different root types: - **RootReference**: References the incoming record from the relation -- **OuterReference**: References outer query records in correlated subqueries, supporting both offset-based (`steps_out`) and id-based (`id_reference`) resolution (see [Outer References](#outer-references)) +- **OuterReference**: References outer query records in correlated subqueries, supporting either offset-based (`steps_out`) or id-based (`id_reference`) resolution (see [Outer References](#outer-references)) - **Expression**: References the result of evaluating an expression - **LambdaParameterReference**: References lambda parameters within lambda body expressions (see [Lambda Expressions](lambda_expressions.md)) @@ -157,32 +157,36 @@ By default, when only a single field is selected from a struct, that struct is r ### Outer References -Outer references allow expressions inside a subquery to access records from an enclosing relation. The `OuterReference` root type supports two resolution strategies that can be used independently or together: +Outer references allow expressions inside a subquery to access records from an enclosing relation. The `OuterReference` root type supports two mutually exclusive resolution strategies: #### `steps_out` (offset-based) -`steps_out` resolves the reference by counting subquery boundaries upward (`steps_out >= 1`). This works correctly whenever the plan is a **tree**, i.e when each relation has exactly one parent, the path to the binding relation can be uniquely determined via `steps_out`. +`steps_out` resolves the reference by counting subquery boundaries upward (`steps_out >= 1`). This works correctly whenever the plan is a **tree**, i.e., when each relation has exactly one parent, the path to the binding relation can be uniquely determined via `steps_out`. #### `id_reference` (id-based) -When a plan contains **common subrelation** via `ReferenceRel`, the same relation can be reached through multiple paths with different depths. In that case, offset-based resolution is ambiguous because `steps_out` depends on *which* path is followed. +`id_reference` resolves the reference by referring to the binding relation via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (> 0) and unique across all relations in the plan. -`id_reference` resolves this by naming the binding relation directly via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (> 0) and unique across all relations in the plan. +#### When to use `id_reference` -At least one of `steps_out` or `id_reference` must be set. When both are present, `id_reference` is authoritative and `steps_out` is advisory (useful for backward compatibility with consumers that only understand `steps_out`). +`id_reference` must be used instead of `steps_out` when a plan contains **shared relations** via `ReferenceRel` with unresolved outer references. In such plans, the binding relation (i.e., the relation providing the actual value of the outer reference) can be reached through multiple paths with different depths, making offset-based resolution ambiguous because `steps_out` depends on *which* path is followed. -For example, consider a plan with two nested scalar subqueries that share a common subexpression `x`. The outer reference to `tableA.a` lives inside `x`, which is reached via paths of different depth: +For example, consider a plan with two nested scalar subqueries that share a common relation `x`. The outer reference to `tableA.a` lives inside `x`, which is reached via paths of different depth: ``` -ProjectRel -├── ReadRel(tableA) [id=1] +PlanRel.relations[0].rel: # let's call it 'x' +FilterRel(a > outer_ref(tableA.a)) + └── ReadRel(tableB) + +PlanRel.relations[1].root: +ProjectRel [id=1] +├── ReadRel(tableA) └── Subquery.Scalar ── (1) └── SetRel(MINUS_PRIMARY) ├── Subquery.Scalar ── (2) - │ └── x: FilterRel(a > outer_ref(tableA.a)) [id=2] - │ └── ReadRel(tableB) - └── x (shared via ReferenceRel) + │ └── ReferenceRel(0) + └── ReferenceRel(0) ``` -The common subexpression `x` contains an outer reference to `tableA.a`. It is reached through two paths — one via subquery (2), one directly — so `steps_out` would need a different value depending on which path is followed. With `id_reference = 1`, the reference inside `x` unambiguously names `ReadRel(tableA)` regardless of path. +The shared relation `x` contains an outer reference to `tableA.a`. It is reached through two paths — one via subquery (2), one directly — so `steps_out` would need a different value depending on which path is followed. With `id_reference = 1`, the reference inside `x` unambiguously names `ProjectRel` to resolve the outer reference regardless of path. diff --git a/site/docs/expressions/subqueries.md b/site/docs/expressions/subqueries.md index ad5eca40c..d1d7fdaab 100644 --- a/site/docs/expressions/subqueries.md +++ b/site/docs/expressions/subqueries.md @@ -70,7 +70,7 @@ WHERE x < ANY(SELECT y from t2) ## Outer References in Subqueries -Subqueries may contain *outer references*, which are field references that reach +Subqueries may contain *outer references*, which are field references that reach outside the subquery boundary to access records from an enclosing relation. The `OuterReference` root type provides two resolution fields: @@ -79,12 +79,12 @@ The `OuterReference` root type provides two resolution fields: single parent). * **`id_reference`**: Resolves the reference by naming the binding relation - via its plan-wide unique `RelCommon.id`. This is recommended when the plan - contains shared subexpressions (`ReferenceRel`) because offset-based - resolution is ambiguous when multiple paths exist to the same relation. + via its plan-wide unique `RelCommon.id`. Must be used instead of + `steps_out` when the plan contains shared relations (`ReferenceRel`) + because offset-based resolution is ambiguous when multiple paths exist to + the same relation. -At least one must be set. When both are present, `id_reference` is -authoritative. See +Exactly one of these fields must be set. See [Field References — Outer References](field_references.md#outer-references) for details. From 649c5ef90f6173958a3b908f7bbcf7ee53deab11 Mon Sep 17 00:00:00 2001 From: YongChul Kwon Date: Wed, 22 Apr 2026 11:13:40 -0700 Subject: [PATCH 4/8] Consistently use gte 1 --- proto/substrait/algebra.proto | 4 ++-- site/docs/expressions/field_references.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index e7ffe57ca..2bd7c25a4 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -27,7 +27,7 @@ message RelCommon { // Optional plan-wide unique identifier for this relation. Required when // this relation is the binding point for an OuterReference using // id_reference. Must be unique across all rels within a Plan. - // Must be > 0 when set. + // Must be >= 1 when set. optional uint32 id = 5; // Direct indicates no change on presence and ordering of fields in the output @@ -1605,7 +1605,7 @@ message Expression { // Plan-wide unique id of the relation whose input order this reference is // rooted on. Must match an id defined on a RelCommon in the plan. - // Must be > 0. Must be used instead of steps_out when the plan + // Must be >= 1. Must be used instead of steps_out when the plan // contains shared relations (ReferenceRel) and offset-based // resolution via steps_out would be ambiguous. uint32 id_reference = 2; diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index ce38c6e1f..e203a21bc 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -165,7 +165,7 @@ Outer references allow expressions inside a subquery to access records from an e #### `id_reference` (id-based) -`id_reference` resolves the reference by referring to the binding relation via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (> 0) and unique across all relations in the plan. +`id_reference` resolves the reference by referring to the binding relation via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (>= 1) and unique across all relations in the plan. #### When to use `id_reference` From bf65fdcc6de64d64b80d522cf1008913861bef04 Mon Sep 17 00:00:00 2001 From: YongChul Kwon <1606237+yongchul@users.noreply.github.com> Date: Thu, 23 Apr 2026 14:45:30 -0700 Subject: [PATCH 5/8] Apply suggestions from code review Co-authored-by: Ben Bellick <36523439+benbellick@users.noreply.github.com> --- site/docs/expressions/subqueries.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/site/docs/expressions/subqueries.md b/site/docs/expressions/subqueries.md index d1d7fdaab..ad1282865 100644 --- a/site/docs/expressions/subqueries.md +++ b/site/docs/expressions/subqueries.md @@ -74,11 +74,11 @@ Subqueries may contain *outer references*, which are field references that reach outside the subquery boundary to access records from an enclosing relation. The `OuterReference` root type provides two resolution fields: -* **`steps_out`**: Resolves the reference by counting subquery boundaries +* `steps_out`: Resolves the reference by counting subquery boundaries upward. This works correctly when the plan is a tree (each relation has a single parent). -* **`id_reference`**: Resolves the reference by naming the binding relation +* `id_reference`: Resolves the reference by naming the binding relation via its plan-wide unique `RelCommon.id`. Must be used instead of `steps_out` when the plan contains shared relations (`ReferenceRel`) because offset-based resolution is ambiguous when multiple paths exist to From 0471b8c76f60748992dd40cf31d16b0a26e2a260 Mon Sep 17 00:00:00 2001 From: YongChul Kwon Date: Thu, 30 Apr 2026 11:58:05 -0700 Subject: [PATCH 6/8] Address PR feedback --- site/docs/expressions/field_references.md | 38 ++++++++++++++---- site/docs/relations/common_fields.md | 4 ++ site/docs/relations/logical_relations.md | 4 ++ .../outer_reference_id_reference.textproto | 39 +++++++++++++++++++ .../outer_reference_steps_out.textproto | 31 +++++++++++++++ 5 files changed, 108 insertions(+), 8 deletions(-) create mode 100644 site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto create mode 100644 site/examples/proto-textformat/field_reference/outer_reference_steps_out.textproto diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index e203a21bc..73d4cc1c3 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -167,26 +167,48 @@ Outer references allow expressions inside a subquery to access records from an e `id_reference` resolves the reference by referring to the binding relation via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (>= 1) and unique across all relations in the plan. +#### Coexistence rules + +Exactly one of `steps_out` or `id_reference` must be set on each `OuterReference`. A single plan may contain outer references using different strategies (e.g., some using `steps_out` and others using `id_reference`), as long as every individual reference is unambiguous. However, if any shared relation (via `ReferenceRel`) contains an unresolved outer reference, that reference **must** use `id_reference`. + #### When to use `id_reference` -`id_reference` must be used instead of `steps_out` when a plan contains **shared relations** via `ReferenceRel` with unresolved outer references. In such plans, the binding relation (i.e., the relation providing the actual value of the outer reference) can be reached through multiple paths with different depths, making offset-based resolution ambiguous because `steps_out` depends on *which* path is followed. +`id_reference` must be used instead of `steps_out` when a plan contains **shared relations** via `ReferenceRel` with unresolved outer references in the shared relations. In such plans, the binding relation (i.e., the relation providing the actual value of the outer reference) can be reached through multiple paths with different depths, making offset-based resolution ambiguous because `steps_out` depends on *which* path is followed. For example, consider a plan with two nested scalar subqueries that share a common relation `x`. The outer reference to `tableA.a` lives inside `x`, which is reached via paths of different depth: ``` PlanRel.relations[0].rel: # let's call it 'x' -FilterRel(a > outer_ref(tableA.a)) +FilterRel(a > outer_ref(steps_out=1, tableA.a)) # steps_out 1 or 2? └── ReadRel(tableB) PlanRel.relations[1].root: -ProjectRel [id=1] +ProjectRel # Correct binding for tableA.a for the outer reference tableA.a in x. ├── ReadRel(tableA) -└── Subquery.Scalar ── (1) +└── Subquery.Scalar # Subquery (1) └── SetRel(MINUS_PRIMARY) - ├── Subquery.Scalar ── (2) - │ └── ReferenceRel(0) - └── ReferenceRel(0) + ├── ProjectRel + | └── Subquery.Scalar # Subquery (2) + │ └── ReferenceRel(0) # Reference 1: correct steps_out = 2 + └── ReferenceRel(0) # Reference 2: correct steps_out = 1 ``` -The shared relation `x` contains an outer reference to `tableA.a`. It is reached through two paths — one via subquery (2), one directly — so `steps_out` would need a different value depending on which path is followed. With `id_reference = 1`, the reference inside `x` unambiguously names `ProjectRel` to resolve the outer reference regardless of path. +From the reference 1, the correct `steps_out` is 2 because it needs to go through 2 subqueries to reach the ProjectRel. From the reference 2, the correct `steps_out` is 1 because it only needs to go over 1 subquery. Thus, the outer reference is malformed. + +With `id_reference`, both reference rels can unambiguously refer to the correct binding. +``` +PlanRel.relations[0].rel: # let's call it 'x' +FilterRel(a > outer_ref(id_reference=7, tableA.a)) + └── ReadRel(tableB) + +PlanRel.relations[1].root: +ProjectRel [id=7] # Correct binding for tableA.a for the outer reference tableA.a in x. +├── ReadRel(tableA) +└── Subquery.Scalar # Subquery (1) + └── SetRel(MINUS_PRIMARY) + ├── ProjectRel + | └── Subquery.Scalar # Subquery (2) + │ └── ReferenceRel(0) # Reference 1: id_reference = 7 + └── ReferenceRel(0) # Reference 2: id_reference = 7 +``` diff --git a/site/docs/relations/common_fields.md b/site/docs/relations/common_fields.md index 37f0d4cf4..99ee26543 100644 --- a/site/docs/relations/common_fields.md +++ b/site/docs/relations/common_fields.md @@ -12,6 +12,10 @@ A relation which has a direct emit kind outputs the relation's output without re * Many relations (such as Project) by default provide as their output the list of all their input columns plus any generated columns as its output columns. Review each relation to understand its specific output default. +## Relation ID + +A relation may carry an optional plan-wide unique identifier (`id`). When set, the value must be >= 1 and unique across all relations in the plan. This identifier is required when the relation is the binding point for an `OuterReference` that uses `id_reference` resolution. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details. + ## Hints Hints provide information that can improve performance but cannot be used to control the behavior. Table statistics, runtime constraints, name hints, and saved computations all fall into this category. diff --git a/site/docs/relations/logical_relations.md b/site/docs/relations/logical_relations.md index a6b0990aa..d4ae3fe11 100644 --- a/site/docs/relations/logical_relations.md +++ b/site/docs/relations/logical_relations.md @@ -425,6 +425,10 @@ We could use the `ReferenceRel` to highlight the shared `A JOIN B` between the t One expressing `A JOIN B` (in position 0 in the plan), one using reference as follows: `ReferenceRel(0) JOIN C` and a third one doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B`. +!!! note "Outer references in shared relations" + + When a shared relation contains an unresolved outer reference, the reference must use `id_reference` instead of `steps_out`, because a `ReferenceRel` can be reached through multiple paths of different depths, making offset-based resolution ambiguous. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details. + | Signature | Value | | -------------------- |---------------------------------------| | Inputs | 1 | diff --git a/site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto b/site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto new file mode 100644 index 000000000..53738bea7 --- /dev/null +++ b/site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto @@ -0,0 +1,39 @@ +# Outer reference using id_reference (id-based resolution) +# +# Scenario: A shared relation (via ReferenceRel) contains a correlated +# filter that references a column from an enclosing relation. Because the +# shared relation can be reached through multiple paths of different +# depths, offset-based resolution (steps_out) would be ambiguous. +# id_reference resolves this by naming the binding relation directly. +# +# Plan structure: +# +# PlanRel.relations[0].rel: (shared relation "x") +# FilterRel(col > outer_ref(id_reference=7, position 0)) +# └── ReadRel(tableB) +# +# PlanRel.relations[1].root: +# ProjectRel [id=7] <-- binding relation +# ├── ReadRel(tableA) +# └── Subquery.Scalar +# └── SetRel(MINUS_PRIMARY) +# ├── ProjectRel +# │ └── Subquery.Scalar +# │ └── ReferenceRel(0) (depth 2 from binding) +# └── ReferenceRel(0) (depth 1 from binding) +# +# Both ReferenceRel nodes point to the same shared relation, but they +# sit at different depths. id_reference = 7 unambiguously resolves +# to the ProjectRel whose RelCommon.id = 7, regardless of which path +# is taken. +# +# message Expression.FieldReference + +outer_reference: { + id_reference: 7 # Refers to the relation with RelCommon.id = 7 +} +direct_reference: { + struct_field: { + field: 0 # First column of the binding relation (tableA.a) + } +} diff --git a/site/examples/proto-textformat/field_reference/outer_reference_steps_out.textproto b/site/examples/proto-textformat/field_reference/outer_reference_steps_out.textproto new file mode 100644 index 000000000..a04d89f10 --- /dev/null +++ b/site/examples/proto-textformat/field_reference/outer_reference_steps_out.textproto @@ -0,0 +1,31 @@ +# Outer reference using steps_out (offset-based resolution) +# +# Scenario: A correlated scalar subquery where the inner filter references +# a column from the outer query. +# +# SQL equivalent: +# SELECT * +# FROM orders -- outer relation +# WHERE amount > ( +# SELECT AVG(amount) -- scalar subquery +# FROM orders AS o2 +# WHERE o2.customer_id = orders.customer_id -- outer reference +# ) +# +# The outer reference `orders.customer_id` is one subquery boundary up, +# so steps_out = 1. The referenced field is at position 0 (customer_id) +# in the outer relation's output. +# +# steps_out works here because the plan is a tree (each relation has +# exactly one parent), so the path to the binding relation is unambiguous. +# +# message Expression.FieldReference + +outer_reference: { + steps_out: 1 # One subquery boundary up to the enclosing relation +} +direct_reference: { + struct_field: { + field: 0 # First column of the outer relation (customer_id) + } +} From 11560b9a90942ddc76478de6fbec55e8653e6cc9 Mon Sep 17 00:00:00 2001 From: YongChul Kwon <1606237+yongchul@users.noreply.github.com> Date: Thu, 30 Apr 2026 14:43:08 -0700 Subject: [PATCH 7/8] Apply suggestions from code review Co-authored-by: Ben Bellick <36523439+benbellick@users.noreply.github.com> --- proto/substrait/algebra.proto | 10 +- site/docs/expressions/field_references.md | 8 +- site/docs/expressions/subqueries.md | 186 +++++++++++----------- 3 files changed, 102 insertions(+), 102 deletions(-) diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index 521b9b396..b2e78dc79 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -1629,11 +1629,11 @@ message Expression { // reference. Must be >= 1. uint32 steps_out = 1; - // Plan-wide unique id of the relation whose input order this reference is - // rooted on. Must match an id defined on a RelCommon in the plan. - // Must be >= 1. Must be used instead of steps_out when the plan - // contains shared relations (ReferenceRel) and offset-based - // resolution via steps_out would be ambiguous. + // References the plan-wide unique id of the relation that this field reference + // is rooted on. Must match an id defined on a RelCommon in the plan. + // Must be >= 1. Must be used instead of steps_out when this outer reference + // appears inside a relation shared via ReferenceRel and offset-based resolution + // via steps_out would be ambiguous. uint32 id_reference = 2; } } diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index 73d4cc1c3..abf6adca4 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -173,7 +173,7 @@ Exactly one of `steps_out` or `id_reference` must be set on each `OuterReference #### When to use `id_reference` -`id_reference` must be used instead of `steps_out` when a plan contains **shared relations** via `ReferenceRel` with unresolved outer references in the shared relations. In such plans, the binding relation (i.e., the relation providing the actual value of the outer reference) can be reached through multiple paths with different depths, making offset-based resolution ambiguous because `steps_out` depends on *which* path is followed. +`id_reference` must be used instead of `steps_out` when an outer reference appears inside a relation shared via `ReferenceRel` and the shared relation can be reached through multiple paths with different subquery depths, making `steps_out` ambiguous. In this case, the same outer reference could require different `steps_out` values depending on which path is followed. For example, consider a plan with two nested scalar subqueries that share a common relation `x`. The outer reference to `tableA.a` lives inside `x`, which is reached via paths of different depth: @@ -189,11 +189,11 @@ ProjectRel # Correct binding for tableA.a for the outer reference tableA.a in x. └── SetRel(MINUS_PRIMARY) ├── ProjectRel | └── Subquery.Scalar # Subquery (2) - │ └── ReferenceRel(0) # Reference 1: correct steps_out = 2 - └── ReferenceRel(0) # Reference 2: correct steps_out = 1 + │ └── ReferenceRel(0) # Here steps_out=1 binds incorrectly, because tableA.a is actually two subquery boundaries out. + └── ReferenceRel(0) # Here steps_out=1 binds correctly, because tableA.a is one subquery boundary out. ``` -From the reference 1, the correct `steps_out` is 2 because it needs to go through 2 subqueries to reach the ProjectRel. From the reference 2, the correct `steps_out` is 1 because it only needs to go over 1 subquery. Thus, the outer reference is malformed. +The same shared relation `x` contains a single stored `steps_out=1` outer reference, but that value is only correct for one of its uses. The other use would need `steps_out=2`, so offset-based resolution is ambiguous. With `id_reference`, both reference rels can unambiguously refer to the correct binding. diff --git a/site/docs/expressions/subqueries.md b/site/docs/expressions/subqueries.md index ad1282865..66dbd2731 100644 --- a/site/docs/expressions/subqueries.md +++ b/site/docs/expressions/subqueries.md @@ -1,95 +1,95 @@ -# Subqueries - -Subqueries are scalar expressions comprised of another query. - -## Forms - -### Scalar - -Scalar subqueries are subqueries that return one row and one column. - -| Property | Description | Required | -| -------- | -------------- | -------- | -| Input | Input relation | Yes | - -### `IN` predicate - -An `IN` subquery predicate checks that the left expression is contained in the -right subquery. - -#### Examples - -```sql -SELECT * -FROM t1 -WHERE x IN (SELECT * FROM t2) -``` - -```sql -SELECT * -FROM t1 -WHERE (x, y) IN (SELECT a, b FROM t2) -``` - -| Property | Description | Required | -| -------- | ------------------------------------------- | -------- | -| Needles | Expressions whose existence will be checked | Yes | -| Haystack | Subquery to check | Yes | - -### Set predicates - -A set predicate is a predicate over a set of rows in the form of a subquery. - -`EXISTS` and `UNIQUE` are common SQL spellings of these kinds of predicates. - -| Property | Description | Required | -| --------- | ------------------------------------------ | -------- | -| Operation | The operation to perform over the set | Yes | -| Tuples | Set of tuples to check using the operation | Yes | - -### Set comparisons - -A set comparison subquery is a subquery comparison using `ANY` or `ALL` operations. - -#### Examples - -```sql -SELECT * -FROM t1 -WHERE x < ANY(SELECT y from t2) -``` - -| Property | Description | Required | -| --------------------- | ---------------------------------------------- | -------- | -| Reduction operation | The kind of reduction to use over the subquery | Yes | -| Comparison operation | The kind of comparison operation to use | Yes | -| Expression | Left-hand side expression to check | Yes | -| Subquery | Subquery to check | Yes | - - - -## Outer References in Subqueries - -Subqueries may contain *outer references*, which are field references that reach -outside the subquery boundary to access records from an enclosing relation. -The `OuterReference` root type provides two resolution fields: - +# Subqueries + +Subqueries are scalar expressions comprised of another query. + +## Forms + +### Scalar + +Scalar subqueries are subqueries that return one row and one column. + +| Property | Description | Required | +| -------- | -------------- | -------- | +| Input | Input relation | Yes | + +### `IN` predicate + +An `IN` subquery predicate checks that the left expression is contained in the +right subquery. + +#### Examples + +```sql +SELECT * +FROM t1 +WHERE x IN (SELECT * FROM t2) +``` + +```sql +SELECT * +FROM t1 +WHERE (x, y) IN (SELECT a, b FROM t2) +``` + +| Property | Description | Required | +| -------- | ------------------------------------------- | -------- | +| Needles | Expressions whose existence will be checked | Yes | +| Haystack | Subquery to check | Yes | + +### Set predicates + +A set predicate is a predicate over a set of rows in the form of a subquery. + +`EXISTS` and `UNIQUE` are common SQL spellings of these kinds of predicates. + +| Property | Description | Required | +| --------- | ------------------------------------------ | -------- | +| Operation | The operation to perform over the set | Yes | +| Tuples | Set of tuples to check using the operation | Yes | + +### Set comparisons + +A set comparison subquery is a subquery comparison using `ANY` or `ALL` operations. + +#### Examples + +```sql +SELECT * +FROM t1 +WHERE x < ANY(SELECT y from t2) +``` + +| Property | Description | Required | +| --------------------- | ---------------------------------------------- | -------- | +| Reduction operation | The kind of reduction to use over the subquery | Yes | +| Comparison operation | The kind of comparison operation to use | Yes | +| Expression | Left-hand side expression to check | Yes | +| Subquery | Subquery to check | Yes | + + + +## Outer References in Subqueries + +Subqueries may contain *outer references*, which are field references that reach +outside the subquery boundary to access records from an enclosing relation. +The `OuterReference` root type provides two resolution fields: + * `steps_out`: Resolves the reference by counting subquery boundaries - upward. This works correctly when the plan is a tree (each relation has a - single parent). - + upward. This works correctly when the plan is a tree (each relation has a + single parent). + * `id_reference`: Resolves the reference by naming the binding relation - via its plan-wide unique `RelCommon.id`. Must be used instead of - `steps_out` when the plan contains shared relations (`ReferenceRel`) - because offset-based resolution is ambiguous when multiple paths exist to - the same relation. - -Exactly one of these fields must be set. See -[Field References — Outer References](field_references.md#outer-references) -for details. - -=== "Protobuf Representation" - - ```proto -%%% proto.message.Expression.Subquery %%% - ``` + via its plan-wide unique `RelCommon.id`. Must be used instead of + `steps_out` when an outer reference appears inside a relation shared via + `ReferenceRel` and that shared relation can be reached through multiple + paths with different subquery depths, making `steps_out` ambiguous. + +Exactly one of these fields must be set. See +[Field References — Outer References](field_references.md#outer-references) +for details. + +=== "Protobuf Representation" + + ```proto +%%% proto.message.Expression.Subquery %%% + ``` From 0981644fc3724ec7b0deb008afd35a834a779bfe Mon Sep 17 00:00:00 2001 From: YongChul Kwon Date: Thu, 30 Apr 2026 15:03:44 -0700 Subject: [PATCH 8/8] rename id to rel_anchor. id_reference to rel_reference. --- proto/substrait/algebra.proto | 16 +++++------ site/docs/expressions/field_references.md | 28 +++++++++---------- site/docs/expressions/subqueries.md | 13 +++++---- site/docs/relations/common_fields.md | 2 +- site/docs/relations/logical_relations.md | 2 +- ...> outer_reference_rel_reference.textproto} | 18 ++++++------ 6 files changed, 41 insertions(+), 38 deletions(-) rename site/examples/proto-textformat/field_reference/{outer_reference_id_reference.textproto => outer_reference_rel_reference.textproto} (63%) diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index b2e78dc79..6fa504dc8 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -26,9 +26,9 @@ message RelCommon { // Optional plan-wide unique identifier for this relation. Required when // this relation is the binding point for an OuterReference using - // id_reference. Must be unique across all rels within a Plan. + // rel_reference. Must be unique across all rels within a Plan. // Must be >= 1 when set. - optional uint32 id = 5; + optional uint32 rel_anchor = 5; // Direct indicates no change on presence and ordering of fields in the output message Direct {} @@ -1629,12 +1629,12 @@ message Expression { // reference. Must be >= 1. uint32 steps_out = 1; - // References the plan-wide unique id of the relation that this field reference - // is rooted on. Must match an id defined on a RelCommon in the plan. - // Must be >= 1. Must be used instead of steps_out when this outer reference - // appears inside a relation shared via ReferenceRel and offset-based resolution - // via steps_out would be ambiguous. - uint32 id_reference = 2; + // References the plan-wide unique rel_anchor of the relation that this + // field reference is rooted on. Must match a rel_anchor defined on a + // RelCommon in the plan. Must be >= 1. Must be used instead of steps_out + // when this outer reference appears inside a relation shared via + // ReferenceRel and offset-based resolution via steps_out would be ambiguous. + uint32 rel_reference = 2; } } diff --git a/site/docs/expressions/field_references.md b/site/docs/expressions/field_references.md index abf6adca4..0f9123f5f 100644 --- a/site/docs/expressions/field_references.md +++ b/site/docs/expressions/field_references.md @@ -5,7 +5,7 @@ In Substrait, all fields are dealt with on a positional basis. Field names are o Field references can originate from different root types: - **RootReference**: References the incoming record from the relation -- **OuterReference**: References outer query records in correlated subqueries, supporting either offset-based (`steps_out`) or id-based (`id_reference`) resolution (see [Outer References](#outer-references)) +- **OuterReference**: References outer query records in correlated subqueries, supporting either offset-based (`steps_out`) or id-based (`rel_reference`) resolution (see [Outer References](#outer-references)) - **Expression**: References the result of evaluating an expression - **LambdaParameterReference**: References lambda parameters within lambda body expressions (see [Lambda Expressions](lambda_expressions.md)) @@ -163,17 +163,17 @@ Outer references allow expressions inside a subquery to access records from an e `steps_out` resolves the reference by counting subquery boundaries upward (`steps_out >= 1`). This works correctly whenever the plan is a **tree**, i.e., when each relation has exactly one parent, the path to the binding relation can be uniquely determined via `steps_out`. -#### `id_reference` (id-based) +#### `rel_reference` (id-based) -`id_reference` resolves the reference by referring to the binding relation via its plan-wide unique `RelCommon.id`. The `id` on the referenced relation must be set (>= 1) and unique across all relations in the plan. +`rel_reference` resolves the reference by referring to the binding relation via its plan-wide unique `RelCommon.rel_anchor`. The `rel_anchor` on the referenced relation must be set (>= 1) and unique across all relations in the plan. #### Coexistence rules -Exactly one of `steps_out` or `id_reference` must be set on each `OuterReference`. A single plan may contain outer references using different strategies (e.g., some using `steps_out` and others using `id_reference`), as long as every individual reference is unambiguous. However, if any shared relation (via `ReferenceRel`) contains an unresolved outer reference, that reference **must** use `id_reference`. +Exactly one of `steps_out` or `rel_reference` must be set on each `OuterReference`. A single plan may contain outer references using different strategies (e.g., some using `steps_out` and others using `rel_reference`), as long as every individual reference is unambiguous. However, if any shared relation (via `ReferenceRel`) contains an unresolved outer reference, that reference **must** use `rel_reference`. -#### When to use `id_reference` +#### When to use `rel_reference` -`id_reference` must be used instead of `steps_out` when an outer reference appears inside a relation shared via `ReferenceRel` and the shared relation can be reached through multiple paths with different subquery depths, making `steps_out` ambiguous. In this case, the same outer reference could require different `steps_out` values depending on which path is followed. +`rel_reference` must be used instead of `steps_out` when an outer reference appears inside a relation shared via `ReferenceRel` and the shared relation can be reached through multiple paths with different subquery depths, making `steps_out` ambiguous. In this case, the same outer reference could require different `steps_out` values depending on which path is followed. For example, consider a plan with two nested scalar subqueries that share a common relation `x`. The outer reference to `tableA.a` lives inside `x`, which is reached via paths of different depth: @@ -189,26 +189,26 @@ ProjectRel # Correct binding for tableA.a for the outer reference tableA.a in x. └── SetRel(MINUS_PRIMARY) ├── ProjectRel | └── Subquery.Scalar # Subquery (2) - │ └── ReferenceRel(0) # Here steps_out=1 binds incorrectly, because tableA.a is actually two subquery boundaries out. - └── ReferenceRel(0) # Here steps_out=1 binds correctly, because tableA.a is one subquery boundary out. + │ └── ReferenceRel(0) # Here steps_out=1 binds incorrectly, because tableA.a is actually two subquery boundaries out. + └── ReferenceRel(0) # Here steps_out=1 binds correctly, because tableA.a is one subquery boundary out. ``` -The same shared relation `x` contains a single stored `steps_out=1` outer reference, but that value is only correct for one of its uses. The other use would need `steps_out=2`, so offset-based resolution is ambiguous. +The same shared relation `x` contains a single stored `steps_out=1` outer reference, but that value is only correct for one of its uses. The other use would need `steps_out=2`, so offset-based resolution is ambiguous. -With `id_reference`, both reference rels can unambiguously refer to the correct binding. +With `rel_reference`, both reference rels can unambiguously refer to the correct binding. ``` PlanRel.relations[0].rel: # let's call it 'x' -FilterRel(a > outer_ref(id_reference=7, tableA.a)) +FilterRel(a > outer_ref(rel_reference=7, tableA.a)) └── ReadRel(tableB) PlanRel.relations[1].root: -ProjectRel [id=7] # Correct binding for tableA.a for the outer reference tableA.a in x. +ProjectRel [rel_anchor=7] # Correct binding for tableA.a for the outer reference tableA.a in x. ├── ReadRel(tableA) └── Subquery.Scalar # Subquery (1) └── SetRel(MINUS_PRIMARY) ├── ProjectRel | └── Subquery.Scalar # Subquery (2) - │ └── ReferenceRel(0) # Reference 1: id_reference = 7 - └── ReferenceRel(0) # Reference 2: id_reference = 7 + │ └── ReferenceRel(0) # Reference 1: rel_reference = 7 + └── ReferenceRel(0) # Reference 2: rel_reference = 7 ``` diff --git a/site/docs/expressions/subqueries.md b/site/docs/expressions/subqueries.md index 66dbd2731..655bb024d 100644 --- a/site/docs/expressions/subqueries.md +++ b/site/docs/expressions/subqueries.md @@ -78,11 +78,14 @@ The `OuterReference` root type provides two resolution fields: upward. This works correctly when the plan is a tree (each relation has a single parent). -* `id_reference`: Resolves the reference by naming the binding relation - via its plan-wide unique `RelCommon.id`. Must be used instead of - `steps_out` when an outer reference appears inside a relation shared via - `ReferenceRel` and that shared relation can be reached through multiple - paths with different subquery depths, making `steps_out` ambiguous. +* `rel_reference`: Resolves the reference by naming the binding relation + via its plan-wide unique `RelCommon.rel_anchor`. Must be used instead of + `steps_out` when an outer reference appears inside a relation shared via + + `ReferenceRel` and that shared relation can be reached through multiple + + paths with different subquery depths, making `steps_out` ambiguous. + Exactly one of these fields must be set. See [Field References — Outer References](field_references.md#outer-references) diff --git a/site/docs/relations/common_fields.md b/site/docs/relations/common_fields.md index 99ee26543..fa3316992 100644 --- a/site/docs/relations/common_fields.md +++ b/site/docs/relations/common_fields.md @@ -14,7 +14,7 @@ A relation which has a direct emit kind outputs the relation's output without re ## Relation ID -A relation may carry an optional plan-wide unique identifier (`id`). When set, the value must be >= 1 and unique across all relations in the plan. This identifier is required when the relation is the binding point for an `OuterReference` that uses `id_reference` resolution. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details. +A relation may carry an optional plan-wide unique identifier (`rel_anchor`). When set, the value must be >= 1 and unique across all relations in the plan. This identifier is required when the relation is the binding point for an `OuterReference` that uses `rel_reference` resolution. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details. ## Hints diff --git a/site/docs/relations/logical_relations.md b/site/docs/relations/logical_relations.md index d4ae3fe11..9d08e37f3 100644 --- a/site/docs/relations/logical_relations.md +++ b/site/docs/relations/logical_relations.md @@ -427,7 +427,7 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B !!! note "Outer references in shared relations" - When a shared relation contains an unresolved outer reference, the reference must use `id_reference` instead of `steps_out`, because a `ReferenceRel` can be reached through multiple paths of different depths, making offset-based resolution ambiguous. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details. + When a shared relation contains an unresolved outer reference, the reference must use `rel_reference` instead of `steps_out`, because a `ReferenceRel` can be reached through multiple paths of different depths, making offset-based resolution ambiguous. See [Field References — Outer References](../expressions/field_references.md#outer-references) for details. | Signature | Value | | -------------------- |---------------------------------------| diff --git a/site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto b/site/examples/proto-textformat/field_reference/outer_reference_rel_reference.textproto similarity index 63% rename from site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto rename to site/examples/proto-textformat/field_reference/outer_reference_rel_reference.textproto index 53738bea7..30dbc8eb0 100644 --- a/site/examples/proto-textformat/field_reference/outer_reference_id_reference.textproto +++ b/site/examples/proto-textformat/field_reference/outer_reference_rel_reference.textproto @@ -1,19 +1,19 @@ -# Outer reference using id_reference (id-based resolution) +# Outer reference using rel_reference (id-based resolution) # # Scenario: A shared relation (via ReferenceRel) contains a correlated # filter that references a column from an enclosing relation. Because the # shared relation can be reached through multiple paths of different # depths, offset-based resolution (steps_out) would be ambiguous. -# id_reference resolves this by naming the binding relation directly. +# rel_reference resolves this by naming the binding relation directly. # # Plan structure: # # PlanRel.relations[0].rel: (shared relation "x") -# FilterRel(col > outer_ref(id_reference=7, position 0)) +# FilterRel(col > outer_ref(rel_reference=7, position 0)) # └── ReadRel(tableB) # # PlanRel.relations[1].root: -# ProjectRel [id=7] <-- binding relation +# ProjectRel [rel_anchor=7] <-- binding relation # ├── ReadRel(tableA) # └── Subquery.Scalar # └── SetRel(MINUS_PRIMARY) @@ -23,17 +23,17 @@ # └── ReferenceRel(0) (depth 1 from binding) # # Both ReferenceRel nodes point to the same shared relation, but they -# sit at different depths. id_reference = 7 unambiguously resolves -# to the ProjectRel whose RelCommon.id = 7, regardless of which path -# is taken. +# sit at different depths. rel_reference = 7 unambiguously resolves +# to the ProjectRel whose RelCommon.rel_anchor = 7, regardless of which +# path is taken. # # message Expression.FieldReference outer_reference: { - id_reference: 7 # Refers to the relation with RelCommon.id = 7 + rel_reference: 7 # Refers to the relation with RelCommon.rel_anchor = 7 } direct_reference: { struct_field: { - field: 0 # First column of the binding relation (tableA.a) + field: 0 # First column of the binding relation (tableA.a) } }