[fix](variant) fix array subscript on pruned variant subpath#63891
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR fixes a planner bug in variant subpath pruning where array subscripts / element_at could remain attached to the original (unpruned) variant access chain after pruning, causing valid 1-based indexing to incorrectly return NULL. It updates the Nereids pruning rewrite to ensure outer array access uses the pruned subpath slot, and adds both FE unit coverage and regression coverage (nested-group and non-nested-group).
Changes:
- Fix
VariantSubPathPruningproject rewrite to replace nestedelement_atchains inside a top-levelElementAtprojection using the pruned-slot map. - Add FE planner unit test asserting the pruned subpath is used for array subscript projections.
- Add two regression tests covering both nested-group and plain
VARIANTarray-leaf cases for CIR-20316.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/VariantSubPathPruning.java |
Ensures LogicalProject rewrites replace nested variant element_at chains inside top-level ElementAt projections so subscripts apply to the pruned subpath slot. |
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java |
Adds a unit test to validate explain/slot-subpath behavior for cast(v['items']['type'] as array<string>)[1]. |
regression-test/suites/variant_p0/test_variant_array_subscript_cir_20316.groovy |
Adds non-nested-group regression verifying 1-based array subscripts and element_at return correct non-NULL results after pruning. |
regression-test/suites/variant_p0/nested_group/test_variant_nested_array_subscript_cir_20316.groovy |
Adds nested-group regression covering array-of-objects behavior for the same pruning/subscript scenario. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Assertions.assertTrue(explain.contains("final projections: element_at(CAST(v AS array<text>), 1)"), | ||
| explain); | ||
| Assertions.assertTrue(explain.contains("subColPath=[items, type]"), | ||
| explain); | ||
| Assertions.assertFalse(explain.contains("element_at(CAST(element_at(element_at("), | ||
| explain); |
223b553 to
473f703
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
Review result: one regression-test standards issue needs to be fixed before approval.
Critical checkpoint conclusions:
- Goal/test: the code change appears to address array subscript on a pruned variant subpath, and the added tests cover the target shape. The regression test format needs correction.
- Scope: the production code change is small and focused.
- Concurrency/lifecycle/config/compatibility/persistence: not materially involved by this PR.
- Parallel paths: project expression replacement paths were checked; no additional production-path issue found.
- Test coverage/results: coverage was added, but the new deterministic regression result is inline instead of generated via the standard qt/.out mechanism.
- Observability/performance: no new observability need or obvious performance issue found.
User focus: no additional user-provided review focus was specified.
473f703 to
aee1c89
Compare
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 31309 ms |
TPC-DS: Total hot run time: 171951 ms |
FE UT Coverage ReportIncrement line coverage |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
aee1c89 to
4984623
Compare
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
TPC-H: Total hot run time: 31316 ms |
TPC-DS: Total hot run time: 171397 ms |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Review completed for the GitHub PR diff (VariantSubPathPruning plus added FE/regression coverage). I did not find any new non-duplicate blocking issues.
Critical checkpoint conclusions:
- Goal/test proof: The change targets array subscript/element_at over a pruned variant subpath. The rule now rewrites nested element_at occurrences inside the outer ElementAt, and the added FE unit test plus regression explain test cover the intended plan shape.
- Scope/focus: The implementation is small and localized to the existing project rewrite branch; tests are limited to the affected variant pruning behavior.
- Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, configuration, persistence, storage-format, or FE-BE protocol compatibility concerns were introduced by this PR diff.
- Parallel paths: Existing replacement paths for filters, joins, sort, aggregate, generate, and other expressions already use ExpressionUtils.replace; this patch addresses the distinct LogicalProject ElementAt branch that previously skipped recursive replacement.
- Conditional checks: The new behavior stays within the existing ElementAt handling structure and preserves the existing pushDownToProject condition.
- Test coverage/results: Regression test follows the updated explain-based approach from the existing review thread and avoids duplicating the already-correct result assertion; FE unit coverage checks the pruned subpath slot. I did not rerun tests in this review session.
- Observability/performance/data correctness: No additional observability appears necessary; the new recursive expression replacement is planner-only and proportional to the projection expression size; no transaction or data visibility path is touched.
User focus points: No additional user-provided review focus was specified.
### What problem does this PR solve? Fix variant subpath pruning for projections where the top-level expression is an array subscript or `element_at` over a variant subpath. The planner could leave the outer subscript on the original variant access chain after pruning, which made valid 1-based array subscripts return `NULL`. The original array-of-objects repro depends on nested-group variant semantics, so the regression in this PR uses a plain `VARIANT` array leaf without nested group. Since that query result is already correct on current master, the regression asserts the verbose plan instead: the scan uses `subColPath=[items, type]` and the final array subscript is applied to the pruned variant slot. ### Check List - [x] Added regression test - [x] Added FE planner unit test ### Tests - `./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d variant_p0 -s test_variant_array_subscript` - `./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest` passed earlier on the same FE code; rerun after this regression-only amend was blocked by system pid/thread exhaustion before test execution.
What problem does this PR solve?
Fix variant subpath pruning for projections where the top-level expression is an array subscript or
element_atover a variant subpath. The planner could leave the outer subscript on the original variant access chain after pruning, which made valid 1-based array subscripts returnNULL.The original array-of-objects repro depends on nested-group variant semantics, so the regression in this PR uses a plain
VARIANTarray leaf without nested group. Since that query result is already correct on current master, the regression asserts the verbose plan instead: the scan usessubColPath=[items, type]and the final array subscript is applied to the pruned variant slot.Check List
Tests
./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d variant_p0 -s test_variant_array_subscript./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTestpassed earlier on the same FE code; rerun after this regression-only amend was blocked by system pid/thread exhaustion before test execution.