Skip to content

[fix](variant) fix array subscript on pruned variant subpath#63891

Merged
eldenmoon merged 1 commit into
masterfrom
branch-cir-20316-variant-array-subscript
Jun 1, 2026
Merged

[fix](variant) fix array subscript on pruned variant subpath#63891
eldenmoon merged 1 commit into
masterfrom
branch-cir-20316-variant-array-subscript

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 29, 2026

What problem does this PR solve?

Fix variant subpath pruning for projections where the top-level expression is an array subscript or element_at over a variant subpath. The planner could leave the outer subscript on the original variant access chain after pruning, which made valid 1-based array subscripts return NULL.

The original array-of-objects repro depends on nested-group variant semantics, so the regression in this PR uses a plain VARIANT array leaf without nested group. Since that query result is already correct on current master, the regression asserts the verbose plan instead: the scan uses subColPath=[items, type] and the final array subscript is applied to the pruned variant slot.

Check List

  • Added regression test
  • Added FE planner unit test

Tests

  • ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d variant_p0 -s test_variant_array_subscript
  • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest passed earlier on the same FE code; rerun after this regression-only amend was blocked by system pid/thread exhaustion before test execution.

Copilot AI review requested due to automatic review settings May 29, 2026 05:16
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a planner bug in variant subpath pruning where array subscripts / element_at could remain attached to the original (unpruned) variant access chain after pruning, causing valid 1-based indexing to incorrectly return NULL. It updates the Nereids pruning rewrite to ensure outer array access uses the pruned subpath slot, and adds both FE unit coverage and regression coverage (nested-group and non-nested-group).

Changes:

  • Fix VariantSubPathPruning project rewrite to replace nested element_at chains inside a top-level ElementAt projection using the pruned-slot map.
  • Add FE planner unit test asserting the pruned subpath is used for array subscript projections.
  • Add two regression tests covering both nested-group and plain VARIANT array-leaf cases for CIR-20316.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/VariantSubPathPruning.java Ensures LogicalProject rewrites replace nested variant element_at chains inside top-level ElementAt projections so subscripts apply to the pruned subpath slot.
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java Adds a unit test to validate explain/slot-subpath behavior for cast(v['items']['type'] as array<string>)[1].
regression-test/suites/variant_p0/test_variant_array_subscript_cir_20316.groovy Adds non-nested-group regression verifying 1-based array subscripts and element_at return correct non-NULL results after pruning.
regression-test/suites/variant_p0/nested_group/test_variant_nested_array_subscript_cir_20316.groovy Adds nested-group regression covering array-of-objects behavior for the same pruning/subscript scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +84
Assertions.assertTrue(explain.contains("final projections: element_at(CAST(v AS array<text>), 1)"),
explain);
Assertions.assertTrue(explain.contains("subColPath=[items, type]"),
explain);
Assertions.assertFalse(explain.contains("element_at(CAST(element_at(element_at("),
explain);
@eldenmoon eldenmoon force-pushed the branch-cir-20316-variant-array-subscript branch from 223b553 to 473f703 Compare May 29, 2026 05:22
@morrySnow
Copy link
Copy Markdown
Contributor

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: one regression-test standards issue needs to be fixed before approval.

Critical checkpoint conclusions:

  • Goal/test: the code change appears to address array subscript on a pruned variant subpath, and the added tests cover the target shape. The regression test format needs correction.
  • Scope: the production code change is small and focused.
  • Concurrency/lifecycle/config/compatibility/persistence: not materially involved by this PR.
  • Parallel paths: project expression replacement paths were checked; no additional production-path issue found.
  • Test coverage/results: coverage was added, but the new deterministic regression result is inline instead of generated via the standard qt/.out mechanism.
  • Observability/performance: no new observability need or obvious performance issue found.

User focus: no additional user-provided review focus was specified.

@eldenmoon eldenmoon force-pushed the branch-cir-20316-variant-array-subscript branch from 473f703 to aee1c89 Compare May 29, 2026 05:38
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

1 similar comment
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Comment thread regression-test/suites/variant_p0/test_variant_array_subscript.groovy Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31309 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aee1c89892d6f5e207240291074034105dc1ec75, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17768	4085	3960	3960
q2	q3	10787	1319	809	809
q4	4687	485	341	341
q5	7688	2262	2110	2110
q6	369	173	140	140
q7	906	795	623	623
q8	9366	1738	1528	1528
q9	7032	4987	4934	4934
q10	6447	2246	1848	1848
q11	433	268	241	241
q12	686	415	287	287
q13	18247	3337	2722	2722
q14	260	253	239	239
q15	q16	825	775	713	713
q17	994	967	947	947
q18	6853	5664	5595	5595
q19	1211	1373	1092	1092
q20	579	404	271	271
q21	6112	2815	2593	2593
q22	452	376	316	316
Total cold run time: 101702 ms
Total hot run time: 31309 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4851	4669	4964	4669
q2	q3	4832	5385	4645	4645
q4	2098	2199	1394	1394
q5	4910	4692	4631	4631
q6	230	176	127	127
q7	1869	1711	1576	1576
q8	2220	1914	1893	1893
q9	7364	7387	7354	7354
q10	4740	4661	4209	4209
q11	527	390	349	349
q12	732	733	523	523
q13	3020	3374	2803	2803
q14	271	276	267	267
q15	q16	687	700	618	618
q17	1262	1242	1240	1240
q18	7330	6893	6814	6814
q19	1096	1108	1147	1108
q20	2215	2221	1954	1954
q21	5220	4629	4442	4442
q22	504	442	405	405
Total cold run time: 55978 ms
Total hot run time: 51021 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171951 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aee1c89892d6f5e207240291074034105dc1ec75, data reload: false

query5	4316	657	510	510
query6	332	219	203	203
query7	4299	565	309	309
query8	321	228	217	217
query9	8798	4083	4070	4070
query10	449	353	308	308
query11	5806	2452	2245	2245
query12	181	128	128	128
query13	1288	609	417	417
query14	6080	5456	5188	5188
query14_1	4492	4501	4478	4478
query15	213	205	191	191
query16	1022	468	434	434
query17	1144	736	616	616
query18	2771	519	366	366
query19	229	207	178	178
query20	138	132	130	130
query21	217	147	122	122
query22	13696	13574	13377	13377
query23	17348	16613	16170	16170
query23_1	16430	16268	16288	16268
query24	7394	1793	1318	1318
query24_1	1311	1345	1355	1345
query25	583	503	460	460
query26	1320	322	186	186
query27	2681	559	364	364
query28	4448	2041	2081	2041
query29	1143	666	512	512
query30	302	244	205	205
query31	1122	1089	954	954
query32	91	77	77	77
query33	573	378	314	314
query34	1222	1164	676	676
query35	792	816	748	748
query36	1424	1413	1353	1353
query37	155	105	91	91
query38	3278	3228	3060	3060
query39	929	928	890	890
query39_1	882	899	867	867
query40	227	150	123	123
query41	64	61	62	61
query42	110	109	107	107
query43	343	324	290	290
query44	
query45	213	207	199	199
query46	1082	1188	753	753
query47	2365	2405	2242	2242
query48	407	405	295	295
query49	630	506	392	392
query50	1026	362	255	255
query51	4343	4430	4246	4246
query52	106	103	92	92
query53	252	279	198	198
query54	303	272	249	249
query55	93	92	85	85
query56	306	309	303	303
query57	1416	1431	1341	1341
query58	291	278	260	260
query59	1577	1684	1480	1480
query60	344	310	296	296
query61	158	155	158	155
query62	696	645	594	594
query63	244	201	208	201
query64	2312	784	644	644
query65	
query66	1715	483	355	355
query67	29855	29727	29605	29605
query68	
query69	466	379	300	300
query70	991	990	1023	990
query71	302	275	268	268
query72	3023	2819	2461	2461
query73	828	782	420	420
query74	5097	4956	4804	4804
query75	2685	2637	2251	2251
query76	2339	1138	762	762
query77	404	419	328	328
query78	12359	12507	11914	11914
query79	1503	1002	760	760
query80	1319	535	458	458
query81	529	280	240	240
query82	1113	161	123	123
query83	325	281	249	249
query84	264	140	109	109
query85	940	537	462	462
query86	455	346	306	306
query87	3450	3349	3220	3220
query88	3646	2771	2754	2754
query89	452	391	352	352
query90	1871	182	181	181
query91	181	165	144	144
query92	89	75	78	75
query93	1604	1530	904	904
query94	717	347	326	326
query95	677	389	376	376
query96	1002	761	355	355
query97	2751	2752	2631	2631
query98	237	235	231	231
query99	1211	1148	1036	1036
Total cold run time: 255606 ms
Total hot run time: 171951 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon force-pushed the branch-cir-20316-variant-array-subscript branch from aee1c89 to 4984623 Compare June 1, 2026 02:40
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31316 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 498462336d9550082c308f1e1c57b20ff0742dff, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17769	3981	3974	3974
q2	q3	10833	1363	831	831
q4	4686	476	356	356
q5	7593	2242	2127	2127
q6	247	178	142	142
q7	958	797	645	645
q8	9363	1783	1548	1548
q9	5249	5011	5028	5011
q10	6426	2190	1864	1864
q11	434	290	251	251
q12	638	433	301	301
q13	18109	3352	2785	2785
q14	273	260	237	237
q15	q16	846	772	709	709
q17	965	867	935	867
q18	6945	5940	5583	5583
q19	1341	1255	1027	1027
q20	546	405	270	270
q21	6030	2523	2479	2479
q22	448	357	309	309
Total cold run time: 99699 ms
Total hot run time: 31316 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4349	4310	4287	4287
q2	q3	4569	4912	4419	4419
q4	2115	2193	1377	1377
q5	4464	4336	4338	4336
q6	224	175	146	146
q7	2446	2044	1713	1713
q8	2515	2240	2231	2231
q9	8228	7891	8160	7891
q10	4835	4720	4277	4277
q11	592	441	412	412
q12	794	756	538	538
q13	3427	3580	2952	2952
q14	292	297	278	278
q15	q16	715	736	643	643
q17	1381	1465	1385	1385
q18	8003	7352	7343	7343
q19	1172	1105	1060	1060
q20	2220	2240	1932	1932
q21	5270	4594	4439	4439
q22	537	471	411	411
Total cold run time: 58148 ms
Total hot run time: 52070 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171397 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 498462336d9550082c308f1e1c57b20ff0742dff, data reload: false

query5	4308	660	515	515
query6	329	229	199	199
query7	4281	568	315	315
query8	328	239	216	216
query9	8833	4014	3979	3979
query10	466	342	299	299
query11	5774	2422	2194	2194
query12	181	132	130	130
query13	1290	599	443	443
query14	6698	5453	5125	5125
query14_1	4488	4479	4457	4457
query15	218	203	184	184
query16	1008	456	463	456
query17	979	731	609	609
query18	2463	499	356	356
query19	229	212	170	170
query20	135	135	131	131
query21	212	140	118	118
query22	13700	13641	13389	13389
query23	17306	16514	16152	16152
query23_1	16392	16252	16347	16252
query24	7451	1754	1313	1313
query24_1	1306	1318	1298	1298
query25	583	514	422	422
query26	1309	321	176	176
query27	2688	559	348	348
query28	4436	2005	1977	1977
query29	1017	622	499	499
query30	317	229	198	198
query31	1155	1091	959	959
query32	90	75	73	73
query33	543	344	298	298
query34	1192	1124	653	653
query35	778	803	709	709
query36	1412	1412	1273	1273
query37	155	105	88	88
query38	3220	3152	3071	3071
query39	928	920	897	897
query39_1	867	879	884	879
query40	224	149	128	128
query41	71	65	63	63
query42	112	116	109	109
query43	327	328	297	297
query44	
query45	217	200	193	193
query46	1117	1254	745	745
query47	2356	2407	2258	2258
query48	402	411	289	289
query49	653	501	387	387
query50	963	347	253	253
query51	4303	4350	4227	4227
query52	107	104	93	93
query53	262	289	206	206
query54	319	264	251	251
query55	92	90	87	87
query56	299	302	302	302
query57	1458	1430	1320	1320
query58	299	266	271	266
query59	1638	1698	1484	1484
query60	326	323	306	306
query61	161	155	161	155
query62	694	651	590	590
query63	245	195	205	195
query64	2443	787	642	642
query65	
query66	1757	477	362	362
query67	29804	29760	29547	29547
query68	
query69	454	339	299	299
query70	1002	999	960	960
query71	306	284	269	269
query72	3055	2803	2438	2438
query73	876	759	425	425
query74	5128	4942	4812	4812
query75	2734	2617	2297	2297
query76	2284	1167	803	803
query77	424	421	351	351
query78	12613	12551	11899	11899
query79	1352	1038	784	784
query80	684	548	468	468
query81	452	282	240	240
query82	452	163	119	119
query83	369	279	248	248
query84	318	142	115	115
query85	891	540	466	466
query86	402	349	322	322
query87	3418	3355	3221	3221
query88	3599	2762	2738	2738
query89	443	386	343	343
query90	1942	187	185	185
query91	178	166	139	139
query92	81	82	73	73
query93	1434	1499	864	864
query94	539	363	312	312
query95	693	476	367	367
query96	1076	798	348	348
query97	2759	2743	2627	2627
query98	249	240	235	235
query99	1174	1149	1023	1023
Total cold run time: 254063 ms
Total hot run time: 171397 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.29% (2/690) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.22% (2/903) 🎉
Increment coverage report
Complete coverage report

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

PR approved by anyone and no changes requested.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed for the GitHub PR diff (VariantSubPathPruning plus added FE/regression coverage). I did not find any new non-duplicate blocking issues.

Critical checkpoint conclusions:

  • Goal/test proof: The change targets array subscript/element_at over a pruned variant subpath. The rule now rewrites nested element_at occurrences inside the outer ElementAt, and the added FE unit test plus regression explain test cover the intended plan shape.
  • Scope/focus: The implementation is small and localized to the existing project rewrite branch; tests are limited to the affected variant pruning behavior.
  • Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, configuration, persistence, storage-format, or FE-BE protocol compatibility concerns were introduced by this PR diff.
  • Parallel paths: Existing replacement paths for filters, joins, sort, aggregate, generate, and other expressions already use ExpressionUtils.replace; this patch addresses the distinct LogicalProject ElementAt branch that previously skipped recursive replacement.
  • Conditional checks: The new behavior stays within the existing ElementAt handling structure and preserves the existing pushDownToProject condition.
  • Test coverage/results: Regression test follows the updated explain-based approach from the existing review thread and avoids duplicating the already-correct result assertion; FE unit coverage checks the pruned subpath slot. I did not rerun tests in this review session.
  • Observability/performance/data correctness: No additional observability appears necessary; the new recursive expression replacement is planner-only and proportional to the projection expression size; no transaction or data visibility path is touched.

User focus points: No additional user-provided review focus was specified.

@eldenmoon eldenmoon merged commit f4b06fd into master Jun 1, 2026
36 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 1, 2026
### What problem does this PR solve?

Fix variant subpath pruning for projections where the top-level
expression is an array subscript or `element_at` over a variant subpath.
The planner could leave the outer subscript on the original variant
access chain after pruning, which made valid 1-based array subscripts
return `NULL`.

The original array-of-objects repro depends on nested-group variant
semantics, so the regression in this PR uses a plain `VARIANT` array
leaf without nested group. Since that query result is already correct on
current master, the regression asserts the verbose plan instead: the
scan uses `subColPath=[items, type]` and the final array subscript is
applied to the pruned variant slot.

### Check List

- [x] Added regression test
- [x] Added FE planner unit test

### Tests

- `./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy
-d variant_p0 -s test_variant_array_subscript`
- `./run-fe-ut.sh --run
org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest` passed
earlier on the same FE code; rerun after this regression-only amend was
blocked by system pid/thread exhaustion before test execution.
@eldenmoon eldenmoon deleted the branch-cir-20316-variant-array-subscript branch June 1, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants