Skip to content

[refactor](be) Use ColumnArrayView in array functions#64215

Open
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:refine-column-array-view-const-array
Open

[refactor](be) Use ColumnArrayView in array functions#64215
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:refine-column-array-view-const-array

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Jun 8, 2026

What problem does this PR solve?

Problem Summary: Refactor array_index, arrays_overlap, and array_remove to read const array inputs through ColumnArrayView instead of expanding const columns with convert_to_full_column_if_const. The dispatch path now uses dispatch_switch_all as the single primitive-type dispatch, including string types, and preserves nullable array semantics through the view helpers.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Refactor array_index, arrays_overlap, and array_remove to read const array inputs through ColumnArrayView instead of expanding const columns with convert_to_full_column_if_const. The dispatch path now uses dispatch_switch_all as the single primitive-type dispatch, including string types, and preserves nullable array semantics through the view helpers.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - sh build.sh --be -j48
    - sh run-be-ut.sh --run --filter="function_array_index_test.*:function_arrays_overlap_test.*:function_array_remove_test.*"
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 8, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

No blocking issues found in the PR. The refactor keeps the affected functions on the same scalar/string-family paths while replacing const materialization with ColumnArrayView/ColumnView access.

Critical checkpoints:

  • Goal and tests: The change addresses const array/scalar handling for array_contains/array_position/countequal, array_remove, and arrays_overlap. Unit tests were added for const arguments. I attempted the targeted BE UT command with bash, but the first-time dependency setup timed out before tests ran in this runner.
  • Scope and clarity: The implementation is focused on the three array function headers and related unit tests.
  • Concurrency and lifecycle: No new shared state, threads, locks, static initialization, or special ownership lifecycle were introduced.
  • Configuration and compatibility: No configs, protocol/storage formats, or persisted metadata are changed.
  • Parallel paths: The three modified array-function paths consistently use the new view helper. I checked string-family dispatch and scalar dispatch coverage.
  • Error handling and data correctness: Existing Status/exception behavior is preserved for unsupported types; nullable outer-array semantics remain represented in the result null maps. Data visibility/transaction/MoW concerns are not applicable.
  • Memory and performance: No untracked long-lived memory ownership is introduced. The main tradeoff is per-element view access for nullable checks, but the change removes const-column materialization and is acceptable for this refactor.
  • Observability: No new observability is required for these local expression execution changes.

User focus: No additional user-provided review focus was present.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Jun 8, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (100/100) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.81% (28264/38293)
Line Coverage 57.82% (307362/531544)
Region Coverage 54.59% (257227/471180)
Branch Coverage 56.05% (111700/199303)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29328 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 65ebd75f1d828385e51cbca4e69d3313177da900, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17725	4001	4016	4001
q2	q3	10824	1497	818	818
q4	4692	480	349	349
q5	7544	885	597	597
q6	182	171	138	138
q7	774	864	642	642
q8	9363	1702	1573	1573
q9	6005	4563	4552	4552
q10	6801	1842	1511	1511
q11	445	273	253	253
q12	634	430	294	294
q13	18216	3328	2775	2775
q14	270	265	245	245
q15	q16	800	780	721	721
q17	1007	1015	919	919
q18	6874	5814	5564	5564
q19	1301	1168	1107	1107
q20	504	408	272	272
q21	6642	2891	2687	2687
q22	464	373	310	310
Total cold run time: 101067 ms
Total hot run time: 29328 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5006	4784	4785	4784
q2	q3	4948	5353	4596	4596
q4	2138	2181	1411	1411
q5	4833	4921	4670	4670
q6	233	182	128	128
q7	1873	1898	1598	1598
q8	2443	2146	2185	2146
q9	7902	7452	7399	7399
q10	4785	4698	4250	4250
q11	541	387	355	355
q12	729	752	535	535
q13	2992	3386	2808	2808
q14	274	282	251	251
q15	q16	691	702	607	607
q17	1287	1247	1241	1241
q18	7748	6661	6915	6661
q19	1120	1116	1119	1116
q20	2221	2209	1937	1937
q21	5258	4526	4427	4427
q22	535	444	396	396
Total cold run time: 57557 ms
Total hot run time: 51316 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169381 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 65ebd75f1d828385e51cbca4e69d3313177da900, data reload: false

query5	4336	635	477	477
query6	468	200	183	183
query7	4898	567	308	308
query8	371	209	213	209
query9	8784	4026	4023	4023
query10	446	323	268	268
query11	5978	2358	2150	2150
query12	165	107	100	100
query13	1355	640	443	443
query14	6407	5417	5133	5133
query14_1	4375	4365	4337	4337
query15	207	196	172	172
query16	1007	443	421	421
query17	927	685	551	551
query18	2430	481	340	340
query19	199	181	140	140
query20	108	107	105	105
query21	216	137	116	116
query22	13643	13568	13487	13487
query23	17450	16562	16242	16242
query23_1	16345	16241	16416	16241
query24	7590	1785	1302	1302
query24_1	1321	1333	1330	1330
query25	589	466	381	381
query26	1302	325	168	168
query27	2773	567	333	333
query28	4501	2064	2037	2037
query29	1051	598	480	480
query30	315	244	205	205
query31	1148	1072	957	957
query32	113	61	59	59
query33	511	322	243	243
query34	1214	1141	658	658
query35	743	773	678	678
query36	1425	1449	1209	1209
query37	157	106	89	89
query38	3252	3157	3077	3077
query39	944	915	902	902
query39_1	877	877	877	877
query40	216	124	100	100
query41	65	65	63	63
query42	96	93	98	93
query43	316	322	278	278
query44	
query45	195	186	178	178
query46	1081	1207	782	782
query47	2364	2381	2269	2269
query48	394	398	292	292
query49	636	481	361	361
query50	993	363	251	251
query51	4322	4235	4277	4235
query52	87	95	77	77
query53	251	263	194	194
query54	275	216	202	202
query55	77	79	72	72
query56	309	222	234	222
query57	1435	1409	1333	1333
query58	270	217	194	194
query59	1600	1653	1437	1437
query60	310	238	230	230
query61	161	154	154	154
query62	713	654	589	589
query63	228	190	188	188
query64	2545	789	626	626
query65	
query66	1786	467	338	338
query67	29858	29747	29700	29700
query68	
query69	425	293	254	254
query70	979	969	965	965
query71	311	219	210	210
query72	3085	2857	2616	2616
query73	846	808	431	431
query74	5182	5004	4775	4775
query75	2695	2631	2255	2255
query76	2335	1169	793	793
query77	367	396	296	296
query78	12339	12350	11855	11855
query79	1405	1085	775	775
query80	627	493	423	423
query81	463	284	243	243
query82	806	160	131	131
query83	374	282	263	263
query84	
query85	960	527	443	443
query86	381	302	279	279
query87	3401	3386	3206	3206
query88	3640	2751	2738	2738
query89	430	389	334	334
query90	1857	184	179	179
query91	175	162	139	139
query92	62	60	57	57
query93	1537	1388	915	915
query94	557	352	308	308
query95	693	481	350	350
query96	1085	798	346	346
query97	2724	2717	2600	2600
query98	215	204	205	204
query99	1139	1185	1027	1027
Total cold run time: 252054 ms
Total hot run time: 169381 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants