bench: add benchmark for map_extract function by lyne7-sc · Pull Request #21251 · apache/datafusion

lyne7-sc · 2026-03-30T14:05:03Z

Which issue does this PR close?

Closes #.

Rationale for this change

This PR separates the map_extract benchmark changes from #21237 for easier review and performance comparison.

What changes are included in this PR?

add a dedicated benchmark for map_extract

Are these changes tested?

this PR contains benchmark-only changes.

Are there any user-facing changes?

No.

rluvaton · 2026-04-05T09:21:11Z

datafusion/functions-nested/benches/map_extract.rs

+                    .invoke_with_args(ScalarFunctionArgs {
+                        args: vec![map_arg.clone(), key_arg.clone()],
+                        arg_fields: arg_fields.clone(),
+                        number_rows,
+                        return_field: Arc::clone(&return_field),
+                        config_options: Arc::clone(&config_options),
+                    })


I think it will be cleaner to extract the ScalarFunctionArgs creation and just do clone here

rluvaton · 2026-04-05T09:25:07Z

datafusion/functions-nested/benches/map_extract.rs

+    });
+}
+
+fn criterion_benchmark(c: &mut Criterion) {


can you also add benchmark when not found

rluvaton · 2026-04-05T09:25:50Z

datafusion/functions-nested/benches/map_extract.rs

+    gen_unique_values(rng, |value| value)
+}
+
+fn list_array(values: ArrayRef, row_count: usize, values_per_row: usize) -> ArrayRef {


the number of values per row should not be the same for each list/map to simulate real world data

rluvaton · 2026-04-05T09:26:54Z

datafusion/functions-nested/benches/map_extract.rs

+    let offsets = (0..=row_count)
+        .map(|index| (index * values_per_row) as i32)
+        .collect::<Vec<_>>();


this is incorrect as the offset length must be the row_count + 1
to make it simpler and fix the issue I just outlined you can use OffsetBuffer::from_lengths(<iter>)

rluvaton · 2026-04-05T09:31:20Z

datafusion/functions-nested/benches/map_extract.rs

+    ))) as ArrayRef;
+    let values = list_array(values, MAP_ROWS, MAP_KEYS_PER_ROW);
+
+    let map_extract_cases = [


I find it hard to understand the data for the benchmark, can you make it cleaner please

rluvaton · 2026-04-05T09:37:06Z

datafusion/functions-nested/benches/map_extract.rs

+    ))) as ArrayRef;
+    let values = list_array(values, MAP_ROWS, MAP_KEYS_PER_ROW);
+
+    let map_extract_cases = [


Couple of problems I see here with the data:

it looks like in every list/map the value to find is always in the same position

It looks like the key to extract is always in the same position

the key to extract is the same, which is fine but I would mention that in the benchmark description and instead of getting array with all values being the same I would get a Scalar which is what you will get when the argument is literal

all the list/maps have the same lists/maps (i.e. list[i] == list[j]) which is not real world case

rluvaton · 2026-04-05T09:38:07Z

datafusion/functions-nested/benches/map_extract.rs

+fn gen_utf8_values(rng: &mut ThreadRng) -> Vec<String> {
+    gen_unique_values(rng, |value| value.to_string())
+}
+
+fn gen_binary_values(rng: &mut ThreadRng) -> Vec<Vec<u8>> {
+    gen_unique_values(rng, |value| value.to_le_bytes().to_vec())
+}


This makes that every binary item is the same length which is not that common.

rluvaton · 2026-04-05T09:40:47Z

datafusion/functions-nested/benches/map_extract.rs

+use std::sync::Arc;
+
+const MAP_ROWS: usize = 1000;
+const MAP_KEYS_PER_ROW: usize = 1000;


This is possible but not very common, I would have number of entries much smaller ranging from 0-10

lyne7-sc · 2026-04-07T15:20:52Z

@rluvaton Thanks for the detailed review. I’ve reworked the benchmark data generation to better reflect common map_extract usage. could you please take another look?

map_extract benchmark

6154aba

github-actions bot added the functions Changes to functions implementation label Mar 30, 2026

lyne7-sc mentioned this pull request Mar 30, 2026

perf: optimize map_extract function lookup for common key types #21237

Open

rluvaton reviewed Apr 5, 2026

View reviewed changes

lyne7-sc added 2 commits April 7, 2026 22:50

add bench case

4de4b38

lint

5b1cf5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add benchmark for map_extract function#21251

bench: add benchmark for map_extract function#21251
lyne7-sc wants to merge 3 commits intoapache:mainfrom
lyne7-sc:tests/map_extract

lyne7-sc commented Mar 30, 2026

Uh oh!

rluvaton Apr 5, 2026

Uh oh!

rluvaton Apr 5, 2026

Uh oh!

rluvaton Apr 5, 2026

Uh oh!

rluvaton Apr 5, 2026

Uh oh!

rluvaton Apr 5, 2026 •

edited

Loading

Uh oh!

rluvaton Apr 5, 2026 •

edited

Loading

Uh oh!

rluvaton Apr 5, 2026

Uh oh!

rluvaton Apr 5, 2026

Uh oh!

lyne7-sc commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyne7-sc commented Mar 30, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

rluvaton Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

lyne7-sc commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rluvaton Apr 5, 2026 •

edited

Loading

rluvaton Apr 5, 2026 •

edited

Loading