Skip to content

fix(parquet): bound data page byte size for large variable-width values#9972

Open
adriangb wants to merge 13 commits into
apache:mainfrom
pydantic:parquet-page-size-mid-batch
Open

fix(parquet): bound data page byte size for large variable-width values#9972
adriangb wants to merge 13 commits into
apache:mainfrom
pydantic:parquet-page-size-mid-batch

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented May 14, 2026

We write large values into our parquet files (e.g. a 5MB LLM prompt). A naive write will cause massive pages (we've seen up to 2GB) at default write settings. The main knob to control this is write_batch_size which defaults to 1024. But if each row is 5MB that's 5GB. On the other hand setting this to something small like 32 kills write performance and is completely unnecessary for other fixed width columns.

The writer even documents this (parquet/src/column/writer/mod.rs):

We check for DataPage limits only after we have inserted the values. If a user writes a large number of values, the DataPage size can be well above the limit.

This PR makes the mini-batch size byte-budget aware:

  • For each chunk, compute bytes_per_value from the values about to be written and pick sub_batch_size = page_byte_limit / bytes_per_value (clamped ≥ 1).
  • For typical small values — numeric columns, short strings — sub_batch_size ≥ chunk size, so we stay on the existing batched fast path with zero behavior change.
  • Only when individual values are large enough that a full chunk would blow the page does the sub-batch shrink — to one row per mini-batch in the limit, matching the format minimum of one record per page.

Implementation notes

Skip the byte-size check while parquet dictionary encoding is active: estimated_value_bytes returns plain-encoded size but a dict-encoded data page only stores small RLE indices, so the estimate would spuriously shrink pages. Dict fallback bounds dict-encoded pages independently.

For repeated/nested columns the sub-batch steps record-by-record (rep == 0 boundaries) so a record never spans data pages, matching the parquet format rule.

Regression test

test_column_writer_caps_page_size_for_large_byte_array_values writes 64 × 64 KiB BYTE_ARRAY values with a 16 KiB page byte limit. Before this fix that produced a single ~4 MiB page; after, it's one page per value (~64 pages, all within ~2× the value size).

Bench results

5-run medians, criterion arrow_writer bench, default writer properties, on a noisy laptop (run-to-run variance ~±1.6%):

bench Δ vs main
primitive/default (i32 25% null) −1.0%
primitive_non_null/default −0.0%
bool_non_null/default −1.2%
string/default +0.6%
short_string_non_null/default (new, 1M × 8 B) +0.2%
large_string_non_null/default (new, 1024 × 256 KiB) +1.2%
string_non_null/default −2.1%
string_dictionary/default +0.4%
list_primitive/default +0.5%
list_primitive_non_null/default +0.1%

🤖 Generated with Claude Code

@github-actions github-actions Bot added the parquet Changes to the parquet crate label May 14, 2026
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangb adriangb force-pushed the parquet-page-size-mid-batch branch from 393ead0 to 4823429 Compare May 14, 2026 04:21
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4447473325-58-pzct6 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (4823429) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.1±0.03ms    19.2 MB/sec    1.01     13.2±0.03ms    18.9 MB/sec
bool/cdc                                           1.00     15.7±0.05ms    16.0 MB/sec    1.03     16.1±0.07ms    15.5 MB/sec
bool/default                                       1.00     11.0±0.02ms    22.8 MB/sec    1.01     11.1±0.03ms    22.5 MB/sec
bool/parquet_2                                     1.00     14.7±0.04ms    17.0 MB/sec    1.01     14.8±0.03ms    16.9 MB/sec
bool/zstd                                          1.00     11.5±0.03ms    21.8 MB/sec    1.01     11.6±0.03ms    21.5 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.04ms    16.6 MB/sec    1.01     15.2±0.05ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.03ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.03ms    18.4 MB/sec    1.01      6.9±0.03ms    18.2 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.4 MB/sec    1.02      4.3±0.02ms    28.8 MB/sec
bool_non_null/parquet_2                            1.01      9.1±0.04ms    13.8 MB/sec    1.00      9.0±0.03ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.02      4.7±0.02ms    26.6 MB/sec
bool_non_null/zstd_parquet_2                       1.01      9.5±0.03ms    13.2 MB/sec    1.00      9.4±0.03ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.00     91.9±0.45ms   152.3 MB/sec    1.03     95.0±0.49ms   147.4 MB/sec
float_with_nans/cdc                                1.00     81.2±0.33ms   172.4 MB/sec    1.02     82.7±0.17ms   169.4 MB/sec
float_with_nans/default                            1.00     74.0±0.32ms   189.3 MB/sec    1.03     76.3±0.28ms   183.4 MB/sec
float_with_nans/parquet_2                          1.00     93.7±0.44ms   149.4 MB/sec    1.01     94.8±0.26ms   147.7 MB/sec
float_with_nans/zstd                               1.00    111.5±0.25ms   125.5 MB/sec    1.02    114.2±0.26ms   122.6 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.7±0.83ms   106.3 MB/sec    1.00    131.8±0.19ms   106.2 MB/sec
large_string_non_null/bloom_filter                                                        1.00     78.3±0.17ms     3.2 GB/sec
large_string_non_null/cdc                                                                 1.00    241.5±1.40ms  1059.9 MB/sec
large_string_non_null/default                                                             1.00     59.9±0.14ms     4.2 GB/sec
large_string_non_null/parquet_2                                                           1.00     59.9±0.17ms     4.2 GB/sec
large_string_non_null/zstd                                                                1.00     60.2±0.60ms     4.2 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     60.0±0.29ms     4.2 GB/sec
list_primitive/bloom_filter                        1.00    321.4±1.04ms  1696.6 MB/sec    1.01    325.1±0.77ms  1677.4 MB/sec
list_primitive/cdc                                 1.01    362.7±4.79ms  1503.8 MB/sec    1.00    360.4±0.58ms  1513.3 MB/sec
list_primitive/default                             1.00    245.4±0.60ms     2.2 GB/sec    1.01    248.7±0.79ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    267.1±0.44ms  2041.6 MB/sec    1.01    270.4±1.01ms  2016.9 MB/sec
list_primitive/zstd                                1.00    495.4±0.86ms  1100.9 MB/sec    1.00    496.4±2.54ms  1098.7 MB/sec
list_primitive/zstd_parquet_2                      1.00    490.1±0.48ms  1112.9 MB/sec    1.01    494.1±0.92ms  1103.8 MB/sec
list_primitive_non_null/bloom_filter               1.00    426.6±3.62ms  1275.7 MB/sec    1.00    427.6±3.63ms  1272.8 MB/sec
list_primitive_non_null/cdc                        1.01    440.0±7.70ms  1236.8 MB/sec    1.00    434.8±8.76ms  1251.6 MB/sec
list_primitive_non_null/default                    1.00    287.9±2.90ms  1890.4 MB/sec    1.01    291.1±3.72ms  1869.5 MB/sec
list_primitive_non_null/parquet_2                  1.00   308.6±12.82ms  1763.4 MB/sec    1.05    323.0±9.12ms  1684.9 MB/sec
list_primitive_non_null/zstd                       1.00    714.5±3.78ms   761.7 MB/sec    1.00    712.8±5.58ms   763.6 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    683.0±0.52ms   796.8 MB/sec    1.00    686.0±0.81ms   793.4 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.1±0.22ms     3.3 GB/sec    1.02     11.3±0.02ms     3.2 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.6±0.18ms  1651.1 MB/sec    1.00     22.7±0.06ms  1648.7 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.8±0.06ms     3.4 GB/sec    1.02     11.0±0.09ms     3.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.8±0.07ms     3.4 GB/sec    1.02     11.0±0.02ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.6±0.09ms     2.9 GB/sec    1.01     12.8±0.02ms     2.9 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.03ms     3.4 GB/sec    1.02     11.1±0.07ms     3.3 GB/sec
primitive/bloom_filter                             1.00    147.4±0.75ms   304.6 MB/sec    1.03    151.3±0.39ms   296.6 MB/sec
primitive/cdc                                      1.00    158.2±0.59ms   283.7 MB/sec    1.02    160.7±0.64ms   279.2 MB/sec
primitive/default                                  1.00    117.5±0.77ms   382.0 MB/sec    1.02    119.7±0.47ms   375.0 MB/sec
primitive/parquet_2                                1.00    131.9±0.39ms   340.1 MB/sec    1.02    134.4±0.21ms   334.0 MB/sec
primitive/zstd                                     1.00    146.2±0.26ms   306.9 MB/sec    1.02    149.2±0.34ms   300.8 MB/sec
primitive/zstd_parquet_2                           1.00    165.0±0.33ms   271.9 MB/sec    1.02    167.9±0.38ms   267.2 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.15ms     3.8 GB/sec    1.00     11.5±0.17ms     3.8 GB/sec
primitive_all_null/cdc                             1.05     30.5±0.34ms  1469.9 MB/sec    1.00     29.2±0.33ms  1537.6 MB/sec
primitive_all_null/default                         1.00     10.9±0.10ms     4.0 GB/sec    1.01     10.9±0.11ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.9±0.14ms     4.0 GB/sec    1.00     10.9±0.11ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.15ms     4.0 GB/sec    1.00     11.0±0.12ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.23ms     4.0 GB/sec    1.00     11.1±0.24ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.04    110.6±1.27ms   397.8 MB/sec    1.00    106.2±0.53ms   414.5 MB/sec
primitive_non_null/cdc                             1.00     89.3±0.47ms   492.9 MB/sec    1.02     91.3±0.47ms   482.0 MB/sec
primitive_non_null/default                         1.00     67.1±0.31ms   655.8 MB/sec    1.02     68.6±0.50ms   641.6 MB/sec
primitive_non_null/parquet_2                       1.00     88.7±0.30ms   495.9 MB/sec    1.01     89.8±0.33ms   489.9 MB/sec
primitive_non_null/zstd                            1.04    103.9±0.21ms   423.7 MB/sec    1.00     99.6±0.49ms   441.6 MB/sec
primitive_non_null/zstd_parquet_2                  1.04    128.8±1.61ms   341.5 MB/sec    1.00    123.6±0.32ms   356.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.01     18.1±0.21ms     2.4 GB/sec    1.00     18.0±0.06ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.03     37.0±0.31ms  1214.3 MB/sec    1.00     35.8±0.35ms  1253.1 MB/sec
primitive_sparse_99pct_null/default                1.00     16.7±0.06ms     2.6 GB/sec    1.00     16.7±0.03ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.8±0.07ms     2.6 GB/sec    1.00     16.7±0.03ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.0±0.11ms     2.2 GB/sec    1.00     20.0±0.10ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     18.7±0.13ms     2.3 GB/sec    1.00     18.6±0.04ms     2.4 GB/sec
short_string_non_null/bloom_filter                                                        1.00     27.9±0.10ms   429.7 MB/sec
short_string_non_null/cdc                                                                 1.00     19.9±0.09ms   602.3 MB/sec
short_string_non_null/default                                                             1.00     15.7±0.09ms   764.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.4±0.06ms   472.0 MB/sec
short_string_non_null/zstd                                                                1.00     35.3±0.09ms   339.9 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.3±0.07ms   424.6 MB/sec
string/bloom_filter                                1.06   226.9±24.81ms     2.3 GB/sec    1.00   214.3±22.17ms     2.4 GB/sec
string/cdc                                         1.00    220.1±5.61ms     2.3 GB/sec    1.00    219.4±7.14ms     2.3 GB/sec
string/default                                     1.20   140.7±24.25ms     3.6 GB/sec    1.00   116.9±11.73ms     4.4 GB/sec
string/parquet_2                                   1.00    124.5±0.65ms     4.1 GB/sec    1.01    125.6±0.79ms     4.1 GB/sec
string/zstd                                        1.00    423.4±2.87ms  1238.1 MB/sec    1.04   440.9±19.31ms  1189.0 MB/sec
string/zstd_parquet_2                              1.00    394.3±0.42ms  1329.7 MB/sec    1.03   406.0±10.72ms  1291.4 MB/sec
string_and_binary_view/bloom_filter                1.00     62.8±0.33ms   513.4 MB/sec    1.05     65.7±0.35ms   491.1 MB/sec
string_and_binary_view/cdc                         1.00     58.2±0.13ms   553.9 MB/sec    1.05     61.0±0.41ms   528.6 MB/sec
string_and_binary_view/default                     1.00     47.7±0.18ms   675.9 MB/sec    1.05     50.0±0.31ms   645.3 MB/sec
string_and_binary_view/parquet_2                   1.00     58.5±0.18ms   551.2 MB/sec    1.04     61.1±0.35ms   527.8 MB/sec
string_and_binary_view/zstd                        1.00     84.1±0.18ms   383.3 MB/sec    1.03     86.6±0.31ms   372.4 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.4±0.35ms   445.3 MB/sec    1.04     75.1±0.30ms   429.6 MB/sec
string_dictionary/bloom_filter                     1.00     88.7±0.91ms     2.9 GB/sec    1.02     90.6±0.58ms     2.8 GB/sec
string_dictionary/cdc                              1.61     83.8±0.82ms     3.1 GB/sec    1.00     52.2±1.18ms     4.9 GB/sec
string_dictionary/default                          1.00     48.0±0.33ms     5.4 GB/sec    1.03     49.2±0.94ms     5.2 GB/sec
string_dictionary/parquet_2                        1.00     53.7±0.14ms     4.8 GB/sec    1.02     55.0±0.21ms     4.7 GB/sec
string_dictionary/zstd                             1.00    208.4±1.00ms  1267.2 MB/sec    1.01    209.5±0.64ms  1260.8 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.5±0.49ms  1330.4 MB/sec    1.00    199.4±0.17ms  1324.8 MB/sec
string_non_null/bloom_filter                       1.05   250.1±14.82ms     2.0 GB/sec    1.00    238.4±4.31ms     2.1 GB/sec
string_non_null/cdc                                1.01    266.1±8.91ms  1969.6 MB/sec    1.00    263.5±3.32ms  1988.9 MB/sec
string_non_null/default                            1.00   125.4±12.46ms     4.1 GB/sec    1.02    128.4±9.65ms     4.0 GB/sec
string_non_null/parquet_2                          1.05   139.7±11.29ms     3.7 GB/sec    1.00    132.7±0.46ms     3.9 GB/sec
string_non_null/zstd                               1.00    528.6±1.85ms   991.3 MB/sec    1.01    533.2±1.33ms   982.8 MB/sec
string_non_null/zstd_parquet_2                     1.00    504.8±2.15ms  1038.0 MB/sec    1.00    503.0±0.48ms  1041.7 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.01ms     6.2 GB/sec    1.00      2.5±0.00ms     6.3 GB/sec
struct_all_null/cdc                                1.00      9.9±0.12ms  1633.5 MB/sec    1.01     10.0±0.11ms  1614.5 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.01ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     45.9±0.18ms   348.8 MB/sec    1.05     48.3±0.23ms   331.4 MB/sec
struct_non_null/cdc                                1.00     45.1±0.17ms   354.6 MB/sec    1.02     45.9±0.27ms   348.3 MB/sec
struct_non_null/default                            1.00     31.8±0.17ms   503.2 MB/sec    1.03     32.7±0.14ms   488.8 MB/sec
struct_non_null/parquet_2                          1.00     40.6±0.49ms   394.5 MB/sec    1.03     41.6±0.11ms   384.7 MB/sec
struct_non_null/zstd                               1.00     40.6±0.15ms   394.2 MB/sec    1.02     41.5±0.15ms   385.6 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.5±0.13ms   293.7 MB/sec    1.02     55.3±0.17ms   289.1 MB/sec
struct_sparse_99pct_null/bloom_filter              1.01      7.4±0.02ms     2.1 GB/sec    1.00      7.4±0.05ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.07     15.3±0.08ms  1053.1 MB/sec    1.00     14.4±0.07ms  1121.7 MB/sec
struct_sparse_99pct_null/default                   1.01      6.9±0.05ms     2.3 GB/sec    1.00      6.9±0.06ms     2.3 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.9±0.03ms     2.3 GB/sec    1.00      6.9±0.04ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.3±0.01ms  1954.5 MB/sec    1.00      8.2±0.02ms  1963.5 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.01      7.7±0.03ms     2.0 GB/sec    1.00      7.6±0.02ms     2.1 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1876.9s
CPU sys 54.7s
Peak spill 0 B

branch

Metric Value
Wall time 2075.5s
Peak memory 6.8 GiB
Avg memory 6.7 GiB
CPU user 2028.4s
CPU sys 44.8s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangb adriangb force-pushed the parquet-page-size-mid-batch branch 2 times, most recently from 0fd6dcb to 24b83c7 Compare May 14, 2026 06:15
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4448145440-71-5z4xr 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (24b83c7) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.02     13.2±0.11ms    18.9 MB/sec    1.00     13.0±0.08ms    19.2 MB/sec
bool/cdc                                           1.01     16.0±0.10ms    15.6 MB/sec    1.00     15.8±0.15ms    15.8 MB/sec
bool/default                                       1.02     11.2±0.09ms    22.4 MB/sec    1.00     10.9±0.04ms    22.9 MB/sec
bool/parquet_2                                     1.01     14.9±0.11ms    16.8 MB/sec    1.00     14.7±0.05ms    17.0 MB/sec
bool/zstd                                          1.02     11.6±0.07ms    21.5 MB/sec    1.00     11.5±0.06ms    21.8 MB/sec
bool/zstd_parquet_2                                1.02     15.3±0.10ms    16.4 MB/sec    1.00     15.1±0.07ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.04ms    17.7 MB/sec    1.00      7.0±0.03ms    17.8 MB/sec
bool_non_null/cdc                                  1.01      7.0±0.08ms    18.0 MB/sec    1.00      6.9±0.10ms    18.1 MB/sec
bool_non_null/default                              1.00      4.3±0.03ms    29.1 MB/sec    1.00      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.04ms    13.7 MB/sec    1.00      9.1±0.04ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    26.9 MB/sec    1.00      4.7±0.03ms    26.8 MB/sec
bool_non_null/zstd_parquet_2                       1.01      9.5±0.04ms    13.1 MB/sec    1.00      9.5±0.05ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     94.5±1.08ms   148.1 MB/sec    1.01     95.7±2.49ms   146.3 MB/sec
float_with_nans/cdc                                1.00     83.0±0.82ms   168.8 MB/sec    1.01     83.7±1.57ms   167.2 MB/sec
float_with_nans/default                            1.00     75.5±1.07ms   185.5 MB/sec    1.00     75.3±1.01ms   185.9 MB/sec
float_with_nans/parquet_2                          1.00     97.2±0.72ms   144.1 MB/sec    1.00     97.4±1.82ms   143.7 MB/sec
float_with_nans/zstd                               1.00    113.2±1.37ms   123.7 MB/sec    1.01    114.0±1.22ms   122.8 MB/sec
float_with_nans/zstd_parquet_2                     1.00    133.3±1.99ms   105.0 MB/sec    1.02    135.3±1.48ms   103.4 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.8±1.96ms     2.9 GB/sec
large_string_non_null/cdc                                                                 1.00    243.8±1.68ms  1050.1 MB/sec
large_string_non_null/default                                                             1.00     64.6±1.09ms     3.9 GB/sec
large_string_non_null/parquet_2                                                           1.00     65.2±2.66ms     3.8 GB/sec
large_string_non_null/zstd                                                                1.00     63.7±2.72ms     3.9 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     63.6±2.71ms     3.9 GB/sec
list_primitive/bloom_filter                        1.22   417.0±14.54ms  1308.0 MB/sec    1.00   340.4±11.36ms  1602.2 MB/sec
list_primitive/cdc                                 1.03   376.9±13.32ms  1447.0 MB/sec    1.00    364.3±4.07ms  1496.8 MB/sec
list_primitive/default                             1.28    324.3±7.20ms  1681.8 MB/sec    1.00    252.9±4.29ms     2.1 GB/sec
list_primitive/parquet_2                           1.24    336.6±2.67ms  1620.3 MB/sec    1.00    271.6±1.48ms  2008.1 MB/sec
list_primitive/zstd                                1.11    566.9±7.39ms   962.0 MB/sec    1.00    508.8±5.43ms  1072.0 MB/sec
list_primitive/zstd_parquet_2                      1.00    494.3±2.36ms  1103.3 MB/sec    1.01    497.0±2.60ms  1097.3 MB/sec
list_primitive_non_null/bloom_filter               1.11   496.8±24.03ms  1095.5 MB/sec    1.00   447.3±18.74ms  1216.7 MB/sec
list_primitive_non_null/cdc                        1.00    444.9±8.81ms  1223.3 MB/sec    1.00   445.4±13.12ms  1221.9 MB/sec
list_primitive_non_null/default                    1.15   344.7±21.96ms  1579.1 MB/sec    1.00    300.1±6.76ms  1813.4 MB/sec
list_primitive_non_null/parquet_2                  1.09    352.8±3.69ms  1542.5 MB/sec    1.00   322.6±22.73ms  1687.1 MB/sec
list_primitive_non_null/zstd                       1.05   765.1±24.38ms   711.3 MB/sec    1.00   730.8±20.34ms   744.7 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    695.9±3.50ms   782.1 MB/sec    1.01    703.1±7.88ms   774.1 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.6±0.13ms     3.2 GB/sec    1.05     12.2±0.27ms     3.0 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.8±0.34ms  1637.4 MB/sec    1.03     23.6±0.18ms  1584.4 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.8±0.21ms     3.4 GB/sec    1.09     11.9±0.06ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.0±0.19ms     3.3 GB/sec    1.06     11.6±0.21ms     3.1 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.1±0.05ms     2.8 GB/sec    1.02     13.3±0.25ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.3±0.26ms     3.2 GB/sec    1.05     11.9±0.16ms     3.1 GB/sec
primitive/bloom_filter                             1.00    153.9±2.09ms   291.5 MB/sec    1.00    154.2±1.49ms   291.1 MB/sec
primitive/cdc                                      1.00    161.3±1.88ms   278.2 MB/sec    1.00    161.2±1.30ms   278.4 MB/sec
primitive/default                                  1.01    120.2±0.79ms   373.4 MB/sec    1.00    118.9±1.69ms   377.3 MB/sec
primitive/parquet_2                                1.00    134.5±1.65ms   333.7 MB/sec    1.01    135.2±1.74ms   332.0 MB/sec
primitive/zstd                                     1.01    150.0±0.77ms   299.2 MB/sec    1.00    149.1±1.93ms   301.0 MB/sec
primitive/zstd_parquet_2                           1.01    169.4±1.51ms   264.9 MB/sec    1.00    167.5±1.20ms   267.9 MB/sec
primitive_all_null/bloom_filter                    1.01     11.8±0.11ms     3.7 GB/sec    1.00     11.7±0.22ms     3.7 GB/sec
primitive_all_null/cdc                             1.01     30.8±0.48ms  1458.4 MB/sec    1.00     30.6±0.35ms  1468.6 MB/sec
primitive_all_null/default                         1.00     11.0±0.18ms     4.0 GB/sec    1.00     11.0±0.14ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.01     11.0±0.18ms     4.0 GB/sec    1.00     10.9±0.10ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.16ms     3.9 GB/sec    1.00     11.1±0.15ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.20ms     4.0 GB/sec    1.01     11.1±0.19ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.06    115.9±2.51ms   379.8 MB/sec    1.00    109.3±2.72ms   402.4 MB/sec
primitive_non_null/cdc                             1.02     92.5±1.13ms   475.8 MB/sec    1.00     91.0±1.65ms   483.7 MB/sec
primitive_non_null/default                         1.00     69.2±0.52ms   635.6 MB/sec    1.00     69.2±1.37ms   636.1 MB/sec
primitive_non_null/parquet_2                       1.00     91.0±1.37ms   483.3 MB/sec    1.00     90.7±0.56ms   484.9 MB/sec
primitive_non_null/zstd                            1.07    106.1±1.85ms   414.7 MB/sec    1.00     99.5±1.06ms   442.1 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    132.2±2.10ms   332.8 MB/sec    1.00    124.4±1.51ms   353.8 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.6±0.64ms     2.4 GB/sec    1.05     19.6±0.16ms     2.2 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.9±0.58ms  1185.4 MB/sec    1.00     37.8±0.30ms  1187.6 MB/sec
primitive_sparse_99pct_null/default                1.00     17.1±0.27ms     2.6 GB/sec    1.00     17.2±0.25ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.3±0.07ms     2.5 GB/sec    1.00     17.3±0.31ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.26ms     2.2 GB/sec    1.00     20.4±0.28ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     19.3±0.10ms     2.3 GB/sec    1.00     19.2±0.32ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.2±0.27ms   425.8 MB/sec
short_string_non_null/cdc                                                                 1.00     20.3±0.24ms   590.7 MB/sec
short_string_non_null/default                                                             1.00     16.1±0.06ms   743.5 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.7±0.05ms   466.2 MB/sec
short_string_non_null/zstd                                                                1.00     35.6±0.13ms   336.7 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.5±0.09ms   421.1 MB/sec
string/bloom_filter                                1.05   241.4±29.36ms     2.1 GB/sec    1.00   229.5±20.33ms     2.2 GB/sec
string/cdc                                         1.01    226.3±8.62ms     2.3 GB/sec    1.00    224.0±7.48ms     2.3 GB/sec
string/default                                     1.22   149.0±25.96ms     3.4 GB/sec    1.00   121.8±12.81ms     4.2 GB/sec
string/parquet_2                                   1.05    128.5±2.18ms     4.0 GB/sec    1.00    121.8±2.90ms     4.2 GB/sec
string/zstd                                        1.01    430.1±4.75ms  1219.0 MB/sec    1.00    425.7±4.35ms  1231.6 MB/sec
string/zstd_parquet_2                              1.00    395.1±1.99ms  1326.7 MB/sec    1.01    399.0±1.66ms  1313.8 MB/sec
string_and_binary_view/bloom_filter                1.00     66.1±2.70ms   488.2 MB/sec    1.04     68.8±2.99ms   469.0 MB/sec
string_and_binary_view/cdc                         1.00     60.1±1.44ms   537.0 MB/sec    1.05     62.9±1.53ms   513.0 MB/sec
string_and_binary_view/default                     1.00     49.8±1.52ms   646.9 MB/sec    1.04     51.8±1.41ms   622.2 MB/sec
string_and_binary_view/parquet_2                   1.00     59.7±1.11ms   540.2 MB/sec    1.04     62.1±1.55ms   519.5 MB/sec
string_and_binary_view/zstd                        1.00     85.2±0.56ms   378.7 MB/sec    1.03     88.0±0.54ms   366.6 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     74.0±0.88ms   436.0 MB/sec    1.04     76.9±1.08ms   419.5 MB/sec
string_dictionary/bloom_filter                     1.00    110.0±7.61ms     2.3 GB/sec    1.28    141.3±7.91ms  1868.8 MB/sec
string_dictionary/cdc                              1.00     78.8±2.29ms     3.3 GB/sec    1.30    102.3±3.80ms     2.5 GB/sec
string_dictionary/default                          1.00     65.6±3.25ms     3.9 GB/sec    1.39     90.9±3.71ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     55.2±0.59ms     4.7 GB/sec    1.85    102.1±1.87ms     2.5 GB/sec
string_dictionary/zstd                             1.00    222.4±3.69ms  1187.8 MB/sec    1.01   224.7±15.20ms  1175.5 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.5±1.33ms  1330.5 MB/sec    1.19    236.0±1.85ms  1119.4 MB/sec
string_non_null/bloom_filter                       1.00   254.5±21.18ms     2.0 GB/sec    1.00   255.3±15.69ms     2.0 GB/sec
string_non_null/cdc                                1.00   277.0±13.67ms  1891.6 MB/sec    1.02   281.6±12.38ms  1861.0 MB/sec
string_non_null/default                            1.09   149.0±12.15ms     3.4 GB/sec    1.00   136.7±14.33ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    124.0±2.41ms     4.1 GB/sec    1.25    154.5±2.23ms     3.3 GB/sec
string_non_null/zstd                               1.00    565.8±9.78ms   926.1 MB/sec    1.04   591.1±33.18ms   886.5 MB/sec
string_non_null/zstd_parquet_2                     1.00    505.3±2.87ms  1037.0 MB/sec    1.04   523.1±10.81ms  1001.6 MB/sec
struct_all_null/bloom_filter                       1.01      2.6±0.02ms     6.2 GB/sec    1.00      2.5±0.03ms     6.2 GB/sec
struct_all_null/cdc                                1.01      9.8±0.13ms  1640.3 MB/sec    1.00      9.8±0.10ms  1648.9 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     47.6±1.08ms   336.1 MB/sec    1.02     48.7±1.46ms   328.6 MB/sec
struct_non_null/cdc                                1.00     46.0±0.28ms   348.1 MB/sec    1.01     46.3±0.59ms   345.9 MB/sec
struct_non_null/default                            1.00     32.5±0.27ms   491.8 MB/sec    1.01     33.0±0.36ms   485.5 MB/sec
struct_non_null/parquet_2                          1.02     41.7±0.54ms   384.1 MB/sec    1.00     41.0±0.61ms   390.3 MB/sec
struct_non_null/zstd                               1.00     41.2±0.74ms   387.9 MB/sec    1.00     41.2±0.24ms   388.6 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.5±0.60ms   288.5 MB/sec    1.00     55.3±0.96ms   289.4 MB/sec
struct_sparse_99pct_null/bloom_filter              1.04      7.9±0.28ms  2047.0 MB/sec    1.00      7.6±0.29ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.00     15.9±0.12ms  1014.2 MB/sec    1.00     15.8±0.27ms  1018.4 MB/sec
struct_sparse_99pct_null/default                   1.03      7.3±0.09ms     2.2 GB/sec    1.00      7.1±0.17ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.2±0.20ms     2.2 GB/sec    1.01      7.2±0.16ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.4±0.18ms  1924.8 MB/sec    1.03      8.6±0.06ms  1868.8 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      8.0±0.14ms  2015.1 MB/sec    1.00      8.0±0.18ms  2013.8 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 2015.4s
Peak memory 6.2 GiB
Avg memory 6.0 GiB
CPU user 1897.3s
CPU sys 113.7s
Peak spill 0 B

branch

Metric Value
Wall time 2115.5s
Peak memory 6.5 GiB
Avg memory 6.3 GiB
CPU user 2032.5s
CPU sys 80.9s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4451856049-90-g2tzh 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (24b83c7) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4451897215-91-nvnwl 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (70dc497) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.02     13.3±0.09ms    18.8 MB/sec    1.00     13.1±0.05ms    19.1 MB/sec
bool/cdc                                           1.01     15.9±0.08ms    15.7 MB/sec    1.00     15.9±0.05ms    15.8 MB/sec
bool/default                                       1.02     11.1±0.07ms    22.4 MB/sec    1.00     10.9±0.04ms    22.9 MB/sec
bool/parquet_2                                     1.01     14.9±0.10ms    16.8 MB/sec    1.00     14.7±0.05ms    17.0 MB/sec
bool/zstd                                          1.02     11.7±0.07ms    21.5 MB/sec    1.00     11.4±0.04ms    21.9 MB/sec
bool/zstd_parquet_2                                1.01     15.3±0.07ms    16.4 MB/sec    1.00     15.1±0.05ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.02ms    17.7 MB/sec    1.00      7.1±0.02ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.9±0.03ms    18.1 MB/sec    1.01      7.0±0.02ms    18.0 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.01      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                            1.01      9.1±0.04ms    13.7 MB/sec    1.00      9.0±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.0 MB/sec    1.02      4.7±0.08ms    26.5 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.02ms    13.1 MB/sec    1.00      9.5±0.04ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     95.0±0.41ms   147.3 MB/sec    1.01     95.8±0.62ms   146.2 MB/sec
float_with_nans/cdc                                1.02     84.7±2.06ms   165.4 MB/sec    1.00     83.3±0.31ms   168.2 MB/sec
float_with_nans/default                            1.00     74.8±0.24ms   187.2 MB/sec    1.01     75.3±0.28ms   185.9 MB/sec
float_with_nans/parquet_2                          1.00     96.5±0.41ms   145.1 MB/sec    1.00     96.9±0.54ms   144.5 MB/sec
float_with_nans/zstd                               1.00    113.0±0.29ms   123.9 MB/sec    1.00    113.3±0.32ms   123.5 MB/sec
float_with_nans/zstd_parquet_2                     1.00    133.4±0.51ms   105.0 MB/sec    1.01    134.2±0.49ms   104.3 MB/sec
large_string_non_null/bloom_filter                                                        1.00     80.9±0.27ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    244.3±2.51ms  1048.1 MB/sec
large_string_non_null/default                                                             1.00     62.0±0.20ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     61.8±0.21ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     61.9±0.20ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     61.8±0.22ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    331.7±2.11ms  1644.3 MB/sec    1.03    341.7±2.31ms  1596.0 MB/sec
list_primitive/cdc                                 1.00    360.2±1.78ms  1514.3 MB/sec    1.00    361.7±1.29ms  1507.9 MB/sec
list_primitive/default                             1.00    251.7±1.44ms     2.1 GB/sec    1.01    255.2±1.27ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    269.9±0.95ms  2020.5 MB/sec    1.01    271.9±0.47ms  2005.8 MB/sec
list_primitive/zstd                                1.00    503.3±2.86ms  1083.6 MB/sec    1.01    509.6±6.03ms  1070.3 MB/sec
list_primitive/zstd_parquet_2                      1.00    493.2±0.54ms  1105.8 MB/sec    1.00    495.2±0.60ms  1101.4 MB/sec
list_primitive_non_null/bloom_filter               1.00   435.7±11.23ms  1249.1 MB/sec    1.01    439.1±9.47ms  1239.6 MB/sec
list_primitive_non_null/cdc                        1.01   443.8±10.63ms  1226.3 MB/sec    1.00   441.2±10.64ms  1233.5 MB/sec
list_primitive_non_null/default                    1.00    287.6±3.39ms  1892.4 MB/sec    1.05    300.7±4.21ms  1810.1 MB/sec
list_primitive_non_null/parquet_2                  1.00    318.2±0.89ms  1710.6 MB/sec    1.00   318.5±21.42ms  1708.6 MB/sec
list_primitive_non_null/zstd                       1.00    713.5±7.79ms   762.8 MB/sec    1.02   725.9±18.34ms   749.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    679.9±0.90ms   800.4 MB/sec    1.03    700.5±9.97ms   777.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.6±0.05ms     3.2 GB/sec    1.05     12.1±0.07ms     3.0 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     23.1±0.05ms  1616.8 MB/sec    1.02     23.7±0.07ms  1578.7 MB/sec
list_primitive_sparse_99pct_null/default           1.00     11.3±0.29ms     3.2 GB/sec    1.04     11.8±0.03ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.1±0.04ms     3.3 GB/sec    1.06     11.8±0.04ms     3.1 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.0±0.03ms     2.8 GB/sec    1.05     13.7±0.04ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.3±0.04ms     3.2 GB/sec    1.06     12.0±0.04ms     3.0 GB/sec
primitive/bloom_filter                             1.01    155.5±5.71ms   288.5 MB/sec    1.00    153.7±0.79ms   292.0 MB/sec
primitive/cdc                                      1.00    161.6±0.56ms   277.6 MB/sec    1.00    161.2±0.64ms   278.4 MB/sec
primitive/default                                  1.01    121.2±1.14ms   370.3 MB/sec    1.00    119.8±0.50ms   374.7 MB/sec
primitive/parquet_2                                1.00    135.0±0.50ms   332.5 MB/sec    1.00    135.2±0.48ms   331.8 MB/sec
primitive/zstd                                     1.01    149.6±0.53ms   299.9 MB/sec    1.00    148.7±0.41ms   301.7 MB/sec
primitive/zstd_parquet_2                           1.01    168.6±0.51ms   266.2 MB/sec    1.00    167.7±0.39ms   267.5 MB/sec
primitive_all_null/bloom_filter                    1.00     11.8±0.07ms     3.7 GB/sec    1.01     11.9±0.26ms     3.7 GB/sec
primitive_all_null/cdc                             1.00     30.7±0.42ms  1461.3 MB/sec    1.00     30.8±0.41ms  1458.2 MB/sec
primitive_all_null/default                         1.00     10.9±0.10ms     4.0 GB/sec    1.00     11.0±0.07ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.17ms     4.0 GB/sec    1.00     11.0±0.17ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.19ms     3.9 GB/sec    1.00     11.1±0.14ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.10ms     4.0 GB/sec    1.01     11.1±0.20ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.04    114.2±1.46ms   385.2 MB/sec    1.00    109.4±0.48ms   402.4 MB/sec
primitive_non_null/cdc                             1.00     91.1±0.54ms   482.9 MB/sec    1.01     91.6±0.24ms   480.3 MB/sec
primitive_non_null/default                         1.00     68.5±0.26ms   642.5 MB/sec    1.01     69.3±0.33ms   634.8 MB/sec
primitive_non_null/parquet_2                       1.00     91.3±0.51ms   482.0 MB/sec    1.00     91.0±0.33ms   483.3 MB/sec
primitive_non_null/zstd                            1.07    106.7±0.26ms   412.4 MB/sec    1.00     99.9±0.28ms   440.5 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    131.5±1.88ms   334.5 MB/sec    1.00    124.3±0.38ms   353.9 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     19.1±0.08ms     2.3 GB/sec    1.01     19.3±0.07ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.7±0.28ms  1189.3 MB/sec    1.01     37.9±0.31ms  1183.2 MB/sec
primitive_sparse_99pct_null/default                1.00     17.2±0.04ms     2.5 GB/sec    1.00     17.3±0.04ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.2±0.04ms     2.5 GB/sec    1.00     17.3±0.03ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.5±0.06ms     2.1 GB/sec    1.01     20.6±0.04ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.1±0.04ms     2.3 GB/sec    1.01     19.2±0.05ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.2±0.09ms   425.5 MB/sec
short_string_non_null/cdc                                                                 1.00     20.4±0.05ms   587.2 MB/sec
short_string_non_null/default                                                             1.00     16.1±0.05ms   744.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.8±0.09ms   464.8 MB/sec
short_string_non_null/zstd                                                                1.00     35.8±0.10ms   335.6 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.7±0.05ms   418.3 MB/sec
string/bloom_filter                                1.05   238.5±27.20ms     2.1 GB/sec    1.00   226.3±17.74ms     2.3 GB/sec
string/cdc                                         1.00    223.2±8.00ms     2.3 GB/sec    1.00    223.1±5.65ms     2.3 GB/sec
string/default                                     1.20   147.1±29.69ms     3.5 GB/sec    1.00   122.2±12.93ms     4.2 GB/sec
string/parquet_2                                   1.05    127.3±0.36ms     4.0 GB/sec    1.00    121.8±0.70ms     4.2 GB/sec
string/zstd                                        1.01    428.5±3.10ms  1223.6 MB/sec    1.00    423.7±1.44ms  1237.3 MB/sec
string/zstd_parquet_2                              1.00    396.2±1.24ms  1323.1 MB/sec    1.01    399.1±0.90ms  1313.7 MB/sec
string_and_binary_view/bloom_filter                1.00     65.9±0.55ms   489.5 MB/sec    1.07     70.4±0.76ms   458.4 MB/sec
string_and_binary_view/cdc                         1.00     59.2±0.33ms   544.4 MB/sec    1.05     62.4±0.26ms   516.8 MB/sec
string_and_binary_view/default                     1.00     48.6±0.22ms   664.2 MB/sec    1.07     52.0±0.23ms   620.7 MB/sec
string_and_binary_view/parquet_2                   1.00     59.3±0.27ms   543.4 MB/sec    1.06     62.8±0.34ms   513.3 MB/sec
string_and_binary_view/zstd                        1.00     85.0±0.28ms   379.2 MB/sec    1.04     88.3±0.22ms   365.4 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.3±0.29ms   440.0 MB/sec    1.05     76.8±0.23ms   419.8 MB/sec
string_dictionary/bloom_filter                     1.00    109.2±1.30ms     2.4 GB/sec    1.33    145.2±2.49ms  1818.7 MB/sec
string_dictionary/cdc                              1.00     81.6±5.02ms     3.2 GB/sec    1.27    103.3±2.71ms     2.5 GB/sec
string_dictionary/default                          1.00     64.1±1.61ms     4.0 GB/sec    1.48     94.7±1.24ms     2.7 GB/sec
string_dictionary/parquet_2                        1.00     67.1±0.25ms     3.8 GB/sec    1.56    104.9±0.45ms     2.5 GB/sec
string_dictionary/zstd                             1.00    217.7±2.62ms  1213.5 MB/sec    1.04   225.3±15.26ms  1172.3 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.2±0.26ms  1326.0 MB/sec    1.20    238.4±0.39ms  1107.9 MB/sec
string_non_null/bloom_filter                       1.00   256.9±15.35ms  2039.7 MB/sec    1.05   269.0±13.29ms  1948.2 MB/sec
string_non_null/cdc                                1.00    258.5±1.05ms  2026.8 MB/sec    1.10   284.5±12.16ms  1841.6 MB/sec
string_non_null/default                            1.01   138.3±15.37ms     3.7 GB/sec    1.00   137.4±12.80ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    151.5±0.40ms     3.4 GB/sec    1.04    157.7±0.75ms     3.2 GB/sec
string_non_null/zstd                               1.01   602.1±23.07ms   870.2 MB/sec    1.00   598.9±35.12ms   874.9 MB/sec
string_non_null/zstd_parquet_2                     1.01   528.8±12.56ms   990.9 MB/sec    1.00   525.6±11.06ms   996.9 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.2 GB/sec    1.00      2.5±0.01ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.8±0.13ms  1639.3 MB/sec    1.00      9.8±0.12ms  1641.0 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     46.3±0.19ms   345.3 MB/sec    1.07     49.5±1.67ms   323.1 MB/sec
struct_non_null/cdc                                1.00     45.7±0.18ms   349.9 MB/sec    1.01     46.2±0.18ms   346.5 MB/sec
struct_non_null/default                            1.00     32.2±0.11ms   496.3 MB/sec    1.02     32.8±0.12ms   487.7 MB/sec
struct_non_null/parquet_2                          1.00     41.1±0.18ms   389.3 MB/sec    1.01     41.4±0.11ms   386.1 MB/sec
struct_non_null/zstd                               1.00     41.2±0.14ms   388.4 MB/sec    1.01     41.4±0.09ms   386.0 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.1±0.23ms   290.5 MB/sec    1.00     55.3±0.15ms   289.6 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.7±0.07ms     2.0 GB/sec    1.04      8.0±0.08ms  2019.0 MB/sec
struct_sparse_99pct_null/cdc                       1.00     15.9±0.13ms  1014.7 MB/sec    1.00     15.9±0.10ms  1014.9 MB/sec
struct_sparse_99pct_null/default                   1.00      7.2±0.03ms     2.2 GB/sec    1.00      7.3±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.3±0.04ms     2.2 GB/sec    1.01      7.3±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.6±0.05ms  1869.8 MB/sec    1.00      8.6±0.05ms  1877.6 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.01      8.1±0.68ms  1988.4 MB/sec    1.00      8.0±0.03ms  2004.7 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1965.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1892.3s
CPU sys 70.4s
Peak spill 0 B

branch

Metric Value
Wall time 2115.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2029.6s
CPU sys 83.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.2±0.10ms    19.0 MB/sec    1.00     13.1±0.12ms    19.1 MB/sec
bool/cdc                                           1.00     15.9±0.11ms    15.8 MB/sec    1.00     15.8±0.11ms    15.8 MB/sec
bool/default                                       1.00     11.0±0.08ms    22.7 MB/sec    1.00     11.0±0.11ms    22.7 MB/sec
bool/parquet_2                                     1.00     14.9±0.08ms    16.8 MB/sec    1.00     14.9±0.11ms    16.8 MB/sec
bool/zstd                                          1.00     11.5±0.11ms    21.7 MB/sec    1.00     11.5±0.11ms    21.7 MB/sec
bool/zstd_parquet_2                                1.00     15.2±0.10ms    16.4 MB/sec    1.00     15.2±0.12ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.01      7.1±0.05ms    17.7 MB/sec    1.00      7.0±0.03ms    17.8 MB/sec
bool_non_null/cdc                                  1.01      6.9±0.05ms    18.2 MB/sec    1.00      6.8±0.04ms    18.4 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.2 MB/sec    1.00      4.3±0.02ms    29.4 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.8 MB/sec    1.01      9.1±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.01      4.6±0.04ms    27.0 MB/sec    1.00      4.6±0.02ms    27.1 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.05ms    13.2 MB/sec    1.00      9.5±0.04ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.00     93.6±0.60ms   149.5 MB/sec    1.00     93.7±0.37ms   149.4 MB/sec
float_with_nans/cdc                                1.00     81.9±0.47ms   171.0 MB/sec    1.00     81.8±0.18ms   171.1 MB/sec
float_with_nans/default                            1.00     74.6±0.31ms   187.8 MB/sec    1.00     74.4±0.19ms   188.1 MB/sec
float_with_nans/parquet_2                          1.00     95.5±0.77ms   146.6 MB/sec    1.00     95.2±0.27ms   147.1 MB/sec
float_with_nans/zstd                               1.00    112.2±0.36ms   124.7 MB/sec    1.00    112.2±0.20ms   124.8 MB/sec
float_with_nans/zstd_parquet_2                     1.00    132.2±0.73ms   105.9 MB/sec    1.00    132.6±0.42ms   105.6 MB/sec
large_string_non_null/bloom_filter                                                        1.00     81.4±0.25ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    242.9±1.01ms  1053.9 MB/sec
large_string_non_null/default                                                             1.00     62.7±0.16ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     62.6±0.20ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     62.6±0.25ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     62.7±0.23ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    326.1±2.39ms  1672.5 MB/sec    1.01    330.5±0.96ms  1650.2 MB/sec
list_primitive/cdc                                 1.00    358.5±1.37ms  1521.0 MB/sec    1.00    358.3±0.87ms  1522.2 MB/sec
list_primitive/default                             1.00    247.5±1.16ms     2.2 GB/sec    1.01    250.2±2.22ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    267.9±0.57ms  2036.0 MB/sec    1.01    270.1±0.72ms  2019.5 MB/sec
list_primitive/zstd                                1.01    499.4±1.63ms  1092.1 MB/sec    1.00    494.7±1.32ms  1102.5 MB/sec
list_primitive/zstd_parquet_2                      1.00    491.4±0.65ms  1109.9 MB/sec    1.00    493.1±0.33ms  1106.0 MB/sec
list_primitive_non_null/bloom_filter               1.00    433.6±5.40ms  1255.1 MB/sec    1.00    434.0±8.63ms  1254.0 MB/sec
list_primitive_non_null/cdc                        1.00    441.9±8.75ms  1231.5 MB/sec    1.00   442.5±19.30ms  1229.8 MB/sec
list_primitive_non_null/default                    1.00    293.6±4.39ms  1853.7 MB/sec    1.04    304.1±7.62ms  1789.9 MB/sec
list_primitive_non_null/parquet_2                  1.01   311.0±13.65ms  1750.2 MB/sec    1.00   307.5±25.04ms  1769.6 MB/sec
list_primitive_non_null/zstd                       1.00    717.4±8.92ms   758.6 MB/sec    1.00   719.1±27.93ms   756.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    671.4±0.87ms   810.6 MB/sec    1.03    690.7±1.03ms   787.9 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.2±0.15ms     3.3 GB/sec    1.09     12.3±0.10ms     3.0 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.8±0.15ms  1638.9 MB/sec    1.03     23.5±0.10ms  1592.1 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.9±0.12ms     3.3 GB/sec    1.08     11.8±0.12ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.9±0.12ms     3.3 GB/sec    1.09     11.9±0.07ms     3.1 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.9±0.10ms     2.8 GB/sec    1.05     13.6±0.07ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.0±0.13ms     3.3 GB/sec    1.08     11.9±0.09ms     3.1 GB/sec
primitive/bloom_filter                             1.01    152.5±0.81ms   294.2 MB/sec    1.00    151.5±1.06ms   296.3 MB/sec
primitive/cdc                                      1.01    161.3±0.76ms   278.2 MB/sec    1.00    160.4±0.80ms   279.8 MB/sec
primitive/default                                  1.00    119.6±0.69ms   375.3 MB/sec    1.00    119.0±0.83ms   377.2 MB/sec
primitive/parquet_2                                1.00    134.3±0.66ms   334.0 MB/sec    1.00    134.8±0.72ms   333.0 MB/sec
primitive/zstd                                     1.00    149.1±0.93ms   300.9 MB/sec    1.00    148.6±0.60ms   301.9 MB/sec
primitive/zstd_parquet_2                           1.00    167.7±0.76ms   267.6 MB/sec    1.00    167.5±0.65ms   267.9 MB/sec
primitive_all_null/bloom_filter                    1.00     11.6±0.18ms     3.8 GB/sec    1.00     11.6±0.14ms     3.8 GB/sec
primitive_all_null/cdc                             1.01     30.8±0.42ms  1458.6 MB/sec    1.00     30.6±0.40ms  1466.5 MB/sec
primitive_all_null/default                         1.00     10.9±0.13ms     4.0 GB/sec    1.01     11.0±0.17ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.9±0.20ms     4.0 GB/sec    1.00     11.0±0.21ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.16ms     4.0 GB/sec    1.01     11.1±0.19ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.1±0.23ms     4.0 GB/sec    1.00     11.1±0.23ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.08    116.5±1.55ms   377.6 MB/sec    1.00    107.9±0.66ms   408.0 MB/sec
primitive_non_null/cdc                             1.00     90.9±0.67ms   484.0 MB/sec    1.00     90.7±0.51ms   485.0 MB/sec
primitive_non_null/default                         1.00     68.2±0.23ms   645.1 MB/sec    1.00     68.3±0.29ms   644.2 MB/sec
primitive_non_null/parquet_2                       1.00     90.0±0.38ms   488.7 MB/sec    1.00     89.6±0.36ms   491.1 MB/sec
primitive_non_null/zstd                            1.07    105.8±0.53ms   415.7 MB/sec    1.00     98.9±0.33ms   445.1 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    130.8±1.86ms   336.3 MB/sec    1.00    123.3±0.36ms   357.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.9±0.22ms     2.3 GB/sec    1.00     18.9±0.17ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.6±0.32ms  1194.9 MB/sec    1.00     37.6±0.23ms  1194.8 MB/sec
primitive_sparse_99pct_null/default                1.00     16.9±0.07ms     2.6 GB/sec    1.01     17.1±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.0±0.07ms     2.6 GB/sec    1.00     17.0±0.09ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.2±0.07ms     2.2 GB/sec    1.01     20.3±0.10ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     18.9±0.07ms     2.3 GB/sec    1.01     19.0±0.08ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.2±0.08ms   425.5 MB/sec
short_string_non_null/cdc                                                                 1.00     20.2±0.07ms   594.8 MB/sec
short_string_non_null/default                                                             1.00     15.9±0.10ms   752.7 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.6±0.06ms   469.3 MB/sec
short_string_non_null/zstd                                                                1.00     36.0±0.10ms   333.5 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.5±0.07ms   421.1 MB/sec
string/bloom_filter                                1.08   231.7±25.73ms     2.2 GB/sec    1.00   213.8±15.26ms     2.4 GB/sec
string/cdc                                         1.00    221.6±6.09ms     2.3 GB/sec    1.00    221.2±5.30ms     2.3 GB/sec
string/default                                     1.20   143.8±25.23ms     3.6 GB/sec    1.00   119.3±12.07ms     4.3 GB/sec
string/parquet_2                                   1.05    125.4±1.02ms     4.1 GB/sec    1.00    119.2±1.11ms     4.3 GB/sec
string/zstd                                        1.01    426.4±2.98ms  1229.6 MB/sec    1.00    420.4±1.72ms  1247.1 MB/sec
string/zstd_parquet_2                              1.00    394.0±0.73ms  1330.7 MB/sec    1.01    397.8±0.72ms  1317.9 MB/sec
string_and_binary_view/bloom_filter                1.00     66.2±0.68ms   486.9 MB/sec    1.02     67.8±0.21ms   475.8 MB/sec
string_and_binary_view/cdc                         1.00     59.0±0.28ms   546.6 MB/sec    1.05     61.7±0.13ms   522.6 MB/sec
string_and_binary_view/default                     1.00     48.6±0.34ms   664.1 MB/sec    1.05     50.9±0.14ms   633.0 MB/sec
string_and_binary_view/parquet_2                   1.00     59.3±0.43ms   544.3 MB/sec    1.04     61.7±0.13ms   522.3 MB/sec
string_and_binary_view/zstd                        1.00     85.1±0.44ms   378.8 MB/sec    1.03     87.6±0.11ms   368.1 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.2±0.27ms   440.9 MB/sec    1.03     75.6±0.14ms   426.3 MB/sec
string_dictionary/bloom_filter                     1.00     91.1±1.88ms     2.8 GB/sec    1.51    137.4±1.18ms  1922.0 MB/sec
string_dictionary/cdc                              1.00     86.0±1.23ms     3.0 GB/sec    1.16     99.8±2.45ms     2.6 GB/sec
string_dictionary/default                          1.00     49.3±0.58ms     5.2 GB/sec    1.86     91.8±1.06ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     54.3±0.47ms     4.7 GB/sec    1.89    102.6±0.37ms     2.5 GB/sec
string_dictionary/zstd                             1.00    211.1±1.33ms  1251.3 MB/sec    1.06   222.9±14.78ms  1185.0 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.4±0.36ms  1331.0 MB/sec    1.19    236.2±0.36ms  1118.3 MB/sec
string_non_null/bloom_filter                       1.01   256.4±16.13ms  2043.6 MB/sec    1.00   254.1±12.29ms     2.0 GB/sec
string_non_null/cdc                                1.00    269.9±9.26ms  1941.5 MB/sec    1.03   279.0±11.17ms  1878.1 MB/sec
string_non_null/default                            1.00   128.9±13.26ms     4.0 GB/sec    1.03   133.0±12.02ms     3.8 GB/sec
string_non_null/parquet_2                          1.00   141.2±11.53ms     3.6 GB/sec    1.09    154.6±0.46ms     3.3 GB/sec
string_non_null/zstd                               1.00    533.5±2.20ms   982.2 MB/sec    1.10   586.3±34.00ms   893.7 MB/sec
string_non_null/zstd_parquet_2                     1.00    506.4±2.28ms  1034.7 MB/sec    1.03   520.9±10.85ms  1005.9 MB/sec
struct_all_null/bloom_filter                       1.01      2.5±0.01ms     6.2 GB/sec    1.00      2.5±0.00ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.8±0.13ms  1640.5 MB/sec    1.00      9.8±0.16ms  1638.0 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.02     48.6±0.29ms   329.1 MB/sec    1.00     47.6±0.16ms   336.2 MB/sec
struct_non_null/cdc                                1.00     46.1±0.21ms   347.3 MB/sec    1.00     46.0±0.16ms   347.6 MB/sec
struct_non_null/default                            1.01     32.6±0.19ms   491.5 MB/sec    1.00     32.4±0.12ms   494.2 MB/sec
struct_non_null/parquet_2                          1.01     41.4±0.18ms   386.5 MB/sec    1.00     41.1±0.11ms   389.2 MB/sec
struct_non_null/zstd                               1.01     41.3±0.18ms   387.5 MB/sec    1.00     41.1±0.10ms   389.7 MB/sec
struct_non_null/zstd_parquet_2                     1.01     55.4±0.14ms   288.6 MB/sec    1.00     55.1±0.14ms   290.3 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.6±0.15ms     2.1 GB/sec    1.00      7.6±0.10ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.00     15.7±0.17ms  1028.6 MB/sec    1.00     15.7±0.18ms  1026.7 MB/sec
struct_sparse_99pct_null/default                   1.00      7.0±0.10ms     2.2 GB/sec    1.00      7.0±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.0±0.10ms     2.2 GB/sec    1.00      7.0±0.04ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.4±0.10ms  1909.6 MB/sec    1.00      8.4±0.04ms  1918.0 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.8±0.12ms     2.0 GB/sec    1.00      7.8±0.04ms     2.0 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1945.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1889.0s
CPU sys 55.7s
Peak spill 0 B

branch

Metric Value
Wall time 2105.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2019.6s
CPU sys 81.0s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4452557769-94-2nbcr 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (bbe2b7e) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4453035593-96-5mrpz 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (bbe2b7e) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 14, 2026

Have you considered making the batch size configurable per column?

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.0±0.03ms    19.2 MB/sec    1.00     13.0±0.09ms    19.2 MB/sec
bool/cdc                                           1.00     15.6±0.04ms    16.0 MB/sec    1.00     15.7±0.12ms    16.0 MB/sec
bool/default                                       1.00     10.9±0.03ms    22.9 MB/sec    1.01     11.0±0.12ms    22.8 MB/sec
bool/parquet_2                                     1.00     14.7±0.04ms    17.0 MB/sec    1.00     14.8±0.12ms    16.9 MB/sec
bool/zstd                                          1.00     11.4±0.03ms    21.9 MB/sec    1.00     11.5±0.10ms    21.8 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.04ms    16.6 MB/sec    1.00     15.1±0.12ms    16.5 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.02ms    17.8 MB/sec    1.00      7.0±0.02ms    17.8 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.04ms    18.4 MB/sec    1.00      6.8±0.03ms    18.4 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.00      4.3±0.02ms    29.2 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.8 MB/sec    1.00      9.1±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.00      4.6±0.02ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.04ms    13.2 MB/sec    1.00      9.5±0.04ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     92.4±0.36ms   151.5 MB/sec    1.04     96.2±0.72ms   145.5 MB/sec
float_with_nans/cdc                                1.00     81.2±0.20ms   172.4 MB/sec    1.03     83.7±1.01ms   167.3 MB/sec
float_with_nans/default                            1.00     74.1±0.24ms   188.9 MB/sec    1.02     75.9±0.43ms   184.4 MB/sec
float_with_nans/parquet_2                          1.00     94.1±0.39ms   148.8 MB/sec    1.03     97.3±0.55ms   143.8 MB/sec
float_with_nans/zstd                               1.00    111.6±0.21ms   125.4 MB/sec    1.02    114.2±1.07ms   122.6 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.1±0.39ms   106.8 MB/sec    1.03    134.9±0.57ms   103.8 MB/sec
large_string_non_null/bloom_filter                                                        1.00     82.2±0.16ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    242.6±1.01ms  1055.1 MB/sec
large_string_non_null/default                                                             1.00     63.1±0.22ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     63.3±0.20ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     63.4±0.22ms     3.9 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     63.4±0.23ms     3.9 GB/sec
list_primitive/bloom_filter                        1.00    323.2±0.77ms  1687.2 MB/sec    1.05    340.5±3.10ms  1601.6 MB/sec
list_primitive/cdc                                 1.00    357.4±1.04ms  1526.1 MB/sec    1.02    364.5±2.63ms  1496.3 MB/sec
list_primitive/default                             1.00   255.4±46.59ms     2.1 GB/sec    1.00    255.8±1.89ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    268.4±0.72ms  2031.7 MB/sec    1.02    272.7±1.05ms  1999.7 MB/sec
list_primitive/zstd                                1.00    500.1±1.93ms  1090.5 MB/sec    1.02    511.1±1.79ms  1067.0 MB/sec
list_primitive/zstd_parquet_2                      1.00    490.5±0.49ms  1111.9 MB/sec    1.02    500.3±2.01ms  1090.1 MB/sec
list_primitive_non_null/bloom_filter               1.00    418.7±6.33ms  1300.0 MB/sec    1.26   526.0±24.06ms  1034.8 MB/sec
list_primitive_non_null/cdc                        1.00    440.4±7.41ms  1235.7 MB/sec    1.01    442.8±6.00ms  1229.0 MB/sec
list_primitive_non_null/default                    1.00    285.9±4.54ms  1903.7 MB/sec    1.32   376.1±18.32ms  1446.9 MB/sec
list_primitive_non_null/parquet_2                  1.00   306.8±12.97ms  1773.9 MB/sec    1.30    397.9±5.13ms  1367.7 MB/sec
list_primitive_non_null/zstd                       1.00    719.5±6.21ms   756.4 MB/sec    1.09   782.7±16.12ms   695.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    686.4±1.31ms   792.9 MB/sec    1.09    749.3±3.12ms   726.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.1±0.22ms     3.3 GB/sec    1.05     11.7±0.11ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.5±0.09ms  1663.4 MB/sec    1.03     23.1±0.11ms  1615.5 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.7±0.04ms     3.4 GB/sec    1.05     11.2±0.11ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.7±0.04ms     3.4 GB/sec    1.04     11.2±0.05ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.6±0.05ms     2.9 GB/sec    1.04     13.1±0.05ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.04ms     3.4 GB/sec    1.04     11.4±0.08ms     3.2 GB/sec
primitive/bloom_filter                             1.00    148.3±0.47ms   302.7 MB/sec    1.00    148.8±0.66ms   301.6 MB/sec
primitive/cdc                                      1.00    158.7±0.53ms   282.8 MB/sec    1.00    158.8±0.86ms   282.6 MB/sec
primitive/default                                  1.00    117.3±0.24ms   382.6 MB/sec    1.01    117.9±0.68ms   380.5 MB/sec
primitive/parquet_2                                1.00    132.2±0.27ms   339.4 MB/sec    1.00    132.7±0.66ms   338.1 MB/sec
primitive/zstd                                     1.00    146.8±0.28ms   305.8 MB/sec    1.01    147.7±0.71ms   303.7 MB/sec
primitive/zstd_parquet_2                           1.00    165.5±0.33ms   271.1 MB/sec    1.00    166.1±0.59ms   270.1 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.11ms     3.8 GB/sec    1.02     11.8±0.17ms     3.7 GB/sec
primitive_all_null/cdc                             1.03     30.6±0.39ms  1465.9 MB/sec    1.00     29.8±0.46ms  1503.4 MB/sec
primitive_all_null/default                         1.00     10.9±0.21ms     4.0 GB/sec    1.00     10.9±0.11ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.01     11.0±0.20ms     4.0 GB/sec    1.00     10.9±0.15ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.14ms     4.0 GB/sec    1.02     11.2±0.21ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.16ms     4.0 GB/sec    1.01     11.1±0.22ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.08    113.3±1.61ms   388.4 MB/sec    1.00    105.2±0.20ms   418.4 MB/sec
primitive_non_null/cdc                             1.01     90.0±0.50ms   489.1 MB/sec    1.00     89.3±0.29ms   492.9 MB/sec
primitive_non_null/default                         1.01     67.4±0.22ms   653.2 MB/sec    1.00     67.0±0.18ms   656.5 MB/sec
primitive_non_null/parquet_2                       1.00     89.1±0.20ms   493.9 MB/sec    1.00     88.8±0.14ms   495.7 MB/sec
primitive_non_null/zstd                            1.08    105.6±0.45ms   416.8 MB/sec    1.00     97.6±0.12ms   450.8 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    130.1±1.86ms   338.1 MB/sec    1.00    122.2±0.18ms   360.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.3±0.16ms     2.4 GB/sec    1.06     19.3±0.26ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.2±0.38ms  1207.0 MB/sec    1.00     37.2±0.37ms  1205.8 MB/sec
primitive_sparse_99pct_null/default                1.00     16.8±0.06ms     2.6 GB/sec    1.03     17.3±0.08ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.7±0.06ms     2.6 GB/sec    1.03     17.3±0.10ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.0±0.10ms     2.2 GB/sec    1.03     20.6±0.12ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     18.7±0.09ms     2.3 GB/sec    1.03     19.3±0.12ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     29.7±0.13ms   403.9 MB/sec
short_string_non_null/cdc                                                                 1.00     20.2±0.06ms   594.2 MB/sec
short_string_non_null/default                                                             1.00     16.4±0.13ms   733.4 MB/sec
short_string_non_null/parquet_2                                                           1.00     26.0±0.14ms   461.6 MB/sec
short_string_non_null/zstd                                                                1.00     38.2±6.26ms   314.3 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     29.1±0.18ms   412.8 MB/sec
string/bloom_filter                                1.05   228.5±26.93ms     2.2 GB/sec    1.00   217.9±17.20ms     2.3 GB/sec
string/cdc                                         1.00    220.9±5.71ms     2.3 GB/sec    1.00    220.6±5.33ms     2.3 GB/sec
string/default                                     1.15   140.0±24.70ms     3.7 GB/sec    1.00   122.1±13.31ms     4.2 GB/sec
string/parquet_2                                   1.01    126.4±1.85ms     4.1 GB/sec    1.00    124.8±1.03ms     4.1 GB/sec
string/zstd                                        1.01    424.6±2.74ms  1234.6 MB/sec    1.00    421.6±1.32ms  1243.6 MB/sec
string/zstd_parquet_2                              1.00    394.7±1.28ms  1328.2 MB/sec    1.02    401.3±0.35ms  1306.2 MB/sec
string_and_binary_view/bloom_filter                1.00     63.4±0.24ms   508.7 MB/sec    1.09     69.2±0.17ms   465.9 MB/sec
string_and_binary_view/cdc                         1.00     58.3±0.12ms   553.1 MB/sec    1.05     61.4±0.15ms   525.6 MB/sec
string_and_binary_view/default                     1.00     47.7±0.10ms   676.3 MB/sec    1.10     52.5±0.18ms   614.2 MB/sec
string_and_binary_view/parquet_2                   1.00     58.5±0.12ms   551.0 MB/sec    1.08     63.2±0.32ms   510.4 MB/sec
string_and_binary_view/zstd                        1.00     84.2±0.15ms   382.9 MB/sec    1.06     89.3±0.37ms   360.9 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.4±0.11ms   445.6 MB/sec    1.07     77.1±0.16ms   418.0 MB/sec
string_dictionary/bloom_filter                     1.00     88.8±0.66ms     2.9 GB/sec    1.54    136.8±2.03ms  1930.6 MB/sec
string_dictionary/cdc                              1.00     86.6±2.67ms     3.0 GB/sec    1.14     98.8±3.49ms     2.6 GB/sec
string_dictionary/default                          1.00     48.2±0.34ms     5.3 GB/sec    1.91     91.9±2.19ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     53.8±0.16ms     4.8 GB/sec    1.94    104.2±2.36ms     2.5 GB/sec
string_dictionary/zstd                             1.00    208.7±0.68ms  1265.8 MB/sec    1.07   223.1±14.95ms  1184.1 MB/sec
string_dictionary/zstd_parquet_2                   1.00    197.8±0.12ms  1335.0 MB/sec    1.20    236.6±1.77ms  1116.4 MB/sec
string_non_null/bloom_filter                       1.00   252.4±15.95ms     2.0 GB/sec    1.01   254.2±12.25ms     2.0 GB/sec
string_non_null/cdc                                1.00    267.6±9.15ms  1958.5 MB/sec    1.06   283.6±12.31ms  1847.7 MB/sec
string_non_null/default                            1.00   126.2±12.63ms     4.1 GB/sec    1.07   135.2±12.42ms     3.8 GB/sec
string_non_null/parquet_2                          1.00   141.3±12.34ms     3.6 GB/sec    1.11    157.1±1.99ms     3.3 GB/sec
string_non_null/zstd                               1.00    531.4±2.22ms   986.0 MB/sec    1.11   589.4±34.73ms   889.0 MB/sec
string_non_null/zstd_parquet_2                     1.00    505.7±2.52ms  1036.1 MB/sec    1.04   527.6±11.72ms   993.2 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.2 GB/sec    1.01      2.6±0.02ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.9±0.12ms  1634.9 MB/sec    1.01     10.0±0.12ms  1611.5 MB/sec
struct_all_null/default                            1.00      2.2±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     47.1±0.20ms   339.8 MB/sec    1.02     47.9±0.83ms   334.2 MB/sec
struct_non_null/cdc                                1.00     45.7±0.22ms   350.3 MB/sec    1.01     46.1±0.46ms   346.9 MB/sec
struct_non_null/default                            1.00     32.1±0.15ms   499.0 MB/sec    1.01     32.4±0.44ms   493.3 MB/sec
struct_non_null/parquet_2                          1.00     40.8±0.51ms   392.0 MB/sec    1.01     41.2±0.49ms   388.4 MB/sec
struct_non_null/zstd                               1.00     40.8±0.11ms   392.1 MB/sec    1.01     41.2±0.54ms   387.9 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.9±0.17ms   291.5 MB/sec    1.03     56.3±2.02ms   284.1 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.5±0.05ms     2.1 GB/sec    1.07      8.0±0.11ms  2019.0 MB/sec
struct_sparse_99pct_null/cdc                       1.00     15.3±0.08ms  1051.6 MB/sec    1.01     15.5±0.09ms  1040.0 MB/sec
struct_sparse_99pct_null/default                   1.00      7.0±0.04ms     2.3 GB/sec    1.04      7.3±0.07ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.9±0.03ms     2.3 GB/sec    1.05      7.3±0.07ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.3±0.02ms  1949.3 MB/sec    1.05      8.7±0.07ms  1864.2 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.7±0.02ms     2.0 GB/sec    1.05      8.1±0.04ms  1998.4 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1940.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1879.7s
CPU sys 57.5s
Peak spill 0 B

branch

Metric Value
Wall time 2155.5s
Peak memory 6.8 GiB
Avg memory 6.7 GiB
CPU user 2078.6s
CPU sys 76.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

Have you considered making the batch size configurable per column?

Yes, that may be a simpler approach. But I'm hoping we can get to a place where users don't have to think about / configure this. Given they gave us a page size limit it'd be nice if we can always adhere to that...

Comment thread parquet/src/data_type.rs Outdated
/// push a page far past the configured page byte limit before the
/// post-write size check fires.
#[inline]
fn byte_size(&self) -> usize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to duplicate dict_encoding_size. Also, #9700 wants to rename dict_encoding_size and instead implement it pretty much the same way as here.

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 14, 2026

Another thought...maybe add another chunker like the CDC work added (

fn write_with_chunker(
). If we compute batches up front when we know the shape of the data that might be faster 🤷

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.01     13.2±0.08ms    18.9 MB/sec    1.00     13.1±0.06ms    19.1 MB/sec
bool/cdc                                           1.01     16.0±0.08ms    15.6 MB/sec    1.00     15.8±0.26ms    15.8 MB/sec
bool/default                                       1.01     11.0±0.07ms    22.6 MB/sec    1.00     10.9±0.09ms    22.9 MB/sec
bool/parquet_2                                     1.00     14.7±0.05ms    17.0 MB/sec    1.00     14.7±0.07ms    17.0 MB/sec
bool/zstd                                          1.02     11.6±0.06ms    21.6 MB/sec    1.00     11.4±0.06ms    22.0 MB/sec
bool/zstd_parquet_2                                1.01     15.2±0.08ms    16.5 MB/sec    1.00     15.1±0.27ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.04ms    17.6 MB/sec    1.00      7.1±0.15ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      7.0±0.05ms    17.9 MB/sec    1.00      7.0±0.13ms    17.8 MB/sec
bool_non_null/default                              1.00      4.3±0.03ms    29.1 MB/sec    1.00      4.3±0.10ms    29.0 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.05ms    13.7 MB/sec    1.00      9.1±0.24ms    13.7 MB/sec
bool_non_null/zstd                                 1.00      4.7±0.02ms    26.9 MB/sec    1.00      4.6±0.03ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.05ms    13.1 MB/sec    1.00      9.5±0.24ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.00     95.7±2.15ms   146.3 MB/sec    1.01     97.0±2.24ms   144.3 MB/sec
float_with_nans/cdc                                1.00     82.9±1.46ms   168.8 MB/sec    1.03     85.0±1.07ms   164.7 MB/sec
float_with_nans/default                            1.00     76.1±2.46ms   184.0 MB/sec    1.01     76.9±1.74ms   182.0 MB/sec
float_with_nans/parquet_2                          1.00     97.5±2.31ms   143.7 MB/sec    1.00     97.7±2.52ms   143.3 MB/sec
float_with_nans/zstd                               1.01    114.5±2.02ms   122.2 MB/sec    1.00    113.0±1.76ms   123.9 MB/sec
float_with_nans/zstd_parquet_2                     1.00    134.1±2.59ms   104.4 MB/sec    1.01    135.2±2.50ms   103.6 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.5±3.68ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    244.0±2.19ms  1049.1 MB/sec
large_string_non_null/default                                                             1.00     64.3±2.55ms     3.9 GB/sec
large_string_non_null/parquet_2                                                           1.00     63.7±3.92ms     3.9 GB/sec
large_string_non_null/zstd                                                                1.00     60.7±0.26ms     4.1 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     62.2±1.88ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    340.8±6.92ms  1600.0 MB/sec    1.08   369.1±15.47ms  1477.7 MB/sec
list_primitive/cdc                                 1.00    364.0±3.08ms  1498.2 MB/sec    1.02    371.0±8.13ms  1469.8 MB/sec
list_primitive/default                             1.00    255.5±6.60ms     2.1 GB/sec    1.07    274.6±8.33ms  1986.2 MB/sec
list_primitive/parquet_2                           1.00    271.3±3.24ms  2010.0 MB/sec    1.06    288.7±2.44ms  1888.7 MB/sec
list_primitive/zstd                                1.00    506.5±6.71ms  1076.6 MB/sec    1.03    520.8±7.25ms  1047.1 MB/sec
list_primitive/zstd_parquet_2                      1.00    496.0±3.27ms  1099.5 MB/sec    1.00    496.6±4.31ms  1098.2 MB/sec
list_primitive_non_null/bloom_filter               1.00   447.4±21.93ms  1216.4 MB/sec    1.16   517.1±33.75ms  1052.5 MB/sec
list_primitive_non_null/cdc                        1.01   450.7±12.72ms  1207.4 MB/sec    1.00   447.8±11.71ms  1215.3 MB/sec
list_primitive_non_null/default                    1.00   303.8±11.10ms  1791.3 MB/sec    1.25   379.8±19.28ms  1432.9 MB/sec
list_primitive_non_null/parquet_2                  1.00   318.8±16.26ms  1707.3 MB/sec    1.33   422.5±16.94ms  1288.2 MB/sec
list_primitive_non_null/zstd                       1.00   735.5±14.23ms   740.0 MB/sec    1.07   783.7±19.04ms   694.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    702.1±6.10ms   775.1 MB/sec    1.06    747.6±4.61ms   728.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.3±0.24ms     3.2 GB/sec    1.04     11.7±0.06ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.02     23.0±0.23ms  1625.1 MB/sec    1.00     22.6±0.10ms  1652.1 MB/sec
list_primitive_sparse_99pct_null/default           1.03     11.2±0.05ms     3.3 GB/sec    1.00     10.9±0.04ms     3.4 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.2±0.14ms     3.3 GB/sec    1.02     11.3±0.07ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.1±0.29ms     2.8 GB/sec    1.01     13.2±0.29ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.05ms     3.3 GB/sec    1.03     11.3±0.31ms     3.2 GB/sec
primitive/bloom_filter                             1.02    156.1±3.23ms   287.4 MB/sec    1.00    153.2±2.93ms   293.0 MB/sec
primitive/cdc                                      1.00    161.7±3.06ms   277.5 MB/sec    1.01    162.6±3.18ms   276.1 MB/sec
primitive/default                                  1.02    120.6±0.82ms   372.0 MB/sec    1.00    118.3±1.10ms   379.3 MB/sec
primitive/parquet_2                                1.00    134.6±2.25ms   333.5 MB/sec    1.00    134.9±1.97ms   332.7 MB/sec
primitive/zstd                                     1.01    149.1±1.65ms   300.9 MB/sec    1.00    147.7±0.76ms   303.8 MB/sec
primitive/zstd_parquet_2                           1.00    168.8±1.53ms   265.9 MB/sec    1.00    168.8±1.52ms   265.8 MB/sec
primitive_all_null/bloom_filter                    1.00     11.9±0.18ms     3.7 GB/sec    1.00     11.8±0.23ms     3.7 GB/sec
primitive_all_null/cdc                             1.03     30.9±0.49ms  1450.1 MB/sec    1.00     30.1±0.51ms  1492.4 MB/sec
primitive_all_null/default                         1.00     10.9±0.11ms     4.0 GB/sec    1.00     11.0±0.16ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.18ms     4.0 GB/sec    1.00     11.0±0.23ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.13ms     4.0 GB/sec    1.00     11.0±0.17ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.01     11.0±0.15ms     4.0 GB/sec    1.00     10.9±0.10ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.09    117.1±2.94ms   375.6 MB/sec    1.00    107.7±1.65ms   408.5 MB/sec
primitive_non_null/cdc                             1.00     92.0±1.42ms   478.3 MB/sec    1.00     91.8±1.31ms   479.5 MB/sec
primitive_non_null/default                         1.03     69.7±1.09ms   631.7 MB/sec    1.00     67.7±0.30ms   649.9 MB/sec
primitive_non_null/parquet_2                       1.01     91.7±1.91ms   479.9 MB/sec    1.00     91.2±1.32ms   482.7 MB/sec
primitive_non_null/zstd                            1.07    107.1±2.41ms   410.8 MB/sec    1.00     99.8±1.92ms   441.0 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    131.4±1.99ms   334.8 MB/sec    1.00    124.6±1.30ms   353.2 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.01     19.0±0.38ms     2.3 GB/sec    1.00     18.9±0.49ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.01     37.5±0.40ms  1198.2 MB/sec    1.00     37.1±0.54ms  1208.8 MB/sec
primitive_sparse_99pct_null/default                1.02     17.3±0.16ms     2.5 GB/sec    1.00     16.9±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.4±0.09ms     2.5 GB/sec    1.00     17.5±0.08ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.24ms     2.1 GB/sec    1.02     20.7±0.28ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.2±0.38ms     2.3 GB/sec    1.01     19.4±0.40ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     29.7±0.21ms   404.3 MB/sec
short_string_non_null/cdc                                                                 1.00     20.6±0.17ms   581.8 MB/sec
short_string_non_null/default                                                             1.00     16.9±0.20ms   710.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     26.5±0.24ms   452.8 MB/sec
short_string_non_null/zstd                                                                1.00     36.5±0.12ms   328.8 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     29.2±0.17ms   410.8 MB/sec
string/bloom_filter                                1.09   231.5±23.89ms     2.2 GB/sec    1.00   211.7±14.75ms     2.4 GB/sec
string/cdc                                         1.01    226.1±9.11ms     2.3 GB/sec    1.00   224.7±11.94ms     2.3 GB/sec
string/default                                     1.12   146.7±24.93ms     3.5 GB/sec    1.00   131.6±11.00ms     3.9 GB/sec
string/parquet_2                                   1.07    129.2±2.88ms     4.0 GB/sec    1.00    120.6±6.92ms     4.2 GB/sec
string/zstd                                        1.00    430.5±8.49ms  1217.7 MB/sec    1.03    441.5±9.87ms  1187.4 MB/sec
string/zstd_parquet_2                              1.00    397.8±4.26ms  1318.0 MB/sec    1.02    406.3±4.56ms  1290.2 MB/sec
string_and_binary_view/bloom_filter                1.00     67.2±3.49ms   479.6 MB/sec    1.01     68.2±0.39ms   472.6 MB/sec
string_and_binary_view/cdc                         1.00     59.4±1.21ms   543.2 MB/sec    1.06     62.8±2.45ms   513.8 MB/sec
string_and_binary_view/default                     1.00     48.7±0.95ms   662.0 MB/sec    1.11     54.0±2.59ms   597.0 MB/sec
string_and_binary_view/parquet_2                   1.00     58.9±0.58ms   547.7 MB/sec    1.11     65.2±0.86ms   494.4 MB/sec
string_and_binary_view/zstd                        1.00     85.5±0.95ms   377.2 MB/sec    1.06     90.8±1.48ms   355.0 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     74.2±0.66ms   434.4 MB/sec    1.07     79.3±2.37ms   406.8 MB/sec
string_dictionary/bloom_filter                     1.03    104.5±0.90ms     2.5 GB/sec    1.00    101.1±5.73ms     2.6 GB/sec
string_dictionary/cdc                              1.38     78.4±2.15ms     3.3 GB/sec    1.00     56.7±3.11ms     4.6 GB/sec
string_dictionary/default                          1.29     65.3±3.55ms     3.9 GB/sec    1.00     50.6±0.56ms     5.1 GB/sec
string_dictionary/parquet_2                        1.19     67.8±0.44ms     3.8 GB/sec    1.00     56.8±1.57ms     4.5 GB/sec
string_dictionary/zstd                             1.02    218.7±3.32ms  1207.8 MB/sec    1.00    215.4±4.94ms  1226.0 MB/sec
string_dictionary/zstd_parquet_2                   1.00    200.1±2.51ms  1319.8 MB/sec    1.01    201.9±1.54ms  1308.5 MB/sec
string_non_null/bloom_filter                       1.03   270.4±28.27ms  1937.7 MB/sec    1.00   263.2±19.14ms  1990.8 MB/sec
string_non_null/cdc                                1.01   274.5±13.60ms  1909.1 MB/sec    1.00   272.9±11.94ms  1919.9 MB/sec
string_non_null/default                            1.00   148.1±14.93ms     3.5 GB/sec    1.03   152.1±16.41ms     3.4 GB/sec
string_non_null/parquet_2                          1.00    145.3±9.80ms     3.5 GB/sec    1.08    156.8±3.28ms     3.3 GB/sec
string_non_null/zstd                               1.00   573.6±17.33ms   913.5 MB/sec    1.02   584.3±20.28ms   896.8 MB/sec
string_non_null/zstd_parquet_2                     1.00   527.8±11.88ms   992.8 MB/sec    1.00   529.4±13.85ms   989.8 MB/sec
struct_all_null/bloom_filter                       1.00      2.6±0.04ms     6.1 GB/sec    1.01      2.6±0.04ms     6.1 GB/sec
struct_all_null/cdc                                1.00      9.9±0.17ms  1637.0 MB/sec    1.02     10.0±0.13ms  1612.6 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     47.1±0.89ms   339.4 MB/sec    1.03     48.4±1.17ms   330.9 MB/sec
struct_non_null/cdc                                1.00     46.4±0.36ms   344.8 MB/sec    1.00     46.3±0.38ms   345.9 MB/sec
struct_non_null/default                            1.01     32.8±0.62ms   487.6 MB/sec    1.00     32.6±0.23ms   490.3 MB/sec
struct_non_null/parquet_2                          1.00     40.9±0.33ms   391.4 MB/sec    1.01     41.4±0.79ms   386.5 MB/sec
struct_non_null/zstd                               1.02     41.3±0.53ms   387.0 MB/sec    1.00     40.7±0.23ms   392.8 MB/sec
struct_non_null/zstd_parquet_2                     1.01     55.3±0.45ms   289.4 MB/sec    1.00     54.9±0.14ms   291.6 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.9±0.36ms  2033.3 MB/sec    1.02      8.1±0.33ms  1993.7 MB/sec
struct_sparse_99pct_null/cdc                       1.04     16.0±0.15ms  1005.3 MB/sec    1.00     15.5±0.39ms  1041.4 MB/sec
struct_sparse_99pct_null/default                   1.00      7.2±0.18ms     2.2 GB/sec    1.03      7.4±0.08ms     2.1 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.0±0.12ms     2.2 GB/sec    1.03      7.3±0.22ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.05      8.9±0.27ms  1820.8 MB/sec    1.00      8.5±0.09ms  1903.6 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.8±0.10ms     2.0 GB/sec    1.03      8.1±0.22ms  1992.9 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1970.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1897.3s
CPU sys 71.4s
Peak spill 0 B

branch

Metric Value
Wall time 2145.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2083.9s
CPU sys 60.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4453799534-97-tqn4k 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (145ea5d) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4457066926-120-2shzp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (0b13cb9) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

adriangb and others added 9 commits May 15, 2026 05:28
`short_string_non_null` writes 1M 8-byte strings — exercises the
BYTE_ARRAY write path where per-value bookkeeping cost is largest.
`large_string_non_null` writes 1024 rows of 256 KiB strings — the case
where individual values exceed the default data-page byte limit, so a
default `write_batch_size`-row chunk would otherwise buffer hundreds of
MiB before any page-size check fires.

Both fill gaps in the existing arrow_writer benches, which only cover
random-length strings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The parquet column writer only checks the data page byte limit AFTER
each mini-batch finishes writing, and mini-batches are sized by row
count (`write_batch_size`, default 1024). For BYTE_ARRAY columns with
large values — e.g. a 5 MiB image blob per row — a single mini-batch
can buffer multiple GiB into one data page before the configured byte
limit is even consulted. Pages can exceed the limit by orders of
magnitude.

Make the mini-batch size byte-budget aware:

- For each chunk, ask the encoder how many of the next values fit in
  one page byte budget. If everything fits, stay on the existing
  batched fast path (zero behavior change for small values).
- If not, sub-batch — for flat columns, one mini-batch per `k` values
  where `k` is the fit count; for repeated columns, one mini-batch
  per record (since a record cannot span data pages).

Skip the check while dictionary encoding is active: the byte estimate
is plain-encoded size, but a dict-encoded data page only stores small
RLE indices, so the estimate would spuriously shrink pages. Dictionary
fallback bounds dict-encoded pages independently.

The encoder hook is `count_values_within_byte_budget(values, offset,
len, byte_budget) -> Option<usize>` plus a `_gather` variant for the
arrow path, mirroring the existing `write`/`write_gather` split.
Returning `None` means "no cheap estimate available; stay batched."

Implementation details:

- `ParquetValueType::byte_size(&self)` returns the per-value plain-
  encoded byte size. Defaults to `size_of::<Self>()`; overridden for
  `ByteArray` (`len + 4`) and `FixedLenByteArray` (`len`).
- Standard `ColumnValueEncoderImpl<T>::count_values_within_byte_budget`
  short-circuits to `(byte_budget / size_of::<T::T>()).max(1).min(n)`
  for fixed-size physical types — one division, no walk. For BYTE_ARRAY
  and FIXED_LEN_BYTE_ARRAY it scans values cumulatively and exits at
  the first one to push the sum past the budget, which also catches
  skewed distributions (a single oversized value among many small ones
  is detected wherever it lands).
- Arrow `ByteArrayEncoder::count_values_within_byte_budget_gather`
  uses a two-stage walk on `GenericByteArray<O>` types: stage 1
  computes the total in O(1) via one subtraction on the offsets buffer
  when indices are contiguous (the case for every non-null column),
  returning immediately if the chunk fits. Stage 2 walks per-index
  lengths from the offsets buffer (still no slice/UTF-8 construction)
  when stage 1 doesn't conclude. View/dict/fixed-size-binary arrays
  fall through to a per-value walk via `ArrayAccessor::value`.
- `LevelDataRef::value_count(total, max_def)` reports how many levels
  in the chunk correspond to actual non-null values. Used to bridge
  the encoder's value-count answer back into level-count subdivision
  for nullable columns.

Tests in `column::writer::tests`:

- `test_column_writer_caps_page_size_for_large_byte_array_values` —
  flat regression: 64 × 64 KiB BYTE_ARRAY values vs a 16 KiB page
  limit produces one page per value rather than a single ~4 MiB page.
- `test_column_writer_caps_page_size_for_large_values_in_list` —
  Materialized-rep branch of `write_granular_chunk`: list of 3 large
  blobs × 3 records, asserts one page per record (no record splits).
- `test_column_writer_caps_page_size_with_nullable_large_values` —
  `LevelDataRef::value_count` on Materialized def levels with mixed
  nulls.
- `test_column_writer_dict_enabled_large_values_post_spill` —
  `has_dictionary()` short-circuit while dict is active, then byte-
  budget sub-batching after dict spill.
- `test_column_writer_caps_page_size_for_fixed_len_byte_array` —
  `FixedLenByteArray::byte_size` override.

Tests in `arrow::arrow_writer::tests`:

- `test_arrow_writer_caps_page_size_for_large_strings` — end-to-end
  through `ArrowWriter` exercising the offsets-buffer fast path.
- `test_arrow_writer_caps_page_size_for_large_string_view` —
  view-array fallback (Utf8View has no contiguous offsets buffer).
- `test_arrow_writer_all_null_string_column` — `value_count` Uniform
  branch under arrow's level optimization; asserts null_count and
  page coverage rather than just non-empty output.
- `test_arrow_writer_granular_mode_roundtrip` — value-fidelity round-
  trip: mix small + large strings so the byte-budget cutoff lands
  mid-chunk, write through `ArrowWriter`, read back with
  `ParquetRecordBatchReader`, assert each string matches.

Bench results vs `main` (5-run medians on a noisy laptop, run-to-run
variance ~±2%):

- `primitive/default` (i32 25% null): −0.4% to +1.3%
- `primitive_non_null/default`: −2.3% to +0.4%
- `bool_non_null/default`: +1.8% to +15.9% (highly noisy on this
  machine)
- `string/default`: +3.3% to +4.7%
- `short_string_non_null/default` (new, 1M × 8 B): +1.0% to +6.4%
- `large_string_non_null/default` (new, 1024 × 256 KiB): +0.5% to
  +2.7% — the case the fix targets
- `string_dictionary/default`: +3.3% to +6.4%
- `string_non_null/default`: −1.6% to +2.3%

All within laptop variance for the fast-path (small-value) cases.
The fix's intended case — large variable-width values — now correctly
bounds page sizes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…w arrays

The byte-budget check on `Utf8View` / `BinaryView` columns previously
fell through to a per-value walk via `ArrayAccessor::value`, which
constructs a `&str`/`&[u8]` slice for each index — chasing the buffer
pointer through the view's u128 word, then slicing `data_buffers[i]`.
At ~1 µs per chunk over ~1000 chunks on the 1 M-row `string_and_binary_view`
bench, that was a consistent ~+3–5 % regression vs `main` in both
GKE benchmark runs.

View arrays store each value's length in the low 32 bits of its u128
view, so we can scan lengths with no data-buffer dereferences:

```
let len = (views[idx] as u32) as usize;
```

Add a dedicated fast path for `Utf8View` and `BinaryView` that walks
the views buffer directly. Falls through to the per-value walk only
for `FixedSizeBinary` and `Dictionary` — the latter still needs the
dictionary-keys indirection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two targeted regressions surfaced in the GKE benchmark sweep:

1. `string_dictionary/*` regressed +30-89 % vs `main` after writer-dict
   spill. The arrow Dictionary input falls through to the per-value
   walk via `ArrayAccessor::value`, which dereferences the dict
   (keys[idx] → values[key] → slice construction) for every index in
   every chunk. The whole point of the byte-budget check is to bound
   pages of large BYTE_ARRAY values, but an arrow column that's
   already Dictionary-encoded at the arrow layer implies its values
   are small enough that dedup is worthwhile — the opposite shape.
   Treat Dictionary input as "everything fits" and skip the check.

2. `list_primitive_sparse_99pct_null` regressed ~+8 % across props.
   The cost was `LevelDataRef::value_count`'s O(N) def-level scan on
   the 20 000-row compact-levels chunks the list path uses. The
   arrow path already has the answer cheaper: `value_indices` is the
   sorted list of non-null positions in the batch, so the count of
   indices falling in the current chunk's level range is a binary
   search (one `partition_point`). Use that when `value_indices` is
   `Some` and fall back to the def-level scan only on the non-arrow
   path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…dgetChunker

Two small structural cleanups in response to PR review:

- Remove `ParquetValueType::byte_size`. It overlapped with
  `dict_encoding_size`, which @etseidl pointed out is being renamed
  and generalized in apache#9700. Instead, compute the per-value plain-
  encoded byte cost inline in
  `ColumnValueEncoderImpl::count_values_within_byte_budget` from
  `dict_encoding_size`'s components, dispatched on the physical type
  (same dispatch shape as `DictEncoder::push` in
  `encodings/encoding/dict_encoder.rs:52`). No new trait method.

- Lift the byte-budget mini-batch sizing decision out of
  `write_batch_internal` into a new `ByteBudgetChunker` struct
  (`column/writer/byte_budget_chunker.rs`). The chunker captures the
  column-open-time facts (page byte limit, static-fits flag,
  max_def_level) once and exposes one `pick_sub_batch_size` method.
  `write_batch_internal`'s inner loop is now ~25 lines shorter and
  reads as: compute chunk boundary → ask chunker for sub_batch_size
  → write_mini_batch or write_granular_chunk.

  This is the lightweight version of the "make it a chunker like CDC"
  suggestion. A full CDC-style pre-compute would emit all chunk
  boundaries upfront, but the byte budget decision depends on the
  encoder's live `has_dictionary()` state, which changes mid-batch
  when the writer's dictionary spills. Querying that per chunk (as
  this refactor does) preserves the existing dict-active short-
  circuit; a precomputed plan would force a choice between losing
  that short-circuit or losing correctness when dict spills mid-batch
  on large-value columns.

No behavior change. Tests still pass and `cargo bench` shows the same
deltas as before the refactor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GKE bench shows string_dictionary regresses ~+90% on the branch even
though `pick_sub_batch_size` should short-circuit instantly when the
encoder's dictionary is still active (single struct-field load + virtual
call into `has_dictionary()`). Local laptop benches don't reproduce the
regression, suggesting it's an architecture-specific
inlining/code-layout effect on the GKE aarch64 runner.

Marking `new` and `pick_sub_batch_size` `#[inline]` to give the compiler
a clear hint that these should fold into `write_batch_internal`'s hot
loop. Local laptop bench is unchanged (~+3% on string_dictionary, ~+5%
on string_and_binary_view, both within noise); pushing to see whether
GKE moves.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es are non-null

The chunker's per-chunk `partition_point` (arrow path) or
`LevelDataRef::value_count` (non-arrow path) returns `chunk_size` by
construction whenever the column has no nulls. The GKE bench showed
~+12–27% regressions on `list_primitive_non_null/*` and
`string_non_null/*` consistent with that walk dominating: ~50 K chunks
× a binary search through a 50 M-entry `non_null_indices` buffer means
cold cache reads on every chunk.

Compute a `ValueCountStrategy` once at `write_batch_internal` entry:

- `AllPresent` — set when the arrow caller passed
  `non_null_indices.len() == num_levels`, or when the column has
  `max_def_level == 0`. The chunker uses `chunk_size` directly with no
  per-chunk work.
- `Sorted(&[usize])` — arrow nullable path; binary-search the indices.
- `DefLevelScan(max_def)` — non-arrow nullable path; def-level scan.

For the bench's `list_primitive_non_null` (all-non-null lists with a
50 M-entry leaf), this drops the per-chunk binary search entirely;
expected to bring those rows back near noise.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…th out

The previous `#[inline]` hint was no longer enough once
`pick_sub_batch_size` grew the `ValueCountStrategy` match — LLVM
silently stopped inlining and the most recent GKE bench bounced
`string_dictionary/*` back to +46–81% (`default` +81%, `parquet_2`
+86%, `bloom_filter` +46%).

Fix:

1. Mark `pick_sub_batch_size` `#[inline(always)]`. The hot path is
   just `if static_always_fits || has_dictionary || chunk_size == 0 {
   return chunk_size; }` — one struct-field load + one virtual call —
   so unconditional inlining is the right call, not a heuristic
   suggestion.

2. Pull the byte-budget computation out into a separate
   `byte_budget_sub_batch_size` method marked `#[inline(never)]`. This
   keeps the inlined fast path small even as the slow path grows; the
   slow path is paid for explicitly when bypasses don't fire, not
   smuggled into every chunk's inline body.

Same behavior, just compiler-friendlier code layout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The GKE bench shows `string_dictionary/*` consistently ~+80% across
every branch commit, even though the chunker's fast path returns
`chunk_size` with a single struct-field load while `has_dictionary()`
is true (which it is for the entire `string_dictionary` bench since
`create_random_batch` produces a low-cardinality dict that doesn't
spill the writer's encoder).

Working hypothesis: the regression is icache pressure from the new
code's mere presence. The cold path (`byte_budget_sub_batch_size`,
`write_granular_chunk`) is never executed for `string_dictionary` but
sits inline near the encoder's hot path and pushes hot bytes out of
L1i.

Mark both cold paths `#[cold]` so LLVM places them in a separate text
section. The hot encoder loop should stay tighter in icache.

This is a hypothesis-driven attempt; if GKE doesn't move it tells us
the regression source is somewhere else and we keep digging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb adriangb force-pushed the parquet-page-size-mid-batch branch from 0b13cb9 to 77ebc07 Compare May 15, 2026 05:28
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.01     13.2±0.03ms    18.9 MB/sec    1.00     13.0±0.05ms    19.2 MB/sec
bool/cdc                                           1.01     15.7±0.05ms    15.9 MB/sec    1.00     15.5±0.04ms    16.1 MB/sec
bool/default                                       1.02     11.0±0.04ms    22.7 MB/sec    1.00     10.8±0.03ms    23.1 MB/sec
bool/parquet_2                                     1.01     14.8±0.06ms    16.9 MB/sec    1.00     14.7±0.04ms    17.0 MB/sec
bool/zstd                                          1.02     11.6±0.09ms    21.6 MB/sec    1.00     11.4±0.09ms    22.0 MB/sec
bool/zstd_parquet_2                                1.01     15.2±0.08ms    16.5 MB/sec    1.00     15.0±0.05ms    16.7 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.03ms    17.7 MB/sec    1.00      7.1±0.03ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.03ms    18.4 MB/sec    1.01      6.8±0.02ms    18.3 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.00      4.3±0.02ms    29.2 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.04ms    13.8 MB/sec    1.00      9.1±0.04ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.03ms    26.9 MB/sec    1.00      4.7±0.03ms    26.9 MB/sec
bool_non_null/zstd_parquet_2                       1.01      9.6±0.05ms    13.0 MB/sec    1.00      9.5±0.03ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     93.9±2.62ms   149.0 MB/sec    1.02     95.9±2.27ms   145.9 MB/sec
float_with_nans/cdc                                1.00     82.4±1.86ms   169.9 MB/sec    1.02     84.1±1.68ms   166.5 MB/sec
float_with_nans/default                            1.00     75.5±1.65ms   185.4 MB/sec    1.00     75.8±0.67ms   184.8 MB/sec
float_with_nans/parquet_2                          1.00     94.3±0.59ms   148.5 MB/sec    1.02     96.0±1.37ms   145.9 MB/sec
float_with_nans/zstd                               1.01    113.3±1.12ms   123.6 MB/sec    1.00    112.2±0.35ms   124.8 MB/sec
float_with_nans/zstd_parquet_2                     1.00    132.6±2.57ms   105.6 MB/sec    1.01    134.1±1.83ms   104.4 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.0±2.09ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    243.1±1.91ms  1053.2 MB/sec
large_string_non_null/default                                                             1.00     61.1±0.16ms     4.1 GB/sec
large_string_non_null/parquet_2                                                           1.00     61.2±0.31ms     4.1 GB/sec
large_string_non_null/zstd                                                                1.00     62.1±1.30ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     64.0±3.49ms     3.9 GB/sec
list_primitive/bloom_filter                        1.00   334.6±11.58ms  1630.1 MB/sec    1.01   339.4±11.86ms  1606.7 MB/sec
list_primitive/cdc                                 1.00    359.1±4.88ms  1518.6 MB/sec    1.01    362.3±4.61ms  1505.3 MB/sec
list_primitive/default                             1.02    255.9±3.99ms     2.1 GB/sec    1.00    251.0±4.40ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    270.0±1.88ms  2020.2 MB/sec    1.01    271.6±3.80ms  2008.3 MB/sec
list_primitive/zstd                                1.00    501.1±5.91ms  1088.4 MB/sec    1.00    499.0±4.97ms  1092.8 MB/sec
list_primitive/zstd_parquet_2                      1.00    491.5±3.14ms  1109.5 MB/sec    1.01    498.0±3.18ms  1095.2 MB/sec
list_primitive_non_null/bloom_filter               1.00   498.6±35.80ms  1091.5 MB/sec    1.03   512.4±30.08ms  1062.2 MB/sec
list_primitive_non_null/cdc                        1.00    443.0±9.94ms  1228.5 MB/sec    1.00   442.2±12.13ms  1230.9 MB/sec
list_primitive_non_null/default                    1.00   373.7±28.75ms  1456.5 MB/sec    1.01   375.8±15.77ms  1448.1 MB/sec
list_primitive_non_null/parquet_2                  1.00    365.6±6.79ms  1488.8 MB/sec    1.12    409.9±4.09ms  1327.6 MB/sec
list_primitive_non_null/zstd                       1.00   749.8±32.54ms   725.8 MB/sec    1.04   777.1±16.65ms   700.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    701.1±6.23ms   776.3 MB/sec    1.10    774.4±4.41ms   702.8 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.1±0.20ms     3.3 GB/sec    1.00     11.1±0.08ms     3.3 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.7±0.32ms  1643.3 MB/sec    1.01     22.9±0.42ms  1629.5 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.7±0.04ms     3.4 GB/sec    1.01     10.8±0.05ms     3.4 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.2±0.09ms     3.3 GB/sec    1.00     11.3±0.07ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.9±0.27ms     2.8 GB/sec    1.01     13.1±0.14ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.0±0.22ms     3.3 GB/sec    1.02     11.2±0.28ms     3.3 GB/sec
primitive/bloom_filter                             1.00    147.9±0.50ms   303.5 MB/sec    1.00    148.4±0.39ms   302.4 MB/sec
primitive/cdc                                      1.00    161.5±1.87ms   278.0 MB/sec    1.00    160.9±1.55ms   279.0 MB/sec
primitive/default                                  1.02    119.6±2.65ms   375.2 MB/sec    1.00    117.6±1.60ms   381.7 MB/sec
primitive/parquet_2                                1.00    134.9±1.11ms   332.7 MB/sec    1.00    134.7±1.05ms   333.3 MB/sec
primitive/zstd                                     1.00    148.1±2.32ms   303.0 MB/sec    1.00    147.7±2.22ms   303.8 MB/sec
primitive/zstd_parquet_2                           1.00    165.4±0.29ms   271.3 MB/sec    1.00    165.2±0.31ms   271.6 MB/sec
primitive_all_null/bloom_filter                    1.00     11.9±0.25ms     3.7 GB/sec    1.00     11.9±0.21ms     3.7 GB/sec
primitive_all_null/cdc                             1.02     30.7±0.41ms  1461.3 MB/sec    1.00     30.0±0.60ms  1497.0 MB/sec
primitive_all_null/default                         1.00     10.9±0.13ms     4.0 GB/sec    1.00     10.9±0.13ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.23ms     4.0 GB/sec    1.00     11.0±0.17ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.18ms     4.0 GB/sec    1.01     11.2±0.19ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.1±0.26ms     3.9 GB/sec    1.00     11.1±0.17ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.05    113.1±3.73ms   388.9 MB/sec    1.00    108.1±3.09ms   407.1 MB/sec
primitive_non_null/cdc                             1.00     90.7±1.81ms   485.3 MB/sec    1.01     91.6±1.93ms   480.5 MB/sec
primitive_non_null/default                         1.00     69.0±1.91ms   637.8 MB/sec    1.00     68.8±1.80ms   639.9 MB/sec
primitive_non_null/parquet_2                       1.00     88.6±0.18ms   496.4 MB/sec    1.00     88.9±0.65ms   495.1 MB/sec
primitive_non_null/zstd                            1.07    107.1±1.15ms   410.8 MB/sec    1.00    100.2±1.17ms   438.9 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    131.2±2.66ms   335.3 MB/sec    1.00    123.6±1.93ms   356.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.1±0.12ms     2.4 GB/sec    1.00     18.2±0.10ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.02     37.5±0.27ms  1196.1 MB/sec    1.00     36.8±0.32ms  1220.5 MB/sec
primitive_sparse_99pct_null/default                1.00     17.0±0.32ms     2.6 GB/sec    1.02     17.3±0.29ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.0±0.31ms     2.6 GB/sec    1.00     17.0±0.31ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.6±0.08ms     2.1 GB/sec    1.00     20.7±0.29ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.02     19.0±0.33ms     2.3 GB/sec    1.00     18.7±0.05ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.3±0.16ms   423.3 MB/sec
short_string_non_null/cdc                                                                 1.00     20.0±0.09ms   599.6 MB/sec
short_string_non_null/default                                                             1.00     15.8±0.08ms   757.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.7±0.15ms   467.0 MB/sec
short_string_non_null/zstd                                                                1.00     35.5±0.18ms   337.9 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.5±0.07ms   421.7 MB/sec
string/bloom_filter                                1.05   240.0±35.08ms     2.1 GB/sec    1.00   228.8±28.17ms     2.2 GB/sec
string/cdc                                         1.00    221.8±6.97ms     2.3 GB/sec    1.00    222.7±7.05ms     2.3 GB/sec
string/default                                     1.20   147.0±29.18ms     3.5 GB/sec    1.00   123.0±12.32ms     4.2 GB/sec
string/parquet_2                                   1.03    125.7±0.25ms     4.1 GB/sec    1.00    122.4±3.16ms     4.2 GB/sec
string/zstd                                        1.01    430.7±6.63ms  1217.2 MB/sec    1.00    425.9±6.28ms  1230.8 MB/sec
string/zstd_parquet_2                              1.00    396.0±3.21ms  1324.0 MB/sec    1.01    401.9±3.00ms  1304.5 MB/sec
string_and_binary_view/bloom_filter                1.00     66.8±2.23ms   482.6 MB/sec    1.04     69.2±3.12ms   465.8 MB/sec
string_and_binary_view/cdc                         1.00     59.7±0.60ms   540.5 MB/sec    1.04     62.3±1.44ms   517.4 MB/sec
string_and_binary_view/default                     1.00     48.6±0.55ms   664.2 MB/sec    1.08     52.7±1.44ms   612.4 MB/sec
string_and_binary_view/parquet_2                   1.00     59.5±1.38ms   542.4 MB/sec    1.07     63.4±2.06ms   508.4 MB/sec
string_and_binary_view/zstd                        1.00     85.0±1.64ms   379.3 MB/sec    1.03     87.5±0.31ms   368.5 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.2±0.12ms   446.9 MB/sec    1.08     77.6±0.56ms   415.5 MB/sec
string_dictionary/bloom_filter                     1.00     95.1±9.63ms     2.7 GB/sec    1.43    136.3±9.04ms  1938.4 MB/sec
string_dictionary/cdc                              1.00     54.9±3.67ms     4.7 GB/sec    1.87    102.7±3.68ms     2.5 GB/sec
string_dictionary/default                          1.00     52.4±2.82ms     4.9 GB/sec    1.78     93.4±4.26ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     54.9±1.96ms     4.7 GB/sec    1.90    104.1±1.84ms     2.5 GB/sec
string_dictionary/zstd                             1.00    210.4±1.98ms  1255.2 MB/sec    1.07   224.7±16.39ms  1175.4 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.5±1.96ms  1323.7 MB/sec    1.19    236.5±2.70ms  1116.9 MB/sec
string_non_null/bloom_filter                       1.10   270.2±18.73ms  1939.5 MB/sec    1.00   246.7±16.78ms     2.1 GB/sec
string_non_null/cdc                                1.00   272.8±10.23ms  1921.0 MB/sec    1.03   280.5±12.99ms  1868.0 MB/sec
string_non_null/default                            1.04   138.1±12.03ms     3.7 GB/sec    1.00   133.4±13.29ms     3.8 GB/sec
string_non_null/parquet_2                          1.00    145.7±9.33ms     3.5 GB/sec    1.08    157.2±3.33ms     3.3 GB/sec
string_non_null/zstd                               1.00   563.0±11.65ms   930.8 MB/sec    1.04   586.9±33.99ms   892.8 MB/sec
string_non_null/zstd_parquet_2                     1.00    505.2±4.62ms  1037.2 MB/sec    1.03   520.5±11.61ms  1006.7 MB/sec
struct_all_null/bloom_filter                       1.00      2.6±0.05ms     6.2 GB/sec    1.00      2.6±0.05ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.8±0.16ms  1638.5 MB/sec    1.01     10.0±0.14ms  1614.9 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.01ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     48.6±1.17ms   329.1 MB/sec    1.00     48.7±0.99ms   328.8 MB/sec
struct_non_null/cdc                                1.00     46.3±0.32ms   345.5 MB/sec    1.02     47.0±0.41ms   340.1 MB/sec
struct_non_null/default                            1.00     32.5±0.48ms   493.0 MB/sec    1.01     32.8±0.46ms   487.7 MB/sec
struct_non_null/parquet_2                          1.00     41.6±0.90ms   385.0 MB/sec    1.01     42.1±0.86ms   380.1 MB/sec
struct_non_null/zstd                               1.00     41.2±0.85ms   388.4 MB/sec    1.02     41.8±0.87ms   382.5 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.7±0.33ms   292.6 MB/sec    1.01     55.4±0.10ms   288.8 MB/sec
struct_sparse_99pct_null/bloom_filter              1.02      7.9±0.44ms  2036.7 MB/sec    1.00      7.8±0.42ms     2.0 GB/sec
struct_sparse_99pct_null/cdc                       1.03     15.9±0.21ms  1011.4 MB/sec    1.00     15.6±0.19ms  1037.0 MB/sec
struct_sparse_99pct_null/default                   1.00      7.1±0.22ms     2.2 GB/sec    1.03      7.3±0.16ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.0±0.03ms     2.2 GB/sec    1.00      7.0±0.02ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.3±0.02ms  1939.5 MB/sec    1.02      8.5±0.17ms  1898.9 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.7±0.02ms     2.1 GB/sec    1.00      7.7±0.02ms     2.0 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1980.5s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1892.3s
CPU sys 85.8s
Peak spill 0 B

branch

Metric Value
Wall time 2155.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2071.7s
CPU sys 83.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

`value_count_strategy` picked the `Sorted` strategy for any arrow column
with `value_indices`. `Sorted` runs `partition_point(|&i| i < end_offset)`,
comparing leaf-value indices against a level offset — coordinate spaces
that only coincide for flat columns. For repeated/nested columns the leaf
values array is decoupled from the rep/def level stream, so `vals_in_chunk`
drifts upward without bound as empty-list / sub-`max_def` levels accumulate,
spuriously triggering granular sub-batching on columns whose values are
small. This was the consistent `list_primitive_non_null` regression
(+18-32% across CI runs).

Repeated columns (`max_rep_level > 0`) now count values via `DefLevelScan`.

Also in this commit:
- `count_within_budget_offsets`: drop the Stage-1 contiguity gate. The
  offset span is a valid O(1) upper bound for any sorted index set, so
  nullable offset columns skip the O(n) per-chunk walk too.
- `write_granular_chunk`: pack whole records up to `sub_batch_size` per
  mini-batch instead of one record per mini-batch (~25x fewer
  `write_mini_batch` calls when granular mode fires on lists of large
  values).
- Move `plain_encoded_byte_size` to the end of `encoder.rs`: defining it
  above the `ColumnValueEncoder` trait shifted downstream compiled code
  and perturbed unrelated string-writer benchmark layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4460974503-142-ljjq9 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (4b92635) to 2e8e0c7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

`ValueCountStrategy` was a 3-way precomputed enum (`AllPresent` / `Sorted`
/ `DefLevelScan`) for answering "how many of this chunk's levels carry a
value". `LevelDataRef::value_count` already answers that correctly for
every column shape — `Absent`/`Uniform` def levels resolve in O(1), and
the O(n) scan only runs for genuinely materialized (nullable/nested) def
levels, on the variable-width slow path the chunker is already on.

The `Sorted` variant — `partition_point` of leaf-value indices against a
level offset — was only ever valid for flat columns; for nested columns
those indices live in a different coordinate space, which is what made
`vals_in_chunk` drift and spuriously trigger granular sub-batching
(`list_primitive_non_null` regression). Deleting the enum removes that
bug class structurally rather than guarding against it.

Net effect: the chunker module drops from ~320 to ~173 lines, the
`'a` lifetime and two parameters disappear from the chunker API, and
`ByteBudgetChunker` just stores `max_def_level`. `pick_sub_batch_size`
goes back to a plain `#[inline]` (the `#[inline(always)]` was added
chasing a `string_dictionary` swing later confirmed to be code-layout
noise, not an inlining effect). Perf-neutral — `value_count` vs the old
`partition_point` is negligible and only on the post-dict-spill path.

`LevelDataRef::value_count` gains a unit test as the now load-bearing
value-counting primitive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4461247885-149-vmwpj 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (beb5fc2) to 2e8e0c7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.01     13.1±0.03ms    19.1 MB/sec    1.00     13.0±0.05ms    19.3 MB/sec
bool/cdc                                           1.00     15.9±0.08ms    15.8 MB/sec    1.01     16.0±0.05ms    15.6 MB/sec
bool/default                                       1.01     10.9±0.03ms    22.9 MB/sec    1.00     10.9±0.04ms    23.0 MB/sec
bool/parquet_2                                     1.01     14.8±0.03ms    17.0 MB/sec    1.00     14.7±0.05ms    17.1 MB/sec
bool/zstd                                          1.00     11.4±0.04ms    21.9 MB/sec    1.00     11.4±0.06ms    22.0 MB/sec
bool/zstd_parquet_2                                1.01     15.1±0.04ms    16.5 MB/sec    1.00     15.0±0.05ms    16.7 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.02ms    17.8 MB/sec    1.01      7.1±0.03ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.9±0.04ms    18.1 MB/sec    1.00      6.9±0.06ms    18.2 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.01      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.02ms    13.9 MB/sec    1.01      9.1±0.04ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.01      4.7±0.02ms    26.9 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.03ms    13.3 MB/sec    1.01      9.5±0.04ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     93.9±0.46ms   149.1 MB/sec    1.18    111.1±7.18ms   126.0 MB/sec
float_with_nans/cdc                                1.00     82.7±0.29ms   169.3 MB/sec    1.03     85.3±1.69ms   164.2 MB/sec
float_with_nans/default                            1.00     75.0±0.69ms   186.6 MB/sec    1.24     93.2±7.30ms   150.2 MB/sec
float_with_nans/parquet_2                          1.00     96.0±0.41ms   145.8 MB/sec    1.18    113.1±7.30ms   123.8 MB/sec
float_with_nans/zstd                               1.00    113.0±0.28ms   123.9 MB/sec    1.15    130.4±7.30ms   107.3 MB/sec
float_with_nans/zstd_parquet_2                     1.00    133.3±0.45ms   105.0 MB/sec    1.13    150.6±7.35ms    92.9 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.0±0.18ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    244.2±1.12ms  1048.5 MB/sec
large_string_non_null/default                                                             1.00     64.1±0.21ms     3.9 GB/sec
large_string_non_null/parquet_2                                                           1.00     64.3±0.15ms     3.9 GB/sec
large_string_non_null/zstd                                                                1.00     64.4±0.18ms     3.9 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     64.3±0.19ms     3.9 GB/sec
list_primitive/bloom_filter                        1.00    336.0±2.13ms  1623.2 MB/sec    1.01    339.6±1.11ms  1606.0 MB/sec
list_primitive/cdc                                 1.00    366.3±1.72ms  1488.8 MB/sec    1.00    367.2±1.83ms  1485.1 MB/sec
list_primitive/default                             1.00    255.4±2.16ms     2.1 GB/sec    1.01    258.9±1.60ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    276.3±0.71ms  1973.8 MB/sec    1.00    276.5±0.59ms  1972.7 MB/sec
list_primitive/zstd                                1.00    508.6±3.68ms  1072.2 MB/sec    1.00    508.9±1.97ms  1071.7 MB/sec
list_primitive/zstd_parquet_2                      1.00    498.4±1.21ms  1094.1 MB/sec    1.00    498.9±0.73ms  1093.2 MB/sec
list_primitive_non_null/bloom_filter               1.00    397.7±4.61ms  1368.4 MB/sec    1.02    407.5±4.37ms  1335.6 MB/sec
list_primitive_non_null/cdc                        1.01    444.6±7.19ms  1224.0 MB/sec    1.00    441.4±7.22ms  1232.9 MB/sec
list_primitive_non_null/default                    1.00    267.2±3.87ms  2037.2 MB/sec    1.03    275.0±3.65ms  1978.9 MB/sec
list_primitive_non_null/parquet_2                  1.00    294.9±0.39ms  1845.3 MB/sec    1.04    305.7±0.36ms  1780.3 MB/sec
list_primitive_non_null/zstd                       1.00    690.3±4.78ms   788.4 MB/sec    1.01    698.9±6.52ms   778.7 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    670.5±1.17ms   811.7 MB/sec    1.00    671.2±0.75ms   810.9 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.04     11.8±0.03ms     3.1 GB/sec    1.00     11.4±0.06ms     3.2 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     23.3±0.03ms  1606.6 MB/sec    1.00     23.2±0.04ms  1607.5 MB/sec
list_primitive_sparse_99pct_null/default           1.02     11.4±0.03ms     3.2 GB/sec    1.00     11.3±0.06ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.03     11.4±0.02ms     3.2 GB/sec    1.00     11.1±0.08ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.02     13.3±0.04ms     2.7 GB/sec    1.00     13.0±0.07ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.02     11.6±0.03ms     3.1 GB/sec    1.00     11.4±0.03ms     3.2 GB/sec
primitive/bloom_filter                             1.00    151.8±0.59ms   295.6 MB/sec    1.00    151.1±0.53ms   296.9 MB/sec
primitive/cdc                                      1.00    159.3±0.57ms   281.6 MB/sec    1.01    160.3±0.65ms   279.9 MB/sec
primitive/default                                  1.00    119.0±0.24ms   377.0 MB/sec    1.00    118.6±0.39ms   378.3 MB/sec
primitive/parquet_2                                1.00    133.9±0.34ms   335.1 MB/sec    1.00    133.4±0.25ms   336.3 MB/sec
primitive/zstd                                     1.00    148.6±0.29ms   302.0 MB/sec    1.00    148.1±0.25ms   302.9 MB/sec
primitive/zstd_parquet_2                           1.00    166.6±0.33ms   269.4 MB/sec    1.00    166.8±0.30ms   269.0 MB/sec
primitive_all_null/bloom_filter                    1.00    895.2±2.85µs    49.0 GB/sec    1.01    905.4±4.07µs    48.4 GB/sec
primitive_all_null/cdc                             1.00     18.0±0.31ms     2.4 GB/sec    1.02     18.4±0.28ms     2.4 GB/sec
primitive_all_null/default                         1.00    263.1±0.48µs   166.6 GB/sec    1.05    276.2±1.88µs   158.7 GB/sec
primitive_all_null/parquet_2                       1.00    260.7±0.78µs   168.1 GB/sec    1.08    280.3±2.28µs   156.3 GB/sec
primitive_all_null/zstd                            1.00    376.1±0.57µs   116.5 GB/sec    1.03    388.7±2.12µs   112.7 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    337.2±1.02µs   130.0 GB/sec    1.06    358.1±2.46µs   122.4 GB/sec
primitive_non_null/bloom_filter                    1.00    108.1±0.20ms   406.9 MB/sec    1.00    108.5±0.45ms   405.4 MB/sec
primitive_non_null/cdc                             1.00     91.3±0.24ms   482.1 MB/sec    1.00     91.5±0.57ms   481.1 MB/sec
primitive_non_null/default                         1.00     67.8±0.17ms   648.9 MB/sec    1.01     68.7±0.15ms   640.8 MB/sec
primitive_non_null/parquet_2                       1.00     89.6±0.30ms   491.1 MB/sec    1.00     89.6±0.20ms   490.8 MB/sec
primitive_non_null/zstd                            1.08    106.4±0.98ms   413.5 MB/sec    1.00     99.0±0.21ms   444.6 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    129.9±2.60ms   338.8 MB/sec    1.00    123.9±0.18ms   355.2 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.8±0.17ms     2.3 GB/sec    1.01     19.0±0.06ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     35.7±0.25ms  1256.6 MB/sec    1.01     35.9±0.22ms  1249.2 MB/sec
primitive_sparse_99pct_null/default                1.00     16.9±0.08ms     2.6 GB/sec    1.01     17.2±0.03ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.1±0.05ms     2.6 GB/sec    1.00     17.2±0.03ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.06ms     2.1 GB/sec    1.00     20.5±0.03ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.1±0.07ms     2.3 GB/sec    1.00     19.1±0.03ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     27.6±0.06ms   434.8 MB/sec
short_string_non_null/cdc                                                                 1.00     20.3±0.06ms   591.1 MB/sec
short_string_non_null/default                                                             1.00     16.0±0.05ms   748.3 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.7±0.06ms   466.1 MB/sec
short_string_non_null/zstd                                                                1.00     35.6±0.10ms   337.0 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.6±0.10ms   419.8 MB/sec
string/bloom_filter                                1.00   221.3±21.32ms     2.3 GB/sec    1.00   220.3±15.07ms     2.3 GB/sec
string/cdc                                         1.01    224.0±5.03ms     2.3 GB/sec    1.00    222.5±9.27ms     2.3 GB/sec
string/default                                     1.03   129.8±21.08ms     3.9 GB/sec    1.00    126.3±8.01ms     4.1 GB/sec
string/parquet_2                                   1.00    111.9±6.45ms     4.6 GB/sec    1.15    128.7±0.54ms     4.0 GB/sec
string/zstd                                        1.00    418.2±2.09ms  1253.7 MB/sec    1.08   451.7±19.33ms  1160.6 MB/sec
string/zstd_parquet_2                              1.00    403.5±6.36ms  1299.3 MB/sec    1.01    408.1±5.30ms  1284.6 MB/sec
string_and_binary_view/bloom_filter                1.00     65.4±0.23ms   493.5 MB/sec    1.08     70.3±0.33ms   458.9 MB/sec
string_and_binary_view/cdc                         1.00     58.5±0.13ms   551.3 MB/sec    1.07     62.3±0.40ms   517.3 MB/sec
string_and_binary_view/default                     1.00     48.3±0.10ms   667.7 MB/sec    1.10     53.1±0.23ms   607.8 MB/sec
string_and_binary_view/parquet_2                   1.00     59.0±0.17ms   546.2 MB/sec    1.08     63.7±0.46ms   506.3 MB/sec
string_and_binary_view/zstd                        1.00     84.7±0.14ms   380.9 MB/sec    1.06     89.8±0.59ms   359.1 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.8±0.16ms   443.3 MB/sec    1.07     77.9±0.45ms   413.8 MB/sec
string_dictionary/bloom_filter                     1.00     92.9±1.45ms     2.8 GB/sec    1.06     98.3±0.90ms     2.6 GB/sec
string_dictionary/cdc                              1.00     53.5±1.15ms     4.8 GB/sec    1.06     56.9±1.09ms     4.5 GB/sec
string_dictionary/default                          1.00     47.3±1.04ms     5.5 GB/sec    1.15     54.3±0.58ms     4.7 GB/sec
string_dictionary/parquet_2                        1.00     54.6±0.25ms     4.7 GB/sec    1.04     56.8±0.53ms     4.5 GB/sec
string_dictionary/zstd                             1.00    210.5±2.07ms  1255.0 MB/sec    1.02    215.4±0.41ms  1226.4 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.0±0.24ms  1327.0 MB/sec    1.01    202.0±0.28ms  1307.9 MB/sec
string_non_null/bloom_filter                       1.00   257.1±14.34ms  2038.0 MB/sec    1.03   266.0±12.46ms  1969.9 MB/sec
string_non_null/cdc                                1.00    267.7±2.97ms  1957.2 MB/sec    1.01    271.2±6.61ms  1931.9 MB/sec
string_non_null/default                            1.00   142.1±12.48ms     3.6 GB/sec    1.01   142.9±11.92ms     3.6 GB/sec
string_non_null/parquet_2                          1.00    132.2±3.04ms     3.9 GB/sec    1.03    136.4±7.91ms     3.8 GB/sec
string_non_null/zstd                               1.00    542.5±3.23ms   965.9 MB/sec    1.01    546.5±4.91ms   958.8 MB/sec
string_non_null/zstd_parquet_2                     1.00    505.4±0.85ms  1036.8 MB/sec    1.00    507.1±0.58ms  1033.3 MB/sec
struct_all_null/bloom_filter                       1.00    367.3±2.23µs    42.9 GB/sec    1.02    376.4±1.25µs    41.8 GB/sec
struct_all_null/cdc                                1.00      7.1±0.08ms     2.2 GB/sec    1.03      7.3±0.09ms     2.1 GB/sec
struct_all_null/default                            1.00    113.5±0.60µs   138.7 GB/sec    1.06    119.9±0.74µs   131.3 GB/sec
struct_all_null/parquet_2                          1.00    112.2±0.64µs   140.4 GB/sec    1.08    121.0±0.89µs   130.1 GB/sec
struct_all_null/zstd                               1.00    160.4±0.74µs    98.2 GB/sec    1.05    168.6±0.90µs    93.4 GB/sec
struct_all_null/zstd_parquet_2                     1.00    144.9±0.47µs   108.7 GB/sec    1.07    154.4±0.99µs   102.0 GB/sec
struct_non_null/bloom_filter                       1.03     47.1±0.15ms   339.4 MB/sec    1.00     45.9±0.18ms   348.4 MB/sec
struct_non_null/cdc                                1.00     45.7±0.15ms   350.4 MB/sec    1.00     45.5±0.16ms   351.9 MB/sec
struct_non_null/default                            1.00     32.2±0.15ms   497.0 MB/sec    1.00     32.1±0.12ms   497.8 MB/sec
struct_non_null/parquet_2                          1.00     41.1±0.09ms   389.6 MB/sec    1.00     41.1±0.13ms   389.3 MB/sec
struct_non_null/zstd                               1.01     41.3±0.13ms   387.4 MB/sec    1.00     40.9±0.12ms   390.8 MB/sec
struct_non_null/zstd_parquet_2                     1.01     55.3±0.12ms   289.3 MB/sec    1.00     54.9±0.13ms   291.5 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      6.8±0.04ms     2.3 GB/sec    1.01      6.9±0.05ms     2.3 GB/sec
struct_sparse_99pct_null/cdc                       1.00     13.5±0.07ms  1191.6 MB/sec    1.00     13.5±0.07ms  1192.9 MB/sec
struct_sparse_99pct_null/default                   1.00      6.1±0.03ms     2.6 GB/sec    1.01      6.1±0.05ms     2.6 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.1±0.03ms     2.6 GB/sec    1.00      6.1±0.04ms     2.6 GB/sec
struct_sparse_99pct_null/zstd                      1.00      7.5±0.03ms     2.1 GB/sec    1.00      7.5±0.05ms     2.1 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      6.9±0.03ms     2.3 GB/sec    1.00      6.9±0.09ms     2.3 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1925.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1890.7s
CPU sys 34.1s
Peak spill 0 B

branch

Metric Value
Wall time 2085.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2043.7s
CPU sys 38.0s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.02     13.1±0.08ms    19.0 MB/sec    1.00     12.9±0.07ms    19.3 MB/sec
bool/cdc                                           1.01     16.0±0.11ms    15.6 MB/sec    1.00     15.8±0.10ms    15.8 MB/sec
bool/default                                       1.02     11.0±0.06ms    22.6 MB/sec    1.00     10.8±0.05ms    23.1 MB/sec
bool/parquet_2                                     1.01     14.9±0.06ms    16.8 MB/sec    1.00     14.7±0.06ms    17.0 MB/sec
bool/zstd                                          1.01     11.5±0.08ms    21.7 MB/sec    1.00     11.4±0.05ms    22.0 MB/sec
bool/zstd_parquet_2                                1.00     15.2±0.08ms    16.5 MB/sec    1.00     15.1±0.05ms    16.5 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.03ms    17.7 MB/sec    1.00      7.1±0.04ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.9±0.08ms    18.1 MB/sec    1.00      6.9±0.07ms    18.0 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.2 MB/sec    1.00      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.9 MB/sec    1.01      9.1±0.04ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.00      4.6±0.02ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.03ms    13.3 MB/sec    1.01      9.5±0.04ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.00     93.7±2.24ms   149.5 MB/sec    1.01     94.8±0.84ms   147.6 MB/sec
float_with_nans/cdc                                1.00     82.3±1.53ms   170.1 MB/sec    1.01     83.3±0.64ms   168.1 MB/sec
float_with_nans/default                            1.01     76.1±1.09ms   184.0 MB/sec    1.00     75.6±0.49ms   185.2 MB/sec
float_with_nans/parquet_2                          1.00     96.2±0.88ms   145.5 MB/sec    1.00     96.5±2.03ms   145.1 MB/sec
float_with_nans/zstd                               1.00    113.0±0.89ms   123.8 MB/sec    1.00    112.5±1.30ms   124.4 MB/sec
float_with_nans/zstd_parquet_2                     1.00    134.1±1.84ms   104.4 MB/sec    1.00    133.6±1.05ms   104.8 MB/sec
large_string_non_null/bloom_filter                                                        1.00     82.4±2.71ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    243.8±2.51ms  1049.9 MB/sec
large_string_non_null/default                                                             1.00     64.3±2.33ms     3.9 GB/sec
large_string_non_null/parquet_2                                                           1.00     63.6±1.81ms     3.9 GB/sec
large_string_non_null/zstd                                                                1.00     63.5±0.94ms     3.9 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     63.3±1.81ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    337.7±9.69ms  1614.8 MB/sec    1.01    341.5±9.51ms  1596.9 MB/sec
list_primitive/cdc                                 1.01    373.0±7.60ms  1462.0 MB/sec    1.00    369.2±3.64ms  1477.1 MB/sec
list_primitive/default                             1.00    257.2±3.41ms     2.1 GB/sec    1.00    257.6±4.71ms     2.1 GB/sec
list_primitive/parquet_2                           1.01    279.3±2.15ms  1952.5 MB/sec    1.00    277.4±2.07ms  1966.2 MB/sec
list_primitive/zstd                                1.00    511.9±4.47ms  1065.5 MB/sec    1.00    510.9±5.07ms  1067.4 MB/sec
list_primitive/zstd_parquet_2                      1.00    502.3±2.32ms  1085.8 MB/sec    1.00    502.5±3.18ms  1085.2 MB/sec
list_primitive_non_null/bloom_filter               1.00   414.2±11.72ms  1313.8 MB/sec    1.02   421.2±15.76ms  1292.2 MB/sec
list_primitive_non_null/cdc                        1.00    445.7±9.47ms  1221.0 MB/sec    1.01   448.9±11.81ms  1212.3 MB/sec
list_primitive_non_null/default                    1.00    275.0±7.27ms  1978.8 MB/sec    1.03    283.4±7.83ms  1920.6 MB/sec
list_primitive_non_null/parquet_2                  1.00    298.9±3.93ms  1820.7 MB/sec    1.02   306.3±10.15ms  1777.0 MB/sec
list_primitive_non_null/zstd                       1.00    698.9±8.09ms   778.7 MB/sec    1.00    697.9±9.19ms   779.9 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    675.1±3.80ms   806.2 MB/sec    1.00    675.0±4.27ms   806.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.8±0.33ms     3.1 GB/sec    1.01     11.8±0.07ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.01     23.5±0.24ms  1587.6 MB/sec    1.00     23.4±0.18ms  1597.7 MB/sec
list_primitive_sparse_99pct_null/default           1.02     11.5±0.15ms     3.2 GB/sec    1.00     11.3±0.13ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.2±0.22ms     3.3 GB/sec    1.00     11.2±0.25ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.03     13.3±0.13ms     2.7 GB/sec    1.00     12.9±0.26ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.04     11.7±0.13ms     3.1 GB/sec    1.00     11.2±0.23ms     3.3 GB/sec
primitive/bloom_filter                             1.00    152.0±1.60ms   295.2 MB/sec    1.00    152.8±2.07ms   293.8 MB/sec
primitive/cdc                                      1.02    161.5±1.31ms   277.9 MB/sec    1.00    158.9±1.80ms   282.5 MB/sec
primitive/default                                  1.00    118.9±1.67ms   377.3 MB/sec    1.00    119.4±0.76ms   376.0 MB/sec
primitive/parquet_2                                1.01    135.2±1.02ms   331.8 MB/sec    1.00    133.7±1.85ms   335.6 MB/sec
primitive/zstd                                     1.00    148.7±1.75ms   301.8 MB/sec    1.00    148.9±0.80ms   301.3 MB/sec
primitive/zstd_parquet_2                           1.00    168.4±0.85ms   266.4 MB/sec    1.00    167.7±1.16ms   267.5 MB/sec
primitive_all_null/bloom_filter                    1.00   884.3±23.62µs    49.6 GB/sec    1.06   936.8±11.16µs    46.8 GB/sec
primitive_all_null/cdc                             1.00     18.0±0.34ms     2.4 GB/sec    1.07     19.3±0.43ms     2.3 GB/sec
primitive_all_null/default                         1.00    263.9±0.92µs   166.0 GB/sec    1.04    274.7±0.90µs   159.5 GB/sec
primitive_all_null/parquet_2                       1.00    260.0±1.63µs   168.5 GB/sec    1.07    278.9±1.05µs   157.1 GB/sec
primitive_all_null/zstd                            1.00    374.3±0.67µs   117.1 GB/sec    1.04    387.7±0.94µs   113.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    336.0±1.32µs   130.4 GB/sec    1.06    356.2±1.14µs   123.0 GB/sec
primitive_non_null/bloom_filter                    1.00    107.6±2.39ms   408.7 MB/sec    1.02    109.6±1.83ms   401.5 MB/sec
primitive_non_null/cdc                             1.00     91.4±1.28ms   481.3 MB/sec    1.01     92.1±1.08ms   477.7 MB/sec
primitive_non_null/default                         1.00     68.3±1.66ms   644.6 MB/sec    1.01     69.2±0.81ms   636.2 MB/sec
primitive_non_null/parquet_2                       1.00     90.2±0.73ms   488.0 MB/sec    1.01     91.4±1.18ms   481.4 MB/sec
primitive_non_null/zstd                            1.08    106.8±1.14ms   412.1 MB/sec    1.00     99.0±1.66ms   444.5 MB/sec
primitive_non_null/zstd_parquet_2                  1.04    129.9±2.91ms   338.7 MB/sec    1.00    124.4±0.99ms   353.6 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     19.2±0.49ms     2.3 GB/sec    1.00     19.3±0.25ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     35.6±0.46ms  1258.9 MB/sec    1.02     36.2±0.41ms  1239.7 MB/sec
primitive_sparse_99pct_null/default                1.00     17.2±0.06ms     2.5 GB/sec    1.00     17.3±0.22ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.0±0.33ms     2.6 GB/sec    1.02     17.4±0.09ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.28ms     2.2 GB/sec    1.02     20.7±0.29ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     19.2±0.11ms     2.3 GB/sec    1.00     19.0±0.34ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     27.3±0.37ms   439.2 MB/sec
short_string_non_null/cdc                                                                 1.00     20.4±0.08ms   589.1 MB/sec
short_string_non_null/default                                                             1.00     16.0±0.22ms   751.1 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.5±0.12ms   470.3 MB/sec
short_string_non_null/zstd                                                                1.00     35.6±0.12ms   337.4 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.5±0.12ms   421.1 MB/sec
string/bloom_filter                                1.02   229.5±26.84ms     2.2 GB/sec    1.00   225.8±17.82ms     2.3 GB/sec
string/cdc                                         1.00    223.7±6.15ms     2.3 GB/sec    1.01   225.3±10.03ms     2.3 GB/sec
string/default                                     1.00   130.6±22.42ms     3.9 GB/sec    1.00    130.1±9.25ms     3.9 GB/sec
string/parquet_2                                   1.00    113.4±7.32ms     4.5 GB/sec    1.14    129.1±2.05ms     4.0 GB/sec
string/zstd                                        1.00    422.6±5.34ms  1240.5 MB/sec    1.07   450.2±20.31ms  1164.5 MB/sec
string/zstd_parquet_2                              1.00    405.6±7.28ms  1292.6 MB/sec    1.00    406.9±6.17ms  1288.3 MB/sec
string_and_binary_view/bloom_filter                1.00     66.6±1.31ms   484.4 MB/sec    1.04     69.3±1.44ms   465.3 MB/sec
string_and_binary_view/cdc                         1.00     58.9±0.77ms   547.3 MB/sec    1.06     62.3±1.50ms   518.0 MB/sec
string_and_binary_view/default                     1.00     48.3±0.65ms   667.3 MB/sec    1.07     51.7±1.34ms   623.8 MB/sec
string_and_binary_view/parquet_2                   1.00     60.3±1.24ms   534.9 MB/sec    1.03     62.2±1.13ms   518.3 MB/sec
string_and_binary_view/zstd                        1.00     84.8±1.19ms   380.1 MB/sec    1.05     88.9±1.23ms   362.8 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.4±0.67ms   439.6 MB/sec    1.04     76.2±1.62ms   423.1 MB/sec
string_dictionary/bloom_filter                     1.00     96.0±5.76ms     2.7 GB/sec    1.01     97.3±3.97ms     2.6 GB/sec
string_dictionary/cdc                              1.00     52.9±2.49ms     4.9 GB/sec    1.08     57.3±1.59ms     4.5 GB/sec
string_dictionary/default                          1.00     48.7±1.44ms     5.3 GB/sec    1.09     52.9±1.45ms     4.9 GB/sec
string_dictionary/parquet_2                        1.00     54.2±1.29ms     4.8 GB/sec    1.04     56.2±0.71ms     4.6 GB/sec
string_dictionary/zstd                             1.00    212.1±2.79ms  1245.1 MB/sec    1.00    213.1±3.66ms  1239.3 MB/sec
string_dictionary/zstd_parquet_2                   1.00    200.3±1.42ms  1318.8 MB/sec    1.00    200.1±0.96ms  1320.0 MB/sec
string_non_null/bloom_filter                       1.00   262.8±20.48ms  1993.6 MB/sec    1.03   272.0±14.10ms  1926.3 MB/sec
string_non_null/cdc                                1.00    270.5±5.10ms  1937.3 MB/sec    1.01    272.0±8.15ms  1926.3 MB/sec
string_non_null/default                            1.00   143.1±12.20ms     3.6 GB/sec    1.01   143.9±13.51ms     3.6 GB/sec
string_non_null/parquet_2                          1.00    132.6±4.41ms     3.9 GB/sec    1.05    139.6±7.76ms     3.7 GB/sec
string_non_null/zstd                               1.00    543.2±6.68ms   964.7 MB/sec    1.01    548.3±7.64ms   955.7 MB/sec
string_non_null/zstd_parquet_2                     1.00    507.0±3.05ms  1033.5 MB/sec    1.00    508.5±2.96ms  1030.5 MB/sec
struct_all_null/bloom_filter                       1.00    367.0±4.83µs    42.9 GB/sec    1.04    382.7±4.73µs    41.2 GB/sec
struct_all_null/cdc                                1.00      7.1±0.12ms     2.2 GB/sec    1.10      7.9±0.23ms  2047.9 MB/sec
struct_all_null/default                            1.00    113.1±0.45µs   139.2 GB/sec    1.06    119.5±0.37µs   131.8 GB/sec
struct_all_null/parquet_2                          1.00    112.0±0.71µs   140.6 GB/sec    1.08    120.8±0.43µs   130.3 GB/sec
struct_all_null/zstd                               1.00    160.7±0.70µs    98.0 GB/sec    1.04    166.3±0.93µs    94.7 GB/sec
struct_all_null/zstd_parquet_2                     1.00    145.0±0.62µs   108.6 GB/sec    1.06    153.5±0.62µs   102.6 GB/sec
struct_non_null/bloom_filter                       1.00     46.6±1.26ms   343.3 MB/sec    1.03     48.0±1.40ms   333.1 MB/sec
struct_non_null/cdc                                1.00     45.4±0.75ms   352.1 MB/sec    1.02     46.5±0.35ms   344.2 MB/sec
struct_non_null/default                            1.00     32.7±0.64ms   489.0 MB/sec    1.00     32.8±0.26ms   488.3 MB/sec
struct_non_null/parquet_2                          1.00     41.1±0.28ms   389.2 MB/sec    1.01     41.4±0.65ms   386.4 MB/sec
struct_non_null/zstd                               1.00     41.3±0.49ms   387.0 MB/sec    1.00     41.3±0.49ms   387.6 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.5±0.46ms   288.4 MB/sec    1.00     55.3±0.28ms   289.4 MB/sec
struct_sparse_99pct_null/bloom_filter              1.02      6.8±0.18ms     2.3 GB/sec    1.00      6.7±0.33ms     2.4 GB/sec
struct_sparse_99pct_null/cdc                       1.00     13.3±0.31ms  1213.5 MB/sec    1.04     13.8±0.27ms  1165.2 MB/sec
struct_sparse_99pct_null/default                   1.00      6.0±0.19ms     2.6 GB/sec    1.04      6.2±0.21ms     2.5 GB/sec
struct_sparse_99pct_null/parquet_2                 1.01      6.1±0.15ms     2.6 GB/sec    1.00      6.0±0.17ms     2.6 GB/sec
struct_sparse_99pct_null/zstd                      1.00      7.6±0.05ms     2.1 GB/sec    1.00      7.5±0.19ms     2.1 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      6.9±0.21ms     2.3 GB/sec    1.02      7.0±0.07ms     2.2 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1940.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1905.0s
CPU sys 33.7s
Peak spill 0 B

branch

Metric Value
Wall time 2080.5s
Peak memory 6.8 GiB
Avg memory 6.7 GiB
CPU user 2038.4s
CPU sys 38.9s
Peak spill 0 B

File an issue against this benchmark runner

A 4-round alternating A/B (main vs branch-with-#[cold] vs branch-without,
8 benches) showed `#[cold]` is not load-bearing: removing it moved every
bench by ≤1% with no consistent direction — pure noise. The comment
justifying `#[cold]` on `byte_budget_sub_batch_size` was also wrong: it
claimed the path "fires only on columns whose values are individually
larger than data_page_size_limit / write_batch_size", when in fact it
runs once per chunk for *every* variable-width column once dictionary
encoding is abandoned (e.g. the small-string benchmarks). Dropped both
`#[cold]` hints and the benchmark-archaeology comments.

`#[inline(never)]` is kept on both slow-path helpers. The symbol table
confirms it is doing real work — without it `byte_budget_sub_batch_size`
and `write_granular_chunk` are inlined bodily into the hot
`write_batch_internal` loop (0-1 vs 7 out-of-line copies). Keeping a
~40-line rarely-taken helper out of the hot loop is standard slow-path
outlining; the A/B shows it costs nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4462063096-150-nqvp4 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (97e8a45) to 2e8e0c7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.0±0.03ms    19.2 MB/sec    1.02     13.3±0.09ms    18.8 MB/sec
bool/cdc                                           1.00     15.9±0.04ms    15.7 MB/sec    1.02     16.2±0.12ms    15.4 MB/sec
bool/default                                       1.00     10.9±0.04ms    22.9 MB/sec    1.02     11.1±0.08ms    22.5 MB/sec
bool/parquet_2                                     1.00     14.7±0.03ms    17.0 MB/sec    1.01     14.9±0.09ms    16.8 MB/sec
bool/zstd                                          1.00     11.4±0.04ms    21.9 MB/sec    1.02     11.6±0.08ms    21.5 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.04ms    16.5 MB/sec    1.01     15.3±0.09ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.03ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      6.9±0.03ms    18.3 MB/sec    1.02      7.0±0.03ms    17.9 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.02      4.4±0.02ms    28.7 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.9 MB/sec    1.02      9.2±0.04ms    13.7 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    26.9 MB/sec    1.01      4.7±0.02ms    26.6 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.02ms    13.3 MB/sec    1.02      9.6±0.03ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.01     94.3±0.27ms   148.4 MB/sec    1.00     93.2±0.37ms   150.2 MB/sec
float_with_nans/cdc                                1.00     82.7±0.28ms   169.4 MB/sec    1.00     82.6±0.42ms   169.4 MB/sec
float_with_nans/default                            1.00     75.3±0.31ms   185.8 MB/sec    1.00     75.1±0.59ms   186.3 MB/sec
float_with_nans/parquet_2                          1.02     96.9±0.38ms   144.5 MB/sec    1.00     95.4±0.35ms   146.7 MB/sec
float_with_nans/zstd                               1.00    113.0±0.39ms   123.9 MB/sec    1.00    112.8±0.49ms   124.2 MB/sec
float_with_nans/zstd_parquet_2                     1.00    133.5±0.47ms   104.9 MB/sec    1.00    132.9±0.45ms   105.3 MB/sec
large_string_non_null/bloom_filter                                                        1.00     81.1±0.28ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    242.3±1.18ms  1056.4 MB/sec
large_string_non_null/default                                                             1.00     62.2±0.55ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     61.7±0.28ms     4.1 GB/sec
large_string_non_null/zstd                                                                1.00     61.8±0.73ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     61.9±0.28ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    335.9±1.28ms  1623.8 MB/sec    1.10    368.3±7.59ms  1480.9 MB/sec
list_primitive/cdc                                 1.00    363.5±2.32ms  1500.5 MB/sec    1.03    374.9±5.77ms  1454.6 MB/sec
list_primitive/default                             1.00    256.5±2.35ms     2.1 GB/sec    1.08    277.5±1.41ms  1965.1 MB/sec
list_primitive/parquet_2                           1.00    275.8±0.65ms  1977.1 MB/sec    1.06    293.3±1.40ms  1859.4 MB/sec
list_primitive/zstd                                1.00    505.5±4.20ms  1078.9 MB/sec    1.01    510.3±1.79ms  1068.8 MB/sec
list_primitive/zstd_parquet_2                      1.00    498.5±0.66ms  1093.9 MB/sec    1.00    498.6±0.48ms  1093.8 MB/sec
list_primitive_non_null/bloom_filter               1.00    423.6±4.93ms  1284.8 MB/sec    1.07   455.0±13.51ms  1196.2 MB/sec
list_primitive_non_null/cdc                        1.00    438.0±9.13ms  1242.4 MB/sec    1.01    440.9±6.79ms  1234.3 MB/sec
list_primitive_non_null/default                    1.00    291.0±3.93ms  1870.0 MB/sec    1.07   311.8±12.63ms  1745.2 MB/sec
list_primitive_non_null/parquet_2                  1.00    319.9±2.17ms  1701.4 MB/sec    1.08    346.6±3.29ms  1570.1 MB/sec
list_primitive_non_null/zstd                       1.00    713.7±9.28ms   762.6 MB/sec    1.01   719.0±19.80ms   757.0 MB/sec
list_primitive_non_null/zstd_parquet_2             1.02    685.3±0.92ms   794.2 MB/sec    1.00    670.7±0.81ms   811.4 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.8±0.09ms     3.1 GB/sec    1.00     11.8±0.05ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     23.2±0.07ms  1611.4 MB/sec    1.01     23.5±0.06ms  1591.1 MB/sec
list_primitive_sparse_99pct_null/default           1.00     11.3±0.10ms     3.2 GB/sec    1.02     11.5±0.04ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.4±0.05ms     3.2 GB/sec    1.00     11.4±0.05ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.3±0.07ms     2.8 GB/sec    1.00     13.3±0.04ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.6±0.08ms     3.2 GB/sec    1.00     11.6±0.06ms     3.1 GB/sec
primitive/bloom_filter                             1.01    152.8±0.44ms   293.6 MB/sec    1.00    151.3±0.79ms   296.5 MB/sec
primitive/cdc                                      1.00    160.3±0.51ms   280.0 MB/sec    1.01    161.6±0.78ms   277.7 MB/sec
primitive/default                                  1.00    119.3±0.27ms   376.2 MB/sec    1.00    119.8±0.45ms   374.6 MB/sec
primitive/parquet_2                                1.00    134.3±0.29ms   334.2 MB/sec    1.01    135.2±0.50ms   331.9 MB/sec
primitive/zstd                                     1.00    149.1±0.28ms   301.0 MB/sec    1.00    149.7±0.44ms   299.8 MB/sec
primitive/zstd_parquet_2                           1.00    168.0±0.64ms   267.1 MB/sec    1.00    168.6±0.53ms   266.2 MB/sec
primitive_all_null/bloom_filter                    1.00    878.7±3.48µs    49.9 GB/sec    1.03    907.5±5.57µs    48.3 GB/sec
primitive_all_null/cdc                             1.00     17.8±0.24ms     2.5 GB/sec    1.04     18.5±0.28ms     2.4 GB/sec
primitive_all_null/default                         1.00    258.8±0.66µs   169.3 GB/sec    1.06    274.5±2.12µs   159.6 GB/sec
primitive_all_null/parquet_2                       1.00    256.8±1.05µs   170.7 GB/sec    1.09    278.9±1.73µs   157.1 GB/sec
primitive_all_null/zstd                            1.00    371.8±0.47µs   117.9 GB/sec    1.04    388.1±1.86µs   112.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    333.6±1.05µs   131.4 GB/sec    1.07    357.1±1.71µs   122.7 GB/sec
primitive_non_null/bloom_filter                    1.01    111.3±0.39ms   395.3 MB/sec    1.00    109.8±0.33ms   400.6 MB/sec
primitive_non_null/cdc                             1.00     91.8±0.33ms   479.1 MB/sec    1.00     91.7±0.49ms   479.6 MB/sec
primitive_non_null/default                         1.00     69.2±0.34ms   636.3 MB/sec    1.00     69.1±0.25ms   636.5 MB/sec
primitive_non_null/parquet_2                       1.00     91.0±0.24ms   483.7 MB/sec    1.00     90.7±0.19ms   485.4 MB/sec
primitive_non_null/zstd                            1.07    106.9±1.04ms   411.7 MB/sec    1.00     99.9±0.34ms   440.7 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    130.7±2.68ms   336.6 MB/sec    1.00    124.3±0.20ms   354.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.7±0.18ms     2.3 GB/sec    1.03     19.3±0.11ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     35.5±0.21ms  1265.0 MB/sec    1.02     36.3±0.29ms  1237.5 MB/sec
primitive_sparse_99pct_null/default                1.00     17.1±0.09ms     2.6 GB/sec    1.01     17.3±0.03ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.9±0.08ms     2.6 GB/sec    1.02     17.3±0.04ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.3±0.07ms     2.2 GB/sec    1.02     20.6±0.04ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.0±0.09ms     2.3 GB/sec    1.01     19.2±0.03ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     27.0±0.08ms   444.7 MB/sec
short_string_non_null/cdc                                                                 1.00     20.2±0.08ms   594.7 MB/sec
short_string_non_null/default                                                             1.00     16.0±0.05ms   750.9 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.7±0.06ms   467.2 MB/sec
short_string_non_null/zstd                                                                1.00     35.5±0.11ms   338.4 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.4±0.06ms   422.6 MB/sec
string/bloom_filter                                1.01   222.2±19.41ms     2.3 GB/sec    1.00   220.1±15.51ms     2.3 GB/sec
string/cdc                                         1.02    225.3±4.59ms     2.3 GB/sec    1.00    221.8±9.16ms     2.3 GB/sec
string/default                                     1.02   126.6±19.36ms     4.0 GB/sec    1.00    124.5±7.73ms     4.1 GB/sec
string/parquet_2                                   1.00    112.7±6.60ms     4.5 GB/sec    1.13    126.9±1.67ms     4.0 GB/sec
string/zstd                                        1.00    420.8±2.11ms  1245.9 MB/sec    1.06   447.6±19.90ms  1171.2 MB/sec
string/zstd_parquet_2                              1.00    404.0±6.73ms  1297.6 MB/sec    1.00    405.4±5.28ms  1293.0 MB/sec
string_and_binary_view/bloom_filter                1.00     66.5±0.31ms   485.0 MB/sec    1.00     66.7±0.35ms   483.7 MB/sec
string_and_binary_view/cdc                         1.00     59.3±0.20ms   543.9 MB/sec    1.04     61.8±0.33ms   521.9 MB/sec
string_and_binary_view/default                     1.00     48.9±0.22ms   659.8 MB/sec    1.04     50.9±0.29ms   633.9 MB/sec
string_and_binary_view/parquet_2                   1.00     59.9±0.35ms   538.0 MB/sec    1.03     61.7±0.31ms   522.9 MB/sec
string_and_binary_view/zstd                        1.00     85.4±0.20ms   377.5 MB/sec    1.02     87.5±0.29ms   368.6 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.7±0.27ms   437.7 MB/sec    1.02     75.5±0.31ms   427.3 MB/sec
string_dictionary/bloom_filter                     1.01     95.4±0.94ms     2.7 GB/sec    1.00     94.3±0.65ms     2.7 GB/sec
string_dictionary/cdc                              1.00     51.7±0.29ms     5.0 GB/sec    1.07     55.3±0.75ms     4.7 GB/sec
string_dictionary/default                          1.00     48.8±1.60ms     5.3 GB/sec    1.05     51.4±0.64ms     5.0 GB/sec
string_dictionary/parquet_2                        1.00     55.0±0.28ms     4.7 GB/sec    1.00     54.9±0.24ms     4.7 GB/sec
string_dictionary/zstd                             1.00    209.7±2.02ms  1259.5 MB/sec    1.01    212.3±1.02ms  1244.1 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.8±0.29ms  1322.0 MB/sec    1.00    199.7±0.25ms  1322.5 MB/sec
string_non_null/bloom_filter                       1.02   263.5±12.50ms  1988.3 MB/sec    1.00   257.1±13.04ms  2037.9 MB/sec
string_non_null/cdc                                1.00   279.7±10.50ms  1873.6 MB/sec    1.00   279.6±12.81ms  1873.9 MB/sec
string_non_null/default                            1.00   144.8±12.98ms     3.5 GB/sec    1.01    146.0±9.40ms     3.5 GB/sec
string_non_null/parquet_2                          1.17    152.9±3.06ms     3.3 GB/sec    1.00    130.3±3.72ms     3.9 GB/sec
string_non_null/zstd                               1.06    570.9±5.62ms   917.9 MB/sec    1.00    539.6±3.16ms   971.1 MB/sec
string_non_null/zstd_parquet_2                     1.01    518.4±4.55ms  1010.8 MB/sec    1.00   512.6±10.97ms  1022.2 MB/sec
struct_all_null/bloom_filter                       1.00    366.6±1.33µs    43.0 GB/sec    1.04    380.0±4.19µs    41.4 GB/sec
struct_all_null/cdc                                1.00      7.1±0.07ms     2.2 GB/sec    1.04      7.4±0.12ms     2.1 GB/sec
struct_all_null/default                            1.00    113.4±0.21µs   138.9 GB/sec    1.06    119.7±0.97µs   131.6 GB/sec
struct_all_null/parquet_2                          1.00    111.8±0.45µs   140.8 GB/sec    1.08    120.8±0.94µs   130.4 GB/sec
struct_all_null/zstd                               1.00    158.7±0.94µs    99.2 GB/sec    1.06    168.0±0.87µs    93.7 GB/sec
struct_all_null/zstd_parquet_2                     1.00    144.3±0.43µs   109.1 GB/sec    1.08    156.3±0.92µs   100.8 GB/sec
struct_non_null/bloom_filter                       1.00     46.2±0.24ms   346.2 MB/sec    1.01     46.9±1.06ms   341.4 MB/sec
struct_non_null/cdc                                1.00     45.4±0.14ms   352.6 MB/sec    1.01     45.7±0.17ms   350.2 MB/sec
struct_non_null/default                            1.00     32.2±0.16ms   497.3 MB/sec    1.00     32.3±0.15ms   494.8 MB/sec
struct_non_null/parquet_2                          1.00     40.8±0.15ms   392.0 MB/sec    1.01     41.0±0.17ms   389.9 MB/sec
struct_non_null/zstd                               1.00     40.8±0.09ms   392.6 MB/sec    1.00     40.9±0.14ms   391.4 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.8±0.13ms   291.9 MB/sec    1.00     55.0±0.15ms   290.9 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      6.7±0.10ms     2.4 GB/sec    1.04      6.9±0.06ms     2.3 GB/sec
struct_sparse_99pct_null/cdc                       1.00     13.5±0.11ms  1195.6 MB/sec    1.02     13.7±0.10ms  1172.8 MB/sec
struct_sparse_99pct_null/default                   1.00      6.0±0.05ms     2.6 GB/sec    1.02      6.1±0.05ms     2.6 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.0±0.06ms     2.6 GB/sec    1.03      6.2±0.05ms     2.5 GB/sec
struct_sparse_99pct_null/zstd                      1.00      7.4±0.05ms     2.1 GB/sec    1.02      7.5±0.04ms     2.1 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      6.9±0.05ms     2.3 GB/sec    1.03      7.0±0.03ms     2.2 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1945.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1891.6s
CPU sys 53.5s
Peak spill 0 B

branch

Metric Value
Wall time 2095.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2037.1s
CPU sys 58.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4462912234-152-vrbkj 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (97e8a45) to 2e8e0c7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.01     13.2±0.08ms    19.0 MB/sec    1.00     13.1±0.06ms    19.1 MB/sec
bool/cdc                                           1.00     16.2±0.10ms    15.5 MB/sec    1.00     16.2±0.07ms    15.4 MB/sec
bool/default                                       1.01     11.0±0.06ms    22.6 MB/sec    1.00     10.9±0.05ms    22.8 MB/sec
bool/parquet_2                                     1.01     14.9±0.08ms    16.8 MB/sec    1.00     14.8±0.05ms    16.9 MB/sec
bool/zstd                                          1.01     11.6±0.07ms    21.6 MB/sec    1.00     11.5±0.06ms    21.8 MB/sec
bool/zstd_parquet_2                                1.01     15.3±0.08ms    16.4 MB/sec    1.00     15.2±0.06ms    16.5 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.04ms    17.8 MB/sec    1.01      7.1±0.04ms    17.5 MB/sec
bool_non_null/cdc                                  1.00      7.0±0.07ms    18.0 MB/sec    1.02      7.1±0.04ms    17.7 MB/sec
bool_non_null/default                              1.00      4.3±0.03ms    29.3 MB/sec    1.02      4.4±0.02ms    28.7 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.9 MB/sec    1.02      9.2±0.05ms    13.6 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.03ms    27.0 MB/sec    1.03      4.8±0.03ms    26.3 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.04ms    13.3 MB/sec    1.02      9.6±0.04ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.03     99.5±3.55ms   140.7 MB/sec    1.00     96.2±2.54ms   145.5 MB/sec
float_with_nans/cdc                                1.01     84.9±1.61ms   164.9 MB/sec    1.00     84.4±1.58ms   166.0 MB/sec
float_with_nans/default                            1.03     78.2±1.51ms   179.0 MB/sec    1.00     76.3±0.82ms   183.6 MB/sec
float_with_nans/parquet_2                          1.01     98.8±2.38ms   141.7 MB/sec    1.00     97.9±2.09ms   143.1 MB/sec
float_with_nans/zstd                               1.01    115.4±1.89ms   121.4 MB/sec    1.00    113.9±1.30ms   122.9 MB/sec
float_with_nans/zstd_parquet_2                     1.01    137.6±2.42ms   101.7 MB/sec    1.00    136.0±1.19ms   102.9 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.3±3.26ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    245.7±2.72ms  1041.8 MB/sec
large_string_non_null/default                                                             1.00     65.1±3.20ms     3.8 GB/sec
large_string_non_null/parquet_2                                                           1.00     63.6±2.96ms     3.9 GB/sec
large_string_non_null/zstd                                                                1.00     66.6±1.72ms     3.8 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     65.2±1.71ms     3.8 GB/sec
list_primitive/bloom_filter                        1.01   360.9±14.46ms  1511.2 MB/sec    1.00   355.9±13.37ms  1532.3 MB/sec
list_primitive/cdc                                 1.00    370.0±4.53ms  1474.2 MB/sec    1.00    368.7±3.89ms  1479.0 MB/sec
list_primitive/default                             1.00    262.1±5.03ms     2.0 GB/sec    1.00    262.1±3.87ms     2.0 GB/sec
list_primitive/parquet_2                           1.01    281.0±2.48ms  1940.8 MB/sec    1.00    278.7±2.15ms  1956.8 MB/sec
list_primitive/zstd                                1.01    517.4±5.04ms  1054.0 MB/sec    1.00    512.7±4.14ms  1063.8 MB/sec
list_primitive/zstd_parquet_2                      1.00    504.1±3.10ms  1081.9 MB/sec    1.00    502.0±2.27ms  1086.5 MB/sec
list_primitive_non_null/bloom_filter               1.08   461.4±18.82ms  1179.5 MB/sec    1.00   428.1±14.10ms  1271.2 MB/sec
list_primitive_non_null/cdc                        1.00    443.5±6.74ms  1227.0 MB/sec    1.00   444.4±11.77ms  1224.6 MB/sec
list_primitive_non_null/default                    1.04    302.3±7.41ms  1800.2 MB/sec    1.00    291.1±6.44ms  1869.7 MB/sec
list_primitive_non_null/parquet_2                  1.07    329.4±4.81ms  1652.1 MB/sec    1.00    307.9±9.41ms  1767.8 MB/sec
list_primitive_non_null/zstd                       1.03   733.2±13.41ms   742.3 MB/sec    1.00    712.0±7.69ms   764.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.04    702.2±6.13ms   775.0 MB/sec    1.00    675.2±3.23ms   806.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.03     12.3±0.12ms     3.0 GB/sec    1.00     11.9±0.20ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     23.7±0.17ms  1574.9 MB/sec    1.00     23.7±0.11ms  1574.6 MB/sec
list_primitive_sparse_99pct_null/default           1.01     11.8±0.09ms     3.1 GB/sec    1.00     11.6±0.19ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.04     11.9±0.06ms     3.1 GB/sec    1.00     11.5±0.21ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.03     13.7±0.16ms     2.7 GB/sec    1.00     13.3±0.20ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.02     12.0±0.09ms     3.0 GB/sec    1.00     11.7±0.18ms     3.1 GB/sec
primitive/bloom_filter                             1.01    158.2±4.33ms   283.7 MB/sec    1.00    156.6±3.31ms   286.6 MB/sec
primitive/cdc                                      1.01    164.0±1.79ms   273.6 MB/sec    1.00    162.0±2.09ms   277.0 MB/sec
primitive/default                                  1.01    122.7±1.40ms   365.9 MB/sec    1.00    121.5±1.48ms   369.5 MB/sec
primitive/parquet_2                                1.00    136.3±1.72ms   329.2 MB/sec    1.00    135.7±1.73ms   330.6 MB/sec
primitive/zstd                                     1.01    152.3±1.34ms   294.6 MB/sec    1.00    150.8±1.85ms   297.7 MB/sec
primitive/zstd_parquet_2                           1.01    171.7±2.28ms   261.3 MB/sec    1.00    169.7±1.78ms   264.5 MB/sec
primitive_all_null/bloom_filter                    1.00   929.5±24.01µs    47.1 GB/sec    1.00   927.8±26.67µs    47.2 GB/sec
primitive_all_null/cdc                             1.00     17.8±0.24ms     2.5 GB/sec    1.04     18.5±0.31ms     2.4 GB/sec
primitive_all_null/default                         1.00    262.5±1.21µs   167.0 GB/sec    1.05    275.0±1.05µs   159.4 GB/sec
primitive_all_null/parquet_2                       1.00    260.2±1.76µs   168.4 GB/sec    1.08    281.8±1.76µs   155.5 GB/sec
primitive_all_null/zstd                            1.00    375.4±1.90µs   116.7 GB/sec    1.05    392.8±1.94µs   111.6 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    335.6±1.38µs   130.6 GB/sec    1.07    358.9±0.99µs   122.1 GB/sec
primitive_non_null/bloom_filter                    1.05    119.1±3.66ms   369.3 MB/sec    1.00    113.6±1.42ms   387.3 MB/sec
primitive_non_null/cdc                             1.01     94.7±2.01ms   464.5 MB/sec    1.00     94.1±0.85ms   467.6 MB/sec
primitive_non_null/default                         1.02     71.7±0.92ms   613.5 MB/sec    1.00     70.4±1.57ms   625.1 MB/sec
primitive_non_null/parquet_2                       1.01     92.6±1.78ms   475.0 MB/sec    1.00     92.1±1.71ms   477.9 MB/sec
primitive_non_null/zstd                            1.08    109.6±2.35ms   401.5 MB/sec    1.00    101.3±1.95ms   434.2 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    133.9±3.12ms   328.6 MB/sec    1.00    125.7±1.67ms   349.9 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     19.7±0.31ms     2.2 GB/sec    1.00     19.6±0.40ms     2.2 GB/sec
primitive_sparse_99pct_null/cdc                    1.01     36.6±0.40ms  1224.4 MB/sec    1.00     36.4±0.34ms  1232.4 MB/sec
primitive_sparse_99pct_null/default                1.00     17.4±0.13ms     2.5 GB/sec    1.00     17.5±0.08ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.5±0.20ms     2.5 GB/sec    1.00     17.4±0.21ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.8±0.22ms     2.1 GB/sec    1.00     20.7±0.22ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.02     19.5±0.23ms     2.2 GB/sec    1.00     19.2±0.23ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.6±0.57ms   419.8 MB/sec
short_string_non_null/cdc                                                                 1.00     20.6±0.19ms   583.7 MB/sec
short_string_non_null/default                                                             1.00     16.3±0.15ms   736.0 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.7±0.12ms   466.4 MB/sec
short_string_non_null/zstd                                                                1.00     35.7±0.17ms   335.9 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.6±0.09ms   419.8 MB/sec
string/bloom_filter                                1.10   251.2±24.68ms     2.0 GB/sec    1.00   228.0±11.12ms     2.2 GB/sec
string/cdc                                         1.03    229.0±6.26ms     2.2 GB/sec    1.00    222.9±3.10ms     2.3 GB/sec
string/default                                     1.00   129.3±18.07ms     4.0 GB/sec    1.08   139.5±18.20ms     3.7 GB/sec
string/parquet_2                                   1.00    115.7±6.41ms     4.4 GB/sec    1.01    117.2±6.37ms     4.4 GB/sec
string/zstd                                        1.00    430.3±4.32ms  1218.4 MB/sec    1.01    435.1±4.57ms  1204.9 MB/sec
string/zstd_parquet_2                              1.01    408.8±6.57ms  1282.4 MB/sec    1.00    404.0±4.27ms  1297.5 MB/sec
string_and_binary_view/bloom_filter                1.00     69.6±2.56ms   463.1 MB/sec    1.07     74.5±2.01ms   432.8 MB/sec
string_and_binary_view/cdc                         1.00     60.4±1.92ms   533.9 MB/sec    1.06     63.9±2.00ms   504.9 MB/sec
string_and_binary_view/default                     1.00     49.7±1.73ms   649.2 MB/sec    1.06     52.7±1.80ms   612.5 MB/sec
string_and_binary_view/parquet_2                   1.00     61.8±0.74ms   521.9 MB/sec    1.04     64.4±1.27ms   500.7 MB/sec
string_and_binary_view/zstd                        1.00     87.7±1.72ms   367.9 MB/sec    1.03     90.4±2.13ms   356.9 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     75.2±1.98ms   429.0 MB/sec    1.03     77.8±1.82ms   414.7 MB/sec
string_dictionary/bloom_filter                     1.00    105.6±4.49ms     2.4 GB/sec    1.01    107.1±6.42ms     2.4 GB/sec
string_dictionary/cdc                              1.00     58.1±3.00ms     4.4 GB/sec    1.01     58.8±2.69ms     4.4 GB/sec
string_dictionary/default                          1.00     51.5±3.29ms     5.0 GB/sec    1.06     54.5±2.39ms     4.7 GB/sec
string_dictionary/parquet_2                        1.01     57.3±1.42ms     4.5 GB/sec    1.00     56.9±1.67ms     4.5 GB/sec
string_dictionary/zstd                             1.00    217.1±3.84ms  1216.5 MB/sec    1.00    216.6±3.80ms  1219.7 MB/sec
string_dictionary/zstd_parquet_2                   1.00    203.5±1.88ms  1298.0 MB/sec    1.00    203.5±1.58ms  1298.1 MB/sec
string_non_null/bloom_filter                       1.06   304.3±22.07ms  1722.1 MB/sec    1.00   287.2±22.06ms  1824.4 MB/sec
string_non_null/cdc                                1.04    287.2±8.24ms  1824.2 MB/sec    1.00    275.0±6.68ms  1905.5 MB/sec
string_non_null/default                            1.01   154.7±13.00ms     3.3 GB/sec    1.00   153.4±12.86ms     3.3 GB/sec
string_non_null/parquet_2                          1.02    149.7±9.28ms     3.4 GB/sec    1.00    146.6±5.13ms     3.5 GB/sec
string_non_null/zstd                               1.07   586.0±12.00ms   894.2 MB/sec    1.00    549.4±7.08ms   953.8 MB/sec
string_non_null/zstd_parquet_2                     1.03    529.7±7.07ms   989.2 MB/sec    1.00    512.8±4.68ms  1021.9 MB/sec
struct_all_null/bloom_filter                       1.00    378.0±5.82µs    41.7 GB/sec    1.00    379.4±5.58µs    41.5 GB/sec
struct_all_null/cdc                                1.00      7.1±0.09ms     2.2 GB/sec    1.04      7.4±0.10ms     2.1 GB/sec
struct_all_null/default                            1.00    113.3±0.78µs   139.0 GB/sec    1.06    119.8±0.55µs   131.5 GB/sec
struct_all_null/parquet_2                          1.00    112.3±0.65µs   140.2 GB/sec    1.08    121.4±0.49µs   129.7 GB/sec
struct_all_null/zstd                               1.00    159.9±0.81µs    98.5 GB/sec    1.05    168.0±0.41µs    93.7 GB/sec
struct_all_null/zstd_parquet_2                     1.00    145.2±0.77µs   108.5 GB/sec    1.07    154.8±0.33µs   101.7 GB/sec
struct_non_null/bloom_filter                       1.02     49.5±1.75ms   323.5 MB/sec    1.00     48.3±1.28ms   331.5 MB/sec
struct_non_null/cdc                                1.01     46.8±0.92ms   341.7 MB/sec    1.00     46.2±0.67ms   346.7 MB/sec
struct_non_null/default                            1.04     33.7±0.85ms   474.2 MB/sec    1.00     32.5±0.55ms   492.2 MB/sec
struct_non_null/parquet_2                          1.02     42.0±0.76ms   381.3 MB/sec    1.00     41.3±0.56ms   387.8 MB/sec
struct_non_null/zstd                               1.03     42.6±0.44ms   375.4 MB/sec    1.00     41.6±0.36ms   385.0 MB/sec
struct_non_null/zstd_parquet_2                     1.03     57.1±0.71ms   280.3 MB/sec    1.00     55.5±0.54ms   288.1 MB/sec
struct_sparse_99pct_null/bloom_filter              1.06      7.1±0.15ms     2.2 GB/sec    1.00      6.7±0.25ms     2.4 GB/sec
struct_sparse_99pct_null/cdc                       1.01     13.9±0.20ms  1161.0 MB/sec    1.00     13.7±0.21ms  1173.3 MB/sec
struct_sparse_99pct_null/default                   1.03      6.3±0.09ms     2.5 GB/sec    1.00      6.1±0.17ms     2.6 GB/sec
struct_sparse_99pct_null/parquet_2                 1.04      6.4±0.05ms     2.5 GB/sec    1.00      6.1±0.18ms     2.6 GB/sec
struct_sparse_99pct_null/zstd                      1.01      7.7±0.04ms     2.0 GB/sec    1.00      7.7±0.06ms     2.1 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.03      7.2±0.08ms     2.2 GB/sec    1.00      7.0±0.15ms     2.3 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1965.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1910.0s
CPU sys 51.3s
Peak spill 0 B

branch

Metric Value
Wall time 2095.5s
Peak memory 6.8 GiB
Avg memory 6.7 GiB
CPU user 2056.4s
CPU sys 34.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

@etseidl after much profiling, debugging, etc. I've been able to get this to work with no performance impact (within noise). I recognize this is a non-trivial change but it introduces no public APIs and in theory if it is problematic in any way we can back out of it. The benefit of doing things this way is that we automatically patch buggy / problematic page size blowouts for everyone, without code changes needed on their end or guessing of column sizes necessary.

One thing we could do to derisk if you want: add a config option to disable this behavior.

Thanks for reviewing this, I hope we can make it work 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants