Skip to content

Commit 6bfe0ef

Browse files
LuciferYangdongjoon-hyun
authored andcommitted
[SPARK-56633][SQL][TESTS] Add comprehensive Parquet vectorized-reader benchmark coverage
### What changes were proposed in this pull request? Add benchmark coverage for the Parquet vectorized-read decode surface that has none today, plus extend the existing `VectorizedRleValuesReaderBenchmark` to its full public API: - **`ParquetVectorUpdaterBenchmark`** (new) — every `ParquetVectorUpdater` family obtained through `ParquetVectorUpdaterFactory.getUpdater`. Six groups: identity, type-converting, rebase, unsigned, decimal, FixedLenByteArray. - **`VectorizedDeltaReaderBenchmark`** (new) — all three delta decoders (`VectorizedDeltaBinaryPackedReader`, `VectorizedDeltaByteArrayReader`, `VectorizedDeltaLengthByteArrayReader`). Five groups covering bulk read/skip across value distributions and prefix-overlap shapes, plus single-value reads and byte/short/unsigned variants. - **`VectorizedPlainValuesReaderBenchmark`** (new) — every public read/skip method on `VectorizedPlainValuesReader`. Five groups: fixed-size bulk, conversion bulk (unsigned, with-rebase), variable-length, single-value, skip. - **`VectorizedRleValuesReaderBenchmark`** (extended) — three new groups: row-index-filtered reads (with-filter code path), single-value reads, skip paths. ### Why are the changes needed? `ParquetVectorUpdater` and the delta / plain decoders sit on the hot path of every Parquet column read but have no in-repo benchmark coverage. Coverage is intentionally broad — every public read/skip method is included even when it's already memcpy-optimal — so the result files track the long-term performance baseline and future iterative optimization does not have to add benchmark coverage as a precursor. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code Closes #55558 from LuciferYang/parquet-benchmark-coverage. Authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent e6ff38e commit 6bfe0ef

18 files changed

Lines changed: 2630 additions & 161 deletions
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
================================================================================================
2+
Identity Updaters
3+
================================================================================================
4+
5+
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
6+
AMD EPYC 7763 64-Core Processor
7+
Identity Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
8+
------------------------------------------------------------------------------------------------------------------------
9+
BooleanUpdater 0 0 0 16946.4 0.1 1.0X
10+
ByteUpdater (INT32 -> Byte) 0 0 0 3743.2 0.3 0.2X
11+
ShortUpdater (INT32 -> Short) 1 1 0 1676.4 0.6 0.1X
12+
IntegerUpdater 0 0 0 10258.9 0.1 0.6X
13+
LongUpdater 0 0 0 5140.3 0.2 0.3X
14+
FloatUpdater 0 0 0 10259.8 0.1 0.6X
15+
DoubleUpdater 0 0 0 5130.4 0.2 0.3X
16+
BinaryUpdater 15 15 0 70.4 14.2 0.0X
17+
18+
19+
================================================================================================
20+
Type-converting Updaters
21+
================================================================================================
22+
23+
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
24+
AMD EPYC 7763 64-Core Processor
25+
Type-converting Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
26+
---------------------------------------------------------------------------------------------------------------------------
27+
IntegerToLongUpdater 2 2 0 530.9 1.9 1.0X
28+
IntegerToDoubleUpdater 2 2 0 531.3 1.9 1.0X
29+
FloatToDoubleUpdater 2 2 0 489.7 2.0 0.9X
30+
DateToTimestampNTZUpdater 29 29 0 36.2 27.6 0.1X
31+
DowncastLongUpdater (INT64 -> Decimal(9,2)) 2 2 0 455.7 2.2 0.9X
32+
33+
34+
================================================================================================
35+
Rebase Updaters
36+
================================================================================================
37+
38+
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
39+
AMD EPYC 7763 64-Core Processor
40+
Rebase Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
41+
-------------------------------------------------------------------------------------------------------------------------------
42+
IntegerWithRebaseUpdater (DATE legacy) 0 0 0 3644.6 0.3 1.0X
43+
LongWithRebaseUpdater (TIMESTAMP_MICROS legacy) 0 0 0 2663.3 0.4 0.7X
44+
LongAsMicrosUpdater (TIMESTAMP_MILLIS) 2 3 0 420.0 2.4 0.1X
45+
46+
47+
================================================================================================
48+
Unsigned Updaters
49+
================================================================================================
50+
51+
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
52+
AMD EPYC 7763 64-Core Processor
53+
Unsigned Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
54+
-----------------------------------------------------------------------------------------------------------------------------
55+
UnsignedIntegerUpdater (UINT32 -> Long) 0 0 0 5974.2 0.2 1.0X
56+
UnsignedLongUpdater (UINT64 -> Decimal(20,0)) 17 18 0 60.3 16.6 0.0X
57+
58+
59+
================================================================================================
60+
Decimal Updaters
61+
================================================================================================
62+
63+
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
64+
AMD EPYC 7763 64-Core Processor
65+
Decimal Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
66+
------------------------------------------------------------------------------------------------------------------------
67+
IntegerToDecimalUpdater 0 0 0 10257.8 0.1 1.0X
68+
LongToDecimalUpdater 0 0 0 5133.7 0.2 0.5X
69+
FixedLenByteArrayToDecimalUpdater 21 21 0 50.2 19.9 0.0X
70+
71+
72+
================================================================================================
73+
FixedLenByteArray Updaters
74+
================================================================================================
75+
76+
OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
77+
AMD EPYC 7763 64-Core Processor
78+
FixedLenByteArray Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
79+
---------------------------------------------------------------------------------------------------------------------------------------
80+
FixedLenByteArrayUpdater (len=16 -> Binary) 20 21 1 51.5 19.4 1.0X
81+
FixedLenByteArrayAsIntUpdater (len=4 -> Decimal(9,2)) 7 7 0 160.1 6.2 3.1X
82+
FixedLenByteArrayAsLongUpdater (len=8 -> Decimal(18,4)) 8 8 0 133.2 7.5 2.6X
83+
84+
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
================================================================================================
2+
Identity Updaters
3+
================================================================================================
4+
5+
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1010-azure
6+
AMD EPYC 7763 64-Core Processor
7+
Identity Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
8+
------------------------------------------------------------------------------------------------------------------------
9+
BooleanUpdater 0 0 0 17177.7 0.1 1.0X
10+
ByteUpdater (INT32 -> Byte) 0 0 0 3680.4 0.3 0.2X
11+
ShortUpdater (INT32 -> Short) 1 1 0 1664.2 0.6 0.1X
12+
IntegerUpdater 0 0 0 10311.6 0.1 0.6X
13+
LongUpdater 0 0 0 5153.5 0.2 0.3X
14+
FloatUpdater 0 0 0 10313.6 0.1 0.6X
15+
DoubleUpdater 0 0 0 5157.8 0.2 0.3X
16+
BinaryUpdater 16 16 0 67.6 14.8 0.0X
17+
18+
19+
================================================================================================
20+
Type-converting Updaters
21+
================================================================================================
22+
23+
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1010-azure
24+
AMD EPYC 7763 64-Core Processor
25+
Type-converting Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
26+
---------------------------------------------------------------------------------------------------------------------------
27+
IntegerToLongUpdater 2 2 0 454.8 2.2 1.0X
28+
IntegerToDoubleUpdater 2 2 0 454.5 2.2 1.0X
29+
FloatToDoubleUpdater 2 2 0 483.4 2.1 1.1X
30+
DateToTimestampNTZUpdater 29 29 0 36.6 27.3 0.1X
31+
DowncastLongUpdater (INT64 -> Decimal(9,2)) 2 2 0 455.5 2.2 1.0X
32+
33+
34+
================================================================================================
35+
Rebase Updaters
36+
================================================================================================
37+
38+
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1010-azure
39+
AMD EPYC 7763 64-Core Processor
40+
Rebase Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
41+
-------------------------------------------------------------------------------------------------------------------------------
42+
IntegerWithRebaseUpdater (DATE legacy) 0 0 0 3668.8 0.3 1.0X
43+
LongWithRebaseUpdater (TIMESTAMP_MICROS legacy) 0 0 0 2671.2 0.4 0.7X
44+
LongAsMicrosUpdater (TIMESTAMP_MILLIS) 3 3 0 371.3 2.7 0.1X
45+
46+
47+
================================================================================================
48+
Unsigned Updaters
49+
================================================================================================
50+
51+
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1010-azure
52+
AMD EPYC 7763 64-Core Processor
53+
Unsigned Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
54+
-----------------------------------------------------------------------------------------------------------------------------
55+
UnsignedIntegerUpdater (UINT32 -> Long) 0 0 0 6344.0 0.2 1.0X
56+
UnsignedLongUpdater (UINT64 -> Decimal(20,0)) 18 18 0 59.3 16.9 0.0X
57+
58+
59+
================================================================================================
60+
Decimal Updaters
61+
================================================================================================
62+
63+
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1010-azure
64+
AMD EPYC 7763 64-Core Processor
65+
Decimal Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
66+
------------------------------------------------------------------------------------------------------------------------
67+
IntegerToDecimalUpdater 0 0 0 10280.2 0.1 1.0X
68+
LongToDecimalUpdater 0 0 0 5153.3 0.2 0.5X
69+
FixedLenByteArrayToDecimalUpdater 21 21 0 50.6 19.8 0.0X
70+
71+
72+
================================================================================================
73+
FixedLenByteArray Updaters
74+
================================================================================================
75+
76+
OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1010-azure
77+
AMD EPYC 7763 64-Core Processor
78+
FixedLenByteArray Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
79+
---------------------------------------------------------------------------------------------------------------------------------------
80+
FixedLenByteArrayUpdater (len=16 -> Binary) 21 21 1 50.5 19.8 1.0X
81+
FixedLenByteArrayAsIntUpdater (len=4 -> Decimal(9,2)) 7 7 0 152.6 6.6 3.0X
82+
FixedLenByteArrayAsLongUpdater (len=8 -> Decimal(18,4)) 8 8 0 127.7 7.8 2.5X
83+
84+
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
================================================================================================
2+
Identity Updaters
3+
================================================================================================
4+
5+
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
6+
AMD EPYC 7763 64-Core Processor
7+
Identity Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
8+
------------------------------------------------------------------------------------------------------------------------
9+
BooleanUpdater 0 0 0 14625.7 0.1 1.0X
10+
ByteUpdater (INT32 -> Byte) 0 0 0 3672.0 0.3 0.3X
11+
ShortUpdater (INT32 -> Short) 1 1 0 2053.4 0.5 0.1X
12+
IntegerUpdater 0 0 0 10284.1 0.1 0.7X
13+
LongUpdater 0 0 0 5132.8 0.2 0.4X
14+
FloatUpdater 0 0 0 10257.9 0.1 0.7X
15+
DoubleUpdater 0 0 0 5097.0 0.2 0.3X
16+
BinaryUpdater 15 15 1 70.3 14.2 0.0X
17+
18+
19+
================================================================================================
20+
Type-converting Updaters
21+
================================================================================================
22+
23+
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
24+
AMD EPYC 7763 64-Core Processor
25+
Type-converting Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
26+
---------------------------------------------------------------------------------------------------------------------------
27+
IntegerToLongUpdater 2 2 0 454.5 2.2 1.0X
28+
IntegerToDoubleUpdater 2 2 0 478.3 2.1 1.1X
29+
FloatToDoubleUpdater 2 2 0 480.2 2.1 1.1X
30+
DateToTimestampNTZUpdater 36 36 0 29.5 33.9 0.1X
31+
DowncastLongUpdater (INT64 -> Decimal(9,2)) 2 2 0 455.3 2.2 1.0X
32+
33+
34+
================================================================================================
35+
Rebase Updaters
36+
================================================================================================
37+
38+
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
39+
AMD EPYC 7763 64-Core Processor
40+
Rebase Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
41+
-------------------------------------------------------------------------------------------------------------------------------
42+
IntegerWithRebaseUpdater (DATE legacy) 0 0 0 2651.7 0.4 1.0X
43+
LongWithRebaseUpdater (TIMESTAMP_MICROS legacy) 0 1 0 2101.9 0.5 0.8X
44+
LongAsMicrosUpdater (TIMESTAMP_MILLIS) 2 2 0 454.6 2.2 0.2X
45+
46+
47+
================================================================================================
48+
Unsigned Updaters
49+
================================================================================================
50+
51+
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
52+
AMD EPYC 7763 64-Core Processor
53+
Unsigned Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
54+
-----------------------------------------------------------------------------------------------------------------------------
55+
UnsignedIntegerUpdater (UINT32 -> Long) 1 1 0 1093.3 0.9 1.0X
56+
UnsignedLongUpdater (UINT64 -> Decimal(20,0)) 18 18 0 59.1 16.9 0.1X
57+
58+
59+
================================================================================================
60+
Decimal Updaters
61+
================================================================================================
62+
63+
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
64+
AMD EPYC 7763 64-Core Processor
65+
Decimal Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
66+
------------------------------------------------------------------------------------------------------------------------
67+
IntegerToDecimalUpdater 0 0 0 10263.1 0.1 1.0X
68+
LongToDecimalUpdater 0 0 0 5133.0 0.2 0.5X
69+
FixedLenByteArrayToDecimalUpdater 21 21 0 51.0 19.6 0.0X
70+
71+
72+
================================================================================================
73+
FixedLenByteArray Updaters
74+
================================================================================================
75+
76+
OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
77+
AMD EPYC 7763 64-Core Processor
78+
FixedLenByteArray Updaters: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
79+
---------------------------------------------------------------------------------------------------------------------------------------
80+
FixedLenByteArrayUpdater (len=16 -> Binary) 19 19 0 54.8 18.3 1.0X
81+
FixedLenByteArrayAsIntUpdater (len=4 -> Decimal(9,2)) 7 7 0 160.2 6.2 2.9X
82+
FixedLenByteArrayAsLongUpdater (len=8 -> Decimal(18,4)) 9 9 0 123.3 8.1 2.3X
83+
84+

0 commit comments

Comments
 (0)