Skip to content

Commit 3873b95

Browse files
committed
feat: add standalone shuffle benchmark binary for profiling
Add a `shuffle_bench` binary that benchmarks shuffle write and read performance independently from Spark, making it easy to profile with tools like `cargo flamegraph`, `perf`, or `instruments`. Supports reading Parquet files (e.g. TPC-H/TPC-DS) or generating synthetic data with configurable schema. Covers different scenarios including compression codecs, partition counts, partitioning schemes, and memory-constrained spilling.
1 parent 1afa8ea commit 3873b95

3 files changed

Lines changed: 816 additions & 2 deletions

File tree

native/Cargo.lock

Lines changed: 86 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

native/core/Cargo.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ url = { workspace = true }
7272
aws-config = { workspace = true }
7373
aws-credential-types = { workspace = true }
7474
parking_lot = "0.12.5"
75+
clap = { version = "4", features = ["derive"] }
7576
datafusion-comet-objectstore-hdfs = { path = "../hdfs", optional = true, default-features = false, features = ["hdfs"] }
7677
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls-native-roots", "http2"] }
7778
object_store_opendal = {version = "0.55.0", optional = true}
@@ -113,6 +114,10 @@ name = "comet"
113114
# "rlib" is for benchmarking with criterion.
114115
crate-type = ["cdylib", "rlib"]
115116

117+
[[bin]]
118+
name = "shuffle_bench"
119+
path = "src/bin/shuffle_bench.rs"
120+
116121
[[bench]]
117122
name = "parquet_read"
118123
harness = false

0 commit comments

Comments
 (0)