feat: add shuffle benchmark variants with native write config support#3226
Closed
andygrove wants to merge 4 commits into
Closed
feat: add shuffle benchmark variants with native write config support#3226andygrove wants to merge 4 commits into
andygrove wants to merge 4 commits into
Conversation
- Add `get_spark_configs()` method to base Benchmark class for benchmark-specific Spark configurations - Common Comet configs (enabled, logging) now defined in Python for jvm/native modes - Add shuffle benchmark variants: - shuffle-hash-native-write: hash shuffle with Comet native parquet writes - shuffle-hash-spark-write: hash shuffle with Spark parquet writes - shuffle-roundrobin-native-write: round-robin with native writes - shuffle-roundrobin-spark-write: round-robin with Spark writes - Add --print-configs CLI option to output benchmark configs - Refactor run_all_benchmarks.sh to use helper function - Exclude benchmarks/pyspark/** from CI test workflows Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3226 +/- ##
============================================
+ Coverage 56.12% 60.02% +3.89%
- Complexity 976 1429 +453
============================================
Files 119 170 +51
Lines 11743 15746 +4003
Branches 2251 2602 +351
============================================
+ Hits 6591 9451 +2860
- Misses 4012 4976 +964
- Partials 1140 1319 +179 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale
Run more benchmarks/variants.
What Changed?
get_spark_configs()method to base Benchmark class for benchmark-specific Spark configurationsshuffle-hash-native-write: hash shuffle with Comet native parquet writes enabledshuffle-hash-spark-write: hash shuffle with native writes disabled (uses Spark writer)shuffle-roundrobin-native-write: round-robin shuffle with native writes enabledshuffle-roundrobin-spark-write: round-robin shuffle with native writes disabled--print-configsCLI option to output benchmark-specific configsrun_all_benchmarks.shto use helper function and remove duplicated configsbenchmarks/pyspark/**from CI test workflows to avoid triggering tests for benchmark-only changesTest plan
python run_benchmark.py --list-benchmarksto verify new benchmarks are registeredpython run_benchmark.py --print-configs --benchmark shuffle-hash-native-write --mode nativeto verify config output./run_all_benchmarks.shto verify benchmarks execute correctly🤖 Generated with Claude Code