Skip to content

perf(sumcheck): replace par_bridge with par_chunks/par_iter (-7.1% big)#2729

Open
tamirhemo wants to merge 2 commits into
tamir/perf-mimallocfrom
tamir/perf-remove-par-bridge
Open

perf(sumcheck): replace par_bridge with par_chunks/par_iter (-7.1% big)#2729
tamirhemo wants to merge 2 commits into
tamir/perf-mimallocfrom
tamir/perf-remove-par-bridge

Conversation

@tamirhemo
Copy link
Copy Markdown
Contributor

Summary

  • Replace par_bridge() with par_chunks_mut() / into_par_iter() in three hot-path files
  • par_bridge uses a mutex internally to pull from sequential iterators, causing lock contention
  • The replacements use rayon's native work-stealing without a mutex

Changed files

  • slop/crates/multilinear/src/restrict.rsmle_fix_last_variable
  • slop/crates/jagged/src/poly.rseval and partial_jagged_little_polynomial_evaluation
  • crates/hypercube/src/prover/zerocheck/sum_as_poly.rs — zerocheck sum accumulation

Benchmark results (on top of E1 + E2)

Workload Before After Delta
fib 26,504ms 24,694ms -6.8%
keccak 34,845ms 32,207ms -7.6%
big 37,899ms 35,215ms -7.1%

Cumulative improvement (E1+E2+E8 vs original baseline)

Workload Original Current Total Delta
fib 35,380ms 24,694ms -30.2%
keccak 42,847ms 32,207ms -24.8%
big 47,697ms 35,215ms -26.2%

Stack

  1. feat(bench): add CPU prover benchmark harness and baseline profiling #2726 — bench infra
  2. perf(prover): default rayon to available_parallelism #2727 — E1: physical cores (-19.8%)
  3. perf(bench): add optional mimalloc allocator for sp1-perf (-7.2% fib) #2728 — E2: mimalloc (-7.2% fib)
  4. This PR — E8: remove par_bridge (-7.1%)

Test plan

  • cargo test --release -p slop-multilinear — 0 tests (no test coverage for restrict.rs)
  • cargo test --release -p slop-jagged — 13 passed
  • cargo test --release -p sp1-hypercube — 14 passed
  • cargo test --release -p sp1-prover test_e2e_node — full e2e (long)

🤖 Generated with Claude Code

@tamirhemo tamirhemo force-pushed the tamir/perf-remove-par-bridge branch from 469b262 to 6cd6235 Compare April 20, 2026 19:23
@tamirhemo tamirhemo force-pushed the tamir/perf-mimalloc branch from 5c7edc7 to 86c3f39 Compare April 20, 2026 19:23
tamirhemo and others added 2 commits April 21, 2026 16:32
par_bridge uses a mutex internally to pull items from a sequential
iterator, causing lock contention under high parallelism. Replace with
par_chunks_mut (for indexed array access) and into_par_iter (for range
iteration), which use rayon's native work-stealing without a mutex.

Changed files:
- slop/crates/multilinear/src/restrict.rs — mle_fix_last_variable
- slop/crates/jagged/src/poly.rs — eval and partial_jagged_polynomial
- crates/hypercube/src/prover/zerocheck/sum_as_poly.rs — zerocheck sum

Benchmark results (on top of E1 physical-cores + E2 mimalloc):
  fib:    26,504ms → 24,694ms  (-6.8%)
  keccak: 34,845ms → 32,207ms  (-7.6%)
  big:    37,899ms → 35,215ms  (-7.1%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tamirhemo tamirhemo force-pushed the tamir/perf-remove-par-bridge branch from 6cd6235 to 6942ec5 Compare April 21, 2026 16:34
@tamirhemo tamirhemo force-pushed the tamir/perf-mimalloc branch from 86c3f39 to 07c5d15 Compare April 21, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant