Skip to content

fix(ci): trim benchmark full grid to fit daily run under 6h timeout#5905

Draft
Ma77Ball wants to merge 2 commits into
apache:mainfrom
Ma77Ball:fix/benchmark-daily-timeout
Draft

fix(ci): trim benchmark full grid to fit daily run under 6h timeout#5905
Ma77Ball wants to merge 2 commits into
apache:mainfrom
Ma77Ball:fix/benchmark-daily-timeout

Conversation

@Ma77Ball

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

  • Drop batchSize=10000 from the full-mode benchmark grid in ArrowFlightActorBench.scala, taking the daily sweep from 36 configs to 27 and removing the 9 heaviest configs (30-70 min each) that pushed the run past GitHub's 6h job ceiling.
  • Update the now-stale "36-config / ~50-60 min" comments to "27-config / ~40 min" in the bench source and benchmarks.yml.

Any related issues, documentation, discussions?

Closes: #5904

How was this PR tested?

  • Non-functional change (benchmark harness grid + CI comments); no shipped behavior and no unit test covers the bench grid contents.
  • CI timing verification: trigger the Benchmarks workflow via workflow_dispatch on this branch (the only non-schedule trigger that runs full mode) and confirm the Bench job finishes well under 6h (expected ~40-50 min including compile/setup), reaching the publish steps.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

@Ma77Ball

Copy link
Copy Markdown
Contributor Author

/request-review @Yicong-Huang

@github-actions github-actions Bot added fix ci changes related to CI labels Jun 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @Yicong-Huang
    You can notify them by mentioning @Yicong-Huang in a comment.

@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.10%. Comparing base (8803d08) to head (25183bb).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5905      +/-   ##
============================================
- Coverage     54.11%   54.10%   -0.02%     
+ Complexity     2819     2816       -3     
============================================
  Files          1103     1103              
  Lines         42650    42650              
  Branches       4588     4588              
============================================
- Hits          23079    23074       -5     
- Misses        18226    18230       +4     
- Partials       1345     1346       +1     
Flag Coverage Δ *Carryforward flag
access-control-service 70.44% <ø> (ø)
agent-service 34.36% <ø> (ø)
amber 55.61% <ø> (-0.04%) ⬇️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 57.35% <ø> (ø)
file-service 58.59% <ø> (ø)
frontend 48.12% <ø> (ø)
pyamber 90.20% <ø> (ø)
python 90.76% <ø> (ø) Carriedforward from f4efde6
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 3 worse · ⚪ 10 noise (<±5%) · 0 without baseline

Compared against main 8803d08 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 459 0.28 22,231/26,678/26,678 us 🔴 +15.5% / 🟢 -23.7%
🔴 bs=100 sw=10 sl=64 954 0.583 103,243/137,028/137,028 us 🔴 +11.5% / 🟢 -8.0%
bs=1000 sw=10 sl=64 1,098 0.67 911,702/983,266/983,266 us ⚪ within ±5% / 🟢 -6.3%
Baseline details

Latest main 8803d08 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 459 tuples/sec 473 tuples/sec 410.82 tuples/sec -3.0% +11.7%
bs=10 sw=10 sl=64 MB/s 0.28 MB/s 0.289 MB/s 0.251 MB/s -3.1% +11.7%
bs=10 sw=10 sl=64 p50 22,231 us 19,254 us 23,785 us +15.5% -6.5%
bs=10 sw=10 sl=64 p95 26,678 us 30,900 us 34,980 us -13.7% -23.7%
bs=10 sw=10 sl=64 p99 26,678 us 30,900 us 34,980 us -13.7% -23.7%
bs=100 sw=10 sl=64 throughput 954 tuples/sec 970 tuples/sec 891.94 tuples/sec -1.6% +7.0%
bs=100 sw=10 sl=64 MB/s 0.583 MB/s 0.592 MB/s 0.544 MB/s -1.5% +7.1%
bs=100 sw=10 sl=64 p50 103,243 us 104,302 us 112,277 us -1.0% -8.0%
bs=100 sw=10 sl=64 p95 137,028 us 122,844 us 139,802 us +11.5% -2.0%
bs=100 sw=10 sl=64 p99 137,028 us 122,844 us 139,802 us +11.5% -2.0%
bs=1000 sw=10 sl=64 throughput 1,098 tuples/sec 1,126 tuples/sec 1,041 tuples/sec -2.5% +5.5%
bs=1000 sw=10 sl=64 MB/s 0.67 MB/s 0.687 MB/s 0.635 MB/s -2.5% +5.4%
bs=1000 sw=10 sl=64 p50 911,702 us 887,392 us 972,714 us +2.7% -6.3%
bs=1000 sw=10 sl=64 p95 983,266 us 942,143 us 1,023,057 us +4.4% -3.9%
bs=1000 sw=10 sl=64 p99 983,266 us 942,143 us 1,023,057 us +4.4% -3.9%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,436.16,200,128000,459,0.280,22231.41,26677.54,26677.54
1,100,10,64,20,2095.47,2000,1280000,954,0.583,103242.50,137027.52,137027.52
2,1000,10,64,20,18210.89,20000,12800000,1098,0.670,911701.56,983265.63,983265.63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daily benchmark run times out at 6h, dashboard stops updating

2 participants