Stabilize benchmarks#8651
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8651) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (74ms) : 70, 79
master - mean (73ms) : 70, 77
section Bailout
This PR (8651) - mean (77ms) : 75, 79
master - mean (77ms) : 75, 79
section CallTarget+Inlining+NGEN
This PR (8651) - mean (1,105ms) : 1043, 1167
master - mean (1,099ms) : 1055, 1142
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (114ms) : 108, 120
master - mean (114ms) : 111, 118
section Bailout
This PR (8651) - mean (114ms) : 110, 118
master - mean (114ms) : 112, 116
section CallTarget+Inlining+NGEN
This PR (8651) - mean (789ms) : 760, 819
master - mean (784ms) : 753, 816
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (104ms) : 97, 110
master - mean (102ms) : 97, 106
section Bailout
This PR (8651) - mean (104ms) : 99, 108
master - mean (102ms) : 98, 106
section CallTarget+Inlining+NGEN
This PR (8651) - mean (950ms) : 903, 998
master - mean (940ms) : 906, 974
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (99ms) : 96, 103
master - mean (101ms) : 97, 106
section Bailout
This PR (8651) - mean (103ms) : 98, 108
master - mean (102ms) : 97, 106
section CallTarget+Inlining+NGEN
This PR (8651) - mean (819ms) : 779, 859
master - mean (819ms) : 783, 856
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (201ms) : 196, 205
master - mean (200ms) : 195, 204
section Bailout
This PR (8651) - mean (205ms) : 201, 209
master - mean (203ms) : 198, 208
section CallTarget+Inlining+NGEN
This PR (8651) - mean (1,202ms) : 1154, 1251
master - mean (1,202ms) : 1155, 1249
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (290ms) : 283, 297
master - mean (290ms) : 282, 298
section Bailout
This PR (8651) - mean (291ms) : 286, 297
master - mean (291ms) : 284, 297
section CallTarget+Inlining+NGEN
This PR (8651) - mean (967ms) : 943, 991
master - mean (962ms) : 942, 982
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (280ms) : 274, 287
master - mean (281ms) : 275, 288
section Bailout
This PR (8651) - mean (280ms) : 275, 286
master - mean (280ms) : 274, 285
section CallTarget+Inlining+NGEN
This PR (8651) - mean (1,159ms) : 1119, 1200
master - mean (1,155ms) : 1118, 1193
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8651) - mean (281ms) : 274, 287
master - mean (277ms) : 270, 284
section Bailout
This PR (8651) - mean (280ms) : 273, 287
master - mean (278ms) : 270, 285
section CallTarget+Inlining+NGEN
This PR (8651) - mean (1,041ms) : 993, 1089
master - mean (1,041ms) : 999, 1082
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-05-18 10:52:19 Comparing candidate commit 27e750a in PR branch Some scenarios are present only in baseline or only in candidate runs. If you didn't create or remove some scenarios in your branch, this maybe a sign of crashed benchmarks 💥💥💥 Scenarios present only in baseline:
Found 5 performance improvements and 2 performance regressions! Performance is the same for 54 metrics, 11 unstable metrics, 89 known flaky benchmarks, 37 flaky benchmarks without significant changes.
|
Summary of changes
Partially revert the methodology changes from #8559 that introduced .NET 6-specific instability in the microbenchmarks:
--launchCountfrom5back to10.cpus_per_itemfrom1back to2on all four batches (trace,trace-unstable,otel-instr-api,otel-api).0xE00000(3 CPUs, bits 21–23) to0x3F000000(6 CPUs, bits 24–29) socpus_per_item: 2still fits the 3 OTel items in a single parallel wave instead of serializing them.The new OpenTelemetry batches added in #8559 are retained.
Reason for change
Since #8559 merged, the microbenchmarks dashboard showed two distinct .NET 6-only regressions:
Root cause is the combination of
cpus_per_item: 1andlaunchCount: 5Implementation details
The two levers target different symptoms:
cpus_per_item: 2addresses the systematic shift by giving the runtime a second core for background work.launchCount: 10addresses the variance by tightening confidence intervals.Test coverage
Other details
Related: #8559 (the PR being partially reverted).