DynamicPPL’s current benchmarks are fairly noisy: ratios vary substantially across runs. The current setup also compares the same set of models on both main and a PR branch, which makes benchmarks hard to run locally.
It would be useful to improve the benchmark setup in two ways:
- Reduce run-to-run noise (ie, run all experiments in a single CI job)
- Introduce an external baseline, such as Stan, and report benchmark ratios relative to Stan’s log density and gradient evaluations rather than relative to main.
Concrete suggestions
- Report primal in absolute time
- Report gradient / primal
Reference: chalk-lab/Mooncake.jl#1163 (comment)
DynamicPPL’s current benchmarks are fairly noisy: ratios vary substantially across runs. The current setup also compares the same set of models on both
mainand a PR branch, which makes benchmarks hard to run locally.It would be useful to improve the benchmark setup in two ways:
Concrete suggestions
Reference: chalk-lab/Mooncake.jl#1163 (comment)