ci: add facility for benchmarking as part of CI by lgritz · Pull Request #4745 · AcademySoftwareFoundation/OpenImageIO

lgritz · 2025-05-10T16:58:35Z

Make it so that CI test cases that set the GHA variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script.

The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download.

Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty).

I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that changes we want to make don't introduce performance regressions.

Caveats to remember in the future:

Take it all with a big grain of salt, and watch the benchmark numbers for the trial-to-trial range of times -- wide variation means that the numbers probably can't be trusted.
The GH runners themselves may change without warning, so beware benchmark stability over time, or if they ever have pools of heterogeneous machine generations/configurations.
While my results indicate a decent amount of timing reliability for purely computational tests, I assume that there will be enormous run-to-run variation in anything involving I/O or networking. So this is unlikely to be a fruitful way of testing for performance regressions in image format I/O speed (but probably is useful for a variety of in-memory operations).
As we add more unit tests to what we benchmark in the future, keep an eye how much time we're spending running these benchmarks. A few minutes on a small subset of the test jobs is probably fine, but I wouldn't want it to make the overall wait for a full CI run to become substantially longer because of it.

Make it so that ci test cases that set the gha variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script. The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download. Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty). I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that chagnes we want to make don't introduce performance regressions. Caveats to remember in the future: * Take it all with a big grain of salt, and watch the benchmark numbers for the trial-to-trial range of times -- wide variation means that the numbers probably can't be trusted. * The GH runners themselves may change without warning, so beware benchmark stability over time, or if they ever have pools of heterogeneous machine generations/configurations. * While my results indicate a decent amount of timing reliability for purely computational tests, I assume that there will be enormous run-to-run variation in anything involving I/O or networking. So this is unlikely to be a fruitful way of testing for performance regressions in image format I/O speed (but probably is useful for a variety of in-memory operations). * As we add more unit tests to what we benchmark in the future, keep an eye how much time we're spending running these benchmarks. A few minutes on a small subset of the test jobs is probably fine, but I wouldn't want it to make the overall wait for a full CI run to become substantially longer because of it. Signed-off-by: Larry Gritz <lg@larrygritz.com>

lgritz · 2025-05-15T19:26:29Z

Any objections or comments before I merge this?

lgritz · 2025-05-19T16:47:53Z

Over a week in review, no objections, CI-only change ==> merge

…ation#4745) Make it so that CI test cases that set the GHA variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script. The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download. Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty). I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that changes we want to make don't introduce performance regressions. Caveats to remember in the future: * Take it all with a big grain of salt, and watch the benchmark numbers for the trial-to-trial range of times -- wide variation means that the numbers probably can't be trusted. * The GH runners themselves may change without warning, so beware benchmark stability over time, or if they ever have pools of heterogeneous machine generations/configurations. * While my results indicate a decent amount of timing reliability for purely computational tests, I assume that there will be enormous run-to-run variation in anything involving I/O or networking. So this is unlikely to be a fruitful way of testing for performance regressions in image format I/O speed (but probably is useful for a variety of in-memory operations). * As we add more unit tests to what we benchmark in the future, keep an eye how much time we're spending running these benchmarks. A few minutes on a small subset of the test jobs is probably fine, but I wouldn't want it to make the overall wait for a full CI run to become substantially longer because of it. Signed-off-by: Larry Gritz <lg@larrygritz.com>

lgritz added the build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration. label May 17, 2025

lgritz merged commit cc794f2 into AcademySoftwareFoundation:main May 19, 2025
32 checks passed

lgritz deleted the lg-cibench branch May 20, 2025 04:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add facility for benchmarking as part of CI#4745

ci: add facility for benchmarking as part of CI#4745
lgritz merged 1 commit into
AcademySoftwareFoundation:mainfrom
lgritz:lg-cibench

lgritz commented May 10, 2025

Uh oh!

lgritz commented May 15, 2025

Uh oh!

lgritz commented May 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lgritz commented May 10, 2025

Uh oh!

lgritz commented May 15, 2025

Uh oh!

lgritz commented May 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant