ci: add facility for benchmarking as part of CI#4745
Merged
Conversation
Make it so that ci test cases that set the gha variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script. The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download. Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty). I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that chagnes we want to make don't introduce performance regressions. Caveats to remember in the future: * Take it all with a big grain of salt, and watch the benchmark numbers for the trial-to-trial range of times -- wide variation means that the numbers probably can't be trusted. * The GH runners themselves may change without warning, so beware benchmark stability over time, or if they ever have pools of heterogeneous machine generations/configurations. * While my results indicate a decent amount of timing reliability for purely computational tests, I assume that there will be enormous run-to-run variation in anything involving I/O or networking. So this is unlikely to be a fruitful way of testing for performance regressions in image format I/O speed (but probably is useful for a variety of in-memory operations). * As we add more unit tests to what we benchmark in the future, keep an eye how much time we're spending running these benchmarks. A few minutes on a small subset of the test jobs is probably fine, but I wouldn't want it to make the overall wait for a full CI run to become substantially longer because of it. Signed-off-by: Larry Gritz <lg@larrygritz.com>
Collaborator
Author
|
Any objections or comments before I merge this? |
Collaborator
Author
|
Over a week in review, no objections, CI-only change ==> merge |
lgritz
added a commit
to lgritz/OpenImageIO
that referenced
this pull request
May 20, 2025
…ation#4745) Make it so that CI test cases that set the GHA variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script. The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download. Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty). I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that changes we want to make don't introduce performance regressions. Caveats to remember in the future: * Take it all with a big grain of salt, and watch the benchmark numbers for the trial-to-trial range of times -- wide variation means that the numbers probably can't be trusted. * The GH runners themselves may change without warning, so beware benchmark stability over time, or if they ever have pools of heterogeneous machine generations/configurations. * While my results indicate a decent amount of timing reliability for purely computational tests, I assume that there will be enormous run-to-run variation in anything involving I/O or networking. So this is unlikely to be a fruitful way of testing for performance regressions in image format I/O speed (but probably is useful for a variety of in-memory operations). * As we add more unit tests to what we benchmark in the future, keep an eye how much time we're spending running these benchmarks. A few minutes on a small subset of the test jobs is probably fine, but I wouldn't want it to make the overall wait for a full CI run to become substantially longer because of it. Signed-off-by: Larry Gritz <lg@larrygritz.com>
zachlewis
pushed a commit
to zachlewis/OpenImageIO
that referenced
this pull request
Aug 1, 2025
…ation#4745) Make it so that CI test cases that set the GHA variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script. The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download. Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty). I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that changes we want to make don't introduce performance regressions. Caveats to remember in the future: * Take it all with a big grain of salt, and watch the benchmark numbers for the trial-to-trial range of times -- wide variation means that the numbers probably can't be trusted. * The GH runners themselves may change without warning, so beware benchmark stability over time, or if they ever have pools of heterogeneous machine generations/configurations. * While my results indicate a decent amount of timing reliability for purely computational tests, I assume that there will be enormous run-to-run variation in anything involving I/O or networking. So this is unlikely to be a fruitful way of testing for performance regressions in image format I/O speed (but probably is useful for a variety of in-memory operations). * As we add more unit tests to what we benchmark in the future, keep an eye how much time we're spending running these benchmarks. A few minutes on a small subset of the test jobs is probably fine, but I wouldn't want it to make the overall wait for a full CI run to become substantially longer because of it. Signed-off-by: Larry Gritz <lg@larrygritz.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make it so that CI test cases that set the GHA variable "benchmark" to 1 will add a benchmarking step to the workflow that runs a new ci-benchmark.bash script.
The script runs selected unit tests containing benchmarks (currently, only image_span_test, but we can amend later as needed). Those designated tests are run, and their output both echoed to the log for that step and also put in build/benchmarks/TESTNAME and saved as a build artifact for optional download.
Most test cases will not turn benchmarking on -- it probably will end up adding a few minutes so do it very selectively (once per major platform or compiler version is plenty).
I would have previously guessed that any attempts at benchmarking on GHA runners was doomed, but in practice, I'm surprised to find that there's almost as much run-to-run consistency as I find doing casual benchmarks on my own machine. As such, I think this can be a handy way to do some rough benchmarking using CI, to compare platforms or compilers, or verify that changes we want to make don't introduce performance regressions.
Caveats to remember in the future: