Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample]#4994
Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample]#4994ssh4net wants to merge 2 commits intoAcademySoftwareFoundation:mainfrom
Conversation
|
I suspect you used LLM for some of this? Which is fine, but I think you should document in the PR description (commit comment) which tool you used and for what parts. |
| template<class Rtype, class Atype, class Btype> | ||
| static bool | ||
| add_impl_hwy(ImageBuf& R, const ImageBuf& A, const ImageBuf& B, ROI roi, | ||
| int nthreads) | ||
| { |
There was a problem hiding this comment.
I haven't done a line-by-line comparison, but it seems to me that the only difference between add_impl_hwy, sub_impl_hwy, and mul_impl_hwy is likely going to be
[](auto d, auto a, auto b) { return hn::Add(a, b); }
versus that one lambda changing for Sub and Mul.
I would love for even the initial commit to reduce this whole thing to a shared hwy_binary_perpixel_op() template that takes the lambda housing the op kernel as a templated parameter.
| // Process pixel by pixel (scalar fallback for strided channels) | ||
| for (int x = roi.xbegin; x < roi.xend; ++x) { | ||
| Rtype* r_ptr = ChannelPtr<Rtype>(Rv, x, y, roi.chbegin); | ||
| const Atype* a_ptr = ChannelPtr<Atype>(Av, x, y, | ||
| roi.chbegin); | ||
| const Btype* b_ptr = ChannelPtr<Btype>(Bv, x, y, | ||
| roi.chbegin); |
There was a problem hiding this comment.
I think we should benchmark the strided case and see how it compares to the contiguous case and the full scalar fallback that we've always had. If there is no big speed gain, I would be in favor of eliminating this whole clause and let non-contiguous strides use the old scalar path, then there is much less template expansion for hwy in the cases where there is not a large gain to be had. Note that this means that the "to hwy or not to hwy" test would need to test contiguity in addition to just localpixels().
|
@ssh4net It's been a while since this PR has been updated, but after your last push, it's failing to build. Can you please rebase on main, fix so it passes CI, and ensure that there is a DCO sign-off on each commit? I would like to proceed with this in some form. |
|
@lgritz sure! Give me a bit of time. I have added some fixes based on discussion above, but not verified fully yet, and switched to other projects 😅 |
|
@lgritz, please check. lateast code. It now should support SIMD path for RGB ROI in case of RGBA inputs. MAD-Specific Note test tool in the archive And I see CL errors in this repo when in my repo all of them run successfully. |
|
Small error in the maximum error 1/65536 instead of the 1/65535. This is really MAD. |
Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub • mul/div • mad • resample Adds CMake and build system support, new implementation helpers, and developer documentation. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
|
@ssh4net I'm sorry, I think I didn't realize how big a change you made after my last comments. I actually came back to this today thinking that maybe I'd pitch in and do some of the refactoring that I suggested, but... you had already done it! I really like what you've done here, it's much simpler at the IBA call sites than last time I viewed your earlier drafts. There are just a handful of tests failing, and I'm looking into it. I will update you soon, but I wanted you to know that this is still a very active PR and I do expect us to merge this with only a few other minor changes. |
|
@lgritz sure, no problem. |
|
Would you mind if I squashed these, rebased on main, fix the conflicts, and force push the results so we have a clean single commit to build on? I know why the tests are failing. (At least some of them.) |
|
@lgritz sure, do what you think is needed. |
* Make cmake option OIIO_USE_HWY able to be overridden by end (by using `set_option()`) * Let env variable OPENIMAGEIO_ENABLE_HWY override runtime hwy enabling. (It was documented to do so, but not yet implemented.) * Clarify that hwy support will be added for 3.2, not 3.1. And that for right now, it defaults to OFF until we have worked out all the bugs and gain more confidence in it. The intention is for it to be enabled by default in time for the release of OIIO 3.2. * When hwy is disabled at build time, don't allow attribute() to appear to enable it. Signed-off-by: Larry Gritz <lg@larrygritz.com>
|
I squashed, rebased (fixing conflicts), and pushed, so now this is directly versus the current main. I made the following additions so far, amended to your work:
Note in particular that I've changed the default behavior to
I believe that makes it totally safe, even with any current flaws, because people need to take action to build it, and also to run it even if it was enabled at build time. The intent, of course, is to switch both of those defaults to "enable" by the time we ship 3.2, but we have some more work and fixes to do first. I think that with this, I would like to propose that we merge what we have here (assuming that CI passes), and then I will have some additional fixes coming shortly. But then with this merged, all the changes from here on out should be small and easy to review. |
|
Like I said, the default as I have it here now is that it's disabled (making bugs harmless until we fix them). I think that after merging, the remaining tasks are as follows:
At that point, I think we an return to looking at other IBA functions and trying to figure out which cases have benefits to hwy code paths and enhance them one by one. (With some attention to a priority that makes sense -- a combination of which look most promising for SIMD speedups and which are the most commonly used operations.) But I'm gating this all on your approving the small number of changes from me that I attached here; I don't want to merge them into your PR without you approving it. If we are both satisfied with what we have here for the moment, then I will merge and move on to the other items listed above. |
|
I will take another look at the code this week. |
Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub
• mul/div
• mad
• resample
Adds CMake and build system support, new implementation helpers, and developer documentation.
Code mostly wrote using frontier Opus4.5 and Codex GPT5.2 High models with a strict rules.
Checklist:
behavior.
testsuite.
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
fixed any problems reported by the clang-format CI test.
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.