OpenEXRCore Deep pixel unpacking optimisation#2049
Merged
cary-ilm merged 4 commits intoJun 13, 2025
Merged
Conversation
Breaking each case of the UNPACK_SAMPLES macro into seperate smaller macros that can be reused to avoid code duplication. Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com>
Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com>
The functions for doing Deep pixel unpacking, had two branch checks (switch statements) for checking the pixel and requested data type within the inner most loop that goes over the pixels of each line. This is definitely unnecessary, since the data types remain the same. However the compiler is not able to optimise this, and the generated assembly for the pixel for loop is massive since it contains all these branch checks. This commit fixes this issue by moving the two switch statements outside of the pixel for loop, which makes the compiler able to generate much more efficient assembly for the pixel unpacking operations. Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com>
5f1056b to
d513716
Compare
Contributor
|
@nikos-foundry - this looks largely fine - but did you experiment with splitting out entirely to separate C functions instead of the various switch statements? Might need a different inverted type of macro / boilerplate to make that easier, but wonder if the compiler does a bit better without the other switch statements in each functional unit... |
cary-ilm
approved these changes
Jun 13, 2025
Member
cary-ilm
left a comment
There was a problem hiding this comment.
We discussed this in the OpenEXR TSC meeting yesterday, good to go as is. Thanks!
cary-ilm
added a commit
that referenced
this pull request
Jul 23, 2025
* Refactor UNPACK_SAMPLES macro to allow code reuse. Breaking each case of the UNPACK_SAMPLES macro into seperate smaller macros that can be reused to avoid code duplication. Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com> * Refactor some common code out in a macro Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com> * Optimise EXRCore Deep pixel unpacking The functions for doing Deep pixel unpacking, had two branch checks (switch statements) for checking the pixel and requested data type within the inner most loop that goes over the pixels of each line. This is definitely unnecessary, since the data types remain the same. However the compiler is not able to optimise this, and the generated assembly for the pixel for loop is massive since it contains all these branch checks. This commit fixes this issue by moving the two switch statements outside of the pixel for loop, which makes the compiler able to generate much more efficient assembly for the pixel unpacking operations. Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com> --------- Signed-off-by: Nikolaos Koutsikos <nikolaos.koutsikos@foundry.com> Co-authored-by: Cary Phillips <cary@ilm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The functions for doing Deep pixel unpacking, had two branch checks (switch statements) for checking the pixel and requested data type within the inner most loop that goes over the pixels of each line. This is definitely unnecessary, since the data types remain the same. However the compiler is not able to optimise this, and the generated assembly for the pixel for loop is massive since it contains all these branch checks.
This PR fixes this issue by moving the two switch statements outside of the pixel for loop, which makes the compiler able to generate much more efficient assembly for the pixel unpacking operations.