exrmetrics print disk size of part#2424
Conversation
…ultipart EXR to be printed as one of the exrmetrics stats. The current method only works when no output file is specified because we use the write position of the dummy output stream to derive bytes written per part. Added flag to enable part size printout. Signed-off-by: Aries Moczar <arcmantis@protulae.com>
|
I have been trying to determine the accuracy of using memory output stream as a determination of part size. I decided to process the
output: I was expecting a rough similarity between For example, parts 0,1,2,4,5,6,7,8 were all writing 10512 bytes more than 'total raw size' and parts 3, 9 were writing 9228 bytes more than 'total raw size'. My only guess so far is that these extra bytes are the metadata of the scanline blocks, but I would appreciate any insight. As I see it, the disk size reports are inaccurate (in the sense that they don't report the raw disk size of a part) unless we can somehow compute the chunk metadata sizes to subtract from the final result. It does however, accurately report the actual disk size of each part within the EXR file layout. |
|
The 'raw size' is supposed to be just the data required for the pixels themselves (how much memory you'd require to store every channel of every pixel in the image in its native type) not the amount of space that takes on disk, even if no compression is used. Each chunk also contains a header. That's between 8 bytes per chunk (for a single part scanline file) and 40 bytes per chunk (for a deeptile part in a multipart file). The file also contains a preamble, and one header and chunk offset table per part. The bytes taken up by the chunk headers will be factored into your "size on disk" but the chunk table won't be. That will cause a discrepancy between the total file size on disk and the total size of all parts. For scanline parts, the number of chunks is dependent on the compression scheme, so the discrepancy will be different. I think that's fine as it is, but perhaps a slightly longer description of what --part-disk-size does could help. |
|
I guess this really is the best that can be done in this case. As long as the user understands that this is not a "pure" part size metric (either packed or unpacked) and more of a part size + chunk headers on disk metric. Despite this, it still gives a mostly accurate look at the impact different compression algorithms have on individual part data. I'll clean it up then. Thanks for the feedback! |
Signed-off-by: Aries Moczar <arcmantis@protulae.com>
peterhillman
left a comment
There was a problem hiding this comment.
I think that's nearly there!
I suggested an update to the help (and to the webpage too) that might be less open to misinterpretation. (It could be understood that you run exrmetrics input.exr --part-disk-size output.exr instead of exrmetrics inputs.exr -o output.exr)
It would also be useful to update the parse() function to error if -o and --part-disk-size are used together.
Also, src/test/bin/test_exrmetrics.py should be updated to verify that the output is valid JSON when --part-disk-size is used
…ror when user attempts to use --part-disk-size and -o or vice versa in command. Added test in test_exrmetrics.py to verify json ouput for --part-disk-size. Signed-off-by: Aries Moczar <arcmantis@protulae.com>
peterhillman
left a comment
There was a problem hiding this comment.
Thanks! That looks good
252feab
into
AcademySoftwareFoundation:main
* Adding feature which allows the disk sizes of induvidual parts in a multipart EXR to be printed as one of the exrmetrics stats. The current method only works when no output file is specified because we use the write position of the dummy output stream to derive bytes written per part. Added flag to enable part size printout. Signed-off-by: Aries Moczar <arcmantis@protulae.com> * Improved help description. Appended to website/bin/exrmetrics.rst Signed-off-by: Aries Moczar <arcmantis@protulae.com> * Improved help message to be more clear. Modified parse() to report error when user attempts to use --part-disk-size and -o or vice versa in command. Added test in test_exrmetrics.py to verify json ouput for --part-disk-size. Signed-off-by: Aries Moczar <arcmantis@protulae.com> --------- Signed-off-by: Aries Moczar <arcmantis@protulae.com> Co-authored-by: Cary Phillips <cary@ilm.com>
WIP for #2031 which requested that exrmetrics have the capability to print the packed size of each part's data in a multipart EXR image. Initially problematic as this information is not readily available by default, however, @peterhillman suggested I use the writing position of the memory output stream to compute the bytes written per part. These stats can only be printed when no output file has been specified as this causes a memory output stream to be used.
In my work so far, I have created a flag
--part-size-diskwhich enables printing of the part sizes. The write position of the memory output stream is used to compute how many bytes were written to the stream per image part and the results are stored in fileMetrics struct for printing later.Though the issue specified that exrmetrics should print the packed/compressed data, I have opted to print this data as the 'size on disk' because I believe that is a more accurate and useful metric.
The eventual name and description of the flag is of course open for discussion, I just needed to get the ball rolling. I will also make other syntax style corrections later, after the main issue is worked out.