Summary
When OpenJPH is used on a platform without an AVX2/AVX512 codeblock encoder
(Apple Silicon arm64, arm64 Linux, embedded ARM, x86 without AVX2), running
the scalar ojph_encode_codeblock32 path can crash deterministically on
specific image content. The crash is EXC_BAD_ACCESS / SIGSEGV inside
mem_elastic_allocator::get_buffer, where cur_store->available is read from
a cur_store pointer whose bytes have been overwritten with raw J2K coded
data.
The actual encoder writes are correctly bounded by needed_bytes — we
verified this experimentally. The corruption is intra-chunk aliasing:
some code path writes past one coded_lists slot's buf + needed_bytes,
landing in the next slot's coded_lists header (since slots are packed
back-to-back inside one 1 MB store chunk). Once that header is corrupt, the
next get_buffer dereferences garbage and crashes.
Reproducer
OpenJPH 0.26.3 (also reproduces with 0.27.0). Running OpenEXR 3.4.11 with
-DOPENEXR_FORCE_INTERNAL_OPENJPH=ON on Apple Silicon (M-series Mac, native
arm64 binary, not Rosetta), encoding an HT/J2K-compressed EXR with certain
image content. Same content does not crash on x86_64 Macs / Windows where
the AVX2 codeblock encoder is selected.
Stack at crash:
* thread #99, stop reason = EXC_BAD_ACCESS (code=1, address=0x3cdacdca26dacdc9)
* frame #0: libOpenEXRCore...ojph::mem_elastic_allocator::get_buffer + 204
frame #1: libOpenEXRCore...ojph::local::ojph_encode_codeblock32 + 12180
frame #2: libOpenEXRCore...ojph::local::codeblock::encode + 156
frame #3: libOpenEXRCore...ojph::local::subband::push_line + 140
frame #4: libOpenEXRCore...ojph::local::resolution::push_line + 1408
frame #5: libOpenEXRCore...ojph::local::tile::push + 840
frame #6: libOpenEXRCore...ojph::local::codestream::exchange + 96
frame #7: libOpenEXRCore...internal_exr_apply_ht + 1256
The crash address 0x3cdacdca26dacdc9 is non-canonical and the byte pattern
..dacdc9.. repeats J2K-stream-looking content — i.e. raw codestream bytes
are sitting where a heap pointer should be.
Diagnostic experiments
We confirmed root cause direction via two builds of OpenJPH:
1. Guard-page allocator (rules out encoder overrun)
Replaced mem_elastic_allocator::get_buffer to mmap two pages per call and
mprotect the trailing page to PROT_READ, positioning coded->buf so that
coded->buf[needed_bytes] lands exactly on the guard page boundary.
If ojph_encode_codeblock32, vlc_encode, ms_encode, or any termination
function wrote a byte past index needed_bytes - 1, the guard page faults
immediately with the exact stack of the offending write.
Result: clean render, no fault. The encoder's own writes to coded->buf
are correctly bounded — the memcpys of ms.buf/mel.buf/vlc.buf into
coded->buf at the end of ojph_encode_codeblock32 total exactly
mel.pos + vlc.pos + ms.pos = needed_bytes bytes by construction.
2. Disable avail-list chunk reuse only (rules out cross-restart aliasing)
Kept the original packed chunk layout, but made mem_elastic_allocator::allocate
ignore the avail list — every chunk request is a fresh malloc, no reuse
across restart() boundaries.
Result: still crashes, same fingerprint. So the bug isn't cross-restart
stale pointers; it's intra-chunk aliasing during a single encode round.
3. Per-slot malloc (workaround that fixes it)
Replaced get_buffer to malloc a dedicated stores_list per call sized
to fit only that slot — no packing inside 1 MB chunks. Slots remain chained
via next_store for the destructor's batch free.
Result: 4 successive full renders complete cleanly, decoded output matches
PIZ/ZIP renders of the same scene. This is what we're shipping locally
as a workaround.
Hypothesis on actual root cause
Given (1) confirms the encoder doesn't overrun and (3) confirms isolating
slots eliminates the symptom, the bug is a write past one slot's
needed_bytes that lands in the next packed slot's coded_lists header.
Candidates we considered but couldn't conclusively pin down:
bit_write_buf::ccl (used for packet headers via bb_put_bit /
bb_expand_buf) retains a coded_lists* across multiple writes. Its
bounds check (buf[buf_size - avail_size] then --avail_size) is
consistent — but if some path manipulates bbp->ccl against a chained
next_list that's been allocated AFTER a codeblock data slot, the byte
layout puts them adjacent.
coded_cb_header::next_coded retains pointers across precinct/resolution
state transitions; if any of these is dereferenced after the chunk's data
pointer has been advanced past it, a write through it could land in a
neighboring slot. (We haven't reproduced this directly.)
- An off-by-one or signed/unsigned arithmetic bug in a less-traveled code
path that targets the scalar encoder's output handling.
We have not been able to pin down the exact offending write — both because
the surface to audit is large and because the symptom only manifests on
specific image content, making instrumented diff-runs awkward.
What we'd find useful
- Maintainer eyes on the candidate code paths above.
- Maybe a more targeted assert (e.g., poisoning the byte at
cur_store->data - 1 after each get_buffer and re-checking it on the
next call) to identify the write site precisely. We tried sentinel bytes
with the original packed layout — they got clobbered (which is what
pointed us at intra-chunk aliasing in the first place) but the loop only
reported the first clobbered offset, so a single byte-difference sentinel
- breakpoint via
__builtin_trap might be more surgical.
Workaround patch (for users hitting this)
This is a workaround, not a root-cause fix — it just eliminates the
aliasing surface so the bug becomes silent. Memory cost: replaces the 1 MB
chunk pool with per-slot mallocs; typical peak working set per in-flight
codestream is a few MB.
--- a/src/core/others/ojph_mem.cpp
+++ b/src/core/others/ojph_mem.cpp
@@ -93,35 +93,60 @@
ui32 extended_bytes)
{
ui32 bytes = ojph_max(extended_bytes, chunk_size);
- if (avail != NULL && avail->orig_size >= bytes)
- {
- *list = avail;
- avail = avail->next_store;
- (*list)->restart();
- return *list;
- }
- else
- {
- ui32 store_bytes = stores_list::eval_store_bytes(bytes);
- *list = (stores_list*) malloc(store_bytes);
- total_allocated += store_bytes;
- return new (*list) stores_list(bytes);
- }
+ // avail-list reuse is disabled: external callers (precinct state,
+ // bit_write_buf::ccl, coded_cb_header::next_coded) retain coded_lists*
+ // pointers into a chunk across restart() boundaries. Reusing the chunk
+ // via the avail list then places fresh coded_lists over memory still
+ // aliased by those stale pointers; writes through them clobber the new
+ // tenant. Always allocate fresh.
+ ui32 store_bytes = stores_list::eval_store_bytes(bytes);
+ *list = (stores_list*) malloc(store_bytes);
+ total_allocated += store_bytes;
+ return new (*list) stores_list(bytes);
}
////////////////////////////////////////////////////////////////////////////
void mem_elastic_allocator::get_buffer(ui32 needed_bytes, coded_lists* &p)
{
+ // Each get_buffer gets its own malloc'd store sized to fit only this
+ // slot (no packing, no chunk_size rounding). Stops adjacent slots from
+ // aliasing each other.
ui32 extended_bytes = needed_bytes + (ui32)sizeof(coded_lists);
+ ui32 store_bytes = stores_list::eval_store_bytes(extended_bytes);
+ stores_list *fresh = (stores_list*) malloc(store_bytes);
+ total_allocated += store_bytes;
+ new (fresh) stores_list(extended_bytes);
if (store == NULL)
- cur_store = store = allocate(&store, extended_bytes);
- else if (cur_store->available < extended_bytes)
- cur_store = allocate(&cur_store->next_store, extended_bytes);
+ store = fresh;
+ else
+ cur_store->next_store = fresh;
+ cur_store = fresh;
p = new (cur_store->data) coded_lists(needed_bytes);
- assert(cur_store->available >= extended_bytes);
cur_store->available -= extended_bytes;
cur_store->data += extended_bytes;
}
Environment
- macOS 26 / Apple Silicon (M-series, native arm64)
- AppleClang 21 (Xcode 26)
- OpenJPH 0.26.3 (vendored in OpenEXR 3.4.11) — also confirmed with 0.27.0
as a standalone dylib
-O3 -DNDEBUG, standard build flags via OpenEXR CMake with
-DOPENEXR_FORCE_INTERNAL_OPENJPH=ON
Summary
When OpenJPH is used on a platform without an AVX2/AVX512 codeblock encoder
(Apple Silicon arm64, arm64 Linux, embedded ARM, x86 without AVX2), running
the scalar
ojph_encode_codeblock32path can crash deterministically onspecific image content. The crash is
EXC_BAD_ACCESS/SIGSEGVinsidemem_elastic_allocator::get_buffer, wherecur_store->availableis read froma
cur_storepointer whose bytes have been overwritten with raw J2K codeddata.
The actual encoder writes are correctly bounded by
needed_bytes— weverified this experimentally. The corruption is intra-chunk aliasing:
some code path writes past one
coded_listsslot'sbuf + needed_bytes,landing in the next slot's
coded_listsheader (since slots are packedback-to-back inside one 1 MB store chunk). Once that header is corrupt, the
next
get_bufferdereferences garbage and crashes.Reproducer
OpenJPH 0.26.3 (also reproduces with 0.27.0). Running OpenEXR 3.4.11 with
-DOPENEXR_FORCE_INTERNAL_OPENJPH=ONon Apple Silicon (M-series Mac, nativearm64 binary, not Rosetta), encoding an HT/J2K-compressed EXR with certain
image content. Same content does not crash on x86_64 Macs / Windows where
the AVX2 codeblock encoder is selected.
Stack at crash:
The crash address
0x3cdacdca26dacdc9is non-canonical and the byte pattern..dacdc9..repeats J2K-stream-looking content — i.e. raw codestream bytesare sitting where a heap pointer should be.
Diagnostic experiments
We confirmed root cause direction via two builds of OpenJPH:
1. Guard-page allocator (rules out encoder overrun)
Replaced
mem_elastic_allocator::get_bufferto mmap two pages per call andmprotectthe trailing page toPROT_READ, positioningcoded->bufso thatcoded->buf[needed_bytes]lands exactly on the guard page boundary.If
ojph_encode_codeblock32,vlc_encode,ms_encode, or any terminationfunction wrote a byte past index
needed_bytes - 1, the guard page faultsimmediately with the exact stack of the offending write.
Result: clean render, no fault. The encoder's own writes to
coded->bufare correctly bounded — the
memcpys ofms.buf/mel.buf/vlc.bufintocoded->bufat the end ofojph_encode_codeblock32total exactlymel.pos + vlc.pos + ms.pos = needed_bytesbytes by construction.2. Disable
avail-list chunk reuse only (rules out cross-restart aliasing)Kept the original packed chunk layout, but made
mem_elastic_allocator::allocateignore the
availlist — every chunk request is a freshmalloc, no reuseacross
restart()boundaries.Result: still crashes, same fingerprint. So the bug isn't cross-restart
stale pointers; it's intra-chunk aliasing during a single encode round.
3. Per-slot malloc (workaround that fixes it)
Replaced
get_buffertomalloca dedicatedstores_listper call sizedto fit only that slot — no packing inside 1 MB chunks. Slots remain chained
via
next_storefor the destructor's batch free.Result: 4 successive full renders complete cleanly, decoded output matches
PIZ/ZIP renders of the same scene. This is what we're shipping locally
as a workaround.
Hypothesis on actual root cause
Given (1) confirms the encoder doesn't overrun and (3) confirms isolating
slots eliminates the symptom, the bug is a write past one slot's
needed_bytesthat lands in the next packed slot'scoded_listsheader.Candidates we considered but couldn't conclusively pin down:
bit_write_buf::ccl(used for packet headers viabb_put_bit/bb_expand_buf) retains acoded_lists*across multiple writes. Itsbounds check (
buf[buf_size - avail_size]then--avail_size) isconsistent — but if some path manipulates
bbp->cclagainst a chainednext_list that's been allocated AFTER a codeblock data slot, the byte
layout puts them adjacent.
coded_cb_header::next_codedretains pointers across precinct/resolutionstate transitions; if any of these is dereferenced after the chunk's data
pointer has been advanced past it, a write through it could land in a
neighboring slot. (We haven't reproduced this directly.)
path that targets the scalar encoder's output handling.
We have not been able to pin down the exact offending write — both because
the surface to audit is large and because the symptom only manifests on
specific image content, making instrumented diff-runs awkward.
What we'd find useful
cur_store->data - 1after eachget_bufferand re-checking it on thenext call) to identify the write site precisely. We tried sentinel bytes
with the original packed layout — they got clobbered (which is what
pointed us at intra-chunk aliasing in the first place) but the loop only
reported the first clobbered offset, so a single byte-difference sentinel
__builtin_trapmight be more surgical.Workaround patch (for users hitting this)
This is a workaround, not a root-cause fix — it just eliminates the
aliasing surface so the bug becomes silent. Memory cost: replaces the 1 MB
chunk pool with per-slot mallocs; typical peak working set per in-flight
codestream is a few MB.
Environment
as a standalone dylib
-O3 -DNDEBUG, standard build flags via OpenEXR CMake with-DOPENEXR_FORCE_INTERNAL_OPENJPH=ON