Skip to content

mem_elastic_allocator intra-chunk aliasing — SIGSEGV in get_buffer on non-SIMD encoder platforms (Apple Silicon arm64) #273

@gletonai-colorfront

Description

@gletonai-colorfront

Summary

When OpenJPH is used on a platform without an AVX2/AVX512 codeblock encoder
(Apple Silicon arm64, arm64 Linux, embedded ARM, x86 without AVX2), running
the scalar ojph_encode_codeblock32 path can crash deterministically on
specific image content. The crash is EXC_BAD_ACCESS / SIGSEGV inside
mem_elastic_allocator::get_buffer, where cur_store->available is read from
a cur_store pointer whose bytes have been overwritten with raw J2K coded
data.

The actual encoder writes are correctly bounded by needed_bytes — we
verified this experimentally. The corruption is intra-chunk aliasing:
some code path writes past one coded_lists slot's buf + needed_bytes,
landing in the next slot's coded_lists header (since slots are packed
back-to-back inside one 1 MB store chunk). Once that header is corrupt, the
next get_buffer dereferences garbage and crashes.

Reproducer

OpenJPH 0.26.3 (also reproduces with 0.27.0). Running OpenEXR 3.4.11 with
-DOPENEXR_FORCE_INTERNAL_OPENJPH=ON on Apple Silicon (M-series Mac, native
arm64 binary, not Rosetta), encoding an HT/J2K-compressed EXR with certain
image content. Same content does not crash on x86_64 Macs / Windows where
the AVX2 codeblock encoder is selected.

Stack at crash:

* thread #99, stop reason = EXC_BAD_ACCESS (code=1, address=0x3cdacdca26dacdc9)
  * frame #0: libOpenEXRCore...ojph::mem_elastic_allocator::get_buffer + 204
    frame #1: libOpenEXRCore...ojph::local::ojph_encode_codeblock32 + 12180
    frame #2: libOpenEXRCore...ojph::local::codeblock::encode + 156
    frame #3: libOpenEXRCore...ojph::local::subband::push_line + 140
    frame #4: libOpenEXRCore...ojph::local::resolution::push_line + 1408
    frame #5: libOpenEXRCore...ojph::local::tile::push + 840
    frame #6: libOpenEXRCore...ojph::local::codestream::exchange + 96
    frame #7: libOpenEXRCore...internal_exr_apply_ht + 1256

The crash address 0x3cdacdca26dacdc9 is non-canonical and the byte pattern
..dacdc9.. repeats J2K-stream-looking content — i.e. raw codestream bytes
are sitting where a heap pointer should be.

Diagnostic experiments

We confirmed root cause direction via two builds of OpenJPH:

1. Guard-page allocator (rules out encoder overrun)

Replaced mem_elastic_allocator::get_buffer to mmap two pages per call and
mprotect the trailing page to PROT_READ, positioning coded->buf so that
coded->buf[needed_bytes] lands exactly on the guard page boundary.

If ojph_encode_codeblock32, vlc_encode, ms_encode, or any termination
function wrote a byte past index needed_bytes - 1, the guard page faults
immediately with the exact stack of the offending write.

Result: clean render, no fault. The encoder's own writes to coded->buf
are correctly bounded — the memcpys of ms.buf/mel.buf/vlc.buf into
coded->buf at the end of ojph_encode_codeblock32 total exactly
mel.pos + vlc.pos + ms.pos = needed_bytes bytes by construction.

2. Disable avail-list chunk reuse only (rules out cross-restart aliasing)

Kept the original packed chunk layout, but made mem_elastic_allocator::allocate
ignore the avail list — every chunk request is a fresh malloc, no reuse
across restart() boundaries.

Result: still crashes, same fingerprint. So the bug isn't cross-restart
stale pointers; it's intra-chunk aliasing during a single encode round.

3. Per-slot malloc (workaround that fixes it)

Replaced get_buffer to malloc a dedicated stores_list per call sized
to fit only that slot — no packing inside 1 MB chunks. Slots remain chained
via next_store for the destructor's batch free.

Result: 4 successive full renders complete cleanly, decoded output matches
PIZ/ZIP renders of the same scene.
This is what we're shipping locally
as a workaround.

Hypothesis on actual root cause

Given (1) confirms the encoder doesn't overrun and (3) confirms isolating
slots eliminates the symptom, the bug is a write past one slot's
needed_bytes that lands in the next packed slot's coded_lists header.
Candidates we considered but couldn't conclusively pin down:

  • bit_write_buf::ccl (used for packet headers via bb_put_bit /
    bb_expand_buf) retains a coded_lists* across multiple writes. Its
    bounds check (buf[buf_size - avail_size] then --avail_size) is
    consistent — but if some path manipulates bbp->ccl against a chained
    next_list that's been allocated AFTER a codeblock data slot, the byte
    layout puts them adjacent.
  • coded_cb_header::next_coded retains pointers across precinct/resolution
    state transitions; if any of these is dereferenced after the chunk's data
    pointer has been advanced past it, a write through it could land in a
    neighboring slot. (We haven't reproduced this directly.)
  • An off-by-one or signed/unsigned arithmetic bug in a less-traveled code
    path that targets the scalar encoder's output handling.

We have not been able to pin down the exact offending write — both because
the surface to audit is large and because the symptom only manifests on
specific image content, making instrumented diff-runs awkward.

What we'd find useful

  • Maintainer eyes on the candidate code paths above.
  • Maybe a more targeted assert (e.g., poisoning the byte at
    cur_store->data - 1 after each get_buffer and re-checking it on the
    next call) to identify the write site precisely. We tried sentinel bytes
    with the original packed layout — they got clobbered (which is what
    pointed us at intra-chunk aliasing in the first place) but the loop only
    reported the first clobbered offset, so a single byte-difference sentinel
    • breakpoint via __builtin_trap might be more surgical.

Workaround patch (for users hitting this)

This is a workaround, not a root-cause fix — it just eliminates the
aliasing surface so the bug becomes silent. Memory cost: replaces the 1 MB
chunk pool with per-slot mallocs; typical peak working set per in-flight
codestream is a few MB.

--- a/src/core/others/ojph_mem.cpp
+++ b/src/core/others/ojph_mem.cpp
@@ -93,35 +93,60 @@
                                   ui32 extended_bytes)
   {
     ui32 bytes = ojph_max(extended_bytes, chunk_size);
-    if (avail != NULL && avail->orig_size >= bytes)
-    {
-      *list = avail;
-      avail = avail->next_store;
-      (*list)->restart();
-      return *list;
-    }
-    else
-    {
-      ui32 store_bytes = stores_list::eval_store_bytes(bytes);
-      *list = (stores_list*) malloc(store_bytes);
-      total_allocated += store_bytes;
-      return new (*list) stores_list(bytes);
-    }
+    // avail-list reuse is disabled: external callers (precinct state,
+    // bit_write_buf::ccl, coded_cb_header::next_coded) retain coded_lists*
+    // pointers into a chunk across restart() boundaries. Reusing the chunk
+    // via the avail list then places fresh coded_lists over memory still
+    // aliased by those stale pointers; writes through them clobber the new
+    // tenant. Always allocate fresh.
+    ui32 store_bytes = stores_list::eval_store_bytes(bytes);
+    *list = (stores_list*) malloc(store_bytes);
+    total_allocated += store_bytes;
+    return new (*list) stores_list(bytes);
   }

   ////////////////////////////////////////////////////////////////////////////
   void mem_elastic_allocator::get_buffer(ui32 needed_bytes, coded_lists* &p)
   {
+    // Each get_buffer gets its own malloc'd store sized to fit only this
+    // slot (no packing, no chunk_size rounding). Stops adjacent slots from
+    // aliasing each other.
     ui32 extended_bytes = needed_bytes + (ui32)sizeof(coded_lists);
+    ui32 store_bytes = stores_list::eval_store_bytes(extended_bytes);
+    stores_list *fresh = (stores_list*) malloc(store_bytes);
+    total_allocated += store_bytes;
+    new (fresh) stores_list(extended_bytes);

     if (store == NULL)
-      cur_store = store = allocate(&store, extended_bytes);
-    else if (cur_store->available < extended_bytes)
-      cur_store = allocate(&cur_store->next_store, extended_bytes);
+      store = fresh;
+    else
+      cur_store->next_store = fresh;
+    cur_store = fresh;

     p = new (cur_store->data) coded_lists(needed_bytes);

-    assert(cur_store->available >= extended_bytes);
     cur_store->available -= extended_bytes;
     cur_store->data += extended_bytes;
   }

Environment

  • macOS 26 / Apple Silicon (M-series, native arm64)
  • AppleClang 21 (Xcode 26)
  • OpenJPH 0.26.3 (vendored in OpenEXR 3.4.11) — also confirmed with 0.27.0
    as a standalone dylib
  • -O3 -DNDEBUG, standard build flags via OpenEXR CMake with
    -DOPENEXR_FORCE_INTERNAL_OPENJPH=ON

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions