`mem_elastic_allocator` intra-chunk aliasing — SIGSEGV in `get_buffer` on non-SIMD encoder platforms (Apple Silicon arm64)

## Summary

When OpenJPH is used on a platform without an AVX2/AVX512 codeblock encoder
(Apple Silicon arm64, arm64 Linux, embedded ARM, x86 without AVX2), running
the scalar `ojph_encode_codeblock32` path can crash deterministically on
specific image content. The crash is `EXC_BAD_ACCESS` / `SIGSEGV` inside
`mem_elastic_allocator::get_buffer`, where `cur_store->available` is read from
a `cur_store` pointer whose bytes have been overwritten with raw J2K coded
data.

The actual encoder writes are correctly bounded by `needed_bytes` — we
verified this experimentally. The corruption is **intra-chunk aliasing**:
some code path writes past one `coded_lists` slot's `buf + needed_bytes`,
landing in the next slot's `coded_lists` header (since slots are packed
back-to-back inside one 1 MB store chunk). Once that header is corrupt, the
next `get_buffer` dereferences garbage and crashes.

## Reproducer

OpenJPH 0.26.3 (also reproduces with 0.27.0). Running OpenEXR 3.4.11 with
`-DOPENEXR_FORCE_INTERNAL_OPENJPH=ON` on Apple Silicon (M-series Mac, native
arm64 binary, not Rosetta), encoding an HT/J2K-compressed EXR with certain
image content. Same content does not crash on x86_64 Macs / Windows where
the AVX2 codeblock encoder is selected.

Stack at crash:

```
* thread #99, stop reason = EXC_BAD_ACCESS (code=1, address=0x3cdacdca26dacdc9)
  * frame #0: libOpenEXRCore...ojph::mem_elastic_allocator::get_buffer + 204
    frame #1: libOpenEXRCore...ojph::local::ojph_encode_codeblock32 + 12180
    frame #2: libOpenEXRCore...ojph::local::codeblock::encode + 156
    frame #3: libOpenEXRCore...ojph::local::subband::push_line + 140
    frame #4: libOpenEXRCore...ojph::local::resolution::push_line + 1408
    frame #5: libOpenEXRCore...ojph::local::tile::push + 840
    frame #6: libOpenEXRCore...ojph::local::codestream::exchange + 96
    frame #7: libOpenEXRCore...internal_exr_apply_ht + 1256
```

The crash address `0x3cdacdca26dacdc9` is non-canonical and the byte pattern
`..dacdc9..` repeats J2K-stream-looking content — i.e. raw codestream bytes
are sitting where a heap pointer should be.

## Diagnostic experiments

We confirmed root cause direction via two builds of OpenJPH:

### 1. Guard-page allocator (rules out encoder overrun)

Replaced `mem_elastic_allocator::get_buffer` to mmap two pages per call and
`mprotect` the trailing page to `PROT_READ`, positioning `coded->buf` so that
`coded->buf[needed_bytes]` lands exactly on the guard page boundary.

If `ojph_encode_codeblock32`, `vlc_encode`, `ms_encode`, or any termination
function wrote a byte past index `needed_bytes - 1`, the guard page faults
immediately with the exact stack of the offending write.

**Result: clean render, no fault.** The encoder's own writes to `coded->buf`
are correctly bounded — the `memcpy`s of `ms.buf`/`mel.buf`/`vlc.buf` into
`coded->buf` at the end of `ojph_encode_codeblock32` total exactly
`mel.pos + vlc.pos + ms.pos = needed_bytes` bytes by construction.

### 2. Disable `avail`-list chunk reuse only (rules out cross-restart aliasing)

Kept the original packed chunk layout, but made `mem_elastic_allocator::allocate`
ignore the `avail` list — every chunk request is a fresh `malloc`, no reuse
across `restart()` boundaries.

**Result: still crashes, same fingerprint.** So the bug isn't cross-restart
stale pointers; it's intra-chunk aliasing during a single encode round.

### 3. Per-slot malloc (workaround that fixes it)

Replaced `get_buffer` to `malloc` a dedicated `stores_list` per call sized
to fit only that slot — no packing inside 1 MB chunks. Slots remain chained
via `next_store` for the destructor's batch free.

**Result: 4 successive full renders complete cleanly, decoded output matches
PIZ/ZIP renders of the same scene.** This is what we're shipping locally
as a workaround.

## Hypothesis on actual root cause

Given (1) confirms the encoder doesn't overrun and (3) confirms isolating
slots eliminates the symptom, the bug is a write past one slot's
`needed_bytes` that lands in the **next** packed slot's `coded_lists` header.
Candidates we considered but couldn't conclusively pin down:

- `bit_write_buf::ccl` (used for packet headers via `bb_put_bit` /
  `bb_expand_buf`) retains a `coded_lists*` across multiple writes. Its
  bounds check (`buf[buf_size - avail_size]` then `--avail_size`) is
  consistent — but if some path manipulates `bbp->ccl` against a chained
  next_list that's been allocated AFTER a codeblock data slot, the byte
  layout puts them adjacent.
- `coded_cb_header::next_coded` retains pointers across precinct/resolution
  state transitions; if any of these is dereferenced after the chunk's data
  pointer has been advanced past it, a write through it could land in a
  neighboring slot. (We haven't reproduced this directly.)
- An off-by-one or signed/unsigned arithmetic bug in a less-traveled code
  path that targets the scalar encoder's output handling.

We have not been able to pin down the exact offending write — both because
the surface to audit is large and because the symptom only manifests on
specific image content, making instrumented diff-runs awkward.

## What we'd find useful

- Maintainer eyes on the candidate code paths above.
- Maybe a more targeted assert (e.g., poisoning the byte at
  `cur_store->data - 1` after each `get_buffer` and re-checking it on the
  next call) to identify the write site precisely. We tried sentinel bytes
  with the original packed layout — they got clobbered (which is what
  pointed us at intra-chunk aliasing in the first place) but the loop only
  reported the first clobbered offset, so a single byte-difference sentinel
  + breakpoint via `__builtin_trap` might be more surgical.

## Workaround patch (for users hitting this)

This is a **workaround, not a root-cause fix** — it just eliminates the
aliasing surface so the bug becomes silent. Memory cost: replaces the 1 MB
chunk pool with per-slot mallocs; typical peak working set per in-flight
codestream is a few MB.

```diff
--- a/src/core/others/ojph_mem.cpp
+++ b/src/core/others/ojph_mem.cpp
@@ -93,35 +93,60 @@
                                   ui32 extended_bytes)
   {
     ui32 bytes = ojph_max(extended_bytes, chunk_size);
-    if (avail != NULL && avail->orig_size >= bytes)
-    {
-      *list = avail;
-      avail = avail->next_store;
-      (*list)->restart();
-      return *list;
-    }
-    else
-    {
-      ui32 store_bytes = stores_list::eval_store_bytes(bytes);
-      *list = (stores_list*) malloc(store_bytes);
-      total_allocated += store_bytes;
-      return new (*list) stores_list(bytes);
-    }
+    // avail-list reuse is disabled: external callers (precinct state,
+    // bit_write_buf::ccl, coded_cb_header::next_coded) retain coded_lists*
+    // pointers into a chunk across restart() boundaries. Reusing the chunk
+    // via the avail list then places fresh coded_lists over memory still
+    // aliased by those stale pointers; writes through them clobber the new
+    // tenant. Always allocate fresh.
+    ui32 store_bytes = stores_list::eval_store_bytes(bytes);
+    *list = (stores_list*) malloc(store_bytes);
+    total_allocated += store_bytes;
+    return new (*list) stores_list(bytes);
   }

   ////////////////////////////////////////////////////////////////////////////
   void mem_elastic_allocator::get_buffer(ui32 needed_bytes, coded_lists* &p)
   {
+    // Each get_buffer gets its own malloc'd store sized to fit only this
+    // slot (no packing, no chunk_size rounding). Stops adjacent slots from
+    // aliasing each other.
     ui32 extended_bytes = needed_bytes + (ui32)sizeof(coded_lists);
+    ui32 store_bytes = stores_list::eval_store_bytes(extended_bytes);
+    stores_list *fresh = (stores_list*) malloc(store_bytes);
+    total_allocated += store_bytes;
+    new (fresh) stores_list(extended_bytes);

     if (store == NULL)
-      cur_store = store = allocate(&store, extended_bytes);
-    else if (cur_store->available < extended_bytes)
-      cur_store = allocate(&cur_store->next_store, extended_bytes);
+      store = fresh;
+    else
+      cur_store->next_store = fresh;
+    cur_store = fresh;

     p = new (cur_store->data) coded_lists(needed_bytes);

-    assert(cur_store->available >= extended_bytes);
     cur_store->available -= extended_bytes;
     cur_store->data += extended_bytes;
   }
```

## Environment

- macOS 26 / Apple Silicon (M-series, native arm64)
- AppleClang 21 (Xcode 26)
- OpenJPH 0.26.3 (vendored in OpenEXR 3.4.11) — also confirmed with 0.27.0
  as a standalone dylib
- `-O3 -DNDEBUG`, standard build flags via OpenEXR CMake with
  `-DOPENEXR_FORCE_INTERNAL_OPENJPH=ON`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`mem_elastic_allocator` intra-chunk aliasing — SIGSEGV in `get_buffer` on non-SIMD encoder platforms (Apple Silicon arm64) #273

Summary

Reproducer

Diagnostic experiments

1. Guard-page allocator (rules out encoder overrun)

2. Disable `avail`-list chunk reuse only (rules out cross-restart aliasing)

3. Per-slot malloc (workaround that fixes it)

Hypothesis on actual root cause

What we'd find useful

Workaround patch (for users hitting this)

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

mem_elastic_allocator intra-chunk aliasing — SIGSEGV in get_buffer on non-SIMD encoder platforms (Apple Silicon arm64) #273

Description

Summary

Reproducer

Diagnostic experiments

1. Guard-page allocator (rules out encoder overrun)

2. Disable avail-list chunk reuse only (rules out cross-restart aliasing)

3. Per-slot malloc (workaround that fixes it)

Hypothesis on actual root cause

What we'd find useful

Workaround patch (for users hitting this)

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`mem_elastic_allocator` intra-chunk aliasing — SIGSEGV in `get_buffer` on non-SIMD encoder platforms (Apple Silicon arm64) #273

2. Disable `avail`-list chunk reuse only (rules out cross-restart aliasing)