Skip to content

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/4bd36dc505c2
Draft

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469
AztecBot wants to merge 1 commit into
nextfrom
cb/4bd36dc505c2

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

Summary

aztec start --local-network reliably SIGBUSes a few blocks into a run on macOS arm64 (since v5.0.0-nightly.20260520, i.e. after #21625 shipped the shared_ptr use-after-free fix). This is a different fault from the one #21625 fixed: a stack-guard violation (stack overflow) on a nodejs_module.node worker thread running AVM-simulation code, not a use-after-free.

This pins an explicit, generous stack size on the ThreadedAsyncOperation worker thread.

Root cause

ThreadedAsyncOperation::Queue() (introduced in #21138) runs the AVM simulation (_fn) directly on a bare std::thread(...).detach(). A std::thread uses the OS default stack for non-main threads, which is 512 KB on macOS versus 8 MB on Linux. The AVM-simulation call chain is deep enough to overflow 512 KB, so on macOS arm64 the worker writes into its stack-guard page and the process aborts with:

EXC_BAD_ACCESS / SIGBUS, KERN_PROTECTION_FAILURE
"Could not determine thread index for stack guard region"
  #0 _platform_memmove
  #1.. nodejs_module.node  bb::nodejs (AVM simulation path)

Linux is unaffected because its 8 MB default is comfortably large. The previous AsyncOperation path never hit this either: it ran on the libuv threadpool, whose threads are sized from RLIMIT_STACK (8 MB soft on macOS), not the 512 KB raw-thread default.

Fix

std::thread can't set a stack size, so launch the worker via pthreads with pthread_attr_setstacksize pinned to a generous WORKER_STACK_SIZE (32 MB — 4× the 8 MB that the libuv path proved sufficient, with headroom for deeper future call chains). Falls back to a default-stack std::thread only if pthreads is unavailable (_WIN32) or pthread_create fails.

The shared_ptr lifetime model from #21625 is preserved exactly — both the worker lambda and the BlockingCall completion callback still capture self, so this does not reintroduce the use-after-free. Only the thread-launch mechanism changed.

Testing

  • The full bb build is too heavy to run in this session, so this is not yet a local end-to-end repro/fix verification — it relies on CI for compilation and on a macOS arm64 aztec start --local-network run to confirm the crash is gone.
  • The pthread/std::function trampoline was compiled and run standalone under -std=c++20 -Wall -Wextra -Werror: the worker thread receives a 32 MB stack (pthread_get_stacksize_np reports 33554432), and the work runs and completes.
  • Requested: verify against tonight's nightly on macOS arm64 (M3) — the reporter's exact repro.

Notes for reviewers

Related: #21138 (introduced the threaded model), #21625 (use-after-free fix), #21629 (open alternative).


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-barretenberg-full Run all barretenberg checks. claudebox Owned by claudebox. it can push to this PR. labels May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-barretenberg-full Run all barretenberg checks. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant