fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469
Draft
AztecBot wants to merge 1 commit into
Draft
fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
aztec start --local-networkreliably SIGBUSes a few blocks into a run on macOS arm64 (sincev5.0.0-nightly.20260520, i.e. after #21625 shipped theshared_ptruse-after-free fix). This is a different fault from the one #21625 fixed: a stack-guard violation (stack overflow) on anodejs_module.nodeworker thread running AVM-simulation code, not a use-after-free.This pins an explicit, generous stack size on the
ThreadedAsyncOperationworker thread.Root cause
ThreadedAsyncOperation::Queue()(introduced in #21138) runs the AVM simulation (_fn) directly on a barestd::thread(...).detach(). Astd::threaduses the OS default stack for non-main threads, which is 512 KB on macOS versus 8 MB on Linux. The AVM-simulation call chain is deep enough to overflow 512 KB, so on macOS arm64 the worker writes into its stack-guard page and the process aborts with:Linux is unaffected because its 8 MB default is comfortably large. The previous
AsyncOperationpath never hit this either: it ran on the libuv threadpool, whose threads are sized fromRLIMIT_STACK(8 MB soft on macOS), not the 512 KB raw-thread default.Fix
std::threadcan't set a stack size, so launch the worker viapthreadswithpthread_attr_setstacksizepinned to a generousWORKER_STACK_SIZE(32 MB — 4× the 8 MB that the libuv path proved sufficient, with headroom for deeper future call chains). Falls back to a default-stackstd::threadonly if pthreads is unavailable (_WIN32) orpthread_createfails.The shared_ptr lifetime model from #21625 is preserved exactly — both the worker lambda and the
BlockingCallcompletion callback still captureself, so this does not reintroduce the use-after-free. Only the thread-launch mechanism changed.Testing
aztec start --local-networkrun to confirm the crash is gone.std::functiontrampoline was compiled and run standalone under-std=c++20 -Wall -Wextra -Werror: the worker thread receives a 32 MB stack (pthread_get_stacksize_npreports33554432), and the work runs and completes.Notes for reviewers
next(notmerge-train/barretenberg) to match fix: use shared_ptr in ThreadedAsyncOperation to prevent SIGBUS on macOS #21625's base and to make the nightly, since this is an urgent release-affecting crash. Happy to retarget if you'd prefer it go through the merge train.getrlimit(RLIMIT_STACK). The fixed constant is simpler and the virtual reservation only commits pages as touched.Related: #21138 (introduced the threaded model), #21625 (use-after-free fix), #21629 (open alternative).
Created by claudebox · group:
slackbot