Skip to content

feat: kv-store over IPC; aztec-kvdb binary; LMDBStore NAPI scaffold [PR 4]#23238

Closed
charlielye wants to merge 0 commit into
cl/ipc-4-avm-binaryfrom
cl/ipc-6-kvdb
Closed

feat: kv-store over IPC; aztec-kvdb binary; LMDBStore NAPI scaffold [PR 4]#23238
charlielye wants to merge 0 commit into
cl/ipc-4-avm-binaryfrom
cl/ipc-6-kvdb

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Summary

Stacked on PR 3b (#23196). Adds an out-of-process `aztec-kvdb` binary that owns LMDB and serves it over UDS / MPSC SHM with the same `TypedMessage

` wire format `yarn-project/native/MsgpackChannel` already speaks. AztecLMDBStoreV2 will be migrated from `NativeLMDBStore` (NAPI) to `KvdbBackend` (IPC) in a follow-up commit on this same PR.

This first commit is inert: the binary builds and is shipped, but nothing in yarn-project uses it yet.

Why

After PR 3b lands, only one load-bearing NAPI consumer remains: the LMDB store used by archiver, p2p, pxe, slasher, and validator-ha-signer. Moving it out of NAPI removes the last embedding of a C++ subsystem in the Node.js process. The NAPI module is reduced to a thin SHM transport stub (`MsgpackClient`/`MsgpackClientAsync`, ~400 LOC) — pure IPC plumbing, no domain logic.

SHM is the production transport (~1–10 µs round-trip via futex doorbell); UDS is the dev/test fallback. This also activates the SHM code-path that has been dead in production since PR 1.

What's in this PR (so far)

C++:

  • `barretenberg/cpp/src/barretenberg/kvdb/` — `aztec-kvdb` binary. Wraps `lmdblib::LMDBStore`, runs a `bb::messaging::MessageDispatcher`, exposes the same op set as the NAPI LMDBStoreWrapper (open_database, get, has, start_cursor, advance_cursor, advance_cursor_count, close_cursor, batch, stats, copy_store).
  • `barretenberg/cpp/src/barretenberg/kvdb/kvdb_messages.hpp` — wire schema. Moved from `nodejs_module/lmdb_store/lmdb_store_message.hpp`; namespace renamed `bb::nodejs::lmdb_store` → `bb::kvdb`, enum `LMDBStoreMessageType` → `KvdbMessageType`. The NAPI wrapper temporarily includes the new location until it's deleted.

TypeScript (bb.js):

  • `barretenberg/ts/src/aztec-kvdb/index.ts` — `KvdbBackend`: spawns the binary, talks UDS or SHM, implements `IMsgpackBackendAsync`. Mirrors `WsdbBackend`.
  • `findKvdbBinary` in `bb_backends/node/platform.ts`.
  • `copy_native.sh` ships `aztec-kvdb` alongside the other binaries.

What's still ahead in this PR

  • Swap `new MsgpackChannel(new NativeLMDBStore(...))` → `new MsgpackChannel(new KvdbBackend(...))` inside `yarn-project/kv-store/src/lmdb-v2/store.ts`.
  • Update spawn sites (archiver, p2p, pxe, slasher, validator-ha-signer factories + `aztec-node/server.ts`) to construct/accept a `KvdbBackend`.
  • Default `useShm: true` in production callers; verify the SHM transport end-to-end.
  • Delete `barretenberg/cpp/src/barretenberg/nodejs_module/lmdb_store/`, the `LMDBStore` export in `init_module.cpp`, the `NativeLMDBStore` export in `@aztec/native`, and the thin `yarn-project/kv-store/src/lmdb-v2/native/` wrapper.

Stack

Test plan

  • `aztec-kvdb msgpack run --input /tmp/foo.sock --data-dir /tmp/lmdb-test` accepts UDS connections
  • `@aztec/kv-store` `lmdb-v2` tests pass against KvdbBackend over UDS and SHM
  • Bench hot `get`/cursor-page latency vs NAPI baseline (target ~5× over SHM)
  • e2e `e2e_block_building` / `e2e_archiver_` / `e2e_p2p/`
  • Sandbox boots and serves transactions

@socket-security

socket-security Bot commented May 13, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatednpm/​@​aztec/​noir-noir_js@​1.0.0-beta.21 ⏵ 1.0.0-beta.20100100100100100

View full report

@AztecBot

Copy link
Copy Markdown
Collaborator

This issue was automatically closed because it was referenced in PR #23469 which has been merged to the default branch.

View workflow run

@AztecBot AztecBot closed this May 22, 2026
@charlielye charlielye reopened this May 29, 2026
@charlielye charlielye force-pushed the cl/ipc-5-avm-cutover branch from 1c8d4f9 to 6867e96 Compare May 29, 2026 13:58
Base automatically changed from cl/ipc-5-avm-cutover to cl/ipc-4-avm-binary May 29, 2026 13:58
@charlielye charlielye closed this May 29, 2026
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
…AztecProtocol#23469)

## Summary

`aztec start --local-network` reliably SIGBUSes a few blocks into a run
on macOS arm64 (since `v5.0.0-nightly.20260520`, i.e. after AztecProtocol#21625
shipped the `shared_ptr` use-after-free fix). This is a **different**
fault from the one AztecProtocol#21625 fixed: a stack-guard violation (stack
overflow) on a `nodejs_module.node` worker thread running AVM-simulation
code, not a use-after-free.

This pins an explicit, generous stack size on the
`ThreadedAsyncOperation` worker thread.

## Root cause

`ThreadedAsyncOperation::Queue()` (introduced in AztecProtocol#21138) runs the AVM
simulation (`_fn`) directly on a bare `std::thread(...).detach()`. A
`std::thread` uses the OS default stack for non-main threads, which is
**512 KB on macOS** versus **8 MB on Linux**. The AVM-simulation call
chain is deep enough to overflow 512 KB, so on macOS arm64 the worker
writes into its stack-guard page and the process aborts with:

```
EXC_BAD_ACCESS / SIGBUS, KERN_PROTECTION_FAILURE
"Could not determine thread index for stack guard region"
  #0 _platform_memmove
  #1.. nodejs_module.node  bb::nodejs (AVM simulation path)
```

Linux is unaffected because its 8 MB default is comfortably large. The
previous `AsyncOperation` path never hit this either: it ran on the
libuv threadpool, whose threads are sized from `RLIMIT_STACK` (8 MB soft
on macOS), not the 512 KB raw-thread default.

## Fix

`std::thread` can't set a stack size, so launch the worker via
`pthreads` with `pthread_attr_setstacksize` pinned to a generous
`WORKER_STACK_SIZE` (32 MB — 4× the 8 MB that the libuv path proved
sufficient, with headroom for deeper future call chains). Falls back to
a default-stack `std::thread` only if pthreads is unavailable (`_WIN32`)
or `pthread_create` fails.

The shared_ptr lifetime model from AztecProtocol#21625 is preserved exactly — both
the worker lambda and the `BlockingCall` completion callback still
capture `self`, so this does not reintroduce the use-after-free. Only
the thread-launch mechanism changed.

## Testing

- The full bb build is too heavy to run in this session, so this is
**not yet a local end-to-end repro/fix verification** — it relies on CI
for compilation and on a macOS arm64 `aztec start --local-network` run
to confirm the crash is gone.
- The pthread/`std::function` trampoline was compiled and run standalone
under `-std=c++20 -Wall -Wextra -Werror`: the worker thread receives a
32 MB stack (`pthread_get_stacksize_np` reports `33554432`), and the
work runs and completes.
- **Requested:** verify against tonight's nightly on macOS arm64 (M3) —
the reporter's exact repro.

## Notes for reviewers

- Targets `next` (not `merge-train/barretenberg`) to match AztecProtocol#21625's base
and to make the nightly, since this is an urgent release-affecting
crash. Happy to retarget if you'd prefer it go through the merge train.
- 32 MB is a deliberate over-provision; if you'd rather mirror the libuv
path precisely we could instead size from `getrlimit(RLIMIT_STACK)`. The
fixed constant is simpler and the virtual reservation only commits pages
as touched.
- The longer-term fix is the NAPI→IPC migration (AztecProtocol#21331 / AztecProtocol#23196 /
AztecProtocol#23238), which removes this in-process worker entirely. This is a
targeted stop-gap for the shipping NAPI path.

Related: AztecProtocol#21138 (introduced the threaded model), AztecProtocol#21625 (use-after-free
fix), AztecProtocol#21629 (open alternative).

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/4bd36dc505c20254) ·
group: `slackbot`*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants