Skip to content

refactor: extract wsdb + world_state + persistent merkle into native-packages/#24311

Open
charlielye wants to merge 16 commits into
cl/ipc-6-memory-merkle-dbfrom
cl/native-packages-wsdb
Open

refactor: extract wsdb + world_state + persistent merkle into native-packages/#24311
charlielye wants to merge 16 commits into
cl/ipc-6-memory-merkle-dbfrom
cl/native-packages-wsdb

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Summary

Spike that lifts the WSDB service and the storage code only it needs out of barretenberg into a new top-level native-packages/wsdb/. After this, barretenberg keeps only the crypto + merkle generic core (and the wsdb client, which prod bb-avm-sim uses); the world-state DB server, WorldState, and the persistent/content-addressed merkle trees live in native-packages/wsdb/cpp, built standalone against prebuilt barretenberg.

Builds on #24306 (which decoupled vm2 from world_state via the in-memory MemoryMerkleDB reference).

Boundary

Stays in barretenberg (bb consumes it, so it must):

  • crypto primitives + crypto/merkle_tree/ generic core (hash, hash_path, memory_tree, response, signal, types, merkle_tree_id, hoisted indexed_leaf.hpp + tree_meta.hpp).
  • lmdblib/ — bb's generic merkle_tree/types.hpp includes lmdblib/types.hpp for DBStats, so it stays; native-packages consumes it from bb (the allowed direction).
  • the wsdb client (wsdb_ipc_client.hpp + generated client) used by the prod bb-avm-sim binary and the vm2_wsdb adapter.

Moves to native-packages/wsdb/:

  • the TS package (from top-level /wsdb/) + the wsdb_schema.jsonc wire contract (bb's client codegen is repointed at the external schema — an accepted, contained dependency inversion).
  • the wsdb server (handlers/scheduler/ipc-server/cli/main), world_state/ (wholesale), and persistent merkle (content_addressed_*, node_store/, lmdb_store/, nullifier_tree/), plus the two tree benchmarks and the MemoryMerkleDB equality test.

Build

native-packages/wsdb/cpp/CMakeLists.txt builds the moved server + storage standalone, linking barretenberg's prebuilt libbarretenberg.a + libenv.a (the bb-external archive structurally excludes the merkle/lmdblib objects, so the full archive is used for now). native-packages/wsdb/bootstrap.sh and the root Makefile/bootstrap.sh are rewired to build the cpp here instead of copying aztec-wsdb out of bb's build.

Validation

  • bb default build compiles with the moved code removed — proves the boundary is clean; client codegen reads native-packages/wsdb/wsdb_schema.jsonc.
  • No bb-side references to the moved dirs remain.
  • Standalone aztec-wsdb builds (host arch) against prebuilt bb.
  • Relocated equality test 7/7 green (full suite 157/157).

Known follow-ups (out of scope for the spike)

  • Cross-arch release build (host-arch only here; needs per-arch prebuilt bb artifacts).
  • @aztec/wsdb TS package build not exercised locally (CI will).
  • Long-term: consume bb from published release artifacts (lib + a curated header subset) rather than the in-tree build.
  • merkle_tree_audit_scope.md still lists the moved files at their old bb paths.

@charlielye charlielye changed the title spike: extract wsdb + world_state + persistent merkle into native-packages/ refactor: extract wsdb + world_state + persistent merkle into native-packages/ Jun 25, 2026
@charlielye charlielye force-pushed the cl/native-packages-wsdb branch 2 times, most recently from c5971c8 to 873f2cf Compare June 25, 2026 23:02
@charlielye charlielye force-pushed the cl/ipc-6-memory-merkle-db branch from c06723b to 815f8ac Compare June 26, 2026 13:36
@charlielye charlielye force-pushed the cl/native-packages-wsdb branch from 873f2cf to ded0386 Compare June 26, 2026 14:18
@socket-security

socket-security Bot commented Jun 27, 2026

Copy link
Copy Markdown

Same YN0028 frozen-lockfile issue as barretenberg/ts: the root workspace entry
(@aztec/aztec3-packages@workspace:.) was mis-ordered rather than at yarn 4.13's
canonical position after cacheKey. Pure relocation, no dependency changes.
…tive-packages

Lift the wsdb service, world_state engine, persistent content-addressed merkle
storage, and the lmdb tree store out of barretenberg into a top-level
native-packages/wsdb package. bb keeps only the crypto/merkle-tree generic core
(hoisting indexed_leaf.hpp and tree_meta.hpp up from their subdirs), lmdblib
(still consumed by nodejs_module and the generic types.hpp), and the wsdb IPC
client used by the AVM simulator.

The wsdb wire schema moves to native-packages/wsdb and bb's client codegen is
repointed at that external path (a deliberate, isolated build-order inversion).
The wire-conversion helpers are split: the client-safe converters (generic
merkle vocabulary only) stay in bb; the world_state-aggregate converters move
with the server.

bb default preset builds cleanly with the moved code removed.
Add native-packages/wsdb/cpp/CMakeLists.txt that codegens the server dispatch
from the local schema and links the moved world_state + persistent merkle +
server against bb's prebuilt static archives (libbarretenberg.a + libenv.a;
bb-external is unusable here since it omits the crypto_merkle_tree and lmdblib
objects). The relocated equality test (and the moved persistent-merkle /
world_state tests) build into wsdb_tests, linking MemoryMerkleDB from bb's
libvm2_sim.a against the now-local WorldState.

Rename bb's client-side wire-convert header to wsdb_wire_convert_client.hpp so
the native-packages server header (same logical path) can include it without a
self-include collision.

Repoint the bootstrap.sh and root Makefile/bootstrap wsdb targets at
native-packages/wsdb, building the cpp via the new CMakeLists instead of copying
aztec-wsdb out of the bb build.
The server pulled in bb's client wire-convert header, which dragged in bb's
generated wsdb_types.hpp alongside the server's own generated wsdb_types.hpp —
identical structs in bb::wsdb from two files, an ODR redefinition. Each side
codegens its own types, so the converters can't share one header; the server
now has a local copy bound to its own generated types.
The wsdb CMakeLists linked lmdb/msgpack/tracy headers and liblmdb.a from
barretenberg's build-internal _deps/ directories. Those FetchContent
internals are not part of bb's cached/published CI artifacts, so the build
failed in CI (only ${BB_BUILD}/lib is cached).

Fetch and build our own lmdb (liblmdb.a), msgpack-c, tracy, and gtest
headers, pinned to the exact commits/tags barretenberg uses so headers and
ABI line up with the prebuilt libbarretenberg.a. Depend on bb only for its
published static archives under ${BB_BUILD}/lib. Enable the C language so
CMAKE_C_COMPILER is populated for lmdb's ExternalProject build.
- native-packages/wsdb/cpp: force -stdlib=libc++ (compile+link) to match bb's
  archives; the system clang otherwise links libstdc++, leaving bb's std::__1::
  symbols undefined.
- yarn-project: repoint @aztec/wsdb portals from ../wsdb/ts to
  ../native-packages/wsdb/ts (the extraction moved the package but left the
  workspace reference stale) and regenerate yarn-project/yarn.lock. No third-party
  version changes; 'yarn install --immutable' verified passing.
Flatten the barretenberg/ source root and the redundant crypto/ level into
per-module dirs (src/{world_state,merkle_tree,wsdb,benchmark}). Move the wsdb
fidelity test out of vm2/simulation/lib into src/wsdb. Delete the obsolete
barretenberg_module CMakeLists. The package's own headers now use short include
paths; bb dependency headers (ecc, numeric, poseidon2, serialize, the generic
merkle headers, vm2/*) keep barretenberg/. Update CMakeLists source paths and
the wsdb server codegen --cpp-include-dir to match.
The wsdb move to native-packages/wsdb left old top-level wsdb/ts paths in
aztec-up/bootstrap.sh (hash dep, package dirs), release-image/bootstrap.sh, and
the two release-image dockerignore whitelists — breaking the release-image docker
build (portal manifest not in context) and aztec-up's npm-deploy loop. Repoint
all to native-packages/wsdb.
bb's archives are libc++, so the standalone build links -stdlib=libc++ — but it
was linking libc++ *dynamically*, so aztec-wsdb needed libc++.so.1/libc++abi.so.1
at runtime. In a minimal environment (a scaffolded user project, the aztec-up
default_scaffold test) those aren't present, so the binary failed to launch
(exit 127: 'aztec-wsdb exited before IPC connection was ready'). bb's own
aztec-wsdb static-links libc++ and is self-contained; match that with
-static-libstdc++ -static-libgcc.
Completes the native-packages spike: barretenberg now holds zero
database/storage code.

- native-packages/lmdblib: the LMDB C++ wrapper, built standalone against
  prebuilt bb (liblmdblib.a + liblmdb.a), consumed by kvdb and wsdb.
- native-packages/kvdb: the Node NAPI addon (nodejs_module.node, LMDBStore +
  msgpack_client) + a thin @aztec/kvdb TS package that ships the addon per-arch
  and resolves it at runtime.
- Decouple bb's generic merkle core from lmdb: DBStats is now a plain POD in
  crypto/merkle_tree/db_stats.hpp (no lmdb.h), so the in-memory merkle path no
  longer compiles lmdb headers.
- Remove lmdblib + nodejs_module from barretenberg (CMake, bootstrap, archive).
- Rewire wsdb to link the lmdblib package instead of bb's archive.
- yarn-project/native loads the addon from @aztec/kvdb; bb.js keeps its own copy
  of the addon for its SHM transport until that migrates to ipc-runtime.
- Wire root Makefile/bootstrap, aztec-up and release-image for the new packages.
Addresses review: native-packages should not depend on barretenberg.

- lmdblib: zero barretenberg includes/links. Uses msgpack-c directly
  (MSGPACK_DEFINE_MAP — byte-identical wire to bb's SERIALIZATION_FIELDS);
  serialise_key for field-like keys is now a template (preserves the on-disk
  byte layout) so no uint256 type is needed; owns DBStats; local format/THROW
  helpers. Builds against lmdb + msgpack-c only.
- kvdb: zero barretenberg includes/links. Split nodejs_module — only the
  LMDBStore NAPI lives here; the msgpack_client wrappers stay in barretenberg
  for bb.js's SHM transport. Vendors the small message header/dispatcher it
  needs; links only lmdblib + lmdb + node-addon-api.
- bb.js: reverted — it builds and ships its own nodejs_module.node (msgpack_client
  only) from barretenberg exactly as before. No native-packages dependency.
- DBStats is owned by lmdblib (lmdb stats). TreeDBStats and the stats-bearing
  merkle responses move to the wsdb package (only the persistent trees report
  stats); barretenberg's generic merkle core carries no stats vocabulary.
- Build wiring: lmdblib and kvdb no longer depend on bb-cpp-native; yarn-project
  depends on kvdb; bb-ts reverts to no kvdb dependency.
npm_install_deps does a clean 'yarn install --immutable' on a cache miss; the
@aztec/kvdb-<arch> optionalDependencies must resolve to the local ts/packages/*
workspaces (created by prepare_arch_packages) or the install 404s against npm.
Run prepare_arch_packages before npm_install_deps.
@charlielye charlielye force-pushed the cl/native-packages-wsdb branch from 41c615f to 58d32a8 Compare June 28, 2026 14:26
wsdb moved to native-packages, so aztec-wsdb is no longer a barretenberg target.
The amd64/arm64 linux/macos build presets still listed it, so the cross builds
(only run under the full CI profile) failed with 'ninja: error: unknown target
aztec-wsdb'.
The standalone wsdb/lmdblib/kvdb builds set no CMAKE_BUILD_TYPE, so they compiled
bb's headers (inline/template crypto + field code) without NDEBUG and without -O3,
while bb's prebuilt archive is Release (-O3 -DNDEBUG). bb headers are compiled into
both, so the NDEBUG/opt mismatch is an ODR violation that can silently diverge
inlined results (e.g. tree leaf hashing) -> the wsdb NullifierTree failing to find
a leaf pre-image by hash in e2e. Default these builds to Release to match bb.
decrement_node_reference_count passed the raw fr nodeHash to delete_value,
while write_node/read_node wrap it in FrKeyType (uint256). In bb the raw fr
went through serialise_key's explicit uint256_t overload (implicit fr->uint256
conversion), so all three agreed. The extracted lmdblib replaced that overload
with a generic serialise_key<T> template, which binds fr exactly and emits the
field's Montgomery-form bytes instead of canonical uint256 bytes. Node deletes
then targeted the wrong key, collaterally deleting retained nodes during unwind
and corrupting the tree (the l1_to_l2 cross_chain e2e regression). Wrap the key
in FrKeyType at the delete site so all node ops serialise the key identically.

Also add native-packages/.clang-format (copied from barretenberg/cpp) so the
new C++ packages format to bb's style.
@AztecBot

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/3e57af2b590ce973�3e57af2b590ce9738;;�): yarn-project/kv-store/scripts/run_test.sh src/sqlite-opfs/internal/ordered-binary-browser.test.ts (1s) (code: 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants