Commonware + BLS#1143
Conversation
|
Update: removed |
Two plans in 2 waves for broadcast, service routing, catch-up, and P2pHandle API preservation. Plan 01 (wave 1) defines P2pMessage, ServiceRouter, and RetryQueue types. Plan 02 (wave 2) integrates the broadcast Engine into bridge loops and adds integration tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tryQueue - 12 failing test stubs define behavior contract (Wave 0 Nyquist compliance) - Added commonware-broadcast = "2026.3.0" dependency to wavs Cargo.toml - P2pMessage tests: from_submission, codec roundtrip, digest determinism, to_submission - ServiceRouter tests: empty rejects, subscribe/accept, unsubscribe, subscribed_topics - RetryQueue tests: empty, push/drain FIFO, overflow drops oldest, drain empty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r, and RetryQueue - P2pMessage struct with service_id_bytes ([u8; 32]) and payload (Vec<u8>) - P2pMessage::from_submission/to_submission for ServiceId+Submission conversion - Digestible impl produces deterministic SHA-256 digest for Engine dedup (BCAST-02) - Codec traits (Write+EncodeSize+Read) for binary encode/decode over broadcast - ServiceRouter with subscribe/unsubscribe and should_accept filtering (BCAST-05) - RetryQueue with bounded 64-item FIFO queue and oldest-drop eviction (BCAST-04) - All 12 unit tests passing (P2pMessage + ServiceRouter + RetryQueue) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Created 02-01-SUMMARY.md with execution results - Updated STATE.md: Phase 2 position, decisions, metrics - Updated ROADMAP.md: Plan 02-01 checked off, Phase 2 in progress - Updated REQUIREMENTS.md: BCAST-02, BCAST-03, BCAST-04, BCAST-05 complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… architecture - Register two channels per network mode: channel 0 (Engine) and channel 1 (direct forwarding) - Create broadcast Engine with deque_size=128 (CATCH-02) in both lookup and discovery modes - Spawn inbound bridge tasks bridging commonware Receiver to tokio mpsc for select! compatibility - Add seen_digests HashSet for application-level message deduplication (BCAST-02) - Add service_router.should_accept() filtering on inbound messages (BCAST-05) - Forward decoded P2pMessages to Aggregator as AggregatorCommand::Receive - Pass aggregator_tx through P2pHandle::new -> spawn_commonware_runtime -> bridge loops - Document CATCH-01 push-based recovery behavior in Engine setup comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… and subscribe logic - Publish handler: broadcast via Engine (CATCH-01) + encode P2pMessage to bytes via Encode::encode() for direct_sender.send() - Publish handler: check ack_rx for zero recipients -> queue for retry (BCAST-04) - Publish handler: flush retry queue when peers available, re-broadcast queued messages - Subscribe/Unsubscribe handlers: update ServiceRouter for service filtering - GetStatus handler: return subscribed_topics from ServiceRouter - BlockPeer handler: block via Oracle (carried from Phase 1) - Remove handle_p2p_command_stub function -- all handlers now implemented inline - Both lookup and discovery bridge loops have identical handler implementations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etry, catch-up, and API - test_broadcast_to_all_peers (BCAST-01): verify message delivery between connected peers - test_service_filtering (BCAST-05): verify only subscribed-service messages forwarded - test_deduplication_by_digest (BCAST-02): verify exactly-once delivery via seen_digests - test_retry_queue_on_no_peers (BCAST-04): verify publish succeeds with no peers (queued) - test_catchup_after_reconnect (CATCH-01): verify subsequent broadcast triggers delivery after reconnect - test_cache_bounded_deque_size (CATCH-02): verify Engine starts with bounded deque, handles 200+ publishes - test_p2p_handle_api_preserved (INT-01): verify all P2pHandle methods work without errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create 02-02-SUMMARY.md documenting two-channel architecture, all handlers, 7 tests - Update STATE.md: advance to plan 2/2 complete, add decisions, update metrics - Update ROADMAP.md: mark phase 02 as complete (2/2 plans done) - Update REQUIREMENTS.md: mark BCAST-01, CATCH-01, CATCH-02, INT-01 complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace libp2p terminology (mDNS, Kademlia, DHT, multiaddr) with commonware concepts - Document all P2pConfig serde fields: peer_addresses, bootstrappers, authorized_peers, max_message_size, deque_size - Add local dev preset showing 2-operator localhost setup with wavs-cli identity command - Reorder options to lead with Disabled (default) for clarity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 03-02-SUMMARY.md with execution results - Update STATE.md with phase 3 position and decisions - Mark CFG-01 and CFG-03 requirements complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove libp2p-era fields (external_addresses, topic_peer_counts) from P2pStatus - Rename subscribed_topics to subscribed_services in P2pStatus and ServiceRouter - Add optional max_message_size and deque_size tuning fields to P2pConfig Local/Remote - Add accessor methods max_message_size() and deque_size() with defaults - Replace hardcoded 65536/128 values with configurable parameters in run_lookup_network and run_discovery_network - Update both GetStatus handlers to use new field names Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create p2p_config_tests.rs with p2p_config_serde and p2p_config_defaults stubs (CFG-01, CFG-02) - Create p2p_status_tests.rs with p2p_status_format stub (OBS-02) - Register test modules in aggregator.rs - Add toml dev-dependency for TOML deserialization tests - Fix pre-existing compilation errors: update P2pStatus field refs and run_lookup_network signature - Fix subscribed_topics -> subscribed_services in broadcast integration tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…roadcast - Add test_status_connected_peers_after_broadcast stub in p2p_broadcast_tests.rs - Fix pre-existing compilation: add max_message_size/deque_size fields to all P2pConfig::Local constructors - Wave 0 stub verifies GetStatus works before broadcast exchange Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create 03-00-SUMMARY.md with execution results - Update STATE.md with plan progress (7/9 plans, 78%) - Update ROADMAP.md and REQUIREMENTS.md - Fix pre-existing P2pConfig field additions in connectivity and identity test files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create 03-01-SUMMARY.md with execution results - Update STATE.md with plan progress (8/9 plans, 89%) - Update ROADMAP.md phase 03 progress Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Arc<RwLock<Vec<String>>> connected_peers_tracker to both lookup and discovery bridge loops - Update tracker from broadcast acknowledgment recipients (sender side) - Track inbound message senders as connected peers (receiver side) - Replace hardcoded connected_peers: 0 / peer_ids: vec![] with real tracker data in GetStatus Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Verify GetStatus returns non-zero connected_peers after broadcast exchange - Verify peer_ids contain hex-encoded Ed25519 public keys (64 hex chars) - Verify node B's pubkey appears in node A's peer list (broadcast ack tracking) - Verify node A's pubkey appears in node B's peer list (inbound message tracking) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create 03-03-SUMMARY.md documenting peer tracking implementation - Update STATE.md with progress (100%), metrics, decisions - Update ROADMAP.md with phase 03 plan progress Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add PeerSubscriptionMap::tracked_peers() for heartbeat prune diffing (SUB-03) - Modify get_recipients() to accept connected_peers set and include un-announced peers for backward compatibility (COMPAT-03) - has_announced() now called in production code (dead_code warning resolved) - Update all ~24 existing test call sites with new signature - Add 8 new unit tests for tracked_peers, heartbeat prune, and COMPAT-03 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…loops - Add connected_peer_set state variable to both bridge loops (lookup + discovery) - Update connected_peer_set from broadcast ack and heartbeat ack in both loops - Add SUB-03 heartbeat prune block: diff tracked_peers vs connected, remove_peer + known_peers.remove for departed - Update all 6 production get_recipients() calls to pass &connected_peer_set - Prune block runs BEFORE retry drain so get_recipients uses pruned state - Both loops have character-for-character identical changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUB-03: heartbeat-driven peer pruning in both bridge loops - COMPAT-03: backward-compatible recipient resolution for pre-v1.3 nodes - 8 new unit tests, all 45 p2p tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er Phase 18 gap closure
dd7fe77 to
887a378
Compare
aggregator_tests::send_to_self and submission_tests::collect_messages_with_wait called the two-arg form. The function now takes a SignatureAlgorithm to select secp256k1 vs BLS12-381 per service (matching dispatcher::add_service_to_managers), so the tests were the only static callers left over. Pass Secp256k1 to match mock_service's evm_default signature kind. Fixes "Test Suite (EigenLayer)" and "Test Suite (POA)" compile failures.
Fixes "Lint" check on PR #1143. No behavior changes — whitespace, line wrapping, and import ordering only. Verified with cargo fmt --all -- --check.
aggregator/p2p.rs:
- doc_lazy_continuation: blank-line-separate the final doc line above run_lookup_network
- cloned_ref_to_slice_refs (x3): parse_authorized_peers(std::slice::from_ref(...))
in the lookup-mode authorized-peer loop and both BlockPeer command handlers
- let_underscore_future (x4): the four retry-drain sites awaited mailbox.broadcast()
which yields a oneshot::Receiver<Vec<P>> (itself a Future). The intent has always
been fire-and-forget on the ack, so wrap in drop() to make that explicit
- too_many_arguments (x2): #[allow] on run_lookup_network and run_discovery_network;
threading 9 commonware/runtime params through these spawns is intrinsic to the
P2P bring-up shape
aggregator/submit.rs:
- clone_on_copy (x2): drop .clone() on envelope.eventId (FixedBytes<20>) and
envelope.ordering (FixedBytes<12>) when building BlsServiceManagerEnvelope —
both are Copy, the field accesses are auto-copies; envelope stays alive for
the subsequent send_bls_envelope_signatures call.
The bls-multi-operator e2e test deploys via PoaBlsMiddleware, which calls
forge against a `contracts/poa-middleware/` checkout one directory ABOVE
the WAVS workspace (i.e. resolve_poa_middleware_path looks at
workspace.parent()/contracts/poa-middleware). That path exists locally
because WAVS lives inside the Layer monorepo as a sibling of the
poa-middleware submodule, but standalone CI has nothing there and panics
during service-manager deployment.
Add a setup step that clones Lay3rLabs/poa-middleware (public, bls branch)
into ${{ runner.workspace }}/contracts/poa-middleware so the resolver finds
it. Shallow + branch-pinned; bump the branch or pin to a SHA here when
WAVS needs newer contracts.
Also drop the Test Suite (EigenLayer) job per direction; only POA is run
on PR CI now. test-suite aggregator updated to gate on test-poa only.
Clippy (rust 1.91.1, -D warnings):
- app/src-tauri/src/commands.rs:1312,1331 unnecessary_to_owned —
Credential implements AsRef<str> (and Deref<Target=str>), so pass
mnemonic.as_ref() directly instead of &mnemonic.to_string()
- packages/layer-tests/src/e2e/handles/evm.rs:59 unused_mut — the
LameAnvil args vec is built once and never mutated
These didn't surface in the previous CI run because cargo clippy bailed
early on the wavs-lib errors fixed in 717f1c5 before reaching wavs-app
and layer-tests.
CI: npm-install poa-middleware contracts after clone. foundry.toml
remaps @openzeppelin/ to node_modules/, so without `npm ci` forge fails
to find @openzeppelin/contracts/interfaces/IERC1271.sol when building
the bls profile. Uses the lockfile (npm ci) for reproducibility; --no-audit
--no-fund keeps CI output clean.
The Tauri commands + React UX layer for BLS / P2P dashboard / unified activity has been moved to a stacked follow-up PR so this PR can stay focused on the node-side Commonware + BLS work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
First-pass + second-pass security/correctness audit completed. I did not find a clear exploitable Critical/High issue in the reviewed areas, and targeted validation passed after fixing the local OpenSSL pkg-config environment:
Non-blocking items I recommend maintainers/reviewers confirm before final approval:
Recommendation from this audit: comment only; do not request changes based on these findings alone, but I would avoid final approval until the BLS end-to-end compatibility coverage/expectation is satisfactory to maintainers. |
layertau
left a comment
There was a problem hiding this comment.
Thanks for the substantial Commonware/BLS migration. The PR is well-scoped for a large protocol change, and I appreciate the new P2P coverage plus the explicit BLS e2e matrix.
I found one packaging/build issue that should be fixed before merge:
Important / blocking for published wavs-types feature consumers
packages/types/src/signing/signer.rs now uses tokio::task::spawn_blocking in the BLS signer path, but packages/types/Cargo.toml enables the optional Tokio dependency with only features = ["sync"]. In isolation, the advertised signer + bls/full feature set does not compile because spawn_blocking is gated behind Tokio's rt feature.
Repro:
cargo check -p wavs-types --no-default-features --features 'signer,bls'Error:
error[E0425]: cannot find function `spawn_blocking` in module `tokio::task`
--> packages/types/src/signing/signer.rs:184:53
note: the item is gated behind the `rt` feature
Recommended fix: add rt to the optional Tokio feature list for wavs-types (or avoid spawn_blocking in this crate), e.g.
tokio = { version = "1", optional = true, default-features = false, features = ["sync", "rt"] }Please also add/ensure a CI check covers wavs-types --no-default-features --features full or at least signer,bls so this does not regress for downstream users.
|
@layertau can you open up a new PR for this branch that fixes up the |
|
Opened the requested stacked fix PR: #1146 It adds Tokio
|
Fix wavs-types BLS feature build
|
CI triage for failed Basic Checks run 25866730788: The failure is in the POA workspace test run, specifically:
Failing test:
Observed assertion: There is also a concurrent Tokio/commonware shutdown panic in the same test binary: Assessment: this looks like a P2P test/runtime lifecycle reliability issue rather than the |
@layertau is that an issue introduced in the current PR or could that be existing previously already? |
|
@ueco-jb This appears introduced by the current PR. Evidence:
It may be exposing a broader async-test lifecycle pattern, but the concrete CI failure is from new Commonware P2P test/code paths added in this PR and should be treated as blocking for this PR. |
Summary
Replaces libp2p with commonware-p2p for the WAVS aggregator's
peer-to-peer layer, and adds BLS12-381 alongside secp256k1 as a per-service signature option.
App-side UX (Tauri commands + React pages exposing BLS keys, P2P dashboard, unified activity,
BLS-aware service deploy) is split out into a follow-up PR stacked on this one — keeps this review
focused on node-internal protocol work and lets the UI keep moving on
mainwithout conflicts.What's new
Commonware P2P (replaces libp2p)
packages/wavs/src/subsystems/aggregator/p2p.rsis essentially a rewrite aroundcommonware-p2p2026.3.0:
P2pConfig::Local): static peer addresses for local dev / testing.P2pConfig::Remote): bootstrapper-based discovery for production.catch-up; channel 1 is read by an inbound bridge task for real-time forwarding to the aggregator.
signing_mnemonic. Address format:<hex_ed25519_pubkey>@<host>:<port>.delivery only sends to subscribed peers, with hello-on-first-contact handshakes and
heartbeat-driven peer state pruning.
deque_size, default 128) replaces theold request/response protocol.
Dependencies dropped:
libp2p(and the gossipsub/kad/mdns/autonat/identify/request-responsestack). Added:
commonware-{p2p,broadcast,cryptography,math,runtime,utils,codec}andrand_chacha/rand_core/bip39.BLS12-381 signing
SignatureAlgorithmenum (packages/types/src/signing.rs) —Secp256k1|Bls12381,selectable per service.
WavsCryptoSigner(packages/types/src/signing/signer.rs) wraps either an alloyPrivateKeySigneror acommonware_cryptography::bls12381::PrivateKey.packages/utils/src/bls_signing.rs— deterministic BLS key derivation from BIP39 mnemonic +HD index, G1 pubkey (128 bytes) / G2 signature (256 bytes) serialization, proof-of-possession
signing.
packages/utils/src/test_utils/middleware/evm/middleware_poa_bls.rs) —drives the standalone
poa-middlewarerepo's BLS variant viaforgefor e2e tests.IPOAStakeRegistry,IWavsServiceHandler,IWavsServiceManagerBLSvariants (
packages/types/src/contracts/solidity/abi/bls/).SubmissionManager::add_service_keyis now algorithm-aware; aggregator submit constructs eitherECDSA
ServiceManagerEnvelopeorBlsServiceManagerEnvelope.Gated behind
blsfeature onpackages/wavs(default-on for the node; BLS-disabled builds returna clear error from
add_service_key).Tests
packages/wavs/tests/p2p_{broadcast,connectivity,identity}_tests.rsandpackages/wavs/src/subsystems/aggregator/p2p_{config,status}_tests.rscovering broadcast-to-all,service filtering, retry-on-no-peers, dedup by digest, catchup-after-reconnect, bounded cache,
peer-connected status, deterministic key derivation, mode-specific config parsing.
bls-multi-operatormatrix test inlayer-tests/layer-tests.toml— forcesEvmMiddlewareType::PoaBlsregardless of config to exercise the BLS path end-to-end.CI
test-eigenlayerjob is gone (no longer needed).Lay3rLabs/poa-middleware(blsbranch) into${{ runner.workspace }}/contracts/poa-middlewareand runsnpm cisoforgecan resolve theOpenZeppelin remap.
-D warningslints addressed.What's removed
PROTOCOL_VERSION = "/wavs/1.0.0"/CATCHUP_PROTOCOL = "/wavs/catchup/1.0.0"constants.peer_addressesin lookup mode)./ip4/.../tcp/.../p2p/...) — replaced bypubkey@host:port.Configuration migration
wavs.toml[wavs.p2p]section reshaped:p2p = { local = { listen_port = 9000 } }(mDNS auto-discovery)p2p = { local = { listen_port = 9000, peer_addresses = ["<pubkey>@127.0.0.1:9001"], authorized_peers = ["<pubkey>"] } }p2p = { remote = { listen_port = 9000, bootstrap_nodes = ["/ip4/1.2.3.4/tcp/9000/p2p/12D3KooW..."] } }p2p = { remote = { listen_port = 9000, bootstrappers = ["<pubkey>@1.2.3.4:9000"] } }Existing options like
max_retry_duration_secs,submission_ttl_secs,cleanup_interval_secsetc. are replaced by the broadcast Engine's simpler model (
max_message_size,deque_size). Seethe inline docs in
wavs.tomlfor the full list.Test plan
bls-multi-operatormatrix test).p2p = { local = ... }and confirm broadcast + catch-upacross restart.
poa-middleware(bls branch) deployment.Follow-up
The frontend layer (Tauri commands + new P2P / settings / activity pages + BLS-aware service
deploy) ships as a stacked PR on
bls-commonware-app.