feat: enhance NetworkProcessor for gloas#9025
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces new metrics and logic for handling 'awaiting block' and 'awaiting payload' gossip messages, particularly for the upcoming Gloas fork. This includes renaming existing reprocess metrics to awaitingBlockGossipMessages and adding new awaitingPayloadGossipMessages metrics, along with new searchUnknownEnvelope functionality and seenEnvelope checks in the chain and fork choice. However, the review identified several critical issues: an incorrect import of a test utility file into production code, the onPayloadProcessed method not being registered as an event handler, and incorrect resetting of gossip message counters, which could lead to unbounded memory growth. Improvements were suggested to correctly decrement these counters when messages are processed or expire.
Performance Report🚀🚀 Significant benchmark improvement detected
Full benchmark results
|
…l gossip types (#9059) **Motivation** This PR is a step toward #9025 where we enhance `NetworkProcessor` for GLOAS. The network processor's "await unknown block" mechanism was originally built specifically for attestation messages. As we add support for more gossip types that depend on block availability (and eventually payload envelopes in GLOAS), the code needs to be generalized before those new types are introduced. **Description** Generalizes the network processor's unknown-block-waiting mechanism from attestation-specific to all gossip message types: - Rename `searchUnknownSlotRoot` → `searchUnknownBlock` across `INetwork`, `Network`, `NetworkProcessor`, and all call sites - Flatten the two-level `awaitingGossipsubMessagesByRootBySlot: MapDef<Slot, MapDef<RootHex, Set<...>>>` into a single `awaitingBlockByRoot: MapDef<RootHex, Set<...>>`, using a separate `unknownBlocksBySlot` map for slot-based pruning - Rename metric group `reprocessGossipAttestations` → `awaitingBlockGossipMessages`; add a `topic` label so per-topic breakdown is visible in dashboards - Add `BlockInputSource.network_processor` to distinguish messages queued by the processor itself - Update Grafana dashboard panels to use the new metric names and `{{topic}}` legend format - Changed: - we cannot track pending messages by slot by root because message slot is not the same to block slot starting gloas - based on that, expiry windows are changed from 1 slot to 3 slots, to prepare for gloas. I think this does not affect much because we should have find the block in the same slot, if not it'll never arrive. **AI Assistance Disclosure** Used Claude Code to assist with implementation. --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
552b9f8 to
7f9d42b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ced0c377c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // do not await the block, we want UnknownBlockSync to handle it. | ||
| preprocessResult = {action: PreprocessAction.PushToQueue}; |
There was a problem hiding this comment.
Keep unknown-block execution payload gossip pending
For execution_payload, the preprocessor now always sets PushToQueue even when the referenced block is unknown. In that case validation returns BLOCK_ROOT_UNKNOWN and the envelope is dropped (the new follow-up event is not consumed anywhere in beacon-node), so the message is never retried after block sync. This can leave blocks permanently missing their FULL payload variant and starve envelope-dependent gossip reprocessing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
true concern, this is a TODO for UnknownBlockSync in the next PR
will add that comment
| // if validation fails, we will NOT forward this gossip block to peers | ||
| // blockInput will then be queued inside BlockInputSync | ||
| // if the gossip block is really invalid, it will be pruned there | ||
| const blockInput = chain.seenBlockInputCache.getByBlock({ |
There was a problem hiding this comment.
we have dead code without this change, see line 193 below (to handle unknownParent event)
| @@ -182,10 +182,7 @@ export function getSlotFromSingleAttestationSerialized(data: Uint8Array): Slot | | |||
| * Extract committee index from SingleAttestation serialized bytes. | |||
There was a problem hiding this comment.
I think you should update this comment since you are changing the naming of this function.
Index no longer means committee index post-electra
lodekeeper
left a comment
There was a problem hiding this comment.
Thorough review across bug hunting, architecture, security, maintainability, and approach challenge.
Architecture: The two-round extraction + PreprocessAction enum is a good model, but the envelope recovery path is asymmetric with block recovery — events are emitted with no sync subscriber, and the NetworkProcessor is absorbing protocol-layout interpretation that will grow with each new GLOAS topic type.
Approach: Core design is sound — ePBS needs envelope dependency tracking, and search-vs-await is the right abstraction. But the duplicated queue lifecycle (block-awaiting vs envelope-awaiting) and the fragile override chain in onPendingGossipsubMessage will accumulate maintenance cost. Consider extracting a shared GossipDependencyQueue class and/or a per-topic preprocessing descriptor.
Overall: The functionality is needed and the direction is right. Key concerns are the missing envelope sync path (events fire into the void) and the unbounded envelope search state. Details inline.
| preprocessResult = {action: PreprocessAction.PushToQueue}; | ||
| break; | ||
| } | ||
| case GossipType.execution_payload_bid: { |
There was a problem hiding this comment.
I think we talked about this briefly here https://discord.com/channels/593655374469660673/1372263082415493200/1481401039801417840 I think we didn't reach a conclusion
that maybe we don't need this mechanism for bids. Maybe we shouldn't trigger block and payload search when a bid has unknown block and payload. Maybe we should just discard it.
There was a problem hiding this comment.
initially I thought we skipped handling this topic to make the code simpler
but then we discussed that we want to forward this if it's valid to help spread to other nodes https://discord.com/channels/593655374469660673/1372263082415493200/1481599538211848242
now I don't see any down side to handle this
Maybe we shouldn't trigger block and payload search when a bid has unknown block and payload. Maybe we should just discard it.
I think we should search any blocks and payloads that are not known to us, it helps our node to backward sync when we're close to head and handle unstable network (ie get all possible chain segments/forks and decide which one is canonical)
| ); | ||
| } | ||
| if (protoBlock?.executionPayloadBlockHash && protoBlock.executionPayloadBlockHash !== parentBlockHash) { | ||
| this.searchUnknownEnvelope( |
There was a problem hiding this comment.
Note that if both parent block and parent payload are missing, then this.searchUnknownEnvelope won't be fired because protoBlock is null due to missing parent block.
Is this intended? If so, then it make sense for L374 to be else if. The code here reads like it is possible to fire both searchUnknownBlock(parentRoot) and searchUnknownEnvelope(parentRoot)
There was a problem hiding this comment.
Note that if both parent block and parent payload are missing, then this.searchUnknownEnvelope won't be fired because protoBlock is null due to missing parent block.
in that case we're not sure if there is payload for the block or not so we cannot trigger search. The problem with assuming there is a payload is that in UnknownBlockSync, it assumes peers know that, if it cannot search for that payload, it'll penalize peers
if parent block is missing, here we don't search for payload, and note that this is the 1st attempt, an eager search only. Later on, at gossip handler, we also fire https://github.com/ChainSafe/lodestar/pull/9025/changes#diff-c702dde1ffa90a6d2720c7b4930bc5c08cc1f302cfd09ed9e74d80755b92da8fR194
at that time, UnknownBlockSync should be smart enough to search for payload there when there is a gap between this block and parent block
There was a problem hiding this comment.
If so, then it make sense for L374 to be else if. The code here reads like it is possible to fire both searchUnknownBlock(parentRoot) and searchUnknownEnvelope(parentRoot)
switched to else if for better typing + add more comments for the above context in 64ec026
| events.on(NetworkEvent.pendingGossipsubMessage, this.onPendingGossipsubMessage.bind(this)); | ||
| this.chain.emitter.on(routes.events.EventType.block, this.onBlockProcessed.bind(this)); | ||
| this.chain.emitter.on( | ||
| routes.events.EventType.executionPayloadAvailable, |
There was a problem hiding this comment.
I just learned the other day that execution_payload_available really only means a payload is available. It doesn't mean payload is verified and valid.
ethereum/beacon-APIs#588 (comment)
So if we listen to this topic, then we are pushing a payload that can be potentially invalid to the queue. But if we listen to execution_payload event instead, which can arrive later, can cause delay on our side. I am not sure which event is the right call here.
There was a problem hiding this comment.
switched to awaiting for execution_payload since execution_payload_available is too early
we need to wait for a validated execution_payload anyway, if not waiting messages will be removed from the pending queues in the end
see 546a109
There was a problem hiding this comment.
execution_payload is definitely the right event here and also mirrors block event, right now I don't see a use case for execution_payload_available outside of notifying the PTC
64ec026 to
546a109
Compare
nflaig
left a comment
There was a problem hiding this comment.
looks great, thanks for all the helpful comments, made reviewing much simpler
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #9025 +/- ##
============================================
+ Coverage 52.50% 52.52% +0.01%
============================================
Files 848 848
Lines 61481 61447 -34
Branches 4528 4528
============================================
- Hits 32282 32272 -10
+ Misses 29134 29110 -24
Partials 65 65 🚀 New features to boost your workflow:
|
Fulu can cache a BlockInput from data columns before the full beacon block is available. PR #9025 narrowed seenBlock() to only treat roots with a cached block as seen, which caused the network processor to start unknownBlockRoot sync for roots that were already being assembled locally from columns. Restore the broader seen-root check for BlockInput cache entries while keeping hasBlock() available for the narrower "full block present" query. Add a regression test covering a Fulu column-first cache entry.
PR #9025 moved beacon_block parent search into the network processor preprocessing path for every fork. For pre-Gloas forks, including Fulu, that is too early. A beacon block should only trigger unknown-parent recovery after normal gossip validation concludes that the block is otherwise processable but blocked on a missing parent. Running the search earlier causes ignored pre-Gloas blocks to start unnecessary unknown-block sync work. Keep the proactive beacon_block parent and envelope search Gloas-only and restore the previous pre-Gloas behavior.
Follow up to #9025 to fix attestation data index extraction, currently it seems we extract the committee index but for how we use it (post gloas), we want the `data.index` relevant code https://github.com/ChainSafe/lodestar/blob/e8407e965790276fae8a22dd7ff57358b9d785e1/packages/beacon-node/src/network/processor/index.ts#L401-L405
Follow up to #9025, there is no need to delay processing payload attestations until the payload envelope is fully imported, the gossip validation and fork choice only require beacon block to be known. This is mostly problematic because PTC members will cast their vote based on the fact that the payload and all data is available, but not that the payload is valid (as per execution layer) and fully imported. This timing discrepancy means that it's very likely we receive `payload_attestation_message` on gossip before we have fully imported the corresponding payload and currently this means we would delay re-gossiping and importing these messages for no good reason.
|
🎉 This PR is included in v1.42.0 🎉 |
Motivation
In the GLOAS fork, several gossip message types depend on the execution payload envelope being known before they can be properly validated:
beacon_attestation/beacon_aggregate_and_proofwithattIndex === 1must wait for the envelopepayload_attestation_messagewithpayloadPresent = truemust wait for the envelopedata_column_sidecarin GLOAS triggers envelope search but does not await (pushes to queue immediately)execution_payload_bidtriggers parent block/envelope search and awaitexecution_payloadtriggers block search but does not await (pushes to queue immediately)beacon_blocktriggers parent block/envelope search but does not await (pushes to queue immediately)Previously the
NetworkProcessoronly had a mechanism to await unknown blocks. This PR extends it to also await unknown execution payload envelopes for GLOAS.Description
execution_payload(envelope),payload_attestation_message,execution_payload_bid, anddata_column_sidecar(block root in GLOAS)onPendingGossipsubMessage()into two extraction rounds:extractBlockSlotRootFn): extracts slot + block root from serialized bytes — common to all topics. If the block root is unknown, always triggerssearchUnknownBlock()and defaults toAwaitBlock.data_column_sidecartriggerssearchUnknownEnvelope()but does not await (pushes to queue immediately), whilebeacon_attestationwithindex = 1overrides the round-1 result and awaits the envelope instead.awaitingMessagesByPayloadBlockRoot— a new pending-message queue that holds gossip messages until the required execution payload envelope is known, mirroring the existingawaitingMessagesByBlockRootqueuesearchUnknownEnvelope()— the envelope-side counterpart tosearchUnknownBlock(), emittingChainEvent.unknownEnvelopeBlockRootto trigger the unknown-envelope sync pathonPendingGossipsubMessage()with aPreprocessActionenum (PushToQueue | AwaitBlock | AwaitEnvelope) to cleanly route each message to the right waiting queue based on GLOAS fork rulesonPayloadEnvelopeProcessed()— called when an envelope is processed, analogous toonBlockProcessed(), to flush and reprocess messages that were awaiting that envelopeawaitingPayloadGossipMessages.*)AI Assistance Disclosure
Used Claude Code (claude-sonnet-4-6) for co-working.