Skip to content

feat: devp2p peer jailing#2283

Merged
vbhattaccmu merged 7 commits into
developfrom
feat/devp2p-peer-jailing
Jun 25, 2026
Merged

feat: devp2p peer jailing#2283
vbhattaccmu merged 7 commits into
developfrom
feat/devp2p-peer-jailing

Conversation

@vbhattaccmu

Copy link
Copy Markdown
Member

Summary

On develop the downloader drops the offending peer on every classified sync failure and then resynchronises. That is right for genuinely bad data, but it also disconnects peers for transient problems (timeouts, stalls, a peer that is simply behind), which causes needless churn, and it does not actually help against a peer serving a pruned-sidechain "ghost-state" chain: once dropped it reconnects and immediately re-offers the same invalid chain, so the node churns on it. It also drops a peer for a checkpoint/milestone (whitelist) mismatch, even though that disagreement is frequently transient and the peer is otherwise useful.

This PR replaces the blanket drop with a graded response. classifySyncFailure maps each failure sentinel to a peerFailureReason, responseDecision turns that reason into an action (none, soft backoff, local jail, drop, a whitelist-mismatch escalation, or a ghost-state escalation), and respondToPeer applies it through jailPeer / backoffPeer / escalateMismatch / escalateGhostState / dropPeerForResponse. Genuinely bad data is still dropped. Transient failures get a short local backoff instead of a disconnect, and a peer that accumulates four soft strikes inside a ten minute window is escalated to the local jail. A whitelist mismatch is treated as its own graded class: a short backoff on the first strike, a local jail on the next, and a drop only after the mismatch persists within a thirty minute window, so a node briefly behind on checkpoints/milestones does not evict good peers while a genuinely divergent peer is still removed.

The pruned-sidechain ghost-state case is also graded rather than dropped on sight: the first occurrence is locally jailed for five minutes, and only a second occurrence inside a thirty minute window escalates to a drop. This gives an honest peer caught in a legitimate pruned-sidechain reorg the benefit of the doubt while still removing a peer that keeps replaying the attack. Every terminal drop (invalid chain, bad peer, invalid ancestor, the fourth whitelist mismatch, the second ghost-state) now also records a thirty minute local bench, so a dropped peer that reconnects with the same node id stays benched instead of immediately looping back into sync. Conversely, a sync that fails only because the node itself cancelled or is shutting down is never charged against the peer.

Failure class Trigger sentinel(s) develop This PR
Pruned-sidechain ghost-state errInvalidChain wrapping "sidechain ghost-state attack" drop local jail 5 min on the 1st occurrence, drop on the 2nd within a 30 min window; jail and bench persist across reconnects
Invalid chain errInvalidChain drop drop + 30 min bench (persists across reconnects)
Bad peer errBadPeer drop drop + 30 min bench
Invalid ancestor errInvalidAncestor drop drop + 30 min bench
Whitelist mismatch (checkpoint/milestone) whitelist.ErrMismatch drop graded within 30 min window: soft backoff 30s (1st strike), local jail 5 min (2nd–3rd), drop + 30 min bench (4th)
Timeout errTimeout drop soft backoff, 30s
Stalling errStallingPeer drop soft backoff, 30s
Unsynced peer errUnsyncedPeer drop soft backoff, 30s
Empty header set errEmptyHeaderSet drop soft backoff, 30s
Too old errTooOld drop soft backoff, 30s
Stale in-flight request past grace gradeStalePeer (errStallingPeer) drop soft backoff 30s, graded once after a 2 min grace; a peer that disconnects with a stale request is graded too
Concurrent-fetch master timeout errTimeout (master peer) drop aborts the sync cycle, then graded at the top level as a transient soft backoff
Repeated soft failures 4 soft strikes within 10 min drop (each time) escalates to local jail, 5 min
Sync canceled / node shutdown errCanceled / errCancelContentProcessing / errTerminated (bare or wrapped) drop early return, no penalty
Peers unavailable errPeersUnavailable drop none, no action
Already jailed errPeerBackedOff n/a early return, no drop, no re-grade
Unclassified (errInvalidBody, errInvalidReceipt, …) no-op no-op

Soft backoff, jail, the whitelist-mismatch ladder, and the ghost-state ladder share one mechanism: a per-connection backoff plus a jailed map keyed by peer id (benchPeer / recordJail), with dropPeerForResponse for the terminal drop. The four windowed strike counters are independent: recordSoftFailure, recordMismatch, and recordGhostState each keep their own per-peer count and window, so the classes escalate independently. The differences are duration and thresholds: four soft strikes inside the ten minute window escalate to a five minute jail; a whitelist mismatch jails on the second strike and drops on the fourth inside a thirty minute window; a ghost-state jails on the first occurrence and drops on the second inside a thirty minute window. dropPeerForResponse benches the peer for thirty minutes (peerDropBackoff) before disconnecting, so a terminal drop survives a same-id reconnect rather than resetting. All four per-peer maps (softStrikes, mismatchStrikes, ghostStrikes, jailed) are hard-capped at maxStrikeEntries with oldest-entry eviction, so peer-keyed memory stays bounded under connection churn.

Peer selection in the chain syncer (peerWithHighestTD / nextSyncOp), the skeleton header syncer, and the concurrent fetcher skips a backed-off peer and arms a wake timer for when the backoff expires, so there is no busy-poll and no missed retry. The backoff is re-applied to a reconnecting peer in Register and its map entry is pruned once expired, so a peer can neither dodge an active jail by reconnecting nor stay jailed forever. In the concurrent fetcher a timed-out request is parked as stale and graded exactly once by gradeStalePeer after a two minute grace; a peer that disconnects while a stale request is outstanding is graded on the way out so it cannot evade the strike, and a master-peer timeout aborts the cycle (and is then graded as a transient backoff at the top level). errPeerBackedOff is returned when a sync is attempted against an already backed-off peer and is treated as a clean early return, and a sync that ends in a cancellation sentinel (errCanceled, errCancelContentProcessing, errTerminated, bare or wrapped) is likewise returned without charging the peer.

Executed tests

Classifier, decision, routing, and the cancellation guard:

  1. TestClassifySyncFailureReasons, TestPeerResponseDecisionActions, TestRespondToPeerDropRoutesToDropPeer, TestIsSyncCancellation
  2. TestHandleSyncFailureClassified, TestHandleSyncFailureUnclassified, TestHandleSyncFailureSkipsUnknownPeer, TestHandleSyncFailureTimeoutBacksOff, TestDropPeerForResponseWithoutDropper, TestLiveOrCapturedPrefersRegisteredPeer
  3. TestClassifyPeerStallerBacksOff, TestBackoffEscalatesToJailAfterRepeatStrikes, TestRecordSoftFailureDecays, TestRecordSoftFailureIsolatesPeers, TestRecordSoftFailureWindowBoundary

Pruned-sidechain ghost-state (jail then drop):

  1. TestPrunedSidechainJailsPeer, TestPrunedSidechainEscalatesToDrop
  2. TestRecordGhostStateDecays, TestRecordGhostStateIsolatesPeers, TestRecordGhostStateWindowBoundary

Whitelist mismatch escalation:

  1. TestRecordMismatchDecays, TestRecordMismatchIsolatesPeers, TestRecordMismatchWindowBoundary
  2. TestWhitelistMismatchDoesNotDropOnFirstStrike, TestWhitelistMismatchEscalatesBackoffJailDrop

Jail/backoff persistence, durable drop, and bounded maps:

  1. TestJailSurvivesReconnect, TestRecordJailPrunesExpiredEntries, TestRecordJailPrunesExpiredOnInterval, TestRecordJailPropagatesToLivePeer, TestRecordJailKeepsLongerExpiry, TestPersistentBackoffPrunesExpiredEntries, TestPeerBackoffStateTransitions
  2. TestDropForResponseBenchesPeer, TestStrikeMapsAreBounded, TestStrikeEvictionRemovesOldest, TestJailEvictionRemovesSoonestExpiry

Backoff-aware selection, wake timers, and concurrent-fetch grading:

  1. TestQueueAcceptsPeer, TestCollectIdlePeersSurfacesBackoff, TestCollectIdlePeersAllStalePastGraceUnblocksExit, TestCollectIdlePeersWithinGraceKeepsWaiting, TestArmBackoffTimer, TestClassifyPeerBusyPeerNotIdle, TestFetchStallError
  2. TestGradeStalePeerClearsEntryAndStrikesOnce, TestConcurrentFetchReceipts_BackedOffPeer, TestConcurrentFetchGradesDepartedStalePeer, TestConcurrentFetchMasterTimeoutAborts
  3. TestEarlierBackoff, TestSkeletonAssignTasksReportsBackoff, TestSkeletonSyncWakesAfterBackoff, TestSkeletonSyncBacksOffOnTimeout
  4. TestChainSyncerNextSyncOpStates, TestChainSyncerNextSyncOpSkipsBackedOffPeer, TestChainSyncerLoopRetriesBackedOffPeer, TestResettableTimer, TestChainSyncerOnSyncDone, TestChainSyncerShutdownReturnsWithoutPendingSync, TestPeerWithHighestTDSkipsBackedOffPeers

End-to-end:

  1. TestFakeEthPeerPrunedSidechainIsLocallyJailed jails a fake eth peer on a real ghost-state mismatch and asserts it stays out of selection on reconnect.

go build ./eth/... and go vet ./eth/ ./eth/downloader/ are clean.
go test -race ./eth/ ./eth/downloader/ is clean.

Rollout notes

  1. Not consensus-affecting. Changes only how the local node reacts to its own sync peers (backoff, jail, or drop); does not alter block validation, fork choice, state, or any wire format.
  2. Fully backwards-compatible. No protocol or message changes, no new config or genesis fields, no coordinated upgrade.
  3. Operator-facing logs and metrics. New warn logs: "Downloader: backing off peer", "Downloader: locally jailing peer", "Downloader: escalating repeated soft failures to local jail"; whitelist path: "Downloader: backing off peer after whitelist mismatch", "Downloader: escalating repeated whitelist mismatch to local jail", "Downloader: dropping peer after persistent whitelist mismatch"; ghost-state path: "Downloader: jailing peer for sidechain ghost-state attack", "Downloader: dropping peer after repeated sidechain ghost-state attacks"; and the existing "Synchronisation failed, dropping peer" now carries a reason field. Four new Prometheus meters: eth/downloader/peer/response/backoff, eth/downloader/peer/response/jail, eth/downloader/peer/response/drop, and eth/downloader/peer/response/mismatch. Behaviourally: a transient failure benches a peer for 30s instead of disconnecting it; a checkpoint/milestone mismatch is backed off and only dropped after persistent disagreement; the ghost-state case is jailed for five minutes and dropped only on a repeat; every drop now keeps the peer benched for thirty minutes across reconnects; and a sync cancelled by node shutdown no longer penalizes the master peer.
  4. Tunable constants, no protocol change: peerSoftBackoff (30s), peerJailBackoff (5 min), peerDropBackoff (30 min), softFailureJailThreshold (4), softFailureWindow (10 min), whitelistMismatchWindow (30 min), whitelistMismatchJailThreshold (2), whitelistMismatchDropThreshold (4), prunedSidechainWindow (30 min), prunedSidechainDropThreshold (2), and maxStrikeEntries (per-peer map cap).

vbhattaccmu and others added 3 commits June 22, 2026 04:04
…eer jailing)

Replace the blanket peer-drop on sync failures with a graded, locally
enforced response.

Core grading:
- Drop peers that serve objectively invalid or malformed data (invalid
  chain, bad-peer, invalid ancestor).
- Soft-backoff peers that are merely slow, time out, or stall, or where
  malice cannot be proven; classification unwraps the error chain so a
  timeout wrapped in a bad-data sentinel is treated as transient, while a
  proven-bad-data sentinel is never downgraded by a stray deadline string.
- Treat whitelist (checkpoint/milestone) mismatches as a distinct graded
  class: short backoff on first occurrence, local jail on repeat, and drop
  only after persistent mismatch within a 30-minute window.
- Escalate repeated soft failures (4 within a 10-minute window) to a local
  jail, and persist backoffs across reconnects so the penalty cannot be
  reset by reconnecting under the same id.
- Bench stalling peers past a grace period instead of dropping them, and
  keep sync live when every peer is stalled by unblocking the fetch loop
  via errPeerBackedOff.

Robustness and hardening:
- A pruned-sidechain ghost-state attack escalates (jail first, drop on a
  second within 30m) instead of jailing forever.
- concurrentFetch timeouts route through the grading funnel exactly once
  (2-minute grace, no double strike); a master timeout aborts the cycle.
- Every terminal drop records a 30-minute durable bench that survives
  reconnect, including the skeleton beacon-sync invalid-headers drop, and
  even when the offending peer has already departed (recordJailByID).
- Per-peer strike and jail maps are hard-capped with oldest-strike /
  soonest-expiry eviction to bound peer-keyed allocation.
- Non-peer-fault conditions never penalize the peer: node-shutdown
  cancellations (errCanceled / errCancelContentProcessing / errTerminated /
  errCancelStateFetch / snap.ErrCancelled) return early, benign swarm
  sentinels (errPeersUnavailable, errNoPeers) classify as no-action, and a
  peer that simply disconnects mid-sync (errDisconnected) is graded as a soft
  backoff rather than mis-classified as a bad-peer drop.
- Cross-queue soft strikes are deduped via an atomic backoffForClaim, so a
  simultaneous multi-queue stall records one strike; skeleton header
  timeouts grade through the same backoff and mark the soft-backoff and jail
  meters. A peer benched by a skeleton timeout is requeued into the idle set
  so the scheduler arms its wake-up timer, and a timeout whose request was
  already reverted by a peer-leave is skipped via req.stale so a departed
  peer's same-id reconnect is never benched by a late timer.
- peerWithHighestTD iterates the peer set under RLock to avoid a hot-path
  allocation; nextSyncOp arms the retry timer when no eligible peer is
  available and for the backoff expiry of benched higher-TD peers while in
  sync with a lower-TD one, and PeerBackoff consults the persisted jail map
  first to keep the peerSet/peerConnection lock order explicit. An errPeersUnavailable
  sync result (e.g. snap sync with only eth/68 peers) arms a short chainSyncer
  cooldown, cleared on any peer join/leave, so the loop does not spin in
  continuous sync setup/teardown without penalizing an otherwise-fine peer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 22, 2026 09:00

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@vbhattaccmu

Copy link
Copy Markdown
Member Author

@claude review

@vbhattaccmu vbhattaccmu changed the title Feat/devp2p peer jailing feat: devp2p peer jailing Jun 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a graded peer-penalty mechanism for devp2p sync failures (soft backoff, local jail, and durable benching across reconnects) to reduce churn on transient issues while still escalating for persistent whitelist mismatches and pruned-sidechain “ghost-state” behavior. The changes span downloader peer response logic, backoff-aware peer selection/scheduling, and a large test suite (unit + p2p-level).

Changes:

  • Add a peer failure classifier + response decision engine (backoff/jail/drop with strike windows and persistent benching across reconnects).
  • Make sync scheduling and fetchers backoff-aware (skip benched peers and arm wake timers instead of busy retrying / disconnecting).
  • Add extensive unit/integration-style coverage (including an end-to-end p2p fixture for the ghost-state case) plus additional logging/metrics.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
eth/sync.go Chain sync loop now supports retry waits and peers-unavailable cooldown handling.
eth/sync_test.go Adds chainSyncer state/cooldown/backoff scheduling tests and helper setup.
eth/protocols/eth/peer.go Initializes td to avoid nil TD usage.
eth/protocols/eth/dispatcher.go Exports ErrDisconnected for downstream classification/backoff decisions.
eth/peerset.go Peer selection can skip backed-off peers and returns a retry hint.
eth/peerset_test.go Tests for backoff-aware highest-TD selection.
eth/peer_jail_p2p_test.go End-to-end p2p test fixture validating ghost-state jailing behavior.
eth/fetcher/tx_fetcher.go Enriches warn logs with origin/direct metadata.
eth/downloader/skeleton.go Removes random request IDs, adds backoff wake timers, and backs off (vs hard-drop) on timeouts; benches invalid skeleton header senders.
eth/downloader/skeleton_test.go Adds coverage for backoff timers, timeout backoff behavior, and invalid-header benching.
eth/downloader/peer.go Introduces peer backoff state + persistent jail/strike tracking with bounded maps.
eth/downloader/peer_response.go New failure classification + graded response routing (soft strikes, mismatch ladder, ghost-state ladder, durable benching).
eth/downloader/peer_response_test.go Comprehensive tests for classification/decision/escalation, decay windows, bounded maps, and reconnect persistence.
eth/downloader/metrics.go Adds meters for backoff/jail/drop/mismatch responses.
eth/downloader/bor_fetchers.go Ensures header fetch respects downloader cancellation.
eth/downloader/bor_fetchers_concurrent.go Reworks concurrent fetch to grade stalling peers via backoff/jail and to wake on backoff expiry.
eth/downloader/bor_fetchers_concurrent_test.go Updates/adds tests for backed-off peers, departed stale grading, and master timeout abort behavior.
eth/downloader/bor_fetch_helpers_test.go Adds focused tests for queue acceptance, idle peer collection, wake timers, and stall grading helpers.
eth/downloader/bor_downloader.go Integrates peer-response handling into LegacySync, adds PeerBackoff helpers, improves error wrapping and bad-block reporting.
eth/downloader/bor_downloader_test.go Updates attacker-drop expectations and adds tests for backoff helpers and ErrPeerBackedOff behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread eth/downloader/bor_downloader.go Outdated
Comment thread eth/downloader/bor_downloader.go

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large, design-level change to peer/sync behavior across ~20 files — worth a human reviewer eyeballing the tuning constants, the new error surface (ErrPeerBackedOff, ErrPeersUnavailable, ErrDisconnected exported), and the cross-component state machine, even though no bugs were flagged.

Extended reasoning...

Overview

This PR replaces the existing "drop peer on any classified sync failure" behavior with a graded response system spread across the downloader, skeleton syncer, concurrent fetcher, chain syncer, and peerset. It touches ~20 files, exports three previously-unexported error sentinels (ErrPeersUnavailable, ErrPeerBackedOff, ErrDisconnected), adds four new per-peer maps in peerSet (soft strikes, mismatch strikes, ghost strikes, jailed) with windowed counters and eviction, and introduces a peer-jail mechanism that persists across reconnects.

Security risks

No direct auth/crypto/permission surface. The main risk surface is denial-of-service resilience: the change makes the node less aggressive about dropping peers in some cases (transient failures get a 30s soft backoff instead of a disconnect) and more lenient on a first ghost-state mismatch (5 min jail rather than immediate drop). The threshold tuning (softFailureJailThreshold=4, whitelistMismatchDropThreshold=4, prunedSidechainDropThreshold=2, maxStrikeEntries=4096) determines whether a malicious peer can keep re-offering invalid chains within a window before being terminally dropped. A reviewer with networking context should sanity-check these against expected mainnet behavior.

Level of scrutiny

High. This is production-critical sync/networking logic that runs on every node, controls how the node reacts to its peers, and changes long-standing default behavior. It is not consensus-affecting per the description, but it changes how the local node interacts with the devp2p layer, and a tuning miscalibration could have operational consequences (e.g. peer churn under partial network conditions, or stickier jails than intended).

Other factors

The PR ships extensive test coverage — classifier, decision, routing, windowed counter decay/isolation/boundary tests, jail persistence, bounded-map eviction, backoff-aware peer selection, skeleton/concurrent-fetch grading, and an end-to-end TestFakeEthPeerPrunedSidechainIsLocallyJailed. The description claims go test -race and go vet are clean. That gives reasonable confidence in mechanical correctness, but the design-level decisions (threshold choices, the choice to grade ghost-state rather than drop on sight, the unified 30-minute drop bench) are the kind of judgment calls that benefit from human sign-off.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comment thread eth/downloader/bor_fetchers_concurrent.go
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 22, 2026 09:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comment thread eth/sync.go

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant security-adjacent change that introduces a new peer-grading state machine across the downloader, peerset, skeleton, and chain syncer — I'm deferring to a human reviewer rather than auto-approving.

Extended reasoning...

Overview

This PR replaces the downloader's blanket peer-drop on classified sync failures with a graded response: classifySyncFailureresponseDecisionrespondToPeer routes each failure to one of soft backoff, local jail, whitelist-mismatch escalation, ghost-state escalation, or terminal drop. New per-peer state (jailed, softStrikes, mismatchStrikes, ghostStrikes) is maintained in peerSet with windowed counters, oldest-entry eviction, and persistence across reconnect via Register. Backoff-aware peer selection is added to peerWithHighestTD, the skeleton syncer, and the concurrent fetcher, each with wake timers to avoid busy-polling. The cancellation path is also reworked so node shutdown no longer penalizes the master peer, and a new resettableTimer is added to the chain syncer loop. The diff touches 20 files spanning ~2,500 lines.

Security risks

The PR is not consensus-affecting and does not change validation or wire formats, but it materially changes the node's DoS posture toward its sync peers. Risks worth a careful look: (a) the strike-counter and jail maps are keyed by peer id and capped at maxStrikeEntries (4096) with oldest-entry eviction — under adversarial connection churn an attacker could potentially evict legitimate strikes; (b) the ghost-state classifier matches on a substring of the error string (isPrunedSidechainMismatch), which is brittle to upstream wording changes; (c) dropPeerForResponse persists a 30-minute bench across reconnects, meaning a misclassification has long-lived blast radius; (d) the master-peer timeout path now aborts the cycle with errTimeout and is graded as a transient backoff at the top level, which is a behavioral shift from the prior immediate drop.

Level of scrutiny

High. This is production-critical sync code that controls when peers are evicted, jailed, or trusted across reconnects. The constants (peerSoftBackoff, peerJailBackoff, peerDropBackoff, the strike thresholds, the 30-minute windows) are tuning decisions that warrant a human reviewer who understands the network's typical churn patterns on Polygon. The classifier ordering in classifySyncFailure (e.g., pruned-sidechain wins over wrapped deadline; whitelist-mismatch wins over invalid-chain) embeds non-obvious precedence assumptions that should get human eyes.

Other factors

Test coverage is extensive (the description lists ~50 new tests including an end-to-end TestFakeEthPeerPrunedSidechainIsLocallyJailed) and go test -race is reported clean. Copilot raised three substantive issues (stale comment in bor_downloader.go, error-attribution in importBlockResultsStateless, and zero-Sent shim requests in gradeStalePeer) and the author has addressed all three with follow-up commits. SonarQube passed. The bug-hunting system found nothing, but the size and criticality of this PR are exactly the case where the guidance says to defer — a thoughtful human reviewer is the right path here.

@vbhattaccmu vbhattaccmu requested a review from a team June 22, 2026 10:18
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.17585% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.69%. Comparing base (5df879f) to head (525e2ed).

Files with missing lines Patch % Lines
eth/downloader/bor_downloader.go 82.53% 9 Missing and 2 partials ⚠️
eth/sync.go 85.71% 7 Missing and 4 partials ⚠️
eth/downloader/skeleton.go 92.79% 6 Missing and 2 partials ⚠️
eth/downloader/bor_fetchers_concurrent.go 93.33% 5 Missing and 2 partials ⚠️
eth/protocols/eth/dispatcher.go 0.00% 4 Missing ⚠️
eth/downloader/peer.go 98.70% 2 Missing and 1 partial ⚠️
eth/downloader/peer_response.go 98.02% 2 Missing and 1 partial ⚠️
eth/fetcher/tx_fetcher.go 40.00% 3 Missing ⚠️
eth/downloader/bor_fetchers.go 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2283      +/-   ##
===========================================
+ Coverage    53.44%   53.69%   +0.24%     
===========================================
  Files          896      897       +1     
  Lines       159770   160369     +599     
===========================================
+ Hits         85397    86108     +711     
+ Misses       69054    68920     -134     
- Partials      5319     5341      +22     
Files with missing lines Coverage Δ
eth/peerset.go 64.00% <100.00%> (+4.67%) ⬆️
eth/protocols/eth/peer.go 29.37% <100.00%> (+0.24%) ⬆️
eth/downloader/bor_fetchers.go 71.79% <0.00%> (-1.89%) ⬇️
eth/downloader/peer.go 92.56% <98.70%> (+10.47%) ⬆️
eth/downloader/peer_response.go 98.02% <98.02%> (ø)
eth/fetcher/tx_fetcher.go 88.05% <40.00%> (ø)
eth/protocols/eth/dispatcher.go 5.64% <0.00%> (ø)
eth/downloader/bor_fetchers_concurrent.go 78.93% <93.33%> (+26.03%) ⬆️
eth/downloader/skeleton.go 78.87% <92.79%> (+1.57%) ⬆️
eth/downloader/bor_downloader.go 64.83% <82.53%> (+1.46%) ⬆️
... and 1 more

... and 31 files with indirect coverage changes

Files with missing lines Coverage Δ
eth/peerset.go 64.00% <100.00%> (+4.67%) ⬆️
eth/protocols/eth/peer.go 29.37% <100.00%> (+0.24%) ⬆️
eth/downloader/bor_fetchers.go 71.79% <0.00%> (-1.89%) ⬇️
eth/downloader/peer.go 92.56% <98.70%> (+10.47%) ⬆️
eth/downloader/peer_response.go 98.02% <98.02%> (ø)
eth/fetcher/tx_fetcher.go 88.05% <40.00%> (ø)
eth/protocols/eth/dispatcher.go 5.64% <0.00%> (ø)
eth/downloader/bor_fetchers_concurrent.go 78.93% <93.33%> (+26.03%) ⬆️
eth/downloader/skeleton.go 78.87% <92.79%> (+1.57%) ⬆️
eth/downloader/bor_downloader.go 64.83% <82.53%> (+1.46%) ⬆️
... and 1 more

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread eth/downloader/peer_response.go
Comment thread eth/downloader/peer_response.go
Comment thread eth/downloader/peer_response.go
Comment thread eth/downloader/peer_response.go
Restore a WARN for no-op peer responses so peers-unavailable and
whitelist no-remote stay visible instead of only logging at debug.
Treat whitelist ErrNoRemote as a no-op and arm the retry cooldown
instead of benching an honest peer that is merely lagging behind the
latest milestone. Add peerGhostStateMeter for parity with the mismatch
meter, and stop double-counting whitelist mismatches in the
soft-backoff meter.

@pratikspatil024 pratikspatil024 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two findings from local AI review:

  1. Blocker: nextSyncOp() can deadlock on peer selection. peerWithHighestTD() holds ps.lock.RLock(), but the callback passed from eth/sync.go calls back into Downloader.PeerBackoff(), which takes ps.lock.Lock() via persistentBackoff(). That is the same mutex in one call chain, so sync can hang as soon as backed-off peers are considered.

  2. Medium: isTransientFailure() is too broad. The fallback strings.Contains(err.Error(), context.DeadlineExceeded.Error()) can classify wrapped hard failures as timeouts just because their text mentions context deadline exceeded, which can downgrade invalid-chain / bad-peer / invalid-ancestor cases into soft backoffs.

@pratikspatil024 pratikspatil024 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just the 2 local AI points

@vbhattaccmu

Copy link
Copy Markdown
Member Author

Two findings from local AI review:

  1. Blocker: nextSyncOp() can deadlock on peer selection. peerWithHighestTD() holds ps.lock.RLock(), but the callback passed from eth/sync.go calls back into Downloader.PeerBackoff(), which takes ps.lock.Lock() via persistentBackoff(). That is the same mutex in one call chain, so sync can hang as soon as backed-off peers are considered.
  2. Medium: isTransientFailure() is too broad. The fallback strings.Contains(err.Error(), context.DeadlineExceeded.Error()) can classify wrapped hard failures as timeouts just because their text mentions context deadline exceeded, which can downgrade invalid-chain / bad-peer / invalid-ancestor cases into soft backoffs.

Have provided answers to the comments here. I believe they are both false positives

  1. peerWithHighestTD lives on the eth peerSet and holds that struct's RLock. PeerBackoff goes through persistentBackoff which locks the downloader peerSet. These are two separate structs in two separate packages that happen to share the name peerSet and each has its own lock field. They are not the same mutex so holding the eth RLock while taking the downloader Lock is not reentrancy and cannot deadlock. The order is also one directional since the downloader never reaches back into the eth peerSet so there is no inversion either. TestChainSyncerNextSyncOpSkipsBackedOffPeer covers this exact path with a backed off peer and the race suite passes.

  2. isTransientFailure returns false for errInvalidChain , errBadPeer and errInvalidAncestor before the strings. A wrapped invalid chain or bad peer or invalid ancestor therefore stays a hard failure even when its message contains context deadline exceeded. peer_response_test.go lines 69 to 71 pin exactly that. The text fallback only runs for errors outside those sentinels and it is there to catch deadline failures that come back as flattened text rather than a wrapped error.

@pratikspatil024 pratikspatil024 added the squash and merge This PR will be squashed and merged label Jun 24, 2026
Copilot AI review requested due to automatic review settings June 25, 2026 02:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment thread eth/downloader/peer_response.go
Comment thread eth/peer_jail_p2p_test.go
@vbhattaccmu vbhattaccmu merged commit a79d527 into develop Jun 25, 2026
21 checks passed
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

squash and merge This PR will be squashed and merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants