Skip to content

reconsider the per-peer dual-connection (primary + pair) mechanism #2359

Description

@gzliudan

Summary

XDPoSChain maintains two independent RLPx/TCP connections per remote peer (a "primary" and a "pair"). This is a private divergence from upstream go-ethereum, which has never allowed more than one connection per NodeID. The mechanism adds non-trivial state-machine complexity without public evidence that the latency/throughput gains justify the cost.


1. Evidence that XDPoSChain currently runs two connections per peer

All references below are against dev-upgrade.

1.1 The p2p server admits a second connection from the same NodeID.

In server.go, encHandshakeChecks only rejects a duplicate id when the existing peer already has a pair:

case peers[c.id] != nil:
    exitPeer := peers[c.id]
    if exitPeer.PairPeer() != nil {
        return DiscAlreadyConnected
    }
    return nil

The addpeer branch then explicitly stores the second connection as a pair rather than replacing the primary:

if peers[c.id] != nil {
    peers[c.id].SetPairPeer(p)
} else {
    peers[c.id] = p
}

dial.go mirrors this on the dialing side: checkDial returns errAlreadyConnected only when existPeer.PairPeer() != nil, so the dialer will actively open a second TCP connection to a peer it is already connected to.

1.2 Each connection is a fully independent RLPx session.

p2p.newPeer creates per-connection readLoop, pingLoop, writeStart, protoErr, and disc channels (peer.go). Both peers go through the full ECIES handshake, AES-CTR keystream, and HMAC state. There is no frame-level multiplexing between them — they are two separate kernel sockets.

1.3 The eth layer routes by primary/pair role.

In peer.go, peerSet.Register registers the first instance as primary and the second as isPair=true, attaches the pair's MsgReadWriter to the primary via existPeer.setPairWriter, and returns the sentinel p2p.ErrAddPairPeer. handler.handle treats this sentinel specially and lets the pair "ride along" without re-registering with the downloader.

peer.msgWriter() then selects the pair writer whenever available:

func (p *peer) msgWriter() p2p.MsgReadWriter {
    if pairRW := p.pairWriter(); pairRW != nil {
        return pairRW
    }
    return p.rw
}

The effective routing (from peer.go):

Traffic Connection
StatusMsg handshake per-connection
TxMsg, OrderTxMsg, LendingTxMsg, NewBlockHashesMsg primary (p.rw)
NewBlockMsg, BlockHeaders/Bodies/NodeData/Receipts (+ Get*) pair when present, else primary
VoteMsg, TimeoutMsg, SyncInfoMsg (XDPoS v2 BFT) pair when present, else primary

So in steady state two XDPoSChain peers really do hold two TCP connections, with tx-pool/announcement traffic on one and block-sync + BFT consensus on the other.

1.4 Recent bug history confirms the design is in active use.


2. Pros and cons of the dual-connection mechanism

Pros

  • RLPx head-of-line isolation. Each Peer.run uses a single-slot writeStart token; a ~2 MB BlockBodies write blocks any subsequent frame on that connection. Putting BFT messages on a separate TCP avoids serialization behind large sync frames.
  • Fault-domain separation. After faf067a5, a transient failure on the pair no longer takes down the primary's tx-broadcast path or downloader registration.
  • Low-invasiveness on the wire format. RLPx framing, devp2p capability negotiation, and the message id layout are unchanged; the divergence is confined to "how many connections per NodeID".

Cons

  • Doubled per-peer cost. Two sockets, two ECIES handshakes, two readLoop + pingLoop + writeLoop goroutine sets, two AES/HMAC streams. With dozens to hundreds of peers, this is a measurable CPU/memory tax on every node.
  • Same physical link. Both TCPs share the same IP path. Congestion control (CUBIC/BBR) will give them roughly equal share, so a BlockBodies burst still steals bandwidth from concurrent VoteMsg packets at the link level — the optimization is real only at the RLPx framing layer, not at the network layer.
  • MaxPeers semantics get blurred. inboundCount++ runs unconditionally in addpeer, so a single logical peer consumes two inbound slots. This silently halves effective inbound capacity vs. the configured MaxPeers.
  • State-machine complexity and bug surface. isPair, pairRW, pairPeer, ErrAddPairPeer, DiscPairPeerStop, removePeerByID, UnregisterPeer(*peer), the ClearPairPeer CAS in p2p.Peer.run's exit path, and the peerSet.Register early-return path all exist purely to support pairing. faf067a5 shows this surface produces real, hard-to-spot consensus-affecting bugs.
  • Maintenance friction vs. upstream. Patches touch server.go, dial.go, peer.go, peer.go, and handler.go — exactly the files most likely to conflict on any future upstream rebase. Per copilot-instructions.md, "unnecessary divergence from upstream" should be flagged.
  • Security surface. Doubled handshakes weaken resistance to dial-based DoS. The pair channel implicitly trusts the primary's NodeID authentication, which is fine cryptographically but means any bug that lets an attacker race in as the pair gains a BFT-injection vector that does not exist in a single-connection design.
  • No published evidence of benefit. I could not find benchmarks or production metrics in the repository showing the pair mode actually improves Vote/Timeout p99 latency, block time, or view-change count vs. a single-connection baseline.

3. Should the dual-connection mechanism be removed?

Yes, I think it should. Summary of reasoning:

  1. The HoL-blocking problem it targets is real but can be solved without doubling per-peer connections (see options below).
  2. The benefit is unmeasured; the cost is paid on every peer of every node.
  3. The complexity has already produced at least one consensus-path bug (faf067a5) and increases the risk surface of every future change in p2p and eth.
  4. Removing it shrinks the diff against upstream go-ethereum, easing future rebases — a stated repository goal.

If the maintainers disagree, I'd at minimum ask for: (a) production metrics quantifying BFT-message latency with vs. without pair, (b) counting pair connections separately in MaxPeers, and (c) declaring an explicit devp2p capability for pair so the design is negotiable rather than implicit.


4. Migration options to a single-connection design

Compatibility legend (XDPoSChain ↔ XDPoSChain only; geth interop is out of scope):

  • N = node running the proposed change.
  • L = legacy XDPoSChain node still running the current pair mechanism.

Option A — Keep a single connection, add a BFT write-priority lane

What changes. Only peer.go's write scheduling: introduce a high-priority write queue so that VoteMsg, TimeoutMsg, SyncInfoMsg can preempt the standard writeStart token ahead of large BlockBodies/NodeData writes. RLPx framing, message ids, capability strings, and handshake are unchanged. peerSet.Register reverts to "one peer per id"; encHandshakeChecks and checkDial revert to upstream semantics.

Compatibility N ↔ L.

  • N will not initiate a pair connection. L's dialer will still try; N's encHandshakeChecks returns DiscAlreadyConnected, so L's pair dial cleanly fails — L does not evict its primary because of DiscAlreadyConnected. Routine eth traffic, including the BFT messages (still at codes 0xe0–0xe2 in the eth segment), flows normally on the single connection.
  • L → N: BFT messages arrive on the primary and are handled the same way they always have been.
  • N → L: BFT messages are sent on the primary (since N has no pair). L accepts them: the primary on L is the same peer that processed StatusMsg and knows the protocol version.

Pros. No protocol break, deployable per-node without coordination, immediate state-machine simplification.
Cons. Smaller optimization than the current design claims to provide; the kernel-level bandwidth contention is not addressed.

Compatibility verdict: fully bidirectional with both N and L. Rolling upgrade safe.


Option B — Move BFT into its own devp2p subprotocol (Name="xdpos", Version=1)

What changes. Protocols() registers an additional p2p.Protocol{Name:"xdpos", Version:1, Length:3} with its own message id space (Vote=0, Timeout=1, SyncInfo=2). The "pure" form of B also removes 0xe0–0xe2 from the eth segment. devp2p multiplexes both subprotocols on a single RLPx connection.

Compatibility N ↔ L (pure form).

  • L's capability advertisement contains only eth. devp2p matchProtocols will not produce an xdpos channel between N and L. So N's BFT sends silently no-op, and L's BFT sends arrive at N inside the eth segment at codes 0xe0–0xe2 — but N has freed those codes. N's eth handler returns errInvalidMsgCode and drops the peer.
  • Result: a pure-form B node cannot talk BFT with L at all and will actively disconnect L. This is a hard incompatibility on a consensus path.

Compatibility N ↔ L (with a transitional bridge release N).*

  • Release N* first: still send BFT on the eth segment AND on the new xdpos subprotocol; on receive, accept either path; do not drop the peer when 0xe0–0xe2 arrive on eth. N* ↔ L works (L sees the duplicate via eth-segment codes and ignores the unknown xdpos cap). N* ↔ N* works (uses xdpos).
  • After the network has migrated to N*, ship the pure-form N that removes the eth-segment BFT codes.

Pros. Aligns with the upstream pattern (e.g. snap), permanently eliminates the pair mechanism, gives BFT its own per-subprotocol write scheduling within devp2p without a second TCP. Future independent versioning of BFT becomes possible.
Cons. Wire-protocol change, must be staged: N* bridge release → migration window → pure N. Requires release-note coordination with node operators (consistent with dev-upgrade policy in copilot-instructions.md).

Compatibility verdict: incompatible without a bridge release. With the staged plan, fully bidirectional throughout.


Option C — Option B plus broader eth/68-style splitting of block-sync traffic

What changes. Everything in B, plus splitting block-announce / block-bodies handling to mirror upstream eth/68 semantics. This is a much larger rework of protocol.go, handler.go, and peer.go.

Compatibility N ↔ L.

  • All of B's hard incompatibilities, plus changes to block-sync message formats and id assignments. L does not understand the new eth version; the negotiated common version becomes the existing private xdpos2=100, but C presumably removes BFT codes from that version, so L cannot exchange either BFT messages or new-format block-sync messages with N.
  • Requires a longer multi-step migration: N1 ships the bridge for BFT (as in B), N2 ships dual-mode block-sync, N3 removes legacy paths. Each step needs deployment coordination.

Pros. Strictly the most "upstream-shaped" outcome; removes both the pair mechanism and the longest-standing eth/63-era divergence in one program of work.
Cons. Largest scope, longest migration window, highest coordination cost. Realistically a multi-quarter program that should be tracked as a separate effort, not bundled with the pair-removal cleanup.

Compatibility verdict: incompatible without a multi-stage migration. Coordination cost dominates.


Recommendation

  • Adopt Option A as the immediate cleanup on dev-upgrade: it removes the pair mechanism, eliminates the bug class around pairRW/isPair/removePeerByID, restores MaxPeers semantics, and is safely rolling-deployable against today's mainnet.
  • Track Option B as a follow-up: introduce xdpos as a proper devp2p subprotocol via a staged bridge → pure release plan, so BFT can evolve independently of eth without ever reintroducing a second TCP.
  • Treat Option C as part of a separate upstream-rebase initiative, not as part of this cleanup.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions