Unsafe dictionary iteration in async context causes `RuntimeError` during Bitswap file sharing ( Universal Connectivity DApp ). #1238

sumanjeet0012 · 2026-02-22T20:00:26Z

sumanjeet0012
Feb 22, 2026

Summary

Using the Bitswap protocol for file sharing (via MerkleDag.fetch_file() → BitswapClient.get_block()) causes a RuntimeError: dictionary changed size during iteration crash. The error occurs because multiple methods across pubsub.py, bitswap/client.py, swarm.py, and gossipsub.py iterate over shared dictionaries using live views (.values(), .items(), .keys(), or the dict itself) while concurrent trio tasks — or synchronous callbacks within the same loop body — mutate those same dictionaries.

This is not a Bitswap-specific bug. The unsafe iteration patterns exist in core pubsub and swarm code and can be triggered by any workload that causes concurrent peer churn + message broadcasting — Bitswap file transfers just happen to reliably expose it.

Environment

py-libp2p version: 0.6.0 (latest main as of Feb 2026)
Python: 3.12
Async runtime: trio
OS: macOS (Apple Silicon), but the bug is platform-independent

How to reproduce

Run two py-libp2p peers connected via GossipSub + Bitswap
Share a file from peer A (chunk with MerkleDag, store blocks, publish CID via pubsub)
Download the file on peer B (MerkleDag.fetch_file() → sequential get_block() calls)
Crash occurs on the first or second download attempt with:

RuntimeError: dictionary changed size during iteration

The crash is non-deterministic but highly reproducible under any peer churn (peers connecting/disconnecting while pubsub messages are being broadcast).

Root cause

In Python, iterating over a dictionary view (.keys(), .values(), .items(), or the dict directly) while another piece of code adds or removes entries raises RuntimeError.

With trio's cooperative multitasking, every await inside a for loop is a yield point — other nursery tasks can run at that exact moment and mutate shared state. Additionally, synchronous callbacks (like exception handlers) can mutate the dict being iterated within the same call frame.

Primary crash site: `Pubsub.message_all_peers()`

# libp2p/pubsub/pubsub.py
async def message_all_peers(self, raw_msg: bytes) -> None:
    for stream in self.peers.values():          # ← live dict view
        try:
            await stream.write(...)             # ← yield point (other tasks can mutate self.peers)
        except StreamClosed:
            peer_id = stream.muxed_conn.peer_id
            self._handle_dead_peer(peer_id)     # ← does `del self.peers[peer_id]` mid-iteration!

Two problems here:

_handle_dead_peer() does del self.peers[peer_id] synchronously inside the loop body's except handler — this directly mutates the dict the for loop is iterating over.
await stream.write(...) is a yield point where other concurrent tasks (dead peer handler, new peer handler) can add/remove entries from self.peers.

All affected locations

I've identified 13 unsafe iteration patterns across 4 files:

`libp2p/pubsub/pubsub.py` (3 issues)

Method	Line	Pattern	Why it crashes
`message_all_peers`	~769	`for stream in self.peers.values()`	`_handle_dead_peer()` deletes from `self.peers` mid-iteration; `await` yield points
`_handle_dead_peer`	~600	`for topic in self.peer_topics`	Concurrent calls can mutate `self.peer_topics`
`_teardown_if_connected`	~471	`for _topic, peerset in self.peer_topics.items()`	Concurrent mutation of `self.peer_topics`

`libp2p/bitswap/client.py` (7 issues)

Method	Pattern	Why it crashes
`_broadcast_wantlist`	`peers = self.host.get_network().connections.keys()`	Live dict view; connections change during broadcast
`_broadcast_cancel`	`peers = self.host.get_network().connections.keys()`	Same; plus `await new_stream()` is a yield point
`_request_block` finally	`await self.cancel_want(cid)` + `del self._pending_requests[cid]`	Opens streams during cleanup while other tasks iterate
`_read_responses_from_stream` finally	`for i, cid in enumerate(self._expected_blocks[peer_id])` + `del self._expected_blocks[peer_id]`	Iterates set while concurrent block processing discards from it
`_process_blocks_v100`	`for pid in self._expected_blocks`	Iterates dict while other tasks add/remove peer entries
`_process_blocks_v110`	`for peer_id in self._expected_blocks`	Same pattern
`_notify_peers_about_block`	`for peer_id, wantlist in self._peer_wantlists.items()`	Live dict view with `await` yield points in the loop body

`libp2p/network/swarm.py` (2 issues)

Method	Pattern	Why it crashes
`get_connections`	`for conns in self.connections.values()`	Connection dict mutated by concurrent connect/disconnect
`connections_legacy`	`for peer_id, conns in self.connections.items()`	Same

`libp2p/pubsub/gossipsub.py` (1 issue)

Method	Pattern	Why it crashes
`leave`	`for peer in self.mesh[topic]` with `await self.emit_prune(...)`	Set iteration with `await` yield point inside loop

Proposed fix

The fix is straightforward and mechanical: snapshot all shared collections before iterating them using list(), and defer mutations until after iteration completes.

Pattern 1: Wrap dict/set views with `list()`

# Before (unsafe):
for stream in self.peers.values():
    await stream.write(...)

# After (safe):
for stream in list(self.peers.values()):
    await stream.write(...)

Pattern 2: Defer mutations until after iteration

# Before (unsafe - deletes from dict mid-iteration):
for stream in self.peers.values():
    try:
        await stream.write(...)
    except StreamClosed:
        self._handle_dead_peer(peer_id)  # deletes from self.peers!

# After (safe - collect dead peers, handle after loop):
dead_peers = []
for stream in list(self.peers.values()):
    try:
        await stream.write(...)
    except StreamClosed:
        dead_peers.append(stream.muxed_conn.peer_id)
for peer_id in dead_peers:
    self._handle_dead_peer(peer_id)

Pattern 3: Use `pop()` instead of `del` for cleanup

# Before (unsafe - can KeyError if concurrent task already removed it):
if cid in self._pending_requests:
    del self._pending_requests[cid]

# After (safe):
self._pending_requests.pop(cid, None)

Pattern 4: Add existence guards after snapshot

# Before (unsafe):
for pid in self._expected_blocks:
    self._expected_blocks[pid].discard(matched_cid)

# After (safe):
for pid in list(self._expected_blocks):
    if pid in self._expected_blocks:  # guard: may have been removed
        self._expected_blocks[pid].discard(matched_cid)

Complete list of changes

`libp2p/pubsub/pubsub.py`

message_all_peers() — snapshot self.peers.values() and defer dead peer handling:

async def message_all_peers(self, raw_msg: bytes) -> None:
    dead_peers: list = []
    for stream in list(self.peers.values()):
        try:
            await stream.write(encode_varint_prefixed(raw_msg))
        except StreamClosed:
            peer_id = stream.muxed_conn.peer_id
            logger.debug("Fail to message peer %s: stream closed", peer_id)
            dead_peers.append(peer_id)
    for peer_id in dead_peers:
        self._handle_dead_peer(peer_id)

_handle_dead_peer() — snapshot self.peer_topics:

for topic in list(self.peer_topics):
    if peer_id in self.peer_topics.get(topic, set()):
        self.peer_topics[topic].discard(peer_id)

_teardown_if_connected() — snapshot self.peer_topics.items():

for _topic, peerset in list(self.peer_topics.items()):

`libp2p/bitswap/client.py`

_broadcast_wantlist() and _broadcast_cancel() — snapshot connections:

peers = list(self.host.get_network().connections.keys())

_request_block() finally — use pop() instead of del, skip broadcast during cleanup:

finally:
    self._wantlist.pop(cid, None)
    self._pending_requests.pop(cid, None)
    self._dont_have_responses.pop(cid, None)

_read_responses_from_stream() finally — snapshot set, use pop():

remaining_cids = list(self._expected_blocks.get(peer_id, set()))
# ...iterate remaining_cids...
self._expected_blocks.pop(peer_id, None)

_process_blocks_v100() and _process_blocks_v110() — snapshot dict + guard:

for pid in list(self._expected_blocks):
    if pid in self._expected_blocks:
        self._expected_blocks[pid].discard(matched_cid)

_notify_peers_about_block() — snapshot + copy:

for peer_id, wantlist in list(self._peer_wantlists.items()):
    if cid in wantlist:
        want_info = dict(wantlist[cid])  # copy

`libp2p/network/swarm.py`

get_connections() and connections_legacy — snapshot:

for conns in list(self.connections.values()):
# ...
for peer_id, conns in list(self.connections.items()):

`libp2p/pubsub/gossipsub.py`

leave() — snapshot set:

for peer in list(self.mesh.get(topic, set())):

Why this matters beyond Bitswap

These patterns affect any concurrent workload on py-libp2p:

GossipSub heartbeats + peer churn → message_all_peers crash
Multiple pubsub topics with frequent subscribe/unsubscribe → _handle_dead_peer crash
High connection turnover (e.g., DHT crawling) → get_connections / connections_legacy crash
Any protocol using host.new_stream() in a loop over connected peers → connection dict mutation

Bitswap file sharing is just the most reliable trigger because it combines:

Frequent pubsub broadcasts (file metadata messages)
Multiple concurrent stream operations (one per block)
Connection churn from block request/response cycles

Additional notes

The list() snapshot pattern is the standard Python approach and has negligible overhead for typical peer counts (tens to low hundreds).
These are all single-threaded race conditions (trio cooperative multitasking), not OS-level thread races. The fix is deterministic — list() creates a snapshot before any yield point.
A broader audit using AST analysis found ~15 additional potentially-unsafe patterns in less frequently exercised code paths (QUIC transport, circuit relay, DHT provider store). These are lower priority but could be addressed in a follow-up.

I'm happy to submit a PR with these fixes if the maintainers agree with the approach.

sumanjeet0012 · 2026-02-22T20:20:27Z

sumanjeet0012
Feb 22, 2026
Author

Bitswap.file.sharing.mp4

0 replies

sumanjeet0012 · 2026-02-22T20:23:54Z

sumanjeet0012
Feb 22, 2026
Author

@seetadev We need to implement certain optimizations in the codebase for file sharing using the Universal Connectivity DApp.
I have made the necessary changes locally and have attached a screencast demonstrating the file-sharing functionality.
I will be raising a PR shortly for your review.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsafe dictionary iteration in async context causes `RuntimeError` during Bitswap file sharing ( Universal Connectivity DApp ). #1238

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Unsafe dictionary iteration in async context causes RuntimeError during Bitswap file sharing ( Universal Connectivity DApp ). #1238

Uh oh!

sumanjeet0012 Feb 22, 2026

Summary

Environment

How to reproduce

Root cause

Primary crash site: Pubsub.message_all_peers()

All affected locations

libp2p/pubsub/pubsub.py (3 issues)

libp2p/bitswap/client.py (7 issues)

libp2p/network/swarm.py (2 issues)

libp2p/pubsub/gossipsub.py (1 issue)

Proposed fix

Pattern 1: Wrap dict/set views with list()

Pattern 2: Defer mutations until after iteration

Pattern 3: Use pop() instead of del for cleanup

Pattern 4: Add existence guards after snapshot

Complete list of changes

libp2p/pubsub/pubsub.py

libp2p/bitswap/client.py

libp2p/network/swarm.py

libp2p/pubsub/gossipsub.py

Why this matters beyond Bitswap

Additional notes

Replies: 2 comments

Uh oh!

sumanjeet0012 Feb 22, 2026 Author

Uh oh!

sumanjeet0012 Feb 22, 2026 Author

Unsafe dictionary iteration in async context causes `RuntimeError` during Bitswap file sharing ( Universal Connectivity DApp ). #1238

sumanjeet0012
Feb 22, 2026

Primary crash site: `Pubsub.message_all_peers()`

`libp2p/pubsub/pubsub.py` (3 issues)

`libp2p/bitswap/client.py` (7 issues)

`libp2p/network/swarm.py` (2 issues)

`libp2p/pubsub/gossipsub.py` (1 issue)

Pattern 1: Wrap dict/set views with `list()`

Pattern 3: Use `pop()` instead of `del` for cleanup

`libp2p/pubsub/pubsub.py`

`libp2p/bitswap/client.py`

`libp2p/network/swarm.py`

`libp2p/pubsub/gossipsub.py`

sumanjeet0012
Feb 22, 2026
Author

sumanjeet0012
Feb 22, 2026
Author