Skip to content

fix: data race on AddrInfo.Addrs in queryPeer#1244

Merged
guillaumemichel merged 4 commits into
masterfrom
fix/issue-kubo-11287-queryevent-race
Apr 20, 2026
Merged

fix: data race on AddrInfo.Addrs in queryPeer#1244
guillaumemichel merged 4 commits into
masterfrom
fix/issue-kubo-11287-queryevent-race

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented Apr 19, 2026

The race

findProvidersAsyncRoutine (and the GetValue / FindPeer) hand the same []*peer.AddrInfo to two places:

  1. routing.QueryEvent.Responses, observed by any RegisterForQueryEvents subscriber (e.g. handlers that json.Marshal the event).
  2. queryPeer in query.go, which did next.Addrs = append(next.Addrs, ...).

Two goroutines end up writing and reading the same AddrInfo.Addrs slice header. Under -race the detector flags the conflict; without -race, a torn slice header can crash inside go-multiaddr when the consumer walks Addrs.

Refs: ipfs/kubo#11287, ipfs/kubo#11116

The fix

Note

This class of errors could be fixed by upstream libp2p/go-libp2p#3490, but I also believe we should do this PR (belt-and-suspenders) so we don't wait for go-libp2p release or are hit by a future refactor there.

In queryPeer, build a fresh slice with slices.Concat(next.Addrs, curInfo.Addrs) instead of mutating next.Addrs. The combined addresses are passed to queryPeerFilter and maybeAddAddrs; the published *peer.AddrInfo is no longer touched.

slices.Concat is preferred over append: with spare capacity, append would reuse next.Addrs's backing array, which the consumer still references. slices.Concat always allocates a fresh array, so no memory is shared with the published event.

Also documents the read-only contract on the queryFn type so LLMs and humans avoid introducing regressions that mutate.

Tests

  • New TestFindProvidersAsyncQueryEventResponsesRace drives real FindProvidersAsync against an intercepted protoMessenger.
    • Fails on master under -race,
    • passes on this branch.
  • Full package suite passes under -race.

lidel added 3 commits April 19, 2026 21:29
findProvidersAsyncRoutine hands the same []*peer.AddrInfo to two
places: routing.QueryEvent.Responses (read by any consumer of
RegisterForQueryEvents, e.g. kubo's findprovs HTTP handler marshaling
events) and the return value, which queryPeer then mutates via
next.Addrs = append(next.Addrs, ...) in query.go.

The two goroutines race on AddrInfo.Addrs. Under -race the detector
reports the conflict; without -race the torn slice header can crash
inside go-multiaddr.

Refs: ipfs/kubo#11287, ipfs/kubo#11116.
Before queryPeer runs, the queryFn closures in routing.go publish
the returned []*peer.AddrInfo as routing.QueryEvent.Responses.
Consumers (e.g. kubo's findprovs HTTP handler) then read those
structs concurrently while queryPeer did
  next.Addrs = append(next.Addrs, curInfo.Addrs...)
racing on AddrInfo.Addrs.

Build the combined address slice locally and leave next untouched.
Covers GetValue, FindProviders, and FindPeer closures since all
three share queryPeer as the worker.

Refs: ipfs/kubo#11287, ipfs/kubo#11116.
@lidel lidel changed the title test: FindProvidersAsync AddrInfo.Addrs race fix: data race on AddrInfo.Addrs in queryPeer Apr 19, 2026
@lidel lidel requested a review from guillaumemichel April 19, 2026 20:09
@lidel lidel marked this pull request as ready for review April 19, 2026 20:10
Copy link
Copy Markdown
Collaborator

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding @lidel! Thanks for the fix and informative regression test!

@guillaumemichel guillaumemichel merged commit 248f53b into master Apr 20, 2026
9 checks passed
@guillaumemichel guillaumemichel deleted the fix/issue-kubo-11287-queryevent-race branch April 20, 2026 10:03
lidel added a commit to ipfs/kubo that referenced this pull request Apr 20, 2026
bump go-libp2p-kad-dht to master (248f53b7) to pick up the queryPeer
AddrInfo.Addrs race fix from libp2p/go-libp2p-kad-dht#1244, which resolves
the random daemon crashes reported in #11287 and #11116.
lidel added a commit to ipfs/kubo that referenced this pull request Apr 20, 2026
bump go-libp2p-kad-dht to v0.39.1 to pick up the queryPeer
AddrInfo.Addrs race fix from libp2p/go-libp2p-kad-dht#1244, which resolves
the random daemon crashes reported in #11287 and #11116.
lidel added a commit to ipfs/kubo that referenced this pull request Apr 20, 2026
bump go-libp2p-kad-dht to v0.39.1 to pick up the queryPeer
AddrInfo.Addrs race fix from libp2p/go-libp2p-kad-dht#1244, which resolves
the random daemon crashes reported in #11287 and #11116.
lidel added a commit to ipfs/kubo that referenced this pull request Apr 20, 2026
bump go-libp2p-kad-dht to v0.39.1 to pick up the queryPeer
AddrInfo.Addrs race fix from libp2p/go-libp2p-kad-dht#1244, which resolves
the random daemon crashes reported in #11287 and #11116.

(cherry picked from commit eb2cfac)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants