feat(kad-dht): add RoutingTableDiagnostics for routing health inspection#41
Open
bhuvan-somisetty wants to merge 18 commits into
Open
Conversation
Signed-off-by: Gautam Manchandani <gautammanch@Gautams-MacBook-Air.local>
Signed-off-by: Gautam Manchandani <gautammanch@Gautams-MacBook-Air.local>
Signed-off-by: Gautam Manchandani <gautammanch@Gautams-MacBook-Air.local>
…t-response-example Signed-off-by: Gautam Manchandani <gautammanch@Gautams-MacBook-Air.local> # Conflicts: # .github/workflows/tox.yml
Signed-off-by: Gautam Manchandani <gautammanch@Gautams-MacBook-Air.local>
…e bridge - New libp2p/filecoin/address.py: Filecoin address parsing (f0/f1/f3/f4/f410), validation, and delegated address creation. 19 passing tests. - Integrate Filecoin address validation into A2A payment authorization. Payer must be a valid f4/f410/f0/f1 address; invalid addresses are rejected. - Add streaming task lifecycle: payment auth sets WORKING state with a completion deadline; GetTask auto-completes after the deadline expires. Libp2p demo polls GetTask to show visible state transitions. - A2A agent card advertises streaming=True for libp2p protocol binding. - Synapse bridge: secure command parsing (shlex), configurable timeout, subprocess.TimeoutExpired handling, and comprehensive architecture docs. - All 53 related tests pass (address module, A2A payment, HTTP demo, agentic demo, request-response API).
…t-response-example
* feat: add ObservedAddrManager for NAT address discovery Add ObservedAddrManager that tracks externally observed addresses reported by peers via Identify, confirms them once enough distinct observer groups agree, and exposes them through BasicHost.get_addrs(). Integrates into BasicHost: records observations after Identify exchanges and cleans up on disconnect. * docs: add newsfragment for ObservedAddrManager (libp2p#1284) * Address review feedback on ObservedAddrManager integration (libp2p#1284) Follow-ups from maintainer review (acul71): * Narrow the exception handling in BasicHost._identify_peer's observed- address recording path. Expected multiaddr parse errors (MultiaddrError subclasses) and ValueError remain at debug level; any other failure is now surfaced as a warning with traceback so regressions aren't hidden. * Rename newsfragments/1284.feature.rst to newsfragments/1250.bugfix.rst so the release notes point at the user-facing bug (Listening Addresses on an AWS EC2 don't include public IPs, libp2p#1250), and update the text accordingly. * Expose BasicHost.get_nat_type() as a thin pass-through to the ObservedAddrManager equivalent (marked experimental) and re-export NATDeviceType from libp2p.host.basic_host imports for easy consumption by future AutoNAT code. Add unit tests covering the pass-through, the observed-addresses-appended path of get_addrs, dedup against listen addrs, and the announce_addrs override branch. * Document the invariant behind the TCP vs UDP classification loop in ObservedAddrManager.get_nat_type (all ext_tw_str keys in a single ext_map share a transport) and add a defensive guard that skips mixed entries instead of silently misclassifying them, matching go-libp2p's getNATType behaviour more explicitly. * Document the observed-address flow in docs/libp2p.host.rst and the announce_addrs vs observed-address interaction in the announce_addrs example doc; cross-link from the announce_addrs parameter docstring in BasicHost.__init__. Made-with: Cursor * Add tests for ObservedAddrManager wiring and defensive guard (libp2p#1284) Close integration-layer test gaps identified during PR review: * BasicHost._identify_peer forwards observed_addr to record_observation (Gap 1). * The three narrow exception branches in _identify_peer — MultiaddrError, ValueError, and the generic Exception fallback — are exercised and the expected DEBUG / WARNING logs (with exc_info for the fallback) are emitted without propagating (Gap 2a/2b/2c). * BasicHost._on_notifee_disconnected calls ObservedAddrManager.remove_conn on peer disconnect, with a defensive no-op path when peer_id is missing (Gap 3). * White-box test for the defensive TCP/UDP skip guard in ObservedAddrManager.get_nat_type that would otherwise be unreachable from record_observation (Gap 4). Constructed so the result *differs* between guarded and unguarded implementations: a stray concentrated UDP entry inside a TCP bucket would flip the classification from ENDPOINT_DEPENDENT to ENDPOINT_INDEPENDENT without the guard. A small fixture (libp2p_log_propagate) re-enables propagation and lowers the libp2p logger level so pytest's caplog can observe records — the package's own setup_logging() sets propagate=False and WARNING at import time when LIBP2P_DEBUG is unset. Made-with: Cursor * Make log-capture in _identify_peer tests xdist-safe (libp2p#1284) The previous fixture re-enabled propagation only on the ``libp2p`` logger, relying on ``caplog`` (attached at root) to pick records up. That's fragile: ``libp2p/utils/logging.py`` can also set ``propagate=False`` on intermediate loggers like ``libp2p.host`` depending on the ``LIBP2P_DEBUG`` env var, which breaks propagation from ``libp2p.host.basic_host`` to root. CI/xdist workers occasionally hit that state and the three log-assertion tests failed with empty ``caplog.records``. Replace the propagation trick with a handler attached directly to the ``libp2p.host.basic_host`` logger. The handler simply collects records into a list, which tests then inspect. This sidesteps propagation and parent-logger config entirely — the test only cares whether ``logger.debug(...)`` in ``basic_host.py`` fires, which it forces by setting ``target.setLevel(DEBUG)`` and ``target.disabled = False``. Verified green under: * plain ``pytest`` * ``pytest -n auto`` (xdist) * ``LIBP2P_DEBUG=DEBUG``, ``LIBP2P_DEBUG=libp2p.host:DEBUG``, unset Made-with: Cursor * fix(host): run outbound Identify on all muxers and inbound conns Match go-libp2p Connected→IdentifyWait behavior: _should_identify_peer accepts any open SwarmConn with a muxer (not QUIC-only), and the identify notifee schedules Identify for inbound connections too. Adds unit tests for _should_identify_peer and inbound notifee scheduling. Co-authored-by: Cursor <cursoragent@cursor.com> * test(identify_push): assert push targets only connected peers After inbound outbound Identify, host_a may retain host_c in peerstore after disconnect; replace peer_ids() checks with host_a not in host_c.get_connected_peers() for push_identify_to_peers. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: DEBUG for observed addr activation; fix identify-demo logging - ObservedAddrManager: log each observation and when ACTIVATION_THRESHOLD is reached for an external thin waist. - identify-demo: stop forcing libp2p to WARNING so LIBP2P_DEBUG works; leave commented quiet defaults for optional use. Co-authored-by: Cursor <cursoragent@cursor.com> * debug(host): trace Identify observed_addr and ObservedAddrManager skips Log when outbound Identify records an observed address or the peer omits it, and add DEBUG reasons for early returns and duplicate observations in ObservedAddrManager.record_observation. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(yamux): expose get_remote_address on muxed conn like Mplex Yamux IMuxedConn now delegates to secured_conn; YamuxStream uses the muxer. ObservedAddrManager._get_remote_addr only calls muxed_conn.get_remote_address (with callable guard and docstring). Add unit test and annotate test DummySecuredConn.get_remote_address for pyrefly. Co-authored-by: Cursor <cursoragent@cursor.com> * docs: add advertising_addresses page under Examples, cross-link from host and announce_addrs Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: acul71 <34693171+acul71@users.noreply.github.com> Co-authored-by: Manu Sheel Gupta <manusheel.edu@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
…response-example feat/agentic-request-response-example
When a KadDHT node misbehaves — slow lookups, unreachable keys, failed
bootstraps — operators previously had no visibility into why. This commit
adds a first-class diagnostic surface that answers the core questions:
• Which k-buckets are under-populated or empty?
• Where are the keyspace coverage gaps?
• How fresh are my known peers? (fresh / aging / stale / very stale)
• What is the overall routing-table health as a single 0–100 score?
Changes:
libp2p/kad_dht/diagnostics.py
New RoutingTableDiagnostics class (read-only analyser).
Produces a RoutingTableReport with BucketStat list, CoverageGap list,
FreshnessDistribution, composite health score, and human-readable summary.
libp2p/kad_dht/routing_table.py
Add RoutingTable.get_diagnostics() convenience factory.
libp2p/kad_dht/kad_dht.py
Add KadDHT.get_diagnostics() convenience factory.
libp2p/kad_dht/__init__.py
Export all new public types.
tests/core/kad_dht/test_routing_table_diagnostics.py
27 unit tests; fully offline (mock host, no network required).
examples/kademlia/routing_table_diagnostics.py
Two-node demo that prints a full report after bootstrapping.
Usage:
report = dht.get_diagnostics().analyse()
print(report.summary())
print(f"Health score: {report.health_score}/100")
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this solve?
When a KadDHT node starts misbehaving - slow lookups, failed bootstraps, keys that can't be found - operators are flying blind. The routing table is a complete black box. The only way to debug it today is to manually add print statements inside the Kademlia internals or stare at raw bucket lists hoping something stands out.
This PR adds a first-class diagnostic surface so you can answer the real questions in seconds:
Architecture
flowchart TD A[KadDHT] -->|get_diagnostics| D[RoutingTableDiagnostics] B[RoutingTable] -->|get_diagnostics| D B --> C1[KBucket 0\nclosest peers] B --> C2[KBucket 1] B --> C3[KBucket N\nfarthest peers] C1 & C2 & C3 --> D D -->|analyse| R[RoutingTableReport] R --> S1[BucketStat list\nfill rate · stale count] R --> S2[CoverageGap list\nkeyspace holes] R --> S3[FreshnessDistribution\nfresh · aging · stale] R --> S4[health_score 0–100\nverdict]The analyser is read-only - it never modifies the routing table.
What was added
libp2p/kad_dht/diagnostics.py- core engineSample output:
Health score breakdown (0–100, composite):
Buckets closest to the local node are weighted exponentially higher - this reflects the Kademlia property that near buckets dominate routing success.
Convenience entry points
Public types exported from
libp2p.kad_dhtTests
tests/core/kad_dht/test_routing_table_diagnostics.py- 27 unit tests, fully offline (mock host, no network required). Covers:BucketStatproperties (is_full,is_empty,health)FreshnessDistributionratios and totalsRoutingTableReport.summary()output formatRunnable example
examples/kademlia/routing_table_diagnostics.py- two-node demo:Non-goals / out of scope
Checklist
libp2p/kad_dht/diagnostics.pywith full docstringsRoutingTable.get_diagnostics()convenience factoryKadDHT.get_diagnostics()convenience factorylibp2p/kad_dht/__init__.pyexamples/kademlia/TYPE_CHECKINGguard to avoid circular imports