Skip to content

Commit c9d1972

Browse files
acul71cursoragent
andcommitted
Health monitoring: run_demo CLI, tests, docs, host.docker.internal target
- run_demo.py: add CLI for limits, interval, monitor options; fix line length - Prometheus/configure: use host.docker.internal, update configure.py patterns - tests: add test_health_monitoring_run_demo.py under tests/examples - README: validation steps, testing section - docs: state current API only in connection_health_monitoring.rst - config/monitor: minor validation and comment tweaks Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent c230e8b commit c9d1972

File tree

8 files changed

+329
-39
lines changed

8 files changed

+329
-39
lines changed

docs/examples.connection_health_monitoring.rst

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -275,22 +275,11 @@ Health monitoring integrates seamlessly with existing host-based code:
275275
- Health monitoring can be enabled/disabled per host instance
276276
- Existing examples work unchanged - just add `connection_config` parameter
277277
- Backward compatibility is maintained
278-
- No need to switch from `new_host()` to low-level swarm APIs - the API inconsistency is fixed
279-
280-
**Before (Previous Implementation - API Inconsistency):**
281-
282-
.. code-block:: python
283-
284-
# ❌ Forced to use different APIs
285-
host = new_host() # High-level API for basic usage
286-
# Health monitoring required low-level swarm API - INCONSISTENT!
287-
288-
**After (Current Implementation - API Consistency):**
278+
- No need to switch from `new_host()` to low-level swarm APIs
289279

290280
.. code-block:: python
291281
292-
# ✅ Consistent API for all use cases
293282
host = new_host() # Basic usage
294-
host = new_host(connection_config=config) # Health monitoring - same API!
283+
host = new_host(connection_config=config) # Health monitoring - same API
295284
296285
For more information, see the :doc:`../libp2p.network` module documentation.

examples/health_monitoring/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Health Monitoring Demo
22

3+
**Prerequisites:** The demo exposes metrics over HTTP for Prometheus. Install the client in your venv:
4+
5+
```bash
6+
pip install prometheus-client
7+
```
8+
39
Configure Prometheus target (match exporter port):
410

511
```bash
@@ -21,10 +27,41 @@ Open UIs:
2127
- Prometheus: http://localhost:9090/targets
2228
- Grafana: http://localhost:3000
2329

30+
**Validating the data**
31+
32+
The demo uses fixed limits: **10 connections**, **20 streams**, **32 MB** memory. Each second it tries to add 1 connection, 1 stream (if there is at least one connection), and 100–500 KB memory per peer. So over time you should see usage rise until it hits the limits, then blocks.
33+
34+
1. **Exporter vs logs** (with `run_demo.py` running):
35+
36+
```bash
37+
curl -s http://localhost:8000/metrics | grep -E '^libp2p_rcmgr_(connections|streams|memory|blocked)'
38+
```
39+
40+
Compare the numbers with what the demo prints: `Current: N conns, M streams, K bytes memory` and `Blocked: ...`. The gauges should match.
41+
42+
1. **Prometheus** (http://localhost:9090 → Graph):
43+
44+
- `libp2p_rcmgr_connections{scope="system"}` — total connections (should stay ≤ 10).
45+
- `libp2p_rcmgr_streams{scope="system"}` — total streams (≤ 20).
46+
- `libp2p_rcmgr_memory{scope="system"}` — bytes (≤ 32*1024*1024).
47+
- `libp2p_rcmgr_blocked_resources` — blocked events; should increase when you are at a limit.
48+
49+
1. **Sanity checks**: Connections and streams should level off at 10 and 20; memory at or below 32 MB. After ~15–20 seconds you should see some blocked resources (connections or memory). The Grafana dashboard panels use these same metrics.
50+
2451
Notes:
2552

2653
- The Grafana dashboard `py-libp2p Resource Manager` is auto-provisioned.
2754
- If you change the exporter port, re-run `configure.py` and `docker compose restart prometheus`.
55+
- Prometheus reaches the host via `host.docker.internal` (docker-compose sets `host-gateway`). If the py-libp2p target stays DOWN, try the Docker bridge IP in `prometheus.yml` (e.g. `172.17.0.1:8000` from `ip addr show docker0`) or your machine’s IP.
56+
- If port 8000 is already in use, run the demo on another port (e.g. `python run_demo.py --port 8001`), then run `configure.py --port 8001` and `docker compose restart prometheus`.
57+
58+
**Testing**
59+
60+
Tests for `run_demo.py` (different parameters, limit enforcement) live under the main test suite:
61+
62+
```bash
63+
pytest tests/examples/test_health_monitoring_run_demo.py -v
64+
```
2865

2966
Stop:
3067

examples/health_monitoring/configure.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,13 @@
99

1010
def set_exporter_port(port: int) -> None:
1111
content = PROM_PATH.read_text()
12-
pattern = r"host\\.docker\\.internal:\\d+"
13-
replacement = f"host.docker.internal:{port}"
14-
new = re.sub(pattern, replacement, content)
15-
PROM_PATH.write_text(new)
12+
# Update py-libp2p target port (host.docker.internal or legacy 172.17.0.1)
13+
for pattern, replacement in [
14+
(r"host\.docker\.internal:\d+", f"host.docker.internal:{port}"),
15+
(r"172\.17\.0\.1:\d+", f"172.17.0.1:{port}"),
16+
]:
17+
content = re.sub(pattern, replacement, content)
18+
PROM_PATH.write_text(content)
1619
print(f"Updated Prometheus target to host.docker.internal:{port}")
1720

1821

examples/health_monitoring/prometheus.yml

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,11 @@ scrape_configs:
2222
scheme: http
2323

2424
# py-libp2p resource manager metrics
25+
# Host is reached via host.docker.internal (docker-compose sets host-gateway).
26+
# Run configure.py --port N to update the port.
2527
- job_name: 'py-libp2p'
2628
static_configs:
27-
# This target can be updated by the helper to match the chosen exporter port
28-
- targets: ['host.docker.internal:8000'] # Default py-libp2p metrics port
29+
- targets: ['host.docker.internal:8000']
2930
scrape_interval: 5s # More frequent scraping for libp2p metrics
3031
scrape_timeout: 1s # Again the scrape timeout should be less than the interval, otherwise prometheus will skip the scrape and give an error
3132
metrics_path: /metrics
@@ -41,12 +42,11 @@ scrape_configs:
4142
regex: 'true'
4243
replacement: 'py-libp2p'
4344

44-
# Node Exporter metrics
45-
# Useful for monitoring system-level metrics of the host
46-
- job_name: 'node-exporter'
47-
static_configs:
48-
- targets: ['node-exporter:9100']
49-
scrape_interval: 15s
50-
scrape_timeout: 10s
51-
metrics_path: /metrics
52-
scheme: http
45+
# Node Exporter (optional): uncomment and add node-exporter to docker-compose if needed
46+
# - job_name: 'node-exporter'
47+
# static_configs:
48+
# - targets: ['node-exporter:9100']
49+
# scrape_interval: 15s
50+
# scrape_timeout: 10s
51+
# metrics_path: /metrics
52+
# scheme: http

examples/health_monitoring/run_demo.py

Lines changed: 59 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
from libp2p.rcmgr import Direction
2121
from libp2p.rcmgr.manager import ResourceLimits, ResourceManager
2222
from libp2p.rcmgr.monitoring import Monitor
23+
from libp2p.rcmgr.prometheus_exporter import create_prometheus_exporter
2324

2425

2526
def _is_port_free(port: int) -> bool:
@@ -74,33 +75,80 @@ def main() -> None:
7475
type=str,
7576
default=os.getenv("DEMO_LOG_LEVEL", "INFO"),
7677
)
78+
parser.add_argument(
79+
"--max-connections",
80+
type=int,
81+
default=10,
82+
metavar="N",
83+
help="Resource limit: max connections (default: 10)",
84+
)
85+
parser.add_argument(
86+
"--max-streams",
87+
type=int,
88+
default=20,
89+
metavar="N",
90+
help="Resource limit: max streams (default: 20)",
91+
)
92+
parser.add_argument(
93+
"--max-memory-mb",
94+
type=int,
95+
default=32,
96+
metavar="MB",
97+
help="Resource limit: max memory in MB (default: 32)",
98+
)
99+
parser.add_argument(
100+
"--interval",
101+
type=float,
102+
default=1.0,
103+
metavar="SECS",
104+
help="Seconds between iterations (default: 1.0)",
105+
)
106+
parser.add_argument(
107+
"--no-connection-tracking",
108+
action="store_true",
109+
help="Disable connection tracking in the monitor",
110+
)
111+
parser.add_argument(
112+
"--no-protocol-metrics",
113+
action="store_true",
114+
help="Disable protocol metrics in the monitor",
115+
)
77116
args = parser.parse_args()
78117

79118
_setup_logging(args.log_level)
80119

81120
port = _pick_port(args.port)
82121

83122
limits = ResourceLimits(
84-
max_connections=10,
85-
max_streams=20,
86-
max_memory_mb=32,
123+
max_connections=args.max_connections,
124+
max_streams=args.max_streams,
125+
max_memory_mb=args.max_memory_mb,
87126
)
88127

128+
# Single shared exporter so only one HTTP server binds to the port
129+
shared_exporter = create_prometheus_exporter(port=port, enable_server=True)
130+
89131
monitor = Monitor(
90-
enable_prometheus=True,
91-
prometheus_port=port,
92-
enable_connection_tracking=True,
93-
enable_protocol_metrics=True,
132+
prometheus_exporter=shared_exporter,
133+
enable_connection_tracking=not args.no_connection_tracking,
134+
enable_protocol_metrics=not args.no_protocol_metrics,
94135
)
95136

96137
rcmgr = ResourceManager(
97138
limits=limits,
98-
enable_prometheus=True,
99-
prometheus_port=port,
139+
prometheus_exporter=shared_exporter,
100140
enable_metrics=True,
101141
)
102142

103-
logging.info("Resource Manager initialized on port %s", port)
143+
logging.info(
144+
"Resource Manager initialized on port %s (limits: %s conns, %s streams, "
145+
"%s MB; interval %.2fs)",
146+
port,
147+
limits.max_connections,
148+
limits.max_streams,
149+
args.max_memory_mb,
150+
args.interval,
151+
)
104152

105153
connection_count = 0
106154
blocked_connections = 0
@@ -276,7 +324,7 @@ def _handle_signal(signum: int, _: object) -> None:
276324
monitor.prometheus_exporter.update_from_metrics(rcmgr.metrics)
277325

278326
iteration += 1
279-
time.sleep(1)
327+
time.sleep(args.interval)
280328

281329
logging.info(
282330
"%s active connections, %s blocked",

libp2p/network/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,3 +145,5 @@ def __post_init__(self) -> None:
145145
raise ValueError(
146146
"Critical health threshold must be between 0.0 and 1.0"
147147
)
148+
if self.unhealthy_grace_period < 0:
149+
raise ValueError("unhealthy_grace_period must be non-negative")

libp2p/network/health/monitor.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ async def _check_connection_health(self, peer_id: ID, conn: INetConn) -> None:
130130
warmup = getattr(self.config, "health_warmup_window", 0.0)
131131
if warmup:
132132
# Check if we have health data with established_at timestamp
133+
# Use time.time() (wall clock) to match ConnectionHealth.established_at,
134+
# which is set with time.time() in data_structures.
133135
if self._has_health_data(peer_id, conn):
134136
import time
135137

0 commit comments

Comments
 (0)