Health monitoring: run_demo CLI, tests, docs, host.docker.internal target

acul71 · cursoragent · acul71 · commit c9d1972f2e0f · 2026-02-12T14:29:45.000+01:00
- run_demo.py: add CLI for limits, interval, monitor options; fix line length
- Prometheus/configure: use host.docker.internal, update configure.py patterns
- tests: add test_health_monitoring_run_demo.py under tests/examples
- README: validation steps, testing section
- docs: state current API only in connection_health_monitoring.rst
- config/monitor: minor validation and comment tweaks

Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;
diff --git a/docs/examples.connection_health_monitoring.rst b/docs/examples.connection_health_monitoring.rst
@@ -275,22 +275,11 @@ Health monitoring integrates seamlessly with existing host-based code:
 - Health monitoring can be enabled/disabled per host instance
 - Existing examples work unchanged - just add `connection_config` parameter
 - Backward compatibility is maintained
-- No need to switch from `new_host()` to low-level swarm APIs - the API inconsistency is fixed
-
-**Before (Previous Implementation - API Inconsistency):**
-
-.. code-block:: python
-
-    # ❌ Forced to use different APIs
-    host = new_host()  # High-level API for basic usage
-    # Health monitoring required low-level swarm API - INCONSISTENT!
-
-**After (Current Implementation - API Consistency):**
+- No need to switch from `new_host()` to low-level swarm APIs
 
 .. code-block:: python
 
-    # ✅ Consistent API for all use cases
     host = new_host()  # Basic usage
-    host = new_host(connection_config=config)  # Health monitoring - same API!
+    host = new_host(connection_config=config)  # Health monitoring - same API
 
 For more information, see the :doc:`../libp2p.network` module documentation.
diff --git a/examples/health_monitoring/README.md b/examples/health_monitoring/README.md
@@ -1,5 +1,11 @@
 # Health Monitoring Demo
 
+**Prerequisites:** The demo exposes metrics over HTTP for Prometheus. Install the client in your venv:
+
+```bash
+pip install prometheus-client
+```
+
 Configure Prometheus target (match exporter port):
 
 ```bash
@@ -21,10 +27,41 @@ Open UIs:
 - Prometheus: http://localhost:9090/targets
 - Grafana: http://localhost:3000
 
+**Validating the data**
+
+The demo uses fixed limits: **10 connections**, **20 streams**, **32 MB** memory. Each second it tries to add 1 connection, 1 stream (if there is at least one connection), and 100–500 KB memory per peer. So over time you should see usage rise until it hits the limits, then blocks.
+
+1. **Exporter vs logs** (with `run_demo.py` running):
+
+   ```bash
+   curl -s http://localhost:8000/metrics | grep -E '^libp2p_rcmgr_(connections|streams|memory|blocked)'
+   ```
+
+   Compare the numbers with what the demo prints: `Current: N conns, M streams, K bytes memory` and `Blocked: ...`. The gauges should match.
+
+1. **Prometheus** (http://localhost:9090 → Graph):
+
+   - `libp2p_rcmgr_connections{scope="system"}` — total connections (should stay ≤ 10).
+   - `libp2p_rcmgr_streams{scope="system"}` — total streams (≤ 20).
+   - `libp2p_rcmgr_memory{scope="system"}` — bytes (≤ 32*1024*1024).
+   - `libp2p_rcmgr_blocked_resources` — blocked events; should increase when you are at a limit.
+
+1. **Sanity checks**: Connections and streams should level off at 10 and 20; memory at or below 32 MB. After ~15–20 seconds you should see some blocked resources (connections or memory). The Grafana dashboard panels use these same metrics.
+
 Notes:
 
 - The Grafana dashboard `py-libp2p Resource Manager` is auto-provisioned.
 - If you change the exporter port, re-run `configure.py` and `docker compose restart prometheus`.
+- Prometheus reaches the host via `host.docker.internal` (docker-compose sets `host-gateway`). If the py-libp2p target stays DOWN, try the Docker bridge IP in `prometheus.yml` (e.g. `172.17.0.1:8000` from `ip addr show docker0`) or your machine’s IP.
+- If port 8000 is already in use, run the demo on another port (e.g. `python run_demo.py --port 8001`), then run `configure.py --port 8001` and `docker compose restart prometheus`.
+
+**Testing**
+
+Tests for `run_demo.py` (different parameters, limit enforcement) live under the main test suite:
+
+```bash
+pytest tests/examples/test_health_monitoring_run_demo.py -v
+```
 
 Stop:
 
diff --git a/examples/health_monitoring/configure.py b/examples/health_monitoring/configure.py
@@ -9,10 +9,13 @@
 
 def set_exporter_port(port: int) -> None:
     content = PROM_PATH.read_text()
-    pattern = r"host\\.docker\\.internal:\\d+"
-    replacement = f"host.docker.internal:{port}"
-    new = re.sub(pattern, replacement, content)
-    PROM_PATH.write_text(new)
+    # Update py-libp2p target port (host.docker.internal or legacy 172.17.0.1)
+    for pattern, replacement in [
+        (r"host\.docker\.internal:\d+", f"host.docker.internal:{port}"),
+        (r"172\.17\.0\.1:\d+", f"172.17.0.1:{port}"),
+    ]:
+        content = re.sub(pattern, replacement, content)
+    PROM_PATH.write_text(content)
     print(f"Updated Prometheus target to host.docker.internal:{port}")
 
 
diff --git a/examples/health_monitoring/prometheus.yml b/examples/health_monitoring/prometheus.yml
@@ -22,10 +22,11 @@ scrape_configs:
     scheme: http
 
   # py-libp2p resource manager metrics
+  # Host is reached via host.docker.internal (docker-compose sets host-gateway).
+  # Run configure.py --port N to update the port.
   - job_name: 'py-libp2p'
     static_configs:
-      # This target can be updated by the helper to match the chosen exporter port
-      - targets: ['host.docker.internal:8000']  # Default py-libp2p metrics port
+      - targets: ['host.docker.internal:8000']
     scrape_interval: 5s   # More frequent scraping for libp2p metrics
     scrape_timeout: 1s  # Again the scrape timeout should be less than the interval, otherwise prometheus will skip the scrape and give an error
     metrics_path: /metrics
@@ -41,12 +42,11 @@ scrape_configs:
         regex: 'true'
         replacement: 'py-libp2p'
 
-  # Node Exporter metrics
-  # Useful for monitoring system-level metrics of the host
-  - job_name: 'node-exporter'
-    static_configs:
-      - targets: ['node-exporter:9100']
-    scrape_interval: 15s
-    scrape_timeout: 10s
-    metrics_path: /metrics
-    scheme: http
+  # Node Exporter (optional): uncomment and add node-exporter to docker-compose if needed
+  # - job_name: 'node-exporter'
+  #   static_configs:
+  #     - targets: ['node-exporter:9100']
+  #   scrape_interval: 15s
+  #   scrape_timeout: 10s
+  #   metrics_path: /metrics
+  #   scheme: http
diff --git a/examples/health_monitoring/run_demo.py b/examples/health_monitoring/run_demo.py
@@ -20,6 +20,7 @@
 from libp2p.rcmgr import Direction
 from libp2p.rcmgr.manager import ResourceLimits, ResourceManager
 from libp2p.rcmgr.monitoring import Monitor
+from libp2p.rcmgr.prometheus_exporter import create_prometheus_exporter
 
 
 def _is_port_free(port: int) -> bool:
@@ -74,33 +75,80 @@ def main() -> None:
         type=str,
         default=os.getenv("DEMO_LOG_LEVEL", "INFO"),
     )
+    parser.add_argument(
+        "--max-connections",
+        type=int,
+        default=10,
+        metavar="N",
+        help="Resource limit: max connections (default: 10)",
+    )
+    parser.add_argument(
+        "--max-streams",
+        type=int,
+        default=20,
+        metavar="N",
+        help="Resource limit: max streams (default: 20)",
+    )
+    parser.add_argument(
+        "--max-memory-mb",
+        type=int,
+        default=32,
+        metavar="MB",
+        help="Resource limit: max memory in MB (default: 32)",
+    )
+    parser.add_argument(
+        "--interval",
+        type=float,
+        default=1.0,
+        metavar="SECS",
+        help="Seconds between iterations (default: 1.0)",
+    )
+    parser.add_argument(
+        "--no-connection-tracking",
+        action="store_true",
+        help="Disable connection tracking in the monitor",
+    )
+    parser.add_argument(
+        "--no-protocol-metrics",
+        action="store_true",
+        help="Disable protocol metrics in the monitor",
+    )
     args = parser.parse_args()
 
     _setup_logging(args.log_level)
 
     port = _pick_port(args.port)
 
     limits = ResourceLimits(
-        max_connections=10,
-        max_streams=20,
-        max_memory_mb=32,
+        max_connections=args.max_connections,
+        max_streams=args.max_streams,
+        max_memory_mb=args.max_memory_mb,
     )
 
+    # Single shared exporter so only one HTTP server binds to the port
+    shared_exporter = create_prometheus_exporter(port=port, enable_server=True)
+
     monitor = Monitor(
-        enable_prometheus=True,
-        prometheus_port=port,
-        enable_connection_tracking=True,
-        enable_protocol_metrics=True,
+        prometheus_exporter=shared_exporter,
+        enable_connection_tracking=not args.no_connection_tracking,
+        enable_protocol_metrics=not args.no_protocol_metrics,
     )
 
     rcmgr = ResourceManager(
         limits=limits,
-        enable_prometheus=True,
-        prometheus_port=port,
+        prometheus_exporter=shared_exporter,
         enable_metrics=True,
     )
 
-    logging.info("Resource Manager initialized on port %s", port)
+    logging.info(
+        "Resource Manager initialized on port %s (limits: %s conns, %s streams, "
+        "%s MB; interval %.2fs)",
+        port,
+        limits.max_connections,
+        limits.max_streams,
+        args.max_memory_mb,
+        args.interval,
+    )
 
     connection_count = 0
     blocked_connections = 0
@@ -276,7 +324,7 @@ def _handle_signal(signum: int, _: object) -> None:
                 monitor.prometheus_exporter.update_from_metrics(rcmgr.metrics)
 
         iteration += 1
-        time.sleep(1)
+        time.sleep(args.interval)
 
     logging.info(
         "%s active connections, %s blocked",
diff --git a/libp2p/network/config.py b/libp2p/network/config.py
@@ -145,3 +145,5 @@ def __post_init__(self) -> None:
                 raise ValueError(
                     "Critical health threshold must be between 0.0 and 1.0"
                 )
+            if self.unhealthy_grace_period < 0:
+                raise ValueError("unhealthy_grace_period must be non-negative")
diff --git a/libp2p/network/health/monitor.py b/libp2p/network/health/monitor.py
@@ -130,6 +130,8 @@ async def _check_connection_health(self, peer_id: ID, conn: INetConn) -> None:
             warmup = getattr(self.config, "health_warmup_window", 0.0)
             if warmup:
                 # Check if we have health data with established_at timestamp
+                # Use time.time() (wall clock) to match ConnectionHealth.established_at,
+                # which is set with time.time() in data_structures.
                 if self._has_health_data(peer_id, conn):
                     import time
 
diff --git a/tests/examples/test_health_monitoring_run_demo.py b/tests/examples/test_health_monitoring_run_demo.py

Original file line number	Diff line number	Diff line change
`@@ -145,3 +145,5 @@ def __post_init__(self) -> None:`
`145`	`145`	`raise ValueError(`
`146`	`146`	`"Critical health threshold must be between 0.0 and 1.0"`
`147`	`147`	`)`
	`148`	`+ if self.unhealthy_grace_period < 0:`
	`149`	`+ raise ValueError("unhealthy_grace_period must be non-negative")`