This module provides Prometheus metrics collection and monitoring for OverFast API.
- metrics.py: Centralized Prometheus metric definitions (counters, gauges, histograms)
- middleware.py: FastAPI middleware for tracking requests that reach the app (cache misses)
- helpers.py: Utility functions for metrics (endpoint normalization, URL normalization)
Without normalization, each unique player ID or hero key creates a separate metric label, leading to metric cardinality explosion. For example:
/players/TeKrop-2217/summary/players/Player-1234/summary/players/User-5678/summary
Would create 3 separate metrics instead of 1. Prometheus performance degrades with high cardinality.
Endpoint normalization is implemented in two places:
- Python (
app/monitoring/helpers.py): Used by FastAPI middleware for requests that reach the app - Lua (
build/nginx/prometheus_log.conf): Used by Nginx for ALL requests (including cache hits)
IMPORTANT: When modifying normalization logic, update BOTH implementations to avoid divergence.
Both implementations must follow these exact patterns:
| Pattern | Raw Path | Normalized Path |
|---|---|---|
| Player with trailing segments | /players/TeKrop-2217/summary |
/players/{player_id}/summary |
| Player without trailing segments | /players/TeKrop-2217 |
/players/{player_id} |
| Hero key (single segment, not "stats") | /heroes/ana |
/heroes/{hero_key} |
| Hero stats | /heroes/stats |
/heroes/stats (unchanged) |
| Heroes list | /heroes |
/heroes (unchanged) |
| All other paths | /maps, /gamemodes, etc. |
Unchanged |
When changing normalization logic:
- Update Python (
app/monitoring/helpers.py-normalize_endpoint()) - Update Lua (
build/nginx/prometheus_log.conf-normalize_endpoint()function) - Update tests (
tests/monitoring/test_helpers.py-TestNormalizeEndpoint) - Run tests to verify Python implementation
- Manual testing to verify Nginx/Lua implementation:
just up monitoring=true curl http://localhost:8080/players/TestPlayer-1234/summary curl http://localhost:9090/api/v1/query?query=nginx_http_requests_total # Verify label shows: endpoint="/players/{player_id}/summary"
Tests in tests/monitoring/test_helpers.py cover:
- Player paths with and without trailing segments
- Hero paths (single segment vs
/heroes/stats) - Edge cases (trailing slashes, multiple segments)
- Different player ID formats (BattleTag, Blizzard ID)
These tests validate the Python implementation only. The Lua implementation must be manually verified or tested through integration tests.
Separate from endpoint normalization, normalize_blizzard_url() in helpers.py normalizes Blizzard API URLs for tracking in blizzard_requests_total metrics. This prevents cardinality explosion from locale variations and dynamic IDs.
See function docstring and TestNormalizeBlizzardUrl tests for details.
Set PROMETHEUS_ENABLED=true in .env:
PROMETHEUS_ENABLED=true# Start API + Prometheus + Grafana
just up monitoring=true
# Start API + Prometheus only
just up prometheus=true
# Start API + Grafana only
just up grafana=true- FastAPI metrics:
http://localhost:8080/metrics(Docker network only, blocked from public) - Nginx metrics:
http://localhost:9145/metrics(Docker network only) - Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3000(default admin/admin orGRAFANA_ADMIN_PASSWORD)
Grafana dashboards are auto-provisioned from build/grafana/provisioning/dashboards/:
- API Usage: Request rates, cache hit rates, endpoint popularity
- API Health: Error rates, response times, status code distribution
- Blizzard Calls: Upstream request tracking, rate limiting
- Tasks & Rate Limiting: Background task metrics, AIMD congestion control
- System Metrics: Nginx connections, in-progress requests
When PROMETHEUS_ENABLED=false (default):
- Zero overhead: No metrics collection, no middleware registration
- Nginx Lua blocks are empty (template substitution)
- FastAPI
/metricsendpoint not registered
When PROMETHEUS_ENABLED=true:
- Minimal overhead: ~5-10μs per request for Nginx, ~10-50μs for FastAPI
- Shared memory zone (10MB) for Lua metrics
- In-memory Python counters with minimal locking
- At 10 rps: <0.1% CPU, <1ms additional latency
- Check
PROMETHEUS_ENABLED=truein.env - Restart containers:
just down && just up monitoring=true - Check Prometheus targets:
http://localhost:9090/targets - Check logs:
docker compose logs prometheus
If you see warnings about high cardinality labels:
- Review endpoint normalization patterns
- Check for new dynamic URL segments not being normalized
- Update both Python and Lua implementations
Check Nginx error log:
docker compose logs nginx | grep errorCommon issues:
prometheusvariable nil:PROMETHEUS_ENABLEDnot set or template substitution failed- Pattern mismatch: Lua regex syntax differs from Python regex