@@ -8,6 +8,9 @@ A production-ready pipeline automation system built with:
88- ** PostgreSQL** - Persistence
99- ** AI Safety Module** - Failure prediction & anomaly handling
1010- ** BentoML + Feast + Kubeflow** - End-to-end model infrastructure
11+ - ** Prometheus + Grafana** - Metrics collection and dashboards
12+ - ** ELK Stack (Elasticsearch, Logstash, Kibana)** - Centralized logging
13+ - ** Sentry** - Error monitoring and tracing
1114
1215## Architecture Overview
1316
@@ -115,6 +118,10 @@ docker-compose up -d
115118| FastAPI Docs | http://localhost:8000/api/docs | - |
116119| PostgreSQL | localhost:5432 | airflow / airflow |
117120| Redis | localhost:6379 | - |
121+ | Prometheus | http://localhost:9090 | - |
122+ | Grafana | http://localhost:3000 | admin / admin |
123+ | Kibana | http://localhost:5601 | - |
124+ | Elasticsearch | http://localhost:9200 | - |
118125
119126### 4. Create Your First Pipeline
120127
@@ -193,6 +200,26 @@ curl -X POST http://localhost:8000/api/executions/pipeline-xxx/execute
193200| GET | ` /metrics ` | Dashboard metrics |
194201| GET | ` /insights ` | AI insights |
195202
203+
204+ ## Observability & Monitoring
205+
206+ ### Metrics (Prometheus)
207+ - Backend exposes Prometheus metrics at ` /metrics ` via ` prometheus-fastapi-instrumentator ` .
208+ - Prometheus scrapes ` backend:8000/metrics ` every 15 seconds.
209+
210+ ### Dashboards (Grafana)
211+ - Grafana runs on port ` 3000 ` and can connect to Prometheus (` http://prometheus:9090 ` ) as a data source.
212+ - Default credentials are ` admin/admin ` (change in production).
213+
214+ ### Centralized Logging (ELK Stack)
215+ - Elasticsearch stores indexed logs.
216+ - Logstash listens on ` 5000 ` (TCP JSON) and ` 5044 ` (beats) and forwards to Elasticsearch.
217+ - Kibana provides visualization for indices like ` flexiroaster-backend-* ` .
218+
219+ ### Error Monitoring (Sentry)
220+ - Configure ` SENTRY_DSN ` to enable Sentry for FastAPI exception capture and tracing.
221+ - Optional tuning: ` SENTRY_ENVIRONMENT ` , ` SENTRY_TRACES_SAMPLE_RATE ` , and ` SENTRY_PROFILES_SAMPLE_RATE ` .
222+
196223## Configuration
197224
198225### Environment Variables
@@ -205,6 +232,9 @@ curl -X POST http://localhost:8000/api/executions/pipeline-xxx/execute
205232| ` EXECUTOR_STAGE_TIMEOUT ` | ` 120 ` | Stage timeout in seconds |
206233| ` AI_BLOCK_HIGH_RISK ` | ` false ` | Block high-risk executions |
207234| ` AI_RISK_THRESHOLD_HIGH ` | ` 0.7 ` | High risk threshold |
235+ | ` SENTRY_DSN ` | ` "" ` | Enables Sentry when set |
236+ | ` SENTRY_ENVIRONMENT ` | ` development ` | Sentry environment label |
237+ | ` SENTRY_TRACES_SAMPLE_RATE ` | ` 0.1 ` | Fraction of traced requests |
208238
209239### Airflow Variables
210240
0 commit comments