Skip to content

Commit 596fe81

Browse files
committed
docs: add demo-guide, review README and copilot-instructions
- docs/demo-guide.md: step-by-step usage guide with 4 scenarios (Streamlit UI, YAML editor, Python SDK, Airflow DAG), troubleshooting table, and observability callouts - README.md: fix container count (17→18), remove outdated manual admin creation block (now automatic), add cAdvisor to UIs table, add Documentation section linking demo-guide/architecture/access-credentials - .github/copilot-instructions.md: rename to ArrowFlow, update project structure (new docs, prometheus/grafana/ provisioning paths), Docker Compose table +cAdvisor (18 services), Grafana auto-provisioning note, PromQL metric naming convention, Airflow idempotent user creation, Common Tasks cheat sheet updated with new doc refs
1 parent 887aa88 commit 596fe81

3 files changed

Lines changed: 317 additions & 17 deletions

File tree

.github/copilot-instructions.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1-
# Copilot Instructions — ETL Microservices Platform
1+
# Copilot Instructions — ArrowFlow (ETL Microservices Platform)
22

33
> These instructions provide context for AI agents (GitHub Copilot, Copilot Chat, agentic workflows) working on this codebase. They describe architecture, conventions, patterns, constraints, and lessons learned so that an AI agent can make correct decisions without re-discovering project structure.
44
55
---
66

77
## 1. Project Identity
88

9+
**Project name:** ArrowFlow (GitHub repo: `VTvito/arrowflow`).
10+
911
**What this is:** An AI-assisted, modular ETL (Extract, Transform, Load) platform where each data operation is an independent Flask microservice. Pipelines are orchestrated via Apache Airflow DAGs or via the AI agent (natural language → YAML → execution).
1012

1113
**Primary use case:** HR / People Analytics and E-commerce — the platform ships with production-ready pipelines for the IBM HR Attrition dataset and e-commerce order analytics, plus a weather API demo. Bundled demo datasets in `data/demo/` allow out-of-the-box testing.
@@ -93,8 +95,8 @@ All services propagate `X-Correlation-ID` header for end-to-end request tracing.
9395

9496
```
9597
etl_microservices/
96-
├── docker-compose.yml # Full stack: 11 services + postgres + airflow + prometheus + grafana + streamlit
97-
├── Makefile # Common commands: up, down, build, test, lint, benchmark
98+
├── docker-compose.yml # Full stack: 11 ETL services + cAdvisor + postgres + airflow + prometheus + grafana + streamlit (18 containers total)
99+
├── Makefile # Common commands: up, down, build, test, lint, benchmark, demo-data, quickstart
98100
├── pyproject.toml # Project metadata, pytest/ruff/coverage config
99101
├── README.md # Project overview and quickstart
100102
├── .env # Environment variables (not committed)
@@ -159,7 +161,10 @@ etl_microservices/
159161
│ └── ecommerce_orders.csv # E-commerce orders sample (501 rows)
160162
161163
├── docs/
162-
│ └── extending.md # Developer guide: add new services & create pipelines
164+
│ ├── extending.md # Developer guide: add new services & create pipelines
165+
│ ├── demo-guide.md # Step-by-step usage guide (UI, YAML editor, SDK, Airflow)
166+
│ ├── access-credentials.md # All service URLs, ports, credentials, env vars
167+
│ └── architecture.md # Technical design: Arrow IPC, parallelism, Gunicorn, security
163168
164169
├── examples/
165170
│ └── pipelines/ # Ready-to-use YAML pipeline definitions
@@ -186,7 +191,13 @@ etl_microservices/
186191
│ └── workflows/ci.yml # GitHub Actions: lint → test-unit → test-integration → docker build
187192
188193
└── prometheus/
189-
└── prometheus.yml # Scrape targets for all 11 services
194+
├── prometheus.yml # Scrape targets for all 11 ETL services + cAdvisor + prometheus self-scrape
195+
└── grafana/
196+
├── provisioning/
197+
│ ├── datasources/prometheus.yml # Auto-configures Prometheus datasource (uid: prometheus-etl)
198+
│ └── dashboards/dashboards.yml # Dashboard provider pointing to provisioned-dashboards/
199+
└── dashboards/
200+
└── etl_services_overview.json # Pre-built 15-panel monitoring dashboard (uid: etl-monitoring-v1)
190201
```
191202

192203
---
@@ -406,15 +417,23 @@ Files stored at `/app/data/<dataset_name>/xcom/<step>_<timestamp>_<uuid>.arrow`.
406417
- **Base image:** `python:3.9-slim` for all services
407418
- **Shared volume:** `etl-containers-shared-data` mounted at `/app/data` across all containers
408419

409-
### Docker Compose Services
420+
### Docker Compose Services (18 total)
410421

411422
| Category | Services |
412423
|---|---|
413-
| **Infrastructure** | postgres, statsd-exporter, prometheus, grafana |
424+
| **Infrastructure** | postgres, statsd-exporter, prometheus, grafana, cadvisor |
414425
| **Orchestration** | airflow (webserver + scheduler) |
415426
| **ETL Services** | 11 microservices (ports 5001–5012) |
416427
| **UI** | streamlit-app (port 8501) |
417428

429+
**cAdvisor**: `gcr.io/cadvisor/cadvisor:latest`, port 8088→8080, `--docker_only=true`. Provides per-container CPU/memory metrics scraped by Prometheus.
430+
431+
**Grafana provisioning**: datasource and dashboard are auto-loaded at startup from `prometheus/grafana/provisioning/`. No manual configuration needed. Dashboard uid: `etl-monitoring-v1`.
432+
433+
**Prometheus metric naming**: counters follow the pattern `{slug}_requests_total` / `{slug}_success_total` / `{slug}_error_total` where slug is the service key (e.g., `extract_csv_requests_total`). PromQL aggregation pattern: `{__name__=~".*_requests_total", job=~".+-service"}`.
434+
435+
**Airflow admin user**: created automatically at first boot by the Dockerfile CMD (idempotent — skipped if already exists). Credentials: `admin` / `admin`.
436+
418437
### Network
419438

420439
Single bridge network `etl-network`. Services reference each other by container name (e.g., `http://clean-nan-service:5002`).
@@ -641,13 +660,18 @@ These are hard-won insights from building and debugging the platform. They shoul
641660
| Task | Command / File |
642661
|---|---|
643662
| Start all services | `make up` or `docker compose up -d` |
663+
| Load demo datasets | `make demo-data` |
644664
| Run tests | `make test` |
645665
| Lint | `make lint` |
646666
| Add new service | Follow section 10 checklist |
647667
| Generate benchmark data | `make benchmark-data` |
648668
| Run benchmark | `make benchmark-all` |
649669
| Trigger HR pipeline | Airflow UI → `hr_analytics_pipeline` → Trigger with config |
650670
| Access Streamlit | http://localhost:8501 |
671+
| Grafana dashboard | http://localhost:3000 (admin / GF_SECURITY_ADMIN_PASSWORD) |
672+
| Service credentials | `docs/access-credentials.md` |
673+
| Architecture doc | `docs/architecture.md` |
674+
| Demo walkthrough | `docs/demo-guide.md` |
651675

652676
---
653677

README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,24 +39,18 @@ cd arrowflow
3939
make quickstart
4040
```
4141

42-
This will build all images, start 17 containers, and load the demo datasets.
43-
44-
### Create Airflow Admin (first time only)
45-
46-
```bash
47-
docker exec -it airflow airflow users create \
48-
--username admin --firstname Admin --lastname User \
49-
--role Admin --email admin@example.com --password admin
50-
```
42+
This will build all images, start 18 containers, and load the demo datasets.
43+
The Airflow admin user (`admin`/`admin`) is created automatically on first boot.
5144

5245
### Open the UIs
5346

5447
| Interface | URL | Credentials |
5548
|---|---|---|
5649
| **Streamlit** (AI Pipeline Builder) | http://localhost:8501 | &mdash; |
5750
| **Airflow** | http://localhost:8080 | admin / admin |
58-
| **Grafana** | http://localhost:3000 | admin / *GF_SECURITY_ADMIN_PASSWORD from .env* |
51+
| **Grafana** (pre-provisioned dashboard) | http://localhost:3000 | admin / *GF_SECURITY_ADMIN_PASSWORD from .env* |
5952
| **Prometheus** | http://localhost:9090 | &mdash; |
53+
| **cAdvisor** (container resources) | http://localhost:8088 | &mdash; |
6054

6155
### Try a Demo Pipeline
6256

@@ -235,6 +229,14 @@ cp -r templates/new_service services/my-service
235229

236230
Full walkthrough: [docs/extending.md](docs/extending.md)
237231

232+
### Documentation
233+
234+
| Doc | Contents |
235+
|---|---|
236+
| [docs/demo-guide.md](docs/demo-guide.md) | Step-by-step demo: UI, YAML editor, SDK, Airflow |
237+
| [docs/architecture.md](docs/architecture.md) | System design, Arrow IPC, parallelism, Gunicorn, security |
238+
| [docs/access-credentials.md](docs/access-credentials.md) | All service URLs, credentials, env vars |
239+
238240
### Project Structure
239241

240242
<details>

0 commit comments

Comments
 (0)