Skip to content

Commit 85a08d0

Browse files
johnmathewsclaude
andcommitted
Update docs and journal for Day 3 progress
- CLAUDE.md: reflect completed stages 5-9 and new file locations - architecture.md: add deploy.yml and docker.yml to CI/CD section - implementation-plan.md: update progress dashboard with current status - journal: add Day 3 entry covering all infrastructure work Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4a41bd1 commit 85a08d0

4 files changed

Lines changed: 92 additions & 21 deletions

File tree

CLAUDE.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,22 +22,32 @@ loan documents.
2222
- **Database:** PostgreSQL with pgvector (metadata + embeddings)
2323
- **Blob storage:** Azure Blob Storage (original PDFs, optional)
2424

25-
### Planned (not yet implemented)
26-
- **Autoscaling:** KEDA (queue-depth based)
27-
- **Monitoring:** Prometheus + Grafana
28-
- **Chaos engineering:** Chaos Mesh
25+
### Implemented (Day 3)
26+
- **Autoscaling:** KEDA ScaledObjects (queue-depth based, YAMLs in k8s/scaling/)
27+
- **Monitoring:** Prometheus + Grafana dashboard (grafana/documentstream-dashboard.json)
28+
- **Chaos engineering:** Chaos Mesh experiments (k8s/chaos/)
29+
- **Load testing:** Locust (locust/locustfile.py)
30+
- **CI/CD:** GitHub Actions deploy workflow (.github/workflows/deploy.yml)
31+
32+
### Not yet done (needs live AKS cluster)
33+
- Azure infra provisioning (scripts ready in infra/)
34+
- Build/push images to ACR and deploy to AKS
35+
- Import Grafana dashboard, apply KEDA/Chaos manifests
36+
- End-to-end demo rehearsal
2937

3038
## Project Structure
3139
- `src/gateway/` — FastAPI API + web UI (dual-mode: sync or async via Redis)
3240
- `src/worker/` — Extract, classify, semantic, store modules + Redis queue + worker runners
3341
- `src/generator/` — PDF document generator (5 templates, CLI tool)
3442
- `demo_samples/` — One complete loan scenario (5 PDFs, committed to git for visibility)
3543
- `tests/` — All tests (83 tests)
36-
- `k8s/` — Kubernetes manifests (empty — Day 2)
37-
- `infra/` — Azure setup/teardown scripts (empty — Day 2)
38-
- `locust/` — Load testing (empty — Day 3)
39-
- `grafana/` — Dashboard JSON (empty — Day 2)
40-
- `docs/` — Documentation (architecture, classification, demo guide, dictionary)
44+
- `k8s/base/` — Kubernetes base manifests (9 files: namespace, configmap, deployments, service, ingress, kustomization)
45+
- `k8s/scaling/` — KEDA ScaledObjects for extract, classify, store workers
46+
- `k8s/chaos/` — Chaos Mesh experiments (pod-kill, network-delay, cpu-stress)
47+
- `infra/` — Azure setup/teardown/helm-install scripts
48+
- `locust/` — Locust load test (locustfile.py)
49+
- `grafana/` — Grafana dashboard JSON (7 panels)
50+
- `docs/` — Documentation (architecture, classification, demo guide, dictionary, implementation plan)
4151
- `journal/` — Development journal
4252

4353
## Commands
@@ -59,7 +69,8 @@ loan documents.
5969
- Workers use Redis consumer groups for at-least-once delivery
6070
- SIGTERM graceful shutdown on all workers (finish current message before exiting)
6171

62-
### Target architecture (Day 2-3)
63-
- Each pipeline stage as a separate K8s Deployment scaled by KEDA
72+
### Architecture
73+
- Each pipeline stage is a separate K8s Deployment scaled by KEDA
6474
- Documents flow through Redis Streams: raw-docs → extracted → classified → stored
6575
- Store worker persists to PostgreSQL (pgvector) + Azure Blob Storage
76+
- CI/CD: ci.yml (lint+test), docker.yml (ghcr.io push), deploy.yml (ACR build + AKS deploy)

docs/architecture.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,8 @@ When stopped (`az aks stop` + `az postgres flexible-server stop`): ~€0.01/hr (
217217

218218
GitHub Actions workflows:
219219
- **ci.yml** — On push to main and PRs: ruff lint, ruff format check, pytest with coverage
220-
- **deploy.yml***Not yet created.* Will build images → push to ACR → deploy to AKS
220+
- **docker.yml** — On push to main: build and push images to ghcr.io/johnmathews/k8s
221+
- **deploy.yml** — On push to main (src/ or k8s/ changes): build → push to ACR → deploy to AKS
221222

222223
---
223224

docs/implementation-plan.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
**Timeline:** 3 days (2026-03-28 to 2026-03-30)
44
**Interview:** After Day 3
5-
**Last updated:** 2026-03-29
5+
**Last updated:** 2026-03-30
66

77
---
88

@@ -12,16 +12,16 @@
1212
|---|---|---|---|---|
1313
| -- | **Day 1: Foundation** | -- | -- | **DONE** |
1414
| 0 | Tool setup (helm, kustomize) | MUST | 15min | DONE |
15-
| 1 | Azure infrastructure | MUST | 1.5-2h | PARTIAL |
15+
| 1 | Azure infrastructure | MUST | 1.5-2h | PARTIAL (scripts written, not executed) |
1616
| 2 | Redis Streams pipeline refactor | MUST | 2.5-3h | DONE |
1717
| 3 | K8s manifests | MUST | 2-2.5h | DONE |
18-
| 4 | Build, push, deploy to AKS | MUST | 1-1.5h | TODO |
19-
| 5 | KEDA autoscaling | MUST | 1-1.5h | TODO |
20-
| 6 | Grafana dashboard | HIGH | 1.5-2h | TODO |
21-
| 7 | Chaos Mesh experiments | MEDIUM | 1h | TODO |
22-
| 8 | Locust load testing | MEDIUM | 1h | TODO |
23-
| 9 | CI/CD deploy workflow | MEDIUM | 1h | TODO |
24-
| 10 | Rolling update demo prep | LOW | 30min | TODO |
18+
| 4 | Build, push, deploy to AKS | MUST | 1-1.5h | TODO (needs Azure) |
19+
| 5 | KEDA autoscaling | MUST | 1-1.5h | DONE (YAMLs written, needs apply) |
20+
| 6 | Grafana dashboard | HIGH | 1.5-2h | DONE (JSON written, needs import) |
21+
| 7 | Chaos Mesh experiments | MEDIUM | 1h | DONE (YAMLs written, needs apply) |
22+
| 8 | Locust load testing | MEDIUM | 1h | DONE (locustfile written, needs run) |
23+
| 9 | CI/CD deploy workflow | MEDIUM | 1h | DONE |
24+
| 10 | Rolling update demo prep | LOW | 30min | TODO (live demo technique) |
2525
| 11 | Polish and demo rehearsal | MUST | 1.5-2h | TODO |
2626

2727
**If time runs short:** Cut from the bottom. Stages 0-5 + 11 are non-negotiable. Stage 6 (Grafana) is
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Day 3: KEDA, Grafana, Chaos Mesh, Locust, and CI/CD Deploy
2+
3+
**Date:** 2026-03-30
4+
5+
## What Was Done
6+
7+
Completed Stages 5-9 of the implementation plan. All files are written and ready to
8+
apply once the AKS cluster is provisioned. No new Python application code — this was
9+
entirely infrastructure/config work.
10+
11+
### Stage 5: KEDA ScaledObjects
12+
- Created 3 ScaledObject manifests in `k8s/scaling/`
13+
- Each targets a worker deployment and watches its corresponding Redis stream
14+
- Config: pollingInterval 15s, cooldownPeriod 60s, min 1 / max 8 replicas, lagCount threshold 5
15+
- Uses the `redis-streams` trigger type pointing at `redis-master.documentstream.svc.cluster.local:6379`
16+
17+
### Stage 6: Grafana Dashboard
18+
- Created `grafana/documentstream-dashboard.json` with 7 panels
19+
- Row 1: Pod count (bar gauge), Redis queue depth (stat), Pod restarts (stat)
20+
- Row 2: CPU usage per pod (timeseries), Memory usage per pod (timeseries)
21+
- Row 3: KEDA scaling metrics (timeseries), Network I/O (timeseries)
22+
- 5-second auto-refresh for live demo, color thresholds on stat panels
23+
24+
### Stage 7: Chaos Mesh Experiments
25+
- `k8s/chaos/pod-kill.yaml` — Kills 2 classify-worker pods (demonstrates self-healing)
26+
- `k8s/chaos/network-delay.yaml` — 500ms latency on store-workers (demonstrates resilience)
27+
- `k8s/chaos/cpu-stress.yaml` — 80% CPU burn on classify-workers (demonstrates KEDA scale-up)
28+
- All include descriptive comments and usage commands
29+
30+
### Stage 8: Locust Load Test
31+
- `locust/locustfile.py` with 4 weighted tasks
32+
- Upload PDF (weight 3), generate scenario (weight 1), list docs (weight 5), health check (weight 2)
33+
- PDF generated once at class level, reused per request
34+
- Supports local, AKS, and headless (CI) modes
35+
36+
### Stage 9: CI/CD Deploy Workflow
37+
- `.github/workflows/deploy.yml` — triggers on push to main when src/ or k8s/ files change
38+
- Builds both gateway and worker images to ACR with SHA tags
39+
- Applies K8s manifests and waits for rollout status
40+
- Added `workflow_call` trigger to `ci.yml` so deploy.yml can reuse it as a gate
41+
42+
### Documentation Updates
43+
- Updated CLAUDE.md to reflect all new files and current project state
44+
- Updated architecture.md with deploy.yml reference
45+
- Updated implementation-plan.md progress dashboard
46+
47+
## What's Left
48+
49+
All remaining work requires a live AKS cluster:
50+
1. Run `infra/setup.sh` to provision Azure resources
51+
2. Run `infra/helm-install.sh` to install Helm charts
52+
3. Build and push images to ACR
53+
4. `kubectl apply -k k8s/base/` + scaling + chaos manifests
54+
5. Import Grafana dashboard
55+
6. End-to-end demo rehearsal
56+
57+
## Test Status
58+
- 83 tests passing, 88% coverage
59+
- No new tests needed (all new files are YAML/JSON/config)

0 commit comments

Comments
 (0)