Skip to content

Commit 460a331

Browse files
johnmathewsclaude
andcommitted
Simplify blob paths, fix Locust for KEDA scaling, clean up demo checklist
- Store blobs as {loan_id}/{doc_type}.pdf instead of {uuid}/{loan_id}/{doc_type}.pdf - Reweight Locust to 80% async uploads (triggers KEDA), remove sync generate task - Remove stale Postgres Flexible Server step from demo checklist (runs in-cluster) - Update journal with full session work Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 44050d4 commit 460a331

5 files changed

Lines changed: 38 additions & 20 deletions

File tree

docs/demo-guide.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,7 @@ commands, timing, talking points, and things to name-drop with context on **why*
1010
Before the interview:
1111

1212
- [ ] AKS cluster is running (`az aks start -n DocumentStreamManagedCluster -g documentstream`)
13-
- [ ] PostgreSQL is running (`az postgres flexible-server start -n documentstream-pg -g documentstream`)
14-
- [ ] All pods are healthy (`kubectl get pods -n documentstream`)
13+
- [ ] All pods are healthy (`kubectl get pods -n documentstream`) — PostgreSQL runs in-cluster, no separate start needed
1514
- [ ] Grafana is accessible and dashboard is loaded
1615
- [ ] Chaos Mesh dashboard is accessible
1716
- [ ] Locust is ready (either in-cluster or local)

journal/260402-chaos-mesh-testing-and-demo-rehearsal.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,34 @@ Applied via `helm upgrade --install`. Chaos daemons restarted and all experiment
5555
4. **Node capacity:** Only 2 of 3 configured nodes are running. Some KEDA-scaled
5656
pods couldn't schedule. Not a blocker for the demo narrative but worth noting.
5757

58+
## Additional fixes
59+
60+
- **Blob storage paths:** Removed unnecessary UUID prefix from blob names. Documents
61+
are now stored as `CRE-123456/contract.pdf` instead of `{uuid}/CRE-123456/contract.pdf`.
62+
The loan ID grouping in the filename is sufficient.
63+
- **Locust load test:** Reweighted tasks so 80% of requests use the async `/api/documents`
64+
endpoint (which goes through Redis and triggers KEDA). Removed the sync `/api/generate`
65+
task which bypassed Redis entirely. Reduced wait time from 1-3s to 0.5-1.5s.
66+
- **Chaos Mesh dashboard RBAC:** Created a `chaos-admin` service account with cluster-admin
67+
binding to fix the dashboard permission error. Generated a long-lived token for demo use.
68+
- **Demo checklist:** Removed the PostgreSQL Flexible Server start step — Postgres runs
69+
in-cluster as a pod, starts automatically with AKS.
70+
- **Documentation audit:** Fixed test count in README (51 → 92), corrected Azure resource
71+
names across all docs, updated implementation plan progress.
72+
- **CI fix:** `src/worker/store.py` had a ruff format issue (function args on one line).
73+
5874
## Files changed
5975

6076
- `k8s/chaos/pod-kill.yaml` — changed mode from `fixed`/`value: "2"` to `one`
6177
- `infra/helm-install.sh` — added containerd runtime settings for Chaos Mesh
78+
- `infra/setup.sh` — corrected storage account name
6279
- `docs/chaos-experiments.md` — added containerd prerequisite note and verified results
63-
- `docs/demo-guide.md` — updated rolling update section to use gateway instead of workers
80+
- `docs/demo-guide.md` — updated rolling update section, removed stale Postgres checklist item
81+
- `docs/implementation-plan.md` — marked chaos, rolling update, demo rehearsal as DONE
82+
- `locust/locustfile.py` — reweighted for async pipeline, removed sync generate task
83+
- `src/worker/store.py` — simplified blob path (filename only, no UUID prefix)
84+
- `tests/test_store.py` — updated blob path assertions
85+
- `README.md` — corrected test count (51 → 92)
6486
- `CLAUDE.md` — removed chaos mesh and demo rehearsal from "Not yet done"
87+
- `.engineering-team/architecture-plan.md` — updated demo script and Azure resource names
88+
- `.github/workflows/deploy.yml` — corrected AKS cluster name and resource group

locust/locustfile.py

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def _generate_pdf() -> bytes:
3333
class DocumentStreamUser(HttpUser):
3434
"""Simulates a user interacting with the DocumentStream gateway."""
3535

36-
wait_time = between(1, 3)
36+
wait_time = between(0.5, 1.5)
3737

3838
# Generate the PDF payload once for the entire class so every upload
3939
# reuses the same bytes instead of rebuilding a PDF per request.
@@ -43,33 +43,28 @@ class DocumentStreamUser(HttpUser):
4343
# Tasks #
4444
# ------------------------------------------------------------------ #
4545

46-
@task(3)
46+
@task(8)
4747
def upload_pdf(self) -> None:
48-
"""Upload a small PDF via multipart form — the main pipeline driver."""
48+
"""Upload a small PDF via multipart form — the main pipeline driver.
49+
50+
This is the primary task because it pushes documents through the Redis
51+
async pipeline, which is what KEDA monitors for autoscaling.
52+
"""
4953
self.client.post(
5054
"/api/documents",
5155
files={"file": ("loadtest.pdf", io.BytesIO(self._pdf_bytes), "application/pdf")},
5256
name="/api/documents [upload]",
5357
)
5458

5559
@task(1)
56-
def generate_scenario(self) -> None:
57-
"""Generate a full loan scenario (5 documents) via the API."""
58-
self.client.post(
59-
"/api/generate",
60-
json={"count": 1},
61-
name="/api/generate",
62-
)
63-
64-
@task(5)
6560
def list_documents(self) -> None:
6661
"""List processed documents — lightweight read, simulates monitoring."""
6762
self.client.get(
6863
"/api/documents",
6964
name="/api/documents [list]",
7065
)
7166

72-
@task(2)
67+
@task(1)
7368
def health_check(self) -> None:
7469
"""Hit the health endpoint — simulates K8s probes and monitoring."""
7570
self.client.get(

src/worker/store.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def upload_blob(
147147
"""Upload a PDF to Azure Blob Storage.
148148
149149
Args:
150-
doc_id: Document ID (used as blob prefix).
150+
doc_id: Document ID (for logging).
151151
filename: Original filename.
152152
pdf_bytes: Raw PDF content.
153153
doc_type: Document type label for metrics.
@@ -166,7 +166,7 @@ def upload_blob(
166166
container = blob_service.get_container_client(BLOB_CONTAINER)
167167
with contextlib.suppress(Exception):
168168
container.create_container()
169-
blob_name = f"{doc_id}/{filename}"
169+
blob_name = filename
170170
container.upload_blob(blob_name, pdf_bytes, overwrite=True)
171171

172172
blob_uploads_total.labels(doc_type=doc_type).inc()

tests/test_store.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ def test_successful_upload(self, mock_blob_class: MagicMock) -> None:
177177

178178
result = upload_blob("doc-1", "test.pdf", b"pdf-content", doc_type="invoice")
179179

180-
assert result == "doc-1/test.pdf"
180+
assert result == "test.pdf"
181181
mock_container.upload_blob.assert_called_once_with(
182-
"doc-1/test.pdf", b"pdf-content", overwrite=True
182+
"test.pdf", b"pdf-content", overwrite=True
183183
)

0 commit comments

Comments
 (0)