Skip to content

Commit 5e736a6

Browse files
committed
Demo 2 Submission
1 parent 5011758 commit 5e736a6

10 files changed

Lines changed: 305 additions & 0 deletions

demo2/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Release Manifest — Core Features Sprint (A5)
2+
3+
## Code Repository
4+
5+
**Project Code Repository:** https://github.com/dcsil/PyGuard-Agentic-Agent
6+
7+
## What This Release Validates (Core Features / Proof of Value)
8+
9+
This release transitions PyGuard from a proof of concept (single web-chat interaction) to a proof of value by replacing the web frontend with **WhatsApp as the primary user interface**. The same multi-agent pipeline (research, email, calendar, Google Docs, data analysis) is now accessible from the messaging app users already have open — eliminating the need to learn a new UI or remember a deployment URL.
10+
11+
**Primary interaction (updated):** From a single WhatsApp message, the agent performs web research, generates a structured compliance report, and executes follow-up actions such as emailing the report and scheduling a meeting — all with responses delivered back in the same WhatsApp conversation.
12+
13+
**Key result:** Required-path task completion time dropped from 678 seconds (Sprint 1 web UI) to 479 seconds (Sprint 2 WhatsApp) — a 29% reduction — with zero required-path context switches maintained.
14+
15+
## Repository Index (Architectural Artifacts)
16+
17+
### 1) Feature Prioritization & Product Definition CUJ
18+
19+
- **File:** `feature-prioritization.md`
20+
- Complete Product Definition CUJ mapping the end-to-end WhatsApp user journey (11 steps with time, context switches, and friction severity). Feature-to-value mapping for 14 features connecting each to specific journey steps and friction points. Strategic prioritization (P0–P3) with rationale. Implementation timeline showing Sprint 2 (built), Sprint 3 (planned), and future (deferred) features. Evolved system topology showing architectural maturation from HTTP-only to dual-interface with the WhatsApp bridge layer.
21+
22+
### 2) Pivot Contract
23+
24+
- **File:** `pivot-contract.md`
25+
- Pre-sprint pivot contract defining the quantified hypothesis (25% completion time reduction via WhatsApp), kill metric (80% of test runs under 510s with zero context switches), trigger date (end of Sprint 2), and two strategic fallback options.
26+
27+
### 3) Build Trap Post-Mortem
28+
29+
- **File:** `build-trap-postmortem.md`
30+
- Post-sprint retrospective validating whether features delivered hypothesized value and whether building was necessary to test those hypotheses. Covers WhatsApp channel (validated successfully), scheduling/cron (partially failed — agent tool misuse caused recursion), deferred features (why deprioritized), and process improvements for next sprint.
31+
32+
### 4) Architecture & Topology Diagrams
33+
34+
- **System Topology ArchitectureDiagram (Deployment/Runtime View):** `assets/evolved-topology.jpg`
35+
36+
These diagrams show:
37+
38+
- User entry point and deployed components (frontend, backend/orchestrator, external services)
39+
- Data/request flow between components
40+
- Decoupling boundaries that support evolution (e.g., adding transaction ingestion and fraud analysis later)
41+
42+
### 5) Evidence / Screenshots
43+
44+
- **Folder:** `assets/`
45+

demo2/assets/IMG_7154.PNG

597 KB
Loading

demo2/assets/IMG_7155.PNG

540 KB
Loading

demo2/assets/IMG_7156.PNG

588 KB
Loading

demo2/assets/IMG_7157.PNG

585 KB
Loading

demo2/assets/IMG_7158.PNG

557 KB
Loading

demo2/assets/evolved-topology.jpg

662 KB
Loading

demo2/build-trap-postmortem.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Build Trap Post-Mortem — Sprint 2
2+
3+
## Did We Need to Build to Learn?
4+
5+
### WhatsApp Channel (P0/P1 Features F1–F7)
6+
7+
- **Hypothesis was:** replacing the web frontend with WhatsApp would cut task completion time by 25% and eliminate required-path context switches.
8+
- **Result:** required-path completion time dropped from 678s to ~479s (29% reduction), and context switches stayed at zero. The hypothesis held.
9+
- **Did we need to build it?** Yes. This was capability validation — the question was "can we route the full multi-agent pipeline through WhatsApp reliably?" There was no way to answer that without actually wiring up the Baileys bridge, the WebSocket layer, and the message flow end to end. A mockup or user interview would not have told us whether Baileys handles reconnection correctly or whether the orchestrator's long-running execution blocks the async event loop.
10+
- **What surprised us:** the Node.js bridge was essentially plug-and-play. The real integration cost was on the Python side — extracting the chat handler into a shared function so both HTTP and WhatsApp call the same pipeline. That refactor was small but would not have been obvious without building.
11+
12+
### Scheduling / Cron (F8-adjacent — added mid-sprint)
13+
14+
- **Hypothesis was:** letting the agent schedule delayed and recurring tasks would let users say things like "send this email in 5 minutes" naturally.
15+
- **Result:** the scheduler itself worked — jobs persisted to disk, timers fired on time, responses were delivered back through WhatsApp. But the agent's use of the tool was unreliable:
16+
- The agent used `every_seconds=300` (recurring every 5 min) when the user said "in 5 minutes," creating an infinite loop instead of a one-time delay.
17+
- When the cron job fired, it ran the job's message through the full agent pipeline, which included the scheduling tools. The agent saw the message and created more cron jobs, causing exponential recursion.
18+
- We had to patch both issues post-testing: replaced `every_seconds` with `delay_seconds` for one-shot delays, and added a `from_cron` flag that disables scheduling tools during cron-triggered execution.
19+
- **Did we need to build it?** Partially. We needed to build the cron service to prove the timer mechanism works. But the tool integration with the LLM-based orchestrator was where it fell apart, and that failure mode could have been predicted. We should have tested the agent's tool-calling behavior with a dry-run (no actual scheduling, just log what the agent tries to call) before wiring it into the live pipeline. That would have revealed the `every_seconds` vs one-time confusion without creating runaway jobs.
20+
- **What we learned:** LLM tool usage is the riskiest layer. The infrastructure (timers, persistence, WebSocket delivery) was straightforward. The unpredictable part was the agent choosing the wrong parameter. Next sprint, any new tool should be tested with logged dry-runs before connecting to real side effects.
21+
22+
## Deferred Features — Why and What We Learned
23+
24+
- **F8 (progress messages) and F9 (acknowledgement):** deprioritized because WhatsApp message delivery is inherently asynchronous — there is no "typing indicator" API we control. We realized mid-sprint that implementing these requires hooks deep inside the orchestrator's sub-agent execution loop, which is more invasive than originally estimated. Still valuable, just larger than a sprint-2 scope.
25+
- **F10 (confirmation before irreversible actions):** deprioritized because it requires pausing the orchestrator mid-execution, sending a WhatsApp message, waiting for the user's reply, and resuming. That is a fundamental change to the agent's execution model (currently fire-and-forget). We learned this is architecturally hard, not just a UX layer.
26+
- **F12–F14 (voice, attachments, groups):** never considered for this sprint. They are new input modalities, not improvements to the existing text flow.
27+
28+
## What We Would Change Next Time
29+
30+
- **Dry-run new tools before live wiring.** The cron recursion bug cost real API calls and sent duplicate emails. A logged-only mode would have caught both the parameter confusion and the recursive loop without side effects.
31+
- **Separate demand validation from capability validation.** For WhatsApp, building was the right call — the question was technical. For scheduling, we could have validated demand first (does Bobby actually ask for delayed tasks, or does she always want immediate execution?) before investing in the cron engine.
32+
- **Update the pivot contract to include tool-reliability metrics.** Our kill metric only measured task completion time and context switches. It did not account for the agent misusing tools. Next sprint's contract should include a "tool accuracy" metric: percentage of tool calls where the agent selects the correct parameters.

0 commit comments

Comments
 (0)