Skip to content

Commit b2892ff

Browse files
authored
Merge pull request #6 from dcsil/a6
A6 Submission
2 parents 085cc64 + cfcfb5b commit b2892ff

6 files changed

Lines changed: 362 additions & 0 deletions

File tree

demo3/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Release Manifest — Slack Integration Sprint (Sprint 3)
2+
3+
## Code Repository
4+
5+
**Project Code Repository:** https://github.com/dcsil/PyGuard-Agentic-Agent
6+
7+
## What This Release Validates
8+
9+
This release demonstrates the **Slack-integrated proof of concept** for an autonomous personal multi-agent assistant. The primary interaction surface moves from WhatsApp (Sprint 2) to Slack, the communication platform Bobby already uses for work.
10+
11+
**Primary interaction:** From a single instruction sent via Slack DM, the agent performs web research, generates a structured compliance report in Google Docs, emails the report, and schedules a follow-up meeting — end-to-end, with zero context switches, entirely within Slack.
12+
13+
**Key architectural change:** The two-process WhatsApp bridge (Node.js + Python WebSocket) is replaced by a single Python service using Slack Socket Mode, eliminating an entire class of infrastructure failures while adding native rich formatting, thread-based replies, and emoji reaction acknowledgement.
14+
15+
---
16+
17+
## Repository Index (Artifacts for This Sprint)
18+
19+
### 1) Feature Prioritization & Competitive Review CUJ
20+
21+
- **File:** `docs/feature-prioritization.md`
22+
- Contains: User persona and value-driven goal, what changed from Sprint 2 and why, competitive CUJ executed across PyGuard (Slack), ChatGPT, and Lindy.ai under identical controlled conditions, gap analysis with performance ratios, feature-to-value mapping table (F1–F16) with journey step, friction point, hypothesis, success metric, priority, and sprint assignment, strategic prioritization rationale (P0–P3), implementation timeline across Sprint 3 / Sprint 4 / Future, and pivot criteria.
23+
24+
### 2) Pivot Contract
25+
26+
- **File:** `docs/pivot-contract.md`
27+
- Contains: Pre-sprint hypothesis (quantified value claim), kill metric (falsifiable threshold with n and time target), trigger date (firm decision point), and two strategic fallback options defining what we would test if the hypothesis is violated.
28+
29+
### 3) Build Trap Post-Mortem
30+
31+
- **File:** `docs/build-trap-postmortem.md`
32+
- Contains: Feature-by-feature retrospective on whether each Sprint 3 feature delivered hypothesized value and whether building was necessary to validate the hypothesis, identification of demand assumptions we could have tested without code, honest assessment of minor over-engineering, reasoning behind deferred features, and what we will change in the next pivot contract.
33+
34+
### 4) Architectural Rationale
35+
36+
- **File:** `docs/architectural-rationale.md`
37+
- Contains: What changed in the system architecture (new modules, removed modules, untouched pipeline), why the change was necessary (structural reliability problem with the two-process bridge, wrong-channel problem with WhatsApp), alternatives considered (Webhook mode, keeping Node.js bridge with Slack, Bolt for Python), technical debt introduced and resolved, and limitations remaining for Sprint 4.
38+
39+
### 5) Previous Sprint Reference
40+
41+
- **File:** `docs/feature-prioritization-last-demo.md`
42+
- The Sprint 2 Feature Prioritization & CUJ document. Retained as reference for the Sprint 2 WhatsApp baseline metrics (479s completion time, 0 context switches) that Sprint 3 is benchmarked against.
43+
44+
---
45+
46+
## Sprint 3 System Topology (Runtime View)
47+
48+
```
49+
Slack User (DM or Channel @mention)
50+
↕ Slack Socket Mode (persistent WebSocket — no public URL required)
51+
SlackService (slack/service.py — Python, runs as FastAPI lifespan task)
52+
↕ calls process_chat_message()
53+
FastAPI Backend (main.py — single Python process)
54+
55+
Memory Agent → Orchestrator → Sub-agents
56+
├── Research Agent (web search)
57+
├── Email Agent (Gmail via Composio)
58+
├── Calendar Agent (Google Calendar via Composio)
59+
├── Report Writer (Google Docs via Composio)
60+
└── Data Analyst (code interpreter)
61+
62+
SlackService._send_reply() → chat_postMessage → Slack User (reply in thread)
63+
```

demo3/architectural-rationale.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Architectural Rationale — Sprint 3 (Slack Integration)
2+
3+
## What Changed
4+
5+
The Sprint 2 messaging channel — a Node.js bridge process communicating over WebSocket with a Python `WhatsAppService` — was replaced entirely with a single Python `SlackService` using Slack's Socket Mode. Two new modules were added (`slack/config.py`, `slack/service.py`), one dependency layer was removed (`PyGuard/bridge/` Node.js process, `websockets` Python package), and two new dependencies were added (`slack-sdk`, `slackify-markdown`). The core agent pipeline (`chat_handler.process_chat_message()`) was untouched.
6+
7+
---
8+
9+
## Why It Was Necessary
10+
11+
The WhatsApp architecture had a structural problem that was not a bug: it required two processes to be running simultaneously. The Node.js bridge handled the WhatsApp Web protocol; the Python service connected to it over a local WebSocket. If the bridge crashed, the Python service would log a warning and retry, but any messages sent during the gap were silently lost. There was no message queue, no durability guarantee, and no way to recover in-flight requests. Beyond reliability, WhatsApp itself turned out to be the wrong channel — a consumer app for a workflow that lives entirely inside a corporate Slack environment. The architecture was sound for the problem it solved; it was just solving the wrong problem.
12+
13+
---
14+
15+
## Alternatives Considered
16+
17+
- **Slack Webhook mode instead of Socket Mode.** Webhook mode requires a publicly reachable HTTPS URL. That means either deploying to a server or running a tunnel (ngrok) locally. Socket Mode requires no public endpoint — the client opens an outbound WebSocket to Slack's infrastructure. For a development-stage product running on localhost, Socket Mode was the only viable option that does not require standing up additional infrastructure.
18+
19+
- **Keeping the bridge pattern and swapping WhatsApp for Slack inside it.** We could have kept the Node.js intermediary and replaced Baileys (WhatsApp library) with a Node.js Slack SDK, maintaining the same WebSocket-to-Python handoff. This would have preserved architectural symmetry with Sprint 2 but reproduced the two-process reliability problem with no benefit. Slack has a mature Python SDK; there was no reason to introduce a Node.js layer.
20+
21+
- **Using an existing Slack bot framework (Bolt for Python).** Slack's Bolt framework wraps the SDK with route-style event handling. We chose the raw `slack-sdk` instead because Bolt assumes a web framework context (Flask or FastAPI route handlers) and adds abstractions that would conflict with PyGuard's lifespan-task pattern. The raw SDK gave us direct control over the `SocketModeClient` lifecycle, which maps cleanly onto FastAPI's `asynccontextmanager` startup/shutdown.
22+
23+
---
24+
25+
## Technical Debt Introduced or Resolved
26+
27+
- **Resolved:** The two-process architecture and its associated failure modes are gone. There is no bridge to restart, no WebSocket reconnect loop to maintain, and no locally bound port to conflict with.
28+
- **Resolved:** The `websockets` dependency (constrained to `>=16.0,<17.0` to match the bridge's protocol version) is removed.
29+
- **Introduced:** The `_to_mrkdwn()` method in `SlackService` contains a regex-based Markdown table converter. It works, but it is fragile — edge cases in table formatting (merged cells, missing separators, malformed pipes) will silently produce garbled output rather than raising an error. This is acceptable for now because agent-generated tables are rare, but it should be replaced with a proper Markdown parser before file-upload responses (F12) ship.
30+
- **Introduced:** Slack user identity (the `sender_id` Slack user ID) is passed directly as the `user` field to `process_chat_message()`. The DB lookup tries `get_user_by_email()` then `get_user_by_name()`. Neither matches a Slack user ID, so first-time Slack users will always fail identity resolution silently. This is a known gap documented as F11 in the feature plan and must be resolved before the product goes beyond a single known user.
31+
32+
---
33+
34+
## Limitations Remaining for Future Sprints
35+
36+
- **No mid-execution feedback.** The `SlackService` calls `process_chat_message()` as a single blocking `await`. There is no mechanism to emit intermediate Slack messages ("Researching...", "Writing doc...") without refactoring the orchestrator to accept a callback or message-bus hook. This is the highest-impact limitation remaining.
37+
- **No stateful conversation per thread.** Replies arrive in the correct thread, but if a user sends a follow-up message in the same thread, the `session_key` is thread-scoped but the agent has no special awareness that it is continuing a prior task. Multi-turn within a thread is supported by DB history, but the thread context is not surfaced to the orchestrator explicitly.
38+
- **File outputs are text-only.** The data analyst sub-agent produces charts and CSVs as files on disk. The current `_send_reply()` only posts text. File upload via `files_upload_v2` is architected into the service but not wired to the agent output path yet.

demo3/build-trap-postmortem.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Build Trap Post-Mortem — Sprint 3 (Slack Integration)
2+
3+
## Did the features deliver the value we expected?
4+
5+
- **Slack as the primary interface (F1–F3): Yes, and it was necessary to build.**
6+
The core bet this sprint was that moving from WhatsApp to Slack would eliminate organizational friction — the idea that Bobby wouldn't realistically text a WhatsApp number to do her job. That assumption turned out to be right, but we couldn't have validated it without actually building the integration. The value wasn't just "Slack is more professional." It was the absence of a Node.js bridge process that kept failing, native mrkdwn formatting that made report summaries actually readable, and thread replies that kept the workspace clean. None of that was testable by just asking users if they'd prefer Slack. We had to ship it and run the journey to see the difference.
7+
8+
- **Emoji reaction acknowledgement (F5): Didn't need to build to validate the hypothesis.**
9+
We knew from Sprint 2 that black-box waiting was a friction point. The 👀 reaction is a two-line implementation and it was obviously the right call — but we didn't need to build it to confirm that users want feedback during a 5-minute execution. A simple user interview question ("what would make you more confident the message was received?") would have validated this immediately. This is a mild build-trap moment. Low cost, so not a disaster, but we should have been more honest with ourselves about it.
10+
11+
- **Thread-based replies (F6) and mrkdwn rendering (F7): Built correctly, but over-engineered slightly.**
12+
Both features delivered real value — formatting is visibly better than Sprint 2's plain text, and threading keeps things organized. The build-trap risk here is the table-conversion logic in `_to_mrkdwn()`. We built a regex-based Markdown table converter that handles multi-column tables with alignment separators. In practice, agent responses that contain full Markdown tables are rare. We could have shipped a simpler version (just pass text to `slackify_markdown` directly) and only added the table logic if a user actually hit the formatting problem. We pre-solved a problem before confirming its frequency.
13+
14+
## Did we build things we could have tested without building?
15+
16+
- **Access control (F8) — demand was assumed, not validated.**
17+
We built `dm_policy` and `group_policy` enforcement before confirming that unauthorized access is a real problem at our current scale. This is a personal assistant with one primary user. Nobody is rushing to DM a PyGuard bot they don't know exists. The feature is correct long-term, but we treated a theoretical production concern as a sprint-critical requirement. We could have shipped with allow-all, observed actual usage, and added access control when there was an actual reason to.
18+
19+
## Why deferred features were deprioritized — and what we learned
20+
21+
- **In-progress status messages (F9) and confirmation before irreversible actions (F10) were the right calls to defer.** Both address real friction. Both also require meaningful orchestrator refactoring — threaded state management, mid-execution pauses, callback handling. Shipping placeholder features here would have been worse than deferring. The deferred decision was strategic, not lazy.
22+
23+
- **Auto-register Slack user (F11) is the one we regret deferring.** The first time a brand-new user DMs PyGuard and the agent can't resolve "email it to me" because there's no DB record, the experience breaks. This should have been Sprint 3 P1, not Sprint 4. We underweighted first-run failure in the prioritization.
24+
25+
## What drove premature building?
26+
27+
- We defaulted to "this is easy, just build it" logic for low-effort features (F5, the access control config) instead of asking whether the value hypothesis was validated. Easy to build does not mean necessary to build right now.
28+
- The competitive analysis against Lindy created urgency to match features (real-time feedback, access policies) that Lindy has, even for a use case with a single-digit number of users. Competitor parity is not a user need.
29+
30+
## What we'll change in the next pivot contract
31+
32+
- Add a check before each P1/P2 feature: "Can we validate demand for this in under an hour without writing code?" If yes, validate first.
33+
- Separate infrastructure features (access control, reconnect logic) from UX features (reactions, formatting) in the prioritization table — they have different validation paths and we kept conflating them.

demo3/evolved-topology.jpg

954 KB
Loading

0 commit comments

Comments
 (0)