Skip to content

Commit 7531d91

Browse files
authored
feat(agents): control-system fix — kill QA dup-issues, reshape Analyst, model bumps, skill realloc (#186)
Stage 1 of the agent-org plan from docs/paperclip-native-migration.md + docs/agent-org-audit-2026-04-24.md. Audit of last 100 issues showed the firefighting problem isn't routines — it's QA filing duplicate incident tickets (DB pool x4, describe_memes x6, score column x4) and Analyst dumping reactive numbers into CEO's inbox. Changes: - QA: explicit DO-NOT-FILE list for known recurring incidents (describe_memes, db-pool, OpenRouter, Forbidden errors), 3-issue/scan output cap, dedup preflight made mandatory. Removed duplicated issue-hygiene + MCP-tools blocks. Skills: -design-consultation +devex-review. - Analyst: daily report rewritten to fixed 4-section shape — one hypothesis, one recommended bet for CEO, severity-gated incident digest (max 5 bullets), open hypotheses status. Anti-patterns named explicitly. Skills: +learn,+codex. - Models: CEO/CTO/Staff Eng bumped claude-opus-4-6 → claude-opus-4-7. - Skill reallocation per gstack agent-company best practices: CEO +learn; CTO -plan-design-review; Staff Eng +cso (scoped to auth/payments/uploads/infra PRs only); Release Eng +canary,+benchmark (post-deploy monitoring is theirs, not QA's); Comms +learn. - agents/_sync_config.py: new Python helper that diffs current adapterConfig + desiredSkills + heartbeat against the manifest+frontmatter and PATCHes only on change. Preserves paperclipai/* skill paths. Permissions routed to dedicated /permissions endpoint. - agents/deploy.sh: invokes _sync_config.py as second pass after the existing markdown PUT pass. - workflow: pip install pyyaml so the new sync helper runs on the runner. Verified locally: dry-run after apply shows zero drift across all 7 agents. CEO desiredSkills now mixes 5 gstack + 4 preserved paperclipai paths. CEO mission reframe + gbrain integration are deferred to Stage 2 per plan (land after this Stage's effect on inbox volume can be measured).
1 parent 4daa145 commit 7531d91

12 files changed

Lines changed: 325 additions & 39 deletions

File tree

.github/workflows/paperclip-deploy-agents.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ jobs:
1818
steps:
1919
- uses: actions/checkout@v4
2020

21+
- name: Install Python deps for config sync
22+
run: pip install --quiet pyyaml
23+
2124
- name: Dry-run agent sync
2225
env:
2326
PAPERCLIP_URL: ${{ secrets.PAPERCLIP_URL }}

agents/.paperclip.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ agents:
3434
config:
3535
dangerouslySkipPermissions: true
3636
maxTurnsPerRun: 300
37-
model: "claude-opus-4-6"
37+
model: "claude-opus-4-7"
3838
runtime:
3939
heartbeat:
4040
enabled: true
@@ -85,7 +85,7 @@ agents:
8585
config:
8686
dangerouslySkipPermissions: true
8787
maxTurnsPerRun: 300
88-
model: "claude-opus-4-6"
88+
model: "claude-opus-4-7"
8989
runtime:
9090
heartbeat:
9191
maxConcurrentRuns: 1
@@ -210,7 +210,7 @@ agents:
210210
config:
211211
dangerouslySkipPermissions: true
212212
maxTurnsPerRun: 200
213-
model: "claude-opus-4-6"
213+
model: "claude-opus-4-7"
214214
runtime:
215215
heartbeat:
216216
maxConcurrentRuns: 1

agents/_sync_config.py

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
#!/usr/bin/env python3
2+
"""Diff-first PATCH of agent adapterConfig + desiredSkills + heartbeat from manifest.
3+
4+
Called by `agents/deploy.sh` after the markdown PUT pass. Reads `.paperclip.yaml`
5+
and per-agent `AGENTS.md` frontmatter, compares with prod via Paperclip API,
6+
PATCHes only agents whose config actually drifted.
7+
8+
Env: PAPERCLIP_URL, PAPERCLIP_API_KEY, COMPANY_ID, SCRIPT_DIR, DRY_RUN.
9+
"""
10+
11+
import json
12+
import os
13+
import re
14+
import sys
15+
import urllib.error
16+
import urllib.request
17+
18+
import yaml
19+
20+
URL = os.environ["PAPERCLIP_URL"]
21+
KEY = os.environ["PAPERCLIP_API_KEY"]
22+
COMPANY = os.environ["COMPANY_ID"]
23+
SCRIPT_DIR = os.environ["SCRIPT_DIR"]
24+
DRY = os.environ.get("DRY_RUN", "0") == "1"
25+
26+
# Skills published under paperclipai/paperclip/ — preserve when present, don't expect them in frontmatter.
27+
PAPERCLIP_NS_SKILLS = {
28+
"paperclip",
29+
"paperclip-create-agent",
30+
"paperclip-create-plugin",
31+
"para-memory-files",
32+
}
33+
34+
35+
def api(method: str, path: str, body=None):
36+
req = urllib.request.Request(URL + path, method=method)
37+
req.add_header("Authorization", f"Bearer {KEY}")
38+
req.add_header("Content-Type", "application/json")
39+
# Cloudflare in front of org.ffmemes.com blocks default Python-urllib UA (error 1010).
40+
req.add_header("User-Agent", "ffmemes-deploy.sh/1.0")
41+
data = json.dumps(body).encode() if body is not None else None
42+
try:
43+
with urllib.request.urlopen(req, data=data) as resp:
44+
return json.loads(resp.read())
45+
except urllib.error.HTTPError as e:
46+
print(f" HTTP {e.code} on {method} {path}: {e.read().decode()[:300]}", file=sys.stderr)
47+
raise
48+
49+
50+
def skill_to_path(slug: str) -> str:
51+
if slug in PAPERCLIP_NS_SKILLS:
52+
return f"paperclipai/paperclip/{slug}"
53+
return f"garrytan/gstack/{slug}"
54+
55+
56+
def read_frontmatter_skills(agents_md_path: str) -> list[str]:
57+
if not os.path.exists(agents_md_path):
58+
return []
59+
with open(agents_md_path) as f:
60+
text = f.read()
61+
m = re.match(r"^---\n(.*?)\n---", text, re.DOTALL)
62+
if not m:
63+
return []
64+
fm = yaml.safe_load(m.group(1)) or {}
65+
return list(fm.get("skills") or [])
66+
67+
68+
def main() -> int:
69+
with open(f"{SCRIPT_DIR}/.paperclip.yaml") as f:
70+
manifest = yaml.safe_load(f)
71+
72+
agents_list = api("GET", f"/api/companies/{COMPANY}/agents")
73+
by_slug = {a["urlKey"]: a for a in agents_list}
74+
75+
patched = 0
76+
skipped = 0
77+
failed = 0
78+
would_patch = 0
79+
for slug, mblock in (manifest.get("agents") or {}).items():
80+
if slug not in by_slug:
81+
print(f" SKIP {slug} — not in prod")
82+
continue
83+
cur = by_slug[slug]
84+
85+
# Targets from manifest
86+
ad_cfg = (mblock.get("adapter") or {}).get("config") or {}
87+
target_model = ad_cfg.get("model")
88+
target_max_turns = ad_cfg.get("maxTurnsPerRun")
89+
target_heartbeat = (mblock.get("runtime") or {}).get("heartbeat") or {}
90+
target_perms = mblock.get("permissions") or {}
91+
92+
# Frontmatter → desiredSkills (preserve any paperclipai/* currently attached)
93+
fm_skills = read_frontmatter_skills(f"{SCRIPT_DIR}/{slug}/AGENTS.md")
94+
cur_ac = cur.get("adapterConfig") or {}
95+
cur_skills = ((cur_ac.get("paperclipSkillSync") or {}).get("desiredSkills")) or []
96+
preserved = [s for s in cur_skills if s.startswith("paperclipai/")]
97+
target_skills = sorted(set(preserved + [skill_to_path(s) for s in fm_skills]))
98+
cur_skills_sorted = sorted(cur_skills)
99+
100+
# Diff
101+
changes: list[str] = []
102+
if target_model and cur_ac.get("model") != target_model:
103+
changes.append(f"model: {cur_ac.get('model')}{target_model}")
104+
if target_max_turns and cur_ac.get("maxTurnsPerRun") != target_max_turns:
105+
changes.append(
106+
f"maxTurnsPerRun: {cur_ac.get('maxTurnsPerRun')}{target_max_turns}"
107+
)
108+
if target_skills != cur_skills_sorted:
109+
added = sorted(set(target_skills) - set(cur_skills_sorted))
110+
removed = sorted(set(cur_skills_sorted) - set(target_skills))
111+
if added:
112+
changes.append(f"+skills: {added}")
113+
if removed:
114+
changes.append(f"-skills: {removed}")
115+
116+
cur_rt = cur.get("runtimeConfig") or {}
117+
cur_hb = cur_rt.get("heartbeat") or {}
118+
for k, v in target_heartbeat.items():
119+
if cur_hb.get(k) != v:
120+
changes.append(f"heartbeat.{k}: {cur_hb.get(k)}{v}")
121+
122+
cur_perms = cur.get("permissions") or {}
123+
perm_changes: list[tuple[str, object]] = []
124+
for k, v in target_perms.items():
125+
if cur_perms.get(k) != v:
126+
perm_changes.append((k, v))
127+
changes.append(f"permissions.{k}: {cur_perms.get(k)}{v}")
128+
129+
if not changes:
130+
print(f" skip {slug} (no config drift)")
131+
skipped += 1
132+
continue
133+
134+
if DRY:
135+
print(f" WOULD PATCH {slug}: {'; '.join(changes)}")
136+
would_patch += 1
137+
continue
138+
139+
# Build merged payload — preserve everything else (instructionsFilePath, env, etc.)
140+
new_ac = dict(cur_ac)
141+
if target_model:
142+
new_ac["model"] = target_model
143+
if target_max_turns:
144+
new_ac["maxTurnsPerRun"] = target_max_turns
145+
new_skill_sync = dict(new_ac.get("paperclipSkillSync") or {})
146+
new_skill_sync["desiredSkills"] = target_skills
147+
new_ac["paperclipSkillSync"] = new_skill_sync
148+
149+
new_rt = dict(cur_rt)
150+
new_hb = dict(cur_hb)
151+
new_hb.update(target_heartbeat)
152+
new_rt["heartbeat"] = new_hb
153+
154+
# Permissions go through a separate endpoint (PATCH /api/agents/:id rejects them).
155+
body = {
156+
"adapterConfig": new_ac,
157+
"runtimeConfig": new_rt,
158+
}
159+
try:
160+
api("PATCH", f"/api/agents/{cur['id']}", body)
161+
print(f" PATCHED {slug}: {'; '.join(changes)}")
162+
patched += 1
163+
if perm_changes:
164+
# Best-effort permissions update via dedicated endpoint.
165+
new_perms = dict(cur_perms)
166+
new_perms.update(target_perms)
167+
try:
168+
api("PATCH", f"/api/agents/{cur['id']}/permissions", new_perms)
169+
print(f" + permissions updated: {dict(perm_changes)}")
170+
except Exception as pe:
171+
print(f" WARN permissions sync failed for {slug}: {pe}", file=sys.stderr)
172+
except Exception as e:
173+
print(f" ERROR PATCH {slug}: {e}", file=sys.stderr)
174+
failed += 1
175+
176+
if DRY:
177+
print(f"\nConfig sync (dry-run): would patch {would_patch}, skip {skipped} (no drift).")
178+
else:
179+
print(f"\nConfig sync: patched={patched}, skipped={skipped}, failed={failed}.")
180+
return 1 if failed > 0 else 0
181+
182+
183+
if __name__ == "__main__":
184+
sys.exit(main())

agents/analyst/AGENTS.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ skills:
66
- investigate
77
- browse
88
- retro
9+
- learn
10+
- codex
911
---
1012

1113
# Analyst Agent — Operating Instructions
@@ -126,14 +128,36 @@ Flag if pct_reacted drops below 55% or pct_delivered drops below 90%.
126128

127129
IMPORTANT: The old dashboard metric (user_stats.nmemes_sent > 0) measures "reacted", not "received". Always use user_meme_reaction directly for delivery measurement. See comments in metrics.sql for details.
128130

129-
### 8. Write Daily Report
130-
Create a report file at `experiments/reports/YYYY-MM-DD-HHmm.md` following the format in `experiments/README.md`.
131+
### 8. Write Daily Report (FIXED SHAPE — do not deviate)
131132

132-
The report should tell a **story**, not just dump numbers:
133-
- What changed since the last report?
134-
- What's working? What's not?
135-
- What trends are emerging?
136-
- What should the CEO pay attention to?
133+
Create the report at `experiments/reports/YYYY-MM-DD-HHmm.md`. The report has **four sections** in this order. Do not add other sections, do not omit any. Brevity is mandatory.
134+
135+
```markdown
136+
# Daily report YYYY-MM-DD
137+
138+
## 1. The hypothesis
139+
<≤1 short paragraph. The most-surprising data point this run, and what it might mean. One claim. No hedging.>
140+
141+
## 2. Recommended bet for CEO
142+
<≤1 short paragraph. Which research-idea (`memory:project_research_ideas.md`) or TODO (`TODOS.md`) is this evidence making *ripe to ship now*? Name the file/section. If nothing's ripe, write: "No new bet — keep advancing current bet." Never recommend a bug fix here — that's a different lane.>
143+
144+
## 3. Incident digest (max 5 bullets)
145+
<Only incidents that crossed a SEVERITY THRESHOLD this run. Threshold = errors > 1% of requests, OR North Star (session length median) drop > 10% week-over-week, OR a public outage / user-visible failure / moderator-flagged content surge. Below-threshold noise goes to the footer. If nothing crossed the threshold, write a single line: "No severe incidents this run." DO NOT rehash known recurring issues (describe_memes, OpenRouter, db-pool) — those are tracked elsewhere.>
146+
147+
## 4. Open hypotheses (1 line each)
148+
<Each running experiment from `experiments/active/`. Format: "experiment-name — current Δ on metric (vs baseline) — days remaining." If conclusion-ready, say so.>
149+
150+
---
151+
**Footer** (raw numbers, optional): copy the JSONL entry from §9 here for grep-ability. No prose.
152+
```
153+
154+
**Anti-patterns** — kill these on sight:
155+
- "the most-surprising thing was a 12% jump in WAU and also a 7% drop in session length and the cold-start funnel improved by..." — pick ONE for §1.
156+
- "we should look into describe_memes coverage" — describe_memes is HARD-banned from §3 and §2.
157+
- "todo: investigate X, Y, Z" — that's reactive routing, not bet recommendation.
158+
- "incident digest" with 12 bullets — cap is 5, use the threshold filter.
159+
160+
The CEO reads §1 and §2 first; §3 only matters when severity-gated. If §2 ever says "no new bet" three runs in a row without justification, the data lens is too narrow — widen the queries next run.
137161

138162
### 8b. Write Anomaly Report (for Comms Agent input)
139163
After the daily report, on the **morning run only** (08:00 or 09:00 MSK — whichever

agents/ceo/AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ skills:
77
- office-hours
88
- autoplan
99
- retro
10+
- learn
1011
---
1112

1213
# CEO Agent — Operating Instructions

agents/comms-manager/AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ reportsTo: ceo
55
skills:
66
- browse
77
- frontend-design
8+
- learn
89
---
910

1011
# Comms Manager — Operating Instructions

agents/cto/AGENTS.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ title: Chief Technology Officer
44
reportsTo: ceo
55
skills:
66
- plan-eng-review
7-
- plan-design-review
87
- retro
98
- cso
109
- codex

agents/deploy.sh

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,11 +86,22 @@ for agent_dir in "$SCRIPT_DIR"/*/; do
8686
done
8787
done
8888

89+
echo
90+
echo "Syncing adapter config + skills (diff-first PATCH)..."
91+
92+
# Pass 2: model, maxTurnsPerRun, desiredSkills, runtime.heartbeat, permissions.
93+
# Diff first; PATCH only on change so we don't spam Paperclip's config-revision history.
94+
COMPANY_ID="$COMPANY_ID" SCRIPT_DIR="$SCRIPT_DIR" DRY_RUN="$DRY_RUN" \
95+
python3 "$SCRIPT_DIR/_sync_config.py" || {
96+
echo "Config sync failed." >&2
97+
errors=$((errors + 1))
98+
}
99+
89100
echo
90101
if [[ $DRY_RUN -eq 1 ]]; then
91102
echo "Dry-run complete. Re-run without --dry-run to apply."
92103
elif [[ $errors -gt 0 ]]; then
93-
echo "Synced $synced_files files, $errors errors."
104+
echo "Synced $synced_files files; $errors errors during apply."
94105
exit 1
95106
else
96107
echo "Synced $synced_files files. Changes take effect on next agent wake."

agents/qa-engineer/AGENTS.md

Lines changed: 17 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ skills:
99
- benchmark
1010
- canary
1111
- design-review
12-
- design-consultation
12+
- devex-review
1313
- setup-browser-cookies
1414
- health
1515
- investigate
@@ -52,30 +52,6 @@ You have Paperclip MCP tools available. Use them for all Paperclip operations in
5252
<!-- END: issue-hygiene-v1 -->
5353

5454

55-
## Paperclip MCP Tools
56-
57-
You have Paperclip MCP tools available. Use them for all Paperclip operations instead of curl:
58-
- `paperclipGetIssue` — fetch an issue by ID
59-
- `paperclipUpdateIssue` — update issue status/fields (use to mark done)
60-
- `paperclipCheckoutIssue` / `paperclipReleaseIssue` — check out / release issues
61-
- `paperclipInboxLite` — check your inbox for assignments
62-
- `paperclipCreateIssue` — create issues (for bug reports to CTO)
63-
- `paperclipAddComment` — comment on an issue
64-
- `paperclipApiRequest` — escape hatch for any `/api` endpoint
65-
66-
<!-- BEGIN: issue-hygiene-v1 (prompt hotfix — remove when Paperclip ships dedupe + slug + sweep) -->
67-
## Issue Hygiene (v1)
68-
69-
**Slug-first titles.** Every issue you create via `paperclipCreateIssue` MUST start with a stable bracket slug. Reuse the same slug across recurrences so the same bug class collapses onto one ticket:
70-
- `[incident:<slug>]` — production bugs (e.g. `[incident:db-pool]`, `[incident:describe-memes-timeout]`, `[incident:webhook-502]`)
71-
- `[deploy:<branch-or-pr>]`, `[report:YYYY-MM-DD]`, `[maintenance:<slug>]`, `[postmortem:<slug>]`
72-
73-
**Dedupe preflight.** Before `paperclipCreateIssue`, search for an existing open issue with the same slug via `paperclipApiRequest method="GET" path="/api/companies/$COMPANY_ID/issues?search=<slug>"`. If any match is `todo|in_progress|blocked|backlog`, comment on it via `paperclipAddComment` with your new evidence instead of creating a new ticket. Critical: this kills the "DB pool exhausted ×3 tickets" pattern.
74-
75-
**Single-writer rule.** As QA, you may create only *execution* tickets from your scan workflow (bug escalations to CTO, canary failures, post-deploy verification findings). Don't open planning/strategic tickets — those belong to CEO.
76-
<!-- END: issue-hygiene-v1 -->
77-
78-
7955
## Heartbeat Wake Procedure
8056

8157
**IMPORTANT: Always check `PAPERCLIP_TASK_ID` first.** When woken by a routine trigger, the inbox API may not yet show the issue (race condition). If `PAPERCLIP_TASK_ID` is set:
@@ -101,8 +77,22 @@ Check Sentry, Coolify logs, DB health.
10177
- **Low**: Forbidden (user blocked bot), IntegrityError (race conditions) — skip unless spike
10278

10379
### 3. Create Bug Reports & Auto-Escalate
104-
For **Critical**: run `/investigate` on the error to produce a root-cause report, then create a HIGH priority Paperclip task for **CTO** with the investigation attached, log source, and proposed fix. Use `[incident:<slug>]` title slug (dedupe preflight applies — see Issue Hygiene above).
105-
For **High**: Create HIGH priority Paperclip task for **CTO** with error, log source, suggested fix. Run `/investigate` first if the root cause is unclear.
80+
81+
**DO NOT FILE these recurring incident classes — comment on the existing ticket instead.** A 2026-04-24 audit found these accounted for ~21 of 38 QA-filed issues over 4 weeks, almost all duplicates:
82+
83+
- `describe_memes` failures, OpenRouter rate-limits, free-tier exhaustion, 402s, circuit-breaker trips. **Do not file at all** — known issue, tracked elsewhere. (See `feedback_describe_memes_no_issues.md` memory.)
84+
- DB connection pool exhaustion (`asyncpg.exceptions.TooManyConnectionsError`, "InterfaceError"). **Comment on `[incident:db-pool]`** if it exists; only create new if no open ticket and the rate is ≥10× normal.
85+
- `score column does not exist` / similar `ProgrammingError` from a known unmigrated branch. **Comment on `[incident:goat-score-column]`**, don't refile.
86+
- Telegram `Forbidden` errors for blocked users, IntegrityError race conditions. **Skip entirely** unless rate spikes >50/h.
87+
88+
For everything else:
89+
90+
- **Critical** (production down, users can't use bot, data loss): run `/investigate`, create HIGH priority `[incident:<slug>]` ticket for CTO with investigation + proposed fix.
91+
- **High** (errors affecting UX, recurring TypeError/AttributeError in hot paths): create HIGH `[incident:<slug>]` ticket for CTO. Run `/investigate` first if root cause unclear.
92+
93+
**Dedupe preflight is mandatory.** Before `paperclipCreateIssue`, search `paperclipApiRequest method="GET" path="/api/companies/$COMPANY_ID/issues?search=<slug>"`. If any match is `todo|in_progress|blocked|backlog`, `paperclipAddComment` instead. Critical: this kills the "DB pool exhausted ×3 tickets in one afternoon" pattern.
94+
95+
**Cap output per scan.** A single 1h scan should produce at most **3 new issues**. If you find more, batch the rest into a single `[scan:YYYY-MM-DD-HHmm]` summary ticket with bulleted findings.
10696

10797
### 4. Write QA Report
10898
`experiments/reports/qa-YYYY-MM-DD-HHmm.md`:

agents/release-engineer/AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ skills:
66
- canary
77
- document-release
88
- setup-deploy
9+
- canary
10+
- benchmark
911
---
1012

1113
# Release Engineer — Operating Instructions

0 commit comments

Comments
 (0)