feat(agents): control-system fix — kill QA dup-issues, reshape Analyst, model bumps, skill realloc (#186)

ohld · web-flow · commit 7531d9152f5e · 2026-04-24T20:47:40.000+08:00
Stage 1 of the agent-org plan from docs/paperclip-native-migration.md +
docs/agent-org-audit-2026-04-24.md. Audit of last 100 issues showed the
firefighting problem isn't routines — it's QA filing duplicate incident
tickets (DB pool x4, describe_memes x6, score column x4) and Analyst
dumping reactive numbers into CEO's inbox.

Changes:
- QA: explicit DO-NOT-FILE list for known recurring incidents
  (describe_memes, db-pool, OpenRouter, Forbidden errors), 3-issue/scan
  output cap, dedup preflight made mandatory. Removed duplicated
  issue-hygiene + MCP-tools blocks. Skills: -design-consultation +devex-review.
- Analyst: daily report rewritten to fixed 4-section shape — one hypothesis,
  one recommended bet for CEO, severity-gated incident digest (max 5 bullets),
  open hypotheses status. Anti-patterns named explicitly. Skills: +learn,+codex.
- Models: CEO/CTO/Staff Eng bumped claude-opus-4-6 → claude-opus-4-7.
- Skill reallocation per gstack agent-company best practices:
  CEO +learn; CTO -plan-design-review; Staff Eng +cso (scoped to
  auth/payments/uploads/infra PRs only); Release Eng +canary,+benchmark
  (post-deploy monitoring is theirs, not QA's); Comms +learn.
- agents/_sync_config.py: new Python helper that diffs current adapterConfig
  + desiredSkills + heartbeat against the manifest+frontmatter and
  PATCHes only on change. Preserves paperclipai/* skill paths. Permissions
  routed to dedicated /permissions endpoint.
- agents/deploy.sh: invokes _sync_config.py as second pass after the
  existing markdown PUT pass.
- workflow: pip install pyyaml so the new sync helper runs on the runner.

Verified locally: dry-run after apply shows zero drift across all 7 agents.
CEO desiredSkills now mixes 5 gstack + 4 preserved paperclipai paths.

CEO mission reframe + gbrain integration are deferred to Stage 2 per plan
(land after this Stage's effect on inbox volume can be measured).
diff --git a/.github/workflows/paperclip-deploy-agents.yml b/.github/workflows/paperclip-deploy-agents.yml
@@ -18,6 +18,9 @@ jobs:
     steps:
       - uses: actions/checkout@v4
 
+      - name: Install Python deps for config sync
+        run: pip install --quiet pyyaml
+
       - name: Dry-run agent sync
         env:
           PAPERCLIP_URL: ${{ secrets.PAPERCLIP_URL }}
diff --git a/agents/.paperclip.yaml b/agents/.paperclip.yaml
@@ -34,7 +34,7 @@ agents:
       config:
         dangerouslySkipPermissions: true
         maxTurnsPerRun: 300
-        model: "claude-opus-4-6"
+        model: "claude-opus-4-7"
     runtime:
       heartbeat:
         enabled: true
@@ -85,7 +85,7 @@ agents:
       config:
         dangerouslySkipPermissions: true
         maxTurnsPerRun: 300
-        model: "claude-opus-4-6"
+        model: "claude-opus-4-7"
     runtime:
       heartbeat:
         maxConcurrentRuns: 1
@@ -210,7 +210,7 @@ agents:
       config:
         dangerouslySkipPermissions: true
         maxTurnsPerRun: 200
-        model: "claude-opus-4-6"
+        model: "claude-opus-4-7"
     runtime:
       heartbeat:
         maxConcurrentRuns: 1
diff --git a/agents/_sync_config.py b/agents/_sync_config.py
@@ -0,0 +1,184 @@
+#!/usr/bin/env python3
+"""Diff-first PATCH of agent adapterConfig + desiredSkills + heartbeat from manifest.
+
+Called by `agents/deploy.sh` after the markdown PUT pass. Reads `.paperclip.yaml`
+and per-agent `AGENTS.md` frontmatter, compares with prod via Paperclip API,
+PATCHes only agents whose config actually drifted.
+
+Env: PAPERCLIP_URL, PAPERCLIP_API_KEY, COMPANY_ID, SCRIPT_DIR, DRY_RUN.
+"""
+
+import json
+import os
+import re
+import sys
+import urllib.error
+import urllib.request
+
+import yaml
+
+URL = os.environ["PAPERCLIP_URL"]
+KEY = os.environ["PAPERCLIP_API_KEY"]
+COMPANY = os.environ["COMPANY_ID"]
+SCRIPT_DIR = os.environ["SCRIPT_DIR"]
+DRY = os.environ.get("DRY_RUN", "0") == "1"
+
+# Skills published under paperclipai/paperclip/ — preserve when present, don't expect them in frontmatter.
+PAPERCLIP_NS_SKILLS = {
+    "paperclip",
+    "paperclip-create-agent",
+    "paperclip-create-plugin",
+    "para-memory-files",
+}
+
+
+def api(method: str, path: str, body=None):
+    req = urllib.request.Request(URL + path, method=method)
+    req.add_header("Authorization", f"Bearer {KEY}")
+    req.add_header("Content-Type", "application/json")
+    # Cloudflare in front of org.ffmemes.com blocks default Python-urllib UA (error 1010).
+    req.add_header("User-Agent", "ffmemes-deploy.sh/1.0")
+    data = json.dumps(body).encode() if body is not None else None
+    try:
+        with urllib.request.urlopen(req, data=data) as resp:
+            return json.loads(resp.read())
+    except urllib.error.HTTPError as e:
+        print(f"  HTTP {e.code} on {method} {path}: {e.read().decode()[:300]}", file=sys.stderr)
+        raise
+
+
+def skill_to_path(slug: str) -> str:
+    if slug in PAPERCLIP_NS_SKILLS:
+        return f"paperclipai/paperclip/{slug}"
+    return f"garrytan/gstack/{slug}"
+
+
+def read_frontmatter_skills(agents_md_path: str) -> list[str]:
+    if not os.path.exists(agents_md_path):
+        return []
+    with open(agents_md_path) as f:
+        text = f.read()
+    m = re.match(r"^---\n(.*?)\n---", text, re.DOTALL)
+    if not m:
+        return []
+    fm = yaml.safe_load(m.group(1)) or {}
+    return list(fm.get("skills") or [])
+
+
+def main() -> int:
+    with open(f"{SCRIPT_DIR}/.paperclip.yaml") as f:
+        manifest = yaml.safe_load(f)
+
+    agents_list = api("GET", f"/api/companies/{COMPANY}/agents")
+    by_slug = {a["urlKey"]: a for a in agents_list}
+
+    patched = 0
+    skipped = 0
+    failed = 0
+    would_patch = 0
+    for slug, mblock in (manifest.get("agents") or {}).items():
+        if slug not in by_slug:
+            print(f"  SKIP {slug} — not in prod")
+            continue
+        cur = by_slug[slug]
+
+        # Targets from manifest
+        ad_cfg = (mblock.get("adapter") or {}).get("config") or {}
+        target_model = ad_cfg.get("model")
+        target_max_turns = ad_cfg.get("maxTurnsPerRun")
+        target_heartbeat = (mblock.get("runtime") or {}).get("heartbeat") or {}
+        target_perms = mblock.get("permissions") or {}
+
+        # Frontmatter → desiredSkills (preserve any paperclipai/* currently attached)
+        fm_skills = read_frontmatter_skills(f"{SCRIPT_DIR}/{slug}/AGENTS.md")
+        cur_ac = cur.get("adapterConfig") or {}
+        cur_skills = ((cur_ac.get("paperclipSkillSync") or {}).get("desiredSkills")) or []
+        preserved = [s for s in cur_skills if s.startswith("paperclipai/")]
+        target_skills = sorted(set(preserved + [skill_to_path(s) for s in fm_skills]))
+        cur_skills_sorted = sorted(cur_skills)
+
+        # Diff
+        changes: list[str] = []
+        if target_model and cur_ac.get("model") != target_model:
+            changes.append(f"model: {cur_ac.get('model')} → {target_model}")
+        if target_max_turns and cur_ac.get("maxTurnsPerRun") != target_max_turns:
+            changes.append(
+                f"maxTurnsPerRun: {cur_ac.get('maxTurnsPerRun')} → {target_max_turns}"
+            )
+        if target_skills != cur_skills_sorted:
+            added = sorted(set(target_skills) - set(cur_skills_sorted))
+            removed = sorted(set(cur_skills_sorted) - set(target_skills))
+            if added:
+                changes.append(f"+skills: {added}")
+            if removed:
+                changes.append(f"-skills: {removed}")
+
+        cur_rt = cur.get("runtimeConfig") or {}
+        cur_hb = cur_rt.get("heartbeat") or {}
+        for k, v in target_heartbeat.items():
+            if cur_hb.get(k) != v:
+                changes.append(f"heartbeat.{k}: {cur_hb.get(k)} → {v}")
+
+        cur_perms = cur.get("permissions") or {}
+        perm_changes: list[tuple[str, object]] = []
+        for k, v in target_perms.items():
+            if cur_perms.get(k) != v:
+                perm_changes.append((k, v))
+                changes.append(f"permissions.{k}: {cur_perms.get(k)} → {v}")
+
+        if not changes:
+            print(f"  skip {slug} (no config drift)")
+            skipped += 1
+            continue
+
+        if DRY:
+            print(f"  WOULD PATCH {slug}: {'; '.join(changes)}")
+            would_patch += 1
+            continue
+
+        # Build merged payload — preserve everything else (instructionsFilePath, env, etc.)
+        new_ac = dict(cur_ac)
+        if target_model:
+            new_ac["model"] = target_model
+        if target_max_turns:
+            new_ac["maxTurnsPerRun"] = target_max_turns
+        new_skill_sync = dict(new_ac.get("paperclipSkillSync") or {})
+        new_skill_sync["desiredSkills"] = target_skills
+        new_ac["paperclipSkillSync"] = new_skill_sync
+
+        new_rt = dict(cur_rt)
+        new_hb = dict(cur_hb)
+        new_hb.update(target_heartbeat)
+        new_rt["heartbeat"] = new_hb
+
+        # Permissions go through a separate endpoint (PATCH /api/agents/:id rejects them).
+        body = {
+            "adapterConfig": new_ac,
+            "runtimeConfig": new_rt,
+        }
+        try:
+            api("PATCH", f"/api/agents/{cur['id']}", body)
+            print(f"  PATCHED {slug}: {'; '.join(changes)}")
+            patched += 1
+            if perm_changes:
+                # Best-effort permissions update via dedicated endpoint.
+                new_perms = dict(cur_perms)
+                new_perms.update(target_perms)
+                try:
+                    api("PATCH", f"/api/agents/{cur['id']}/permissions", new_perms)
+                    print(f"    + permissions updated: {dict(perm_changes)}")
+                except Exception as pe:
+                    print(f"    WARN permissions sync failed for {slug}: {pe}", file=sys.stderr)
+        except Exception as e:
+            print(f"  ERROR PATCH {slug}: {e}", file=sys.stderr)
+            failed += 1
+
+    if DRY:
+        print(f"\nConfig sync (dry-run): would patch {would_patch}, skip {skipped} (no drift).")
+    else:
+        print(f"\nConfig sync: patched={patched}, skipped={skipped}, failed={failed}.")
+    return 1 if failed > 0 else 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/agents/analyst/AGENTS.md b/agents/analyst/AGENTS.md
@@ -6,6 +6,8 @@ skills:
   - investigate
   - browse
   - retro
+  - learn
+  - codex
 ---
 
 # Analyst Agent — Operating Instructions
@@ -126,14 +128,36 @@ Flag if pct_reacted drops below 55% or pct_delivered drops below 90%.
 
 IMPORTANT: The old dashboard metric (user_stats.nmemes_sent > 0) measures "reacted", not "received". Always use user_meme_reaction directly for delivery measurement. See comments in metrics.sql for details.
 
-### 8. Write Daily Report
-Create a report file at `experiments/reports/YYYY-MM-DD-HHmm.md` following the format in `experiments/README.md`.
+### 8. Write Daily Report (FIXED SHAPE — do not deviate)
 
-The report should tell a **story**, not just dump numbers:
-- What changed since the last report?
-- What's working? What's not?
-- What trends are emerging?
-- What should the CEO pay attention to?
+Create the report at `experiments/reports/YYYY-MM-DD-HHmm.md`. The report has **four sections** in this order. Do not add other sections, do not omit any. Brevity is mandatory.
+
+```markdown
+# Daily report YYYY-MM-DD
+
+## 1. The hypothesis
+<≤1 short paragraph. The most-surprising data point this run, and what it might mean. One claim. No hedging.>
+
+## 2. Recommended bet for CEO
+<≤1 short paragraph. Which research-idea (`memory:project_research_ideas.md`) or TODO (`TODOS.md`) is this evidence making *ripe to ship now*? Name the file/section. If nothing's ripe, write: "No new bet — keep advancing current bet."  Never recommend a bug fix here — that's a different lane.>
+
+## 3. Incident digest (max 5 bullets)
+<Only incidents that crossed a SEVERITY THRESHOLD this run. Threshold = errors > 1% of requests, OR North Star (session length median) drop > 10% week-over-week, OR a public outage / user-visible failure / moderator-flagged content surge. Below-threshold noise goes to the footer. If nothing crossed the threshold, write a single line: "No severe incidents this run." DO NOT rehash known recurring issues (describe_memes, OpenRouter, db-pool) — those are tracked elsewhere.>
+
+## 4. Open hypotheses (1 line each)
+<Each running experiment from `experiments/active/`. Format: "experiment-name — current Δ on metric (vs baseline) — days remaining." If conclusion-ready, say so.>
+
+---
+**Footer** (raw numbers, optional): copy the JSONL entry from §9 here for grep-ability. No prose.
+```
+
+**Anti-patterns** — kill these on sight:
+- "the most-surprising thing was a 12% jump in WAU and also a 7% drop in session length and the cold-start funnel improved by..." — pick ONE for §1.
+- "we should look into describe_memes coverage" — describe_memes is HARD-banned from §3 and §2.
+- "todo: investigate X, Y, Z" — that's reactive routing, not bet recommendation.
+- "incident digest" with 12 bullets — cap is 5, use the threshold filter.
+
+The CEO reads §1 and §2 first; §3 only matters when severity-gated. If §2 ever says "no new bet" three runs in a row without justification, the data lens is too narrow — widen the queries next run.
 
 ### 8b. Write Anomaly Report (for Comms Agent input)
 After the daily report, on the **morning run only** (08:00 or 09:00 MSK — whichever
diff --git a/agents/ceo/AGENTS.md b/agents/ceo/AGENTS.md
@@ -7,6 +7,7 @@ skills:
   - office-hours
   - autoplan
   - retro
+  - learn
 ---
 
 # CEO Agent — Operating Instructions
diff --git a/agents/comms-manager/AGENTS.md b/agents/comms-manager/AGENTS.md
@@ -5,6 +5,7 @@ reportsTo: ceo
 skills:
   - browse
   - frontend-design
+  - learn
 ---
 
 # Comms Manager — Operating Instructions
diff --git a/agents/cto/AGENTS.md b/agents/cto/AGENTS.md
@@ -4,7 +4,6 @@ title: Chief Technology Officer
 reportsTo: ceo
 skills:
   - plan-eng-review
-  - plan-design-review
   - retro
   - cso
   - codex
diff --git a/agents/deploy.sh b/agents/deploy.sh
@@ -86,11 +86,22 @@ for agent_dir in "$SCRIPT_DIR"/*/; do
   done
 done
 
+echo
+echo "Syncing adapter config + skills (diff-first PATCH)..."
+
+# Pass 2: model, maxTurnsPerRun, desiredSkills, runtime.heartbeat, permissions.
+# Diff first; PATCH only on change so we don't spam Paperclip's config-revision history.
+COMPANY_ID="$COMPANY_ID" SCRIPT_DIR="$SCRIPT_DIR" DRY_RUN="$DRY_RUN" \
+  python3 "$SCRIPT_DIR/_sync_config.py" || {
+  echo "Config sync failed." >&2
+  errors=$((errors + 1))
+}
+
 echo
 if [[ $DRY_RUN -eq 1 ]]; then
   echo "Dry-run complete. Re-run without --dry-run to apply."
 elif [[ $errors -gt 0 ]]; then
-  echo "Synced $synced_files files, $errors errors."
+  echo "Synced $synced_files files; $errors errors during apply."
   exit 1
 else
   echo "Synced $synced_files files. Changes take effect on next agent wake."
diff --git a/agents/qa-engineer/AGENTS.md b/agents/qa-engineer/AGENTS.md
@@ -9,7 +9,7 @@ skills:
   - benchmark
   - canary
   - design-review
-  - design-consultation
+  - devex-review
   - setup-browser-cookies
   - health
   - investigate
@@ -52,30 +52,6 @@ You have Paperclip MCP tools available. Use them for all Paperclip operations in
 <!-- END: issue-hygiene-v1 -->
 
 
-## Paperclip MCP Tools
-
-You have Paperclip MCP tools available. Use them for all Paperclip operations instead of curl:
-- `paperclipGetIssue` — fetch an issue by ID
-- `paperclipUpdateIssue` — update issue status/fields (use to mark done)
-- `paperclipCheckoutIssue` / `paperclipReleaseIssue` — check out / release issues
-- `paperclipInboxLite` — check your inbox for assignments
-- `paperclipCreateIssue` — create issues (for bug reports to CTO)
-- `paperclipAddComment` — comment on an issue
-- `paperclipApiRequest` — escape hatch for any `/api` endpoint
-
-<!-- BEGIN: issue-hygiene-v1 (prompt hotfix — remove when Paperclip ships dedupe + slug + sweep) -->
-## Issue Hygiene (v1)
-
-**Slug-first titles.** Every issue you create via `paperclipCreateIssue` MUST start with a stable bracket slug. Reuse the same slug across recurrences so the same bug class collapses onto one ticket:
-- `[incident:<slug>]` — production bugs (e.g. `[incident:db-pool]`, `[incident:describe-memes-timeout]`, `[incident:webhook-502]`)
-- `[deploy:<branch-or-pr>]`, `[report:YYYY-MM-DD]`, `[maintenance:<slug>]`, `[postmortem:<slug>]`
-
-**Dedupe preflight.** Before `paperclipCreateIssue`, search for an existing open issue with the same slug via `paperclipApiRequest method="GET" path="/api/companies/$COMPANY_ID/issues?search=<slug>"`. If any match is `todo|in_progress|blocked|backlog`, comment on it via `paperclipAddComment` with your new evidence instead of creating a new ticket. Critical: this kills the "DB pool exhausted ×3 tickets" pattern.
-
-**Single-writer rule.** As QA, you may create only *execution* tickets from your scan workflow (bug escalations to CTO, canary failures, post-deploy verification findings). Don't open planning/strategic tickets — those belong to CEO.
-<!-- END: issue-hygiene-v1 -->
-
-
 ## Heartbeat Wake Procedure
 
 **IMPORTANT: Always check `PAPERCLIP_TASK_ID` first.** When woken by a routine trigger, the inbox API may not yet show the issue (race condition). If `PAPERCLIP_TASK_ID` is set:
@@ -101,8 +77,22 @@ Check Sentry, Coolify logs, DB health.
 - **Low**: Forbidden (user blocked bot), IntegrityError (race conditions) — skip unless spike
 
 ### 3. Create Bug Reports & Auto-Escalate
-For **Critical**: run `/investigate` on the error to produce a root-cause report, then create a HIGH priority Paperclip task for **CTO** with the investigation attached, log source, and proposed fix. Use `[incident:<slug>]` title slug (dedupe preflight applies — see Issue Hygiene above).
-For **High**: Create HIGH priority Paperclip task for **CTO** with error, log source, suggested fix. Run `/investigate` first if the root cause is unclear.
+
+**DO NOT FILE these recurring incident classes — comment on the existing ticket instead.** A 2026-04-24 audit found these accounted for ~21 of 38 QA-filed issues over 4 weeks, almost all duplicates:
+
+- `describe_memes` failures, OpenRouter rate-limits, free-tier exhaustion, 402s, circuit-breaker trips. **Do not file at all** — known issue, tracked elsewhere. (See `feedback_describe_memes_no_issues.md` memory.)
+- DB connection pool exhaustion (`asyncpg.exceptions.TooManyConnectionsError`, "InterfaceError"). **Comment on `[incident:db-pool]`** if it exists; only create new if no open ticket and the rate is ≥10× normal.
+- `score column does not exist` / similar `ProgrammingError` from a known unmigrated branch. **Comment on `[incident:goat-score-column]`**, don't refile.
+- Telegram `Forbidden` errors for blocked users, IntegrityError race conditions. **Skip entirely** unless rate spikes >50/h.
+
+For everything else:
+
+- **Critical** (production down, users can't use bot, data loss): run `/investigate`, create HIGH priority `[incident:<slug>]` ticket for CTO with investigation + proposed fix.
+- **High** (errors affecting UX, recurring TypeError/AttributeError in hot paths): create HIGH `[incident:<slug>]` ticket for CTO. Run `/investigate` first if root cause unclear.
+
+**Dedupe preflight is mandatory.** Before `paperclipCreateIssue`, search `paperclipApiRequest method="GET" path="/api/companies/$COMPANY_ID/issues?search=<slug>"`. If any match is `todo|in_progress|blocked|backlog`, `paperclipAddComment` instead. Critical: this kills the "DB pool exhausted ×3 tickets in one afternoon" pattern.
+
+**Cap output per scan.** A single 1h scan should produce at most **3 new issues**. If you find more, batch the rest into a single `[scan:YYYY-MM-DD-HHmm]` summary ticket with bulleted findings.
 
 ### 4. Write QA Report
 `experiments/reports/qa-YYYY-MM-DD-HHmm.md`:
diff --git a/agents/release-engineer/AGENTS.md b/agents/release-engineer/AGENTS.md
@@ -6,6 +6,8 @@ skills:
   - canary
   - document-release
   - setup-deploy
+  - canary
+  - benchmark
 ---
 
 # Release Engineer — Operating Instructions
diff --git a/agents/staff-engineer/AGENTS.md b/agents/staff-engineer/AGENTS.md
diff --git a/docs/agent-org-audit-2026-04-24.md b/docs/agent-org-audit-2026-04-24.md