Skip to content

Commit 3fd7a33

Browse files
committed
1.1.0: Atomic signal protocol, unstick script, deadlock-resistant handoffs
Fixes a recurring cross-turn deadlock where one side wrote State.json and the other side never woke up to see it. Signals now explicitly pair the State.json flip with a backgrounded wait-for-state.sh watcher, and a new unstick.sh diagnostic recovers from lingering stalls.
1 parent 1205206 commit 3fd7a33

5 files changed

Lines changed: 341 additions & 38 deletions

File tree

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "tandemkit",
3-
"version": "1.0.8",
3+
"version": "1.1.0",
44
"description": "Describe your goal, approve the spec, then step away — Claude and Codex loop together until it's right.",
55
"author": {
66
"name": "Cihat Gündüz",

commands/init.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,14 @@ Populate each with project-specific context. Key requirements:
369369

370370
**Generator.md should reference existing project skills** that are relevant (e.g., "Load `swift-code-context` before writing Swift code", "Load `swiftui-code-context` for SwiftUI work", "Use `review-swift-changes` for validation"). **For macOS apps, reference `macos-peekaboo` (runtime UI automation) and `macos-accessibility-ids` (so new/touched SwiftUI/AppKit views are automatable by Peekaboo and XCUITest out of the box).**
371371

372+
**Generator.md AND Evaluator.md must both include a top-of-file reminder** pointing at the Signal Protocol in SKILL.md. Suggested exact wording (add verbatim at the top of each file, under the first heading):
373+
374+
```markdown
375+
> **Signal Protocol — Atomic (NON-NEGOTIABLE):** every round handoff is the two-step SIGNAL from the Generator/Evaluator SKILL.md §"Signal Protocol" — flip State.json **and** launch `wait-for-state.sh` via `Bash run_in_background: true` **before the response ends**. A State.json write without the watcher deadlocks the loop; a foreground `ls`/`until` poll is NOT a substitute — those die when the turn ends. If the user ever asks "why did you stop?", run `scripts/unstick.sh <mission>` and follow the "If the user asks…" section of SKILL.md.
376+
```
377+
378+
This reminder is the project-level safety net that makes the skill-level rule impossible to miss, even if the agent somehow skims past it in SKILL.md.
379+
372380
**Planner.md should reference ALL major documentation areas** — not just code docs. Include research folders, exploration docs, context directories, domain-specific reference material with "When to read" guidance.
373381

374382
### Verify Build Commands

scripts/unstick.sh

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
#!/usr/bin/env bash
2+
# TandemKit — unstick
3+
#
4+
# Diagnoses the Generator↔Evaluator signal loop when either side has stopped, and optionally re-fires the other
5+
# side's live watcher by refreshing State.json's `updated` timestamp. Designed to be invoked the moment the user
6+
# asks "why did you stop?" / "what are you waiting for?" / "why are both frozen?" — from either chat, without the
7+
# agent needing to know in advance which side is at fault.
8+
#
9+
# Usage:
10+
# unstick.sh <mission-folder> # diagnose only
11+
# unstick.sh <mission-folder> --touch # refresh State.json's `updated` field to re-fire any live watcher
12+
#
13+
# The script itself takes no action beyond reading State.json and (optionally) touching its timestamp. The agent
14+
# reading the output decides what to do next:
15+
#
16+
# - If the agent IS the at-fault side, it resumes work immediately per its own SKILL.md. No --touch needed —
17+
# doing the work IS the fix.
18+
# - If the OTHER side is at fault, re-run with --touch so any still-alive watcher on that side picks up a fresh
19+
# file event and wakes. If nothing wakes within a minute, that side's background task has died with its
20+
# Claude Code session (usage limit / compaction / crash) and only the user can nudge the dead session.
21+
#
22+
# Exit 0 = diagnosis printed successfully. Exit 1 = bad args or missing State.json.
23+
24+
set -euo pipefail
25+
26+
if [[ $# -lt 1 ]]; then
27+
echo "Usage: unstick.sh <mission-folder> [--touch]" >&2
28+
exit 1
29+
fi
30+
31+
MISSION="$1"
32+
TOUCH=false
33+
if [[ "${2:-}" == "--touch" ]]; then TOUCH=true; fi
34+
35+
STATE_FILE="$MISSION/State.json"
36+
37+
if [[ ! -f "$STATE_FILE" ]]; then
38+
echo "ERROR: State.json not found at $STATE_FILE" >&2
39+
exit 1
40+
fi
41+
42+
read_field() {
43+
python3 -c "
44+
import json
45+
with open('$STATE_FILE') as f:
46+
print(json.load(f).get('$1', 'null'))
47+
" 2>/dev/null || echo "null"
48+
}
49+
50+
ROUND=$(read_field round)
51+
PHASE=$(read_field phase)
52+
G_STATUS=$(read_field generatorStatus)
53+
E_STATUS=$(read_field evaluatorStatus)
54+
VERDICT=$(read_field verdict)
55+
UPDATED=$(read_field updated)
56+
57+
# Count live wait-for-state.sh watchers for this mission, grouped by which side they serve, per the canonical
58+
# Signal Protocol in each side's SKILL.md:
59+
# - Evaluator-side watchers wait for `generatorStatus ready-for-eval` (next Generator round signal).
60+
# - Generator-side watchers wait for `evaluatorStatus done` (verdict) or `evaluatorStatus watching` (initial
61+
# Evaluator-ready signal on first mission start).
62+
# Non-canonical watcher patterns (e.g. watching one's own field flip) are deliberately not counted — they mask
63+
# real deadlocks by letting the script report a healthy watcher when the actual protocol-defined watcher is dead.
64+
MISSION_WATCHERS=$(pgrep -fa wait-for-state.sh 2>/dev/null | grep -F "$MISSION" || true)
65+
WATCHERS_EVAL_SIDE=$(echo "$MISSION_WATCHERS" | grep -c -E "generatorStatus ready-for-eval" 2>/dev/null || true)
66+
WATCHERS_GEN_SIDE=$(echo "$MISSION_WATCHERS" | grep -c -E "evaluatorStatus (done|watching)" 2>/dev/null || true)
67+
WATCHERS_COMPLETION=$(echo "$MISSION_WATCHERS" | grep -c "phase complete" 2>/dev/null || true)
68+
WATCHERS_EVAL_SIDE=${WATCHERS_EVAL_SIDE:-0}
69+
WATCHERS_GEN_SIDE=${WATCHERS_GEN_SIDE:-0}
70+
WATCHERS_COMPLETION=${WATCHERS_COMPLETION:-0}
71+
72+
# Diagnose at-fault side from the state combination. "At fault" here means "the side that owes the next write" —
73+
# not a value judgement. Healthy in-flight work (e.g. generatorStatus=working while Generator implements) shows as
74+
# "generator" because the Generator is who will move next; the script cannot distinguish "actively working" from
75+
# "stalled mid-work" without additional signal.
76+
AT_FAULT="unknown"
77+
ACTION=""
78+
case "$G_STATUS/$E_STATUS" in
79+
"ready-for-eval/pending"|"ready-for-eval/watching")
80+
AT_FAULT="evaluator"
81+
ACTION="Evaluator should read Generator/Round-$ROUND.md and evaluate."
82+
;;
83+
"ready-for-eval/evaluating")
84+
AT_FAULT="evaluator"
85+
ACTION="Evaluator claimed evaluating but no verdict landed. Resume evaluation or signal done with the atomic template."
86+
;;
87+
"ready-for-eval/done")
88+
AT_FAULT="generator"
89+
ACTION="Generator should read Evaluator/Round-$ROUND.md (verdict: $VERDICT) and start the next round."
90+
;;
91+
"working/"*)
92+
AT_FAULT="generator"
93+
ACTION="Generator is working (healthy if actively implementing; stalled if no progress for 15+ min). If stalled, resume implementation and signal ready-for-eval when done."
94+
;;
95+
"researching/"*)
96+
AT_FAULT="generator"
97+
ACTION="Generator is in research mode. If the Evaluator is already watching, proceed to implementation."
98+
;;
99+
*)
100+
ACTION="State combination ($G_STATUS/$E_STATUS) does not match a known waiting pattern — investigate manually."
101+
;;
102+
esac
103+
104+
# Flag the likely watcher-missing side separately from the at-fault side. The two can diverge: e.g. the Evaluator
105+
# might be at-fault (their turn to act) AND their watcher is dead, which compounds the deadlock because their
106+
# dead watcher means the next Generator signal will also be missed.
107+
WATCHER_NOTE=""
108+
if [[ "$AT_FAULT" == "evaluator" && "$WATCHERS_EVAL_SIDE" == "0" ]]; then
109+
WATCHER_NOTE="⚠ Evaluator has no live watcher — it will not wake when the Generator signals again."
110+
elif [[ "$AT_FAULT" == "generator" && "$WATCHERS_GEN_SIDE" == "0" ]]; then
111+
WATCHER_NOTE="⚠ Generator has no live watcher — it will not wake when the Evaluator signals again."
112+
fi
113+
114+
cat <<EOF
115+
═══ TandemKit signal-loop diagnosis ═══
116+
mission: $MISSION
117+
round: $ROUND (phase: $PHASE)
118+
generatorStatus: $G_STATUS
119+
evaluatorStatus: $E_STATUS
120+
verdict: $VERDICT
121+
last updated: $UPDATED
122+
123+
evaluator-side watchers (generatorStatus ready-for-eval): $WATCHERS_EVAL_SIDE
124+
generator-side watchers (evaluatorStatus done/watching): $WATCHERS_GEN_SIDE
125+
completion watchers (phase complete): $WATCHERS_COMPLETION
126+
127+
at-fault side: $AT_FAULT
128+
next action: $ACTION
129+
EOF
130+
131+
if [[ -n "$WATCHER_NOTE" ]]; then
132+
echo " $WATCHER_NOTE"
133+
fi
134+
135+
if [[ "$TOUCH" == true ]]; then
136+
python3 <<PYEOF
137+
import json, datetime
138+
with open('$STATE_FILE') as f:
139+
s = json.load(f)
140+
s['updated'] = datetime.datetime.utcnow().isoformat() + 'Z'
141+
with open('$STATE_FILE', 'w') as f:
142+
json.dump(s, f, indent=2)
143+
f.write('\n')
144+
PYEOF
145+
echo
146+
echo " ✓ State.json \`updated\` timestamp refreshed — any live watcher on the other side should pick up"
147+
echo " the file event and wake. If nothing wakes within a minute, that side's background task is dead"
148+
echo " (session reset / usage limit) — only the user can re-arm it by nudging that session directly."
149+
fi

skills/evaluator/SKILL.md

Lines changed: 95 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,67 @@ You are the Evaluator. Your job is to verify the Generator's work against the sp
1818
2. **Report format** is in `templates/Evaluator-Round-Format.md`. **Strategies** are in `strategies/`.
1919
3. Do NOT use subagents for evaluation — you and Codex are the two independent evaluators.
2020

21+
## ⛔ Signal Protocol — Atomic (NON-NEGOTIABLE) ⛔
22+
23+
**A "signal" from the Evaluator to the Generator is NOT just a State.json write. It is a two-step atomic operation, and both steps must happen before your response ends. Skipping the second step deadlocks the loop — the Generator can flip to `ready-for-eval` in the next round but nothing will wake you to respond.**
24+
25+
The same applies to the readiness signal at Step 2 (`evaluatorStatus: watching`) and to the "keep watching" watchers after every verdict.
26+
27+
### The SIGNAL template — use this EVERY time you hand off or wait for a round
28+
29+
```bash
30+
# Step 1 of 2 — Flip State.json (Edit/Write):
31+
# evaluatorStatus: "watching" | "evaluating" | "done"
32+
# verdict: PASS / PASS_WITH_GAPS / FAIL / BLOCKED (after Step 4)
33+
# round: N
34+
# updated: <now>
35+
#
36+
# Step 2 of 2 — IMMEDIATELY launch the wake-up watcher in background.
37+
# Use the Bash tool with run_in_background: true. Do NOT foreground.
38+
# After a verdict, arm BOTH watchers (see Step 6: next-round + completion).
39+
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" \
40+
"$(pwd)/TandemKit/<mission>" generatorStatus ready-for-eval
41+
```
42+
43+
**A signal is incomplete without both steps.** If you wrote Step 1 and did not start Step 2 before the response ended, you violated the protocol. The Generator's next round signal will sit unseen until the user manually intervenes.
44+
45+
### Why it MUST be the backgrounded watcher
46+
47+
Within one turn, foreground `ls` polls or `until` loops inside a single Bash call work fine. But **the moment your response ends, foreground polls die**. The only thing that wakes you across turn boundaries is a `run_in_background: true` Bash task completing and firing a `<task-notification>` into your session. `wait-for-state.sh` exists specifically for this purpose:
48+
49+
- It uses watchman-wait when available, else md5-polls State.json every 5 seconds.
50+
- When the watched field matches, it prints `READY` and exits cleanly.
51+
- Exit → Claude Code fires `<task-notification>` → your next turn starts automatically → read current State.json, do the work, signal again (with the same atomic template).
52+
53+
### Before your response ends — pre-flight checklist
54+
55+
If your response is about to end, verify **all three** of these:
56+
- [ ] State.json is in the correct state (`watching` / `evaluating` / `done` + `verdict` if applicable).
57+
- [ ] A `wait-for-state.sh … generatorStatus ready-for-eval` watcher is running via `Bash run_in_background: true` (plus a `phase complete` watcher after a verdict — see Step 6).
58+
- [ ] The last thing you did was a tool call (ideally the watcher launch), not explanatory text. Closing narration like "Watching for next round…" = the deadlock pattern.
59+
60+
If any box is unchecked: **do not let the response end.** Fix it with another tool call.
61+
62+
### Why this is non-negotiable
63+
64+
This pattern has caused real cross-turn deadlocks in live missions in BOTH directions — Evaluator PASSes sitting unseen because the Generator didn't arm its wake-up watcher, and Generator signals sitting unseen because the Evaluator ended its response after writing the verdict without arming the next-round watcher. The atomic template above is the only reliable fix.
65+
66+
## If the user asks "why did you stop?" / "what are you waiting for?" / "why are both frozen?"
67+
68+
Treat it as an unstick request. Run the diagnostic:
69+
70+
```bash
71+
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/unstick.sh" \
72+
"$(pwd)/TandemKit/NNN-MissionName"
73+
```
74+
75+
Interpret `at-fault side`:
76+
77+
- **If YOU (Evaluator) are at-fault:** resume work immediately — re-read State.json, pick up the current round, do the next step per this SKILL. Re-signal at the end with the atomic template. No `--touch` needed; doing the work IS the fix.
78+
- **If the Generator is at-fault and your watcher is alive:** no action — your watcher will fire when they move. Show the diagnosis to the user.
79+
- **If the Generator is at-fault and your watcher is dead:** re-arm it immediately (Step 6 of this SKILL — both watchers) so their eventual signal doesn't get missed a second time.
80+
- **If the diagnosis says your watcher is alive but the Generator is stuck:** re-run with `--touch` to refresh State.json's mtime. That re-fires the Generator's live watcher if theirs is still alive. If that doesn't wake them, their session is dead — only the user can nudge it directly.
81+
2182
## Mindset + Anti-Bias Rules
2283

2384
- **Assume the Generator made mistakes.** Your job is to find them.
@@ -82,19 +143,23 @@ The user invokes this skill with `/tandemkit:evaluator NNN-MissionName`. First r
82143
4. Read any `UserFeedback/` files if this is a post-feedback round
83144
5. **Scan `.claude/skills/` for skills relevant to this mission's topic.** Load any that seem related — they may contain domain knowledge, validation rules, or conventions critical for correct evaluation. If the Spec mentions specific skills, load those too.
84145

85-
## Step 2 — Signal Readiness and Wait for Generator
146+
## Step 2 — Signal Readiness and Wait for Generator (ATOMIC SIGNAL)
147+
148+
This is a SIGNAL per the "⛔ Signal Protocol" section above. Both halves mandatory before response ends.
149+
150+
5. **Half 1: flip State.json**`evaluatorStatus: "watching"`. Read-modify-write only your field.
86151

87-
5. Update State.json: `evaluatorStatus: "watching"`. Read-modify-write only your field.
88-
6. Check `generatorStatus`:
89-
- `"ready-for-eval"` → Proceed to Step 3 immediately.
90-
- Any other value → Wait:
152+
6. **Half 2: launch the wake-up watcher.** Check `generatorStatus`:
153+
- If already `"ready-for-eval"` → proceed to Step 3 immediately within this turn (no watcher needed — just continue).
154+
- Otherwise → **before ending this response**, launch the watcher via `Bash run_in_background: true`:
91155
```bash
92-
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" "$(pwd)/TandemKit/NNN-MissionName" generatorStatus ready-for-eval
156+
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" \
157+
"$(pwd)/TandemKit/NNN-MissionName" generatorStatus ready-for-eval
93158
```
94-
Run with `run_in_background: true`. When it prints "READY", proceed to Step 3.
159+
The script's exit fires a `<task-notification>` that auto-starts your next turn with Step 3. Foreground polls won't survive the turn boundary.
95160

96161
════════════════════════════════════════
97-
→ Watching — Waiting for Generator
162+
→ Watching — watcher armed, waiting for Generator
98163
════════════════════════════════════════
99164

100165
## Step 3 — Parallel Independent Evaluation (Round 1 of each eval cycle)
@@ -239,37 +304,38 @@ The user invokes this skill with `/tandemkit:evaluator NNN-MissionName`. First r
239304
240305
**Efficiency tip:** When the changes between rounds are small (e.g., a few findings adjusted, one section updated), consider copying the previous file and editing only the changed parts (`cp` + Edit tool) instead of writing the entire file from scratch. This saves output tokens and time. Use your judgment — if the restructuring is substantial, a fresh Write is cleaner.
241306
242-
## Step 5 — Signal Generator
307+
## Step 5 — Signal Generator + Arm Watchers (ATOMIC SIGNAL)
308+
309+
**This is a SIGNAL per the "⛔ Signal Protocol" section above. All three halves mandatory before response ends.** Writing the verdict without arming the watchers is the deadlock pattern — do not skip any half.
310+
311+
22. **Half 1: flip State.json** → `evaluatorStatus: "done"`, `verdict: "..."`, `round: N`, `updated: <now>`.
312+
313+
23. **Half 2: arm the next-round watcher** via `Bash run_in_background: true`:
314+
```bash
315+
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" \
316+
"$(pwd)/TandemKit/NNN-MissionName" generatorStatus ready-for-eval
317+
```
318+
When it fires, read `round` from State.json to learn which `Round-NN.md` to evaluate, then re-enter **Step 3**.
243319
244-
22. Update State.json: `evaluatorStatus: "done"`, `verdict: "..."`, `round: N`
320+
24. **Half 3: arm the completion watcher** via `Bash run_in_background: true`:
321+
```bash
322+
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" \
323+
"$(pwd)/TandemKit/NNN-MissionName" phase complete
324+
```
325+
When it fires, print the closing banner and stop — the mission is over.
245326
246327
════════════════════════════════════════
247-
→ Verdict: [PASS/FAIL/PASS_WITH_GAPS/BLOCKED] — Watching for next round
328+
→ Verdict: [PASS/FAIL/PASS_WITH_GAPS/BLOCKED]
329+
→ Both watchers armed — response may end safely
248330
════════════════════════════════════════
249331
250332
## Step 6 — Keep Watching
251333
252334
**CRITICAL: You are NEVER done until `phase` is `"complete"`.** A PASS verdict does NOT end your watch duty. The user may give feedback, the Generator will iterate, and you will evaluate again. Only `phase: "complete"` (set by the user through the Generator) or the user exiting your session ends your job.
253335
254-
After writing your verdict, IMMEDIATELY start TWO background watchers:
255-
256-
1. **Next round watcher:**
257-
```bash
258-
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" "$(pwd)/TandemKit/NNN-MissionName" generatorStatus ready-for-eval
259-
```
260-
261-
Read `round` from `State.json` at wake time to know which `Round-NN.md` to evaluate.
262-
263-
2. **Completion watcher:**
264-
```bash
265-
bash "$HOME/.claude/plugins/cache/FlineDev/tandemkit/latest/scripts/wait-for-state.sh" "$(pwd)/TandemKit/NNN-MissionName" phase complete
266-
```
267-
268-
Run both with `run_in_background: true`. When either returns:
269-
- If `generatorStatus: ready-for-eval` → go back to **Step 3**
270-
- If `phase: complete` → print the closing banner and stop
336+
The two watchers from Step 5 Halves 2 and 3 are the mechanism that lets you end the current response safely. Their completion triggers new turns automatically.
271337
272-
If a watch times out (10 minutes), re-read State.json and restart the watchers. NEVER go idle.
338+
If a watcher times out (default 10 min in `wait-for-state.sh`), the runtime delivers a completion notification anyway — on that wake-up, re-read State.json, decide next action, and re-arm watchers if still waiting. NEVER go idle without a watcher armed.
273339
274340
### Catchup case — Evaluator has fallen behind
275341

0 commit comments

Comments
 (0)