Two Claude Code windows on the same branch (or just both on main) silently
corrupt each other's /tmp/ci_watch_* state, leak watchers, and produce
mis-routed notifications. We introduce a per-session identity primitive
(session_id from the hook payload, bridged via a clone-local token file
written at SessionStart) and rewrite every /tmp filename to be keyed by
(branch, session). We also fix a pile of latent shell bugs surfaced while
auditing the same scripts: set -e + command-substitution traps, JSON parse
error handling, lock-file PID reuse, transcript-scan misses, atomic cache
writes, and a Python string-injection in stop_failure__rate_limit.sh.
/Users/yonichechik/.claude/scripts/ci_watch_persistent.sh— the watcher. Detached process. Launched from/ciskill via shell backgrounding; does NOT receive a hook stdin payload. GetsPORT,BRANCH,SESSION_TOKENas argv from the skill (whereSESSION_TOKENis the webhook MCP's UUID, not the Claude session_id)./Users/yonichechik/.claude/scripts/status_line.sh— runs once per refresh. RECEIVES JSON on stdin (workspace, context_window, rate_limits) but the payload does NOT includesession_id. Reads/tmp/ci_watch_*./Users/yonichechik/.claude/scripts/stop__sound.sh— Stop hook. Receives full hook JSON on stdin includingsession_idandtranscript_path./Users/yonichechik/.claude/scripts/session_start.sh— SessionStart hook. Receivessession_idon stdin (per docs/multi-session-architecture.md line 63: "Every hook receives a JSON payload on stdin that includessession_id,cwd,tool_name, andtool_input"). Currently does not read it./Users/yonichechik/.claude/scripts/stop_failure__rate_limit.sh— StopFailure hook. Has a Python heredoc that interpolates$CLAUDE_JSONinto Python source (path-injection risk if$HOMEis exotic)./Users/yonichechik/.claude/scripts/notification__sound.sh,notify_waiting.sh— sound + tab-title hooks. Already use/dev/ttyfor routing, so per-tty isolation handles them. Only minor issues (blockingafplay, nocommand -vguard)./Users/yonichechik/.claude/settings.json— hook registrations./Users/yonichechik/.claude/docs/multi-session-architecture.md— exists, documents current state including all known gaps. Confirmssession_idis on every hook stdin payload./Users/yonichechik/.claude/channel/webhook.ts— MCP webhook server. MintssessionToken = randomUUID()at startup;get_porttool returns${httpPort}:${sessionToken}. This token is per-MCP-process, NOT the Claude session UUID — different concept./Users/yonichechik/.claude/skills/ci/SKILL.md—/ciinvocation.
| Option | Evaluation |
|---|---|
A. $CLAUDE_SESSION_ID env var |
Not present. grep -rn 'CLAUDE_SESSION' in scripts/docs returns nothing. Not exposed by harness. |
B. Webhook TOKEN written by session_start.sh, read via parent PID |
Webhook TOKEN is per-MCP-server, not per-Claude-session. SessionStart hook also can't easily get the webhook port without an MCP call. |
C. Clone-local token file written by session_start.sh, hooks read via git rev-parse --show-toplevel |
Workable for hooks running inside the clone. But (1) two windows on the same clone overwrite each other's token; (2) status_line.sh can resolve clone root, but the watcher and hooks see only their own cwd. Not a clean primitive. |
| D. Parent PID of hook process | The hook's parent IS the Claude Code process — stable for a session's lifetime, unique across windows. But this is the OS PID of the harness, which we can read via ps -o ppid= from inside any hook. Reliable, no setup. Drawback: if Claude Code spawns hooks via an intermediate shell, the parent PID may not be the Claude process. |
E. Hook payload session_id (UUID) |
Already on stdin for every hook. 36 chars — too long for /tmp filenames as-is, but the first 8 chars of the UUID are uniquely sufficient (collision probability negligible across the handful of concurrent sessions a user has). Zero MCP calls. Works in every hook context. Drawback: status_line.sh's stdin payload does NOT contain session_id (only workspace/context_window), and ci_watch_persistent.sh receives no stdin payload at all. |
Hybrid (E + C): Hook payload session_id is the source of truth. We
shorten it to its first 8 hex chars (SID8 = ${session_id:0:8}) and use
(branch, SID8) as the /tmp namespace key. For the two contexts that
do NOT receive session_id on stdin — status_line.sh and
ci_watch_persistent.sh — session_start.sh writes a small
session-discovery file:
~/.claude/session-env/<sessionId>/
sid8 # 8-char short id
cwd # session cwd at start
branch # branch at start
Plus a cwd → sid8 reverse lookup at a per-cwd path:
~/.claude/cache/cwd-session/<sha1(cwd)> # contains sid8
status_line.sh reads its sid8 from ~/.claude/cache/cwd-session/<sha1(cwd)>.
ci_watch_persistent.sh receives sid8 as a NEW 4th argv argument from the
/ci skill (which already has access to it via the SessionStart-written
file — the skill resolves cwd → sid8 the same way).
Why this works:
- Simple: hooks that have
session_idon stdin use it directly; the two that don't have a single well-defined fallback path. - Reliable: SessionStart fires once at start, writes both files atomically (mktemp + mv), and the cwd-keyed file is overwritten if another session opens in the same cwd (last-writer-wins, which is correct — the new session is the live one).
- No MCP calls: pure filesystem reads.
- Works across all hook types.
Drawbacks accepted:
- Two simultaneous sessions in the same cwd (which the workflow explicitly discourages) will contend on the cwd→sid8 file. We treat this as out-of-scope; the cwd→sid8 file is purely a discoverability hint for status_line.
- 8 hex chars = 16^8 ≈ 4.3B possibilities. Birthday collision at 65k concurrent sessions, which is irrelevant.
Primitive: SID8 = first 8 hex chars of payload session_id. All
/tmp filenames key on (BRANCH_KEY, SID8):
/tmp/ci_watch_state_${BRANCH_KEY}_${SID8}
/tmp/ci_watch_lock_${BRANCH_KEY}_${SID8}
/tmp/ci_watch_pr_${BRANCH_KEY}_${SID8}
/tmp/ci_watch_${BRANCH_KEY}_${SID8}.log
where BRANCH_KEY="${BRANCH//\//__}" (slash-sanitized).
Discovery for hooks without session_id on stdin: read
~/.claude/cache/cwd-session/$(printf '%s' "$cwd" | shasum -a 1 | cut -c1-12),
which session_start.sh populates.
What:
- In
session_start.sh, before the existing logic, read stdin once intoINPUT, extractsession_idviajq -r .session_id, computeSID8="${session_id:0:8}". If empty/null, fall back tounknown— log a warning to~/.claude/logs/session_start.logso we can detect harness changes. - Compute
cwd_hash=$(printf '%s' "$PWD" | shasum -a 1 | cut -c1-12). mkdir -p ~/.claude/cache/cwd-session ~/.claude/session-env/$session_id.- Atomically write
~/.claude/cache/cwd-session/$cwd_hashand~/.claude/session-env/$session_id/sid8usingmktemp + mv. - Add a tiny shared helper
~/.claude/scripts/lib/session_id.shexporting two functions:sid8_from_payload <json>—jq -r .session_idthen cut to 8 chars.sid8_from_cwd <cwd>— sha1+cut, then read cache file. Echoes empty string on miss.
- Update
/Users/yonichechik/.claude/skills/ci/SKILL.md: after parsing branch, also resolveSID8viasid8_from_cwd "$PWD"and pass as the 4th argument to the watcher.
What:
- In
ci_watch_persistent.sh: addBRANCH_KEY="${BRANCH//\//__}"near top, replace all/tmppath uses of${BRANCH}with${BRANCH_KEY}. - In
status_line.sh: same sanitization for cache file reads (buildbranch_key="${branch//\//__}"). - In
skills/ci/SKILL.md: replace/tmp/ci_watch_${BRANCH}.logwith the sanitized form.
What:
ci_watch_persistent.sh:- Accept new 4th argv
SID8(required). Update usage string. - Define
SLOT="${BRANCH_KEY}_${SID8}"and rewrite all/tmp/ci_watch_*paths to use$SLOT. - Update the lock-file logic (Task 6 will tighten it further).
- Accept new 4th argv
status_line.sh:- Resolve
SID8via the cwd→session cache file using the dir field from its stdin payload. On miss, fall back to legacy${branch}filename for one release (read-only) so a status line started before the watcher is restarted still shows something. - Use
slot="${branch_key}_${sid8}"for state/PR cache reads.
- Resolve
/ciskill: passSID8as 4th argv to the watcher launch line.
What:
fetch_runs_for(): replacewithoutput=$(gh run list ... 2>&1) exit_code=$?
Underif ! output=$(gh run list ... 2>&1); then echo "Warning: ..." >&2 echo "[]"; return 0 fi
set -ean assigned-then-checked$?works only because the assignment is the last command, but it's brittle if anyone adds a pre-check log line. Inlineif !is robust.- Audit
detect_new_sha,get_sha_runs_for, the merged-path block, and everyRUNS_JSON=$(...)/pr_checks_json=$(...)for the same pattern. - Replace any
local var=$(cmd)(which masks$?) with declare-then-assign on two lines.
What:
- Read a real transcript file and document the actual JSONL event shapes
in a comment. Confirm the field names: is it
agentIdoragent_id? Is the completion delivered as<task-id>matching theagentId, or are these different namespaces? - Update
extract_completed_idsto match all terminal statuses by using a regex over the status field:re.search(r"<status>(completed|failed|cancelled|error|timeout|aborted)</status>", text). - If the launch records an
agentIdand the completion records a<task-id>that's actually the SAME id, keep set arithmetic. If they are different namespaces, look up the launch record'staskId/task_idinstead, and compare to that. - Add a debug log gated behind
CLAUDE_DEBUG_STOP=1env var that prints the launched/completed sets when they don't match.
What:
- Replace
with
while IFS= read ...; do fields+=("$line"); done < <(echo "$input" | jq ...) if [ $? -ne 0 ]; then ...
The currentif ! parsed=$(echo "$input" | jq -r '...' 2>/dev/null); then printf '%b' "${red}(status_line.sh: json parse error)${reset}" exit 0 fi IFS=$'\n' read -r -d '' -a fields <<< "$parsed" || true
$?check sees the exit status ofread(always 0 at EOF in the loop), notjq. The error path is effectively dead.
What:
- After
OLD_PID=$(cat "$LOCK_FILE")and the existence check, validate the process is actually our watcher before killing it:if kill -0 "$OLD_PID" 2>/dev/null \ && ps -p "$OLD_PID" -o args= 2>/dev/null | grep -q ci_watch_persistent; then kill "$OLD_PID" || true # Wait for it to actually exit, up to 10s, in 1s ticks (per CLAUDE.md # rule: no fixed sleeps, poll instead). for _ in 1 2 3 4 5 6 7 8 9 10; do kill -0 "$OLD_PID" 2>/dev/null || break sleep 1 done fi
- Replace the existing
sleep 1with the polling loop above.
What:
- After
detect_new_shaandSHA_RUNS=$(get_sha_runs_for ...), when length is 0 on the BRANCH path (currently silentcontinue):- Track
SHA_RUNS_EMPTY_COUNT. Increment when empty, reset to 0 whenever we get a non-empty result. - When count reaches
SHA_RUNS_EMPTY_MAX=24(≈2 minutes at 5s poll interval), fire a one-shot webhook:"⚠️ No CI runs visible for ${BRANCH} after 2 min — workflow may be missing."Set aREPORTED_NO_RUNSflag so we don't spam. - Reset
REPORTED_NO_RUNSandSHA_RUNS_EMPTY_COUNTwhenever a new SHA is detected (indetect_new_sha).
- Track
What:
- In
fetch_runs_for, thegh run listsuccess path can still yield a non-JSON string if the API returns 200 with HTML (rare, but possible during GitHub maintenance). Wrap the value through ajqvalidator:if ! echo "$output" | jq empty 2>/dev/null; then echo "[]" return 0 fi
- Same guard inside the merged-path branch on
MAIN_RUNS_JSON. - Add a single line right before any
jquse of$RUNS_JSON/$SHA_RUNS:[ -z "$RUNS_JSON" ] && RUNS_JSON='[]'(defensive, near-free).
What:
- Replace
with
printf '%s' "$PR_JSON" > "/tmp/ci_watch_pr_${SLOT}"
sotmp=$(mktemp "/tmp/ci_watch_pr_${SLOT}.XXXXXX") printf '%s' "$PR_JSON" > "$tmp" mv -f "$tmp" "/tmp/ci_watch_pr_${SLOT}"
status_line.shnever reads a half-written file. - Same pattern for
/tmp/ci_watch_state_${SLOT}(state writes happen in multiple paths — define a small helperwrite_state()).
What:
- Replace
with env-var passing:
ORG_NAME=$(python3 -c " import json, sys d = json.load(open('$CLAUDE_JSON')) ... ")
Heredoc withORG_NAME=$(CLAUDE_JSON="$CLAUDE_JSON" python3 - <<'PY' 2>/dev/null import json, os try: with open(os.environ["CLAUDE_JSON"]) as f: d = json.load(f) print(d.get("oauthAccount", {}).get("organizationName", "")) except Exception: print("") PY )
'PY'(quoted) prevents shell expansion; env-var prevents injection if$HOMEever contains a'.
What:
- Remove the dead
json_escape()function — its only caller was replaced byjq -Rs .on line 129, and the unconditionalprintffor the jq-missing path on line 40 still usesjson_escape. Either keep both paths consistent (usejqonly) by also gating the early-exit on jq's presence (already done — that path can simply emit a hardcoded fallback JSON string), or remove the function and inline a fallback. - Replace
grep -q "refs/heads/$branch"withgrep -qF "refs/heads/$branch"so a branch name containing regex meta-chars (rare but possible:feat-foo.bar) doesn't false-match. - The
git branch -vvparser[[ $line == \** ]] && continueskips lines starting with*, but git also prefixes the active branch with+in worktrees. Add[[ $line == +* ]] && continue(use a single[[ $line =~ ^[*+] ]] && continue). - Insert Task 0's session-id capture logic at the very top of the script, BEFORE the existing logic, so even early failures still register the session.
What:
- Add an mtime check before trusting the cached PR JSON:
pr_cache_file="/tmp/ci_watch_pr_${slot}" if [ -f "$pr_cache_file" ]; then now=$(date +%s) mtime=$(stat -f %m "$pr_cache_file" 2>/dev/null || stat -c %Y "$pr_cache_file" 2>/dev/null || echo 0) if [ $((now - mtime)) -gt 600 ]; then # Stale — pretend it doesn't exist. pr_cache_file="" fi fi
- Apply the same age check (with a shorter threshold, say 60s) to
/tmp/ci_watch_state_${slot}— a state file older than the watcher's 5-min health-retry window is definitely orphaned.
What:
- Change
to
afplay /System/Library/Sounds/Glass.aiff
so a slow audio system doesn't block the Stop hook (which Claude Code waits on synchronously).if command -v afplay >/dev/null 2>&1; then afplay /System/Library/Sounds/Glass.aiff </dev/null >/dev/null 2>&1 & disown fi
- Apply the same pattern in any other script that calls
afplaydirectly —grep -rn afplay ~/.claude/scripts/to verify.
- Task 0 is the prerequisite for Tasks 1, 2, and the
/ciskill change. - Tasks 1, 3–13 can be done in parallel after Task 0.
- Task 2 depends on Task 0 and Task 1 (needs
BRANCH_KEY).
- Cleanup of orphaned
/tmp/ci_watch_*files from killed sessions (documented gap; separate sweeper). - A central registry of live sessions + webhook ports.
- Restructuring
session_start.shto not race when two windows share a base repo.