Skip to content

fix(session): empty Claude command no-ops restart, leaving pane dead#855

Open
JMBattista wants to merge 2 commits intoasheshgoplani:mainfrom
JMBattista:feature/sessions-dispear-on-restart
Open

fix(session): empty Claude command no-ops restart, leaving pane dead#855
JMBattista wants to merge 2 commits intoasheshgoplani:mainfrom
JMBattista:feature/sessions-dispear-on-restart

Conversation

@JMBattista
Copy link
Copy Markdown
Contributor

Summary

Fixes a hard-to-spot restart loop on Claude-tool sessions whose stored Command is empty.

When an Instance row has tool=claude, command="", and tool_data="{}" (no ClaudeSessionID), Restart() calls buildClaudeCommand(i.Command)buildClaudeCommandWithMessage("", ""). The function gates its claude-build branch on baseCommand == "claude"; an empty value falls through to the custom-command branch and returns just the env-export prefix with no executable. The pane runs that, exits immediately, status loops back to error, and pressing R/Enter repeats forever.

Live debug.log signature from the user-reported repro:

export COLORFGBG='15;0' && unset TELEGRAM_STATE_DIR && export AGENTDECK_INSTANCE_ID=9e618f9f-1773801500;

Zero claude invocations.

Fix

internal/session/instance.go — in buildClaudeCommandWithMessage, after the IsClaudeCompatible gate, default an empty baseCommand to "claude" so the Claude-build branch runs. 3 lines.

Tests added (TDD-first per CLAUDE.md)

  • TestBuildClaudeCommand_EmptyBaseCommand_StillEmitsClaudeBinary — pins that buildClaudeCommand("") for tool=claude contains claude and --session-id, and does not end at the AGENTDECK_INSTANCE_ID export. RED before the fix, GREEN after.
  • TestRestart_ShellSession_PostRestartIsHealthy + TestRestart_ShellSession_AdoptsLiveTmuxOnNameMismatch — guard the broader Shell-restart contract (status not-error, single live tmux session, no orphan, sweep-by-instance-id heals stale name).
  • TestRestartMsg_DoesNotPrunePeerErrorSessions_Unfiltered + _FilteredToErrors — guard that a successful sessionRestartedMsg does not prune peer error rows from flatItems (the user-reported intermittent symptom; today green, codified to catch regressions).

Verification

  • go test -run TestPersistence_ ./internal/session/... -race -count=1 — green
  • go test ./internal/session/... ./internal/ui/... -race -count=1 — green (modulo two pre-existing flakes called out below)
  • Manually reproduced against the patched binary on the user's busted Smithy row: pressing R now spawns a real Claude process, status flips out of error, peers remain visible.

Out of scope

Two pre-existing tests (TestStatusCycle_ShellSessionWithCommand, TestLifecycle_StoppedRestartedRunningError) were silently skipping before this branch due to skipIfNoTmuxServer requiring a non-bootstrap session. They now run and are flaky under concurrent tmux load. Not caused by this fix's logic — recommend a follow-up PR.

Test plan

  • Mandated session-persistence suite green
  • New regression tests RED → GREEN
  • Manual: R on a busted tool=claude, command="" row launches Claude and clears error
  • Manual: peer error sessions remain visible post-restart
  • Reviewer: run bash scripts/verify-session-persistence.sh on a Linux+systemd host before merge

Pins the contracts the user noticed when filing
feature/sessions-dispear-on-restart:

  TestRestart_ShellSession_PostRestartIsHealthy — basic R on a Shell
    session yields a non-error status and exactly one live tmux session
    (no orphans).

  TestRestart_ShellSession_AdoptsLiveTmuxOnNameMismatch — when the
    in-memory tmuxSession.Name no longer matches any live tmux session
    but a session of the same prefix is alive, Restart() must heal
    cleanly (current behavior: sweepDuplicateToolSessions reaps the
    orphan via AGENTDECK_INSTANCE_ID; this test pins that).

  TestRestartMsg_DoesNotPrunePeerErrorSessions_Unfiltered /
  TestRestartMsg_DoesNotPrunePeerErrorSessions_FilteredToErrors —
    sessionRestartedMsg success path must not remove peer sessions
    (including other error-state ones) from h.flatItems.

All four pass against the current code; they exist as regression guards
so a future change that breaks any of these contracts is caught.
…ually runs the binary

Repro: an Instance row with tool=claude, Command="", and no
ClaudeSessionID (e.g. a session whose tool_data lost its session
fingerprint) hits the Restart() fallback path which selects
buildClaudeCommand(i.Command) → buildClaudeCommandWithMessage("", "").
The latter only enters its claude-build branch when baseCommand ==
"claude" and otherwise falls through to the custom-command branch,
returning JUST the env prefix:

  export ... && unset TELEGRAM_STATE_DIR && export
  AGENTDECK_INSTANCE_ID=<id>;

The pane runs that, has nothing executable to launch, exits, status
flips back to error. Pressing R or Enter loops indefinitely with
no surfaced error — captured live from a user's debug.log on
2026-04-27 (feature/sessions-dispear-on-restart).

Fix: when the tool is Claude-compatible and baseCommand is empty,
default it to "claude" so the existing claude-build branch runs
and emits a real `claude --session-id <uuid>` invocation.

Pinned by TestBuildClaudeCommand_EmptyBaseCommand_StillEmitsClaudeBinary
in the same package, which was RED before this change.
Copilot AI review requested due to automatic review settings May 4, 2026 03:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a restart loop where Claude-compatible sessions with an empty stored Command generate an env-only shell line (no executable), causing the tmux pane to exit immediately and the session to remain “dead” on repeated restarts.

Changes:

  • Default empty Claude baseCommand to "claude" inside buildClaudeCommandWithMessage so Claude’s command-building path always emits a runnable binary invocation.
  • Add a regression test to pin buildClaudeCommand("") behavior for Claude-compatible sessions.
  • Add restart regression tests for shell sessions and UI regression tests to ensure peer sessions aren’t pruned from the TUI list after a successful restart message.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
internal/session/instance.go Ensures empty Claude base command defaults to "claude" to avoid env-only no-op restart commands.
internal/session/empty_command_claude_restart_test.go Adds regression coverage for the empty-command Claude restart-loop failure mode.
internal/session/shell_restart_test.go Adds tmux-backed integration tests for shell Restart() health and “adopt live session” behavior.
internal/ui/sessions_disappear_on_restart_test.go Adds UI-level regression tests asserting peer sessions remain visible after sessionRestartedMsg.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// never assigned an explicit Command) otherwise falls all the way
// through to the custom-command branch and returns just the env
// prefix — pane runs `export ...;` and exits, status loops to error.
// See feature/sessions-dispear-on-restart, Smithy repro 2026-04-27.
Comment on lines +28 to +42
home := NewHome()
home.storage = nil // prevent disk writes during saveInstances
home.statusFilter = ""

smithy := newShellInstanceForRestartTest("smithy-id", "Smithy", session.StatusError)
foo := newShellInstanceForRestartTest("foo-id", "Foo", session.StatusError)
bar := newShellInstanceForRestartTest("bar-id", "Bar", session.StatusRunning)

home.instances = []*session.Instance{smithy, foo, bar}
home.instanceByID = map[string]*session.Instance{
smithy.ID: smithy,
foo.ID: foo,
bar.ID: bar,
}
home.groupTree = session.NewGroupTree(home.instances)
Comment on lines +19 to +21
// All tests run pure in-memory: no SQLite, no tmux, no PTY. h.storage is
// nilled out so saveInstances() is a no-op and we can drive the message
// handler directly.
)

// Tests in this file are regression guards for the bug that owns this branch
// (feature/sessions-dispear-on-restart). User report: pressing R on a busted
)

// Tests in this file pin the regression that owns this branch
// (feature/sessions-dispear-on-restart). User report: pressing R on a
}

matches := listTmuxSessionsWithPrefix(tmux.SessionPrefix + sanitizeTitleForPrefix(title) + "_")
_ = liveName

// TestBuildClaudeCommand_EmptyBaseCommand_StillEmitsClaudeBinary pins the
// regression behind the user-reported "R/Enter loop forever" symptom on
// feature/sessions-dispear-on-restart. Repro state, captured live from
Comment on lines +593 to +600
// Default empty baseCommand to "claude" so the Claude-build branch below
// runs. An Instance row with tool=claude and an empty Command field
// (e.g. a session whose tool_data lost its ClaudeSessionID and was
// never assigned an explicit Command) otherwise falls all the way
// through to the custom-command branch and returns just the env
// prefix — pane runs `export ...;` and exits, status loops to error.
// See feature/sessions-dispear-on-restart, Smithy repro 2026-04-27.
if baseCommand == "" {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants