|
| 1 | +# Research: Send() blocks indefinitely when ReadyForInitialPrompt returns false (#209) |
| 2 | + |
| 3 | +## Problem Context |
| 4 | + |
| 5 | +When `POST /message` with `type: "user"` is called, `Send()` hangs indefinitely if `ReadyForInitialPrompt` never returns `true` for the current screen content, even though `GET /status` returns `"stable"`. |
| 6 | + |
| 7 | +**Root cause chain:** |
| 8 | + |
| 9 | +1. `statusLocked()` does NOT check `initialPromptReady`. It returns `"stable"` when the screen is stable and the queue is empty. |
| 10 | +2. `Send()` checks `statusLocked() != ConversationStatusStable` — this passes, so the message is enqueued. |
| 11 | +3. The send loop blocks on `<-c.stableSignal`. |
| 12 | +4. The snapshot loop only fires `stableSignal` when `c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked()`. |
| 13 | +5. Since `initialPromptReady` is `false` and never becomes `true`, the signal never fires. |
| 14 | +6. `Send()` blocks forever on `<-errCh`. |
| 15 | + |
| 16 | +**Real-world trigger:** Claude Code v2.1.87 shows a theme selection onboarding screen using `╌╌╌` (U+254C, BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL) instead of `───` (U+2500, BOX DRAWINGS LIGHT HORIZONTAL). The message box detection fails because `findGreaterThanMessageBox` / `findGenericSlimMessageBox` look for `───────────────` which isn't present. `ReadyForInitialPrompt` stays `false`. |
| 17 | + |
| 18 | +## Code Analysis |
| 19 | + |
| 20 | +### `statusLocked()` — `lib/screentracker/pty_conversation.go:505-537` |
| 21 | + |
| 22 | +```go |
| 23 | +func (c *PTYConversation) statusLocked() ConversationStatus { |
| 24 | + // ...sanity checks... |
| 25 | + snapshots := c.snapshotBuffer.GetAll() |
| 26 | + if len(c.messages) > 0 && c.messages[len(c.messages)-1].Role == ConversationRoleUser { |
| 27 | + return ConversationStatusChanging |
| 28 | + } |
| 29 | + if len(snapshots) != c.stableSnapshotsThreshold { |
| 30 | + return ConversationStatusInitializing |
| 31 | + } |
| 32 | + if !c.isScreenStableLocked() { |
| 33 | + return ConversationStatusChanging |
| 34 | + } |
| 35 | + // Handle initial prompt readiness: report "changing" until the queue is drained |
| 36 | + if len(c.outboundQueue) > 0 || c.sendingMessage { |
| 37 | + return ConversationStatusChanging |
| 38 | + } |
| 39 | + return ConversationStatusStable |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +**Key observation:** `initialPromptReady` is never consulted. The status can be `"stable"` even when `initialPromptReady` is `false`. |
| 44 | + |
| 45 | +### `Send()` — `lib/screentracker/pty_conversation.go:358-378` |
| 46 | + |
| 47 | +```go |
| 48 | +func (c *PTYConversation) Send(messageParts ...MessagePart) error { |
| 49 | + // ...validation... |
| 50 | + c.lock.Lock() |
| 51 | + if c.statusLocked() != ConversationStatusStable { |
| 52 | + c.lock.Unlock() |
| 53 | + return ErrMessageValidationChanging |
| 54 | + } |
| 55 | + c.lock.Unlock() |
| 56 | + errCh := make(chan error, 1) |
| 57 | + c.outboundQueue <- outboundMessage{parts: messageParts, errCh: errCh} |
| 58 | + return <-errCh // blocks forever if stableSignal never fires |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +### Snapshot loop signal logic — `lib/screentracker/pty_conversation.go:229-236` |
| 63 | + |
| 64 | +```go |
| 65 | +if c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked() { |
| 66 | + select { |
| 67 | + case c.stableSignal <- struct{}{}: |
| 68 | + c.sendingMessage = true |
| 69 | + default: |
| 70 | + } |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +### `findGreaterThanMessageBox` — `lib/msgfmt/message_box.go:11-22` |
| 75 | + |
| 76 | +```go |
| 77 | +func findGreaterThanMessageBox(lines []string) int { |
| 78 | + for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- { |
| 79 | + if strings.Contains(lines[i], ">") { |
| 80 | + if i > 0 && strings.Contains(lines[i-1], "───────────────") { |
| 81 | + return i - 1 |
| 82 | + } |
| 83 | + return i |
| 84 | + } |
| 85 | + } |
| 86 | + return -1 |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +Only checks for `───────────────` (U+2500). Does not handle `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C). |
| 91 | + |
| 92 | +### `findGenericSlimMessageBox` — `lib/msgfmt/message_box.go:28-38` |
| 93 | + |
| 94 | +```go |
| 95 | +func findGenericSlimMessageBox(lines []string) int { |
| 96 | + for i := len(lines) - 3; i >= max(len(lines)-9, 0); i-- { |
| 97 | + if strings.Contains(lines[i], "───────────────") && |
| 98 | + (strings.Contains(lines[i+1], "|") || strings.Contains(lines[i+1], "│") || strings.Contains(lines[i+1], "❯")) && |
| 99 | + strings.Contains(lines[i+2], "───────────────") { |
| 100 | + return i |
| 101 | + } |
| 102 | + } |
| 103 | + return -1 |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +Same limitation — only checks for U+2500. |
| 108 | + |
| 109 | +### `isGenericAgentReadyForInitialPrompt` — `lib/msgfmt/agent_readiness.go:34-38` |
| 110 | + |
| 111 | +```go |
| 112 | +func isGenericAgentReadyForInitialPrompt(message string) bool { |
| 113 | + message = trimEmptyLines(message) |
| 114 | + messageWithoutInputBox := removeMessageBox(message) |
| 115 | + return len(messageWithoutInputBox) != len(message) |
| 116 | +} |
| 117 | +``` |
| 118 | + |
| 119 | +Returns `true` only if `removeMessageBox` actually removes something. If neither message box detector matches, the message is unchanged and readiness returns `false`. |
| 120 | + |
| 121 | +### Existing test: "agent not ready - status is stable" — `pty_conversation_test.go:~1082` |
| 122 | + |
| 123 | +The existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` already asserts that status is `stable` when the agent is not ready. This is the **current expected behavior** when there IS an initial prompt configured but the agent hasn't become ready yet. |
| 124 | + |
| 125 | +However, this test ALSO has `InitialPrompt` configured. The issue scenario is different: `Send()` is called by the user (not as initial prompt) while `initialPromptReady` is `false`. |
| 126 | + |
| 127 | +### Existing test: "no initial prompt - normal status logic applies" — `pty_conversation_test.go:~1160` |
| 128 | + |
| 129 | +When `ReadyForInitialPrompt` always returns `false` AND there is no `InitialPrompt` configured, status is currently `stable`. From a pure screen-stability perspective this is correct — the screen IS stable. |
| 130 | + |
| 131 | +However, `Send()` will still block in this state because `stableSignal` requires `initialPromptReady` to fire. This means status says `stable` but the system cannot actually process user messages — an inconsistency that is the root cause of the bug. |
| 132 | + |
| 133 | +### Existing test: "no initial prompt configured - normal status logic applies" — `pty_conversation_test.go:~1207` |
| 134 | + |
| 135 | +When `ReadyForInitialPrompt` is NOT set (nil → defaults to `return true`) and no `InitialPrompt` is configured, status correctly reaches `stable`. This test is UNAFFECTED by the fix because `initialPromptReady` becomes `true` on the first snapshot tick via the default function. |
| 136 | + |
| 137 | +## Approaches |
| 138 | + |
| 139 | +### Approach A: Guard `statusLocked()` with `initialPromptReady` check |
| 140 | + |
| 141 | +**Description:** When `initialPromptReady` is `false`, return `ConversationStatusChanging` (or a new status) from `statusLocked()`. This prevents `Send()` from enqueueing and returns `ErrMessageValidationChanging` immediately. |
| 142 | + |
| 143 | +**Precedent:** `statusLocked()` already returns `ConversationStatusChanging` when there are items in the outbound queue or a message is being sent. This follows the same pattern. |
| 144 | + |
| 145 | +**Strongest argument for:** Fail-fast. Any future detection failures fail immediately with a clear error instead of hanging. This is a general safety net. |
| 146 | + |
| 147 | +**Strongest argument against:** Changes the public status semantics. Currently, `statusLocked()` reports on screen state. Adding `initialPromptReady` couples it to agent detection. Also, callers currently expect `"stable"` to mean "screen is stable" — now it would also mean "agent detection succeeded". This could break the existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` which explicitly asserts status is `stable` when readiness is `false`. |
| 148 | + |
| 149 | +**Consideration:** The `stableSignal` only gates the signal when `initialPromptReady` is false. But this is orthogonal to the **user-initiated** `Send()` path. The initial prompt path and the user-message path both go through the same queue and same signal. The real issue is that `initialPromptReady` gates the signal for ALL queued messages, not just the initial prompt. |
| 150 | + |
| 151 | +**Nuance:** We need to be careful about when `ReadyForInitialPrompt` is `nil` (defaults to `func(string) bool { return true }`). When there's no readiness function, `initialPromptReady` becomes `true` on the first snapshot tick. This won't cause regressions. |
| 152 | + |
| 153 | +### Approach B: Decouple `stableSignal` from `initialPromptReady` for user-sent messages |
| 154 | + |
| 155 | +**Description:** Only gate the `stableSignal` on `initialPromptReady` for the initial prompt. For user-sent messages, fire the signal based purely on screen stability. This could be done by tracking whether the queued message is the initial prompt or a user message. |
| 156 | + |
| 157 | +**Strongest argument for:** Precisely targets the bug without changing status semantics. The initial prompt legitimately needs readiness gating; user messages do not. |
| 158 | + |
| 159 | +**Strongest argument against:** Adds complexity to the queue/signal mechanism. The `outboundQueue` currently treats all messages uniformly. Adding message-type awareness complicates the design. Also, if the agent truly isn't ready, sending a user message to it may not work correctly anyway. |
| 160 | + |
| 161 | +**What this makes easy:** Preserves existing status semantics and test assertions. |
| 162 | +**What this makes hard:** Complicating the send path and potentially allowing messages to be sent to an unready agent. |
| 163 | + |
| 164 | +### Approach C: Improve message box detection to handle `╌` (U+254C) |
| 165 | + |
| 166 | +**Description:** Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` as an alternative pattern in `findGreaterThanMessageBox` and `findGenericSlimMessageBox`. |
| 167 | + |
| 168 | +**Strongest argument for:** Fixes the specific real-world trigger. The onboarding screen for Claude Code v2.1.87 uses this character. |
| 169 | + |
| 170 | +**Strongest argument against:** Only fixes this one variant. Future Claude Code versions might use yet another character. Does not prevent the indefinite hang for other detection failures. |
| 171 | + |
| 172 | +**What this makes easy:** Simple, targeted fix. |
| 173 | +**What this makes hard:** Doesn't address the systemic issue. |
| 174 | + |
| 175 | +### Approach D: Both A and C (recommended by the issue) |
| 176 | + |
| 177 | +**Description:** Fix the detection for this specific Claude Code version (C) AND add the `statusLocked()` guard (A) so future detection failures fail fast. |
| 178 | + |
| 179 | +**Strongest argument for:** Belt and suspenders. Fixes the immediate problem and prevents the class of bugs. |
| 180 | + |
| 181 | +**Strongest argument against:** Status semantics change (same as A). However, the issue author explicitly recommends this. |
| 182 | + |
| 183 | +## Decisions |
| 184 | + |
| 185 | +### Decision 1: Guard `statusLocked()` with `initialPromptReady` AND fix detection |
| 186 | +- **Question:** How to prevent `Send()` from hanging when readiness detection fails? |
| 187 | +- **Options:** (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C |
| 188 | +- **Chosen:** D — both guard and detection |
| 189 | +- **Classification:** Agent-recommended (issue author also recommends option D) |
| 190 | +- **Reasoning:** The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this. The detection fix handles the immediate trigger; the guard prevents the class of bugs. |
| 191 | + |
| 192 | +### Decision 2: Return `ConversationStatusChanging` when `initialPromptReady` is false |
| 193 | +- **Question:** What status to return when readiness is false? |
| 194 | +- **Options:** `changing` vs `initializing` |
| 195 | +- **Chosen:** `changing` |
| 196 | +- **Classification:** Agent-recommended |
| 197 | +- **Reasoning:** The snapshot buffer IS full (past the `initializing` phase). `changing` matches the error `ErrMessageValidationChanging`. |
| 198 | + |
| 199 | +### Decision 3: Apply the guard unconditionally (not only when `InitialPrompt` is configured) |
| 200 | +- **Question:** Should the `initialPromptReady` guard only apply when `InitialPrompt` is configured? (Open Question 2) |
| 201 | +- **Options:** Conditional (only when InitialPrompt set) vs unconditional |
| 202 | +- **Chosen:** Unconditional |
| 203 | +- **Classification:** Agent-recommended |
| 204 | +- **Reasoning:** The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could transition to `stable`. So the unconditional guard causes no regressions for the default case. |
| 205 | + |
| 206 | +### Decision 4: Add U+254C only to detection |
| 207 | +- **Question:** Which additional Unicode box-drawing characters to support? |
| 208 | +- **Options:** Just U+254C, also U+254D, broad set |
| 209 | +- **Chosen:** U+254C only |
| 210 | +- **Classification:** Agent-recommended |
| 211 | +- **Reasoning:** This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters. |
| 212 | + |
| 213 | +## Open Questions |
| 214 | + |
| 215 | +All open questions have been resolved — see Decisions section above. |
0 commit comments