Skip to content

Commit e80b116

Browse files
committed
docs: add research and plan for issue #209
1 parent 761b7d5 commit e80b116

2 files changed

Lines changed: 269 additions & 0 deletions

File tree

docs/plans/issue-209-research.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# Research: Send() blocks indefinitely when ReadyForInitialPrompt returns false (#209)
2+
3+
## Problem Context
4+
5+
When `POST /message` with `type: "user"` is called, `Send()` hangs indefinitely if `ReadyForInitialPrompt` never returns `true` for the current screen content, even though `GET /status` returns `"stable"`.
6+
7+
**Root cause chain:**
8+
9+
1. `statusLocked()` does NOT check `initialPromptReady`. It returns `"stable"` when the screen is stable and the queue is empty.
10+
2. `Send()` checks `statusLocked() != ConversationStatusStable` — this passes, so the message is enqueued.
11+
3. The send loop blocks on `<-c.stableSignal`.
12+
4. The snapshot loop only fires `stableSignal` when `c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked()`.
13+
5. Since `initialPromptReady` is `false` and never becomes `true`, the signal never fires.
14+
6. `Send()` blocks forever on `<-errCh`.
15+
16+
**Real-world trigger:** Claude Code v2.1.87 shows a theme selection onboarding screen using `╌╌╌` (U+254C, BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL) instead of `───` (U+2500, BOX DRAWINGS LIGHT HORIZONTAL). The message box detection fails because `findGreaterThanMessageBox` / `findGenericSlimMessageBox` look for `───────────────` which isn't present. `ReadyForInitialPrompt` stays `false`.
17+
18+
## Code Analysis
19+
20+
### `statusLocked()``lib/screentracker/pty_conversation.go:505-537`
21+
22+
```go
23+
func (c *PTYConversation) statusLocked() ConversationStatus {
24+
// ...sanity checks...
25+
snapshots := c.snapshotBuffer.GetAll()
26+
if len(c.messages) > 0 && c.messages[len(c.messages)-1].Role == ConversationRoleUser {
27+
return ConversationStatusChanging
28+
}
29+
if len(snapshots) != c.stableSnapshotsThreshold {
30+
return ConversationStatusInitializing
31+
}
32+
if !c.isScreenStableLocked() {
33+
return ConversationStatusChanging
34+
}
35+
// Handle initial prompt readiness: report "changing" until the queue is drained
36+
if len(c.outboundQueue) > 0 || c.sendingMessage {
37+
return ConversationStatusChanging
38+
}
39+
return ConversationStatusStable
40+
}
41+
```
42+
43+
**Key observation:** `initialPromptReady` is never consulted. The status can be `"stable"` even when `initialPromptReady` is `false`.
44+
45+
### `Send()``lib/screentracker/pty_conversation.go:358-378`
46+
47+
```go
48+
func (c *PTYConversation) Send(messageParts ...MessagePart) error {
49+
// ...validation...
50+
c.lock.Lock()
51+
if c.statusLocked() != ConversationStatusStable {
52+
c.lock.Unlock()
53+
return ErrMessageValidationChanging
54+
}
55+
c.lock.Unlock()
56+
errCh := make(chan error, 1)
57+
c.outboundQueue <- outboundMessage{parts: messageParts, errCh: errCh}
58+
return <-errCh // blocks forever if stableSignal never fires
59+
}
60+
```
61+
62+
### Snapshot loop signal logic — `lib/screentracker/pty_conversation.go:229-236`
63+
64+
```go
65+
if c.initialPromptReady && len(c.outboundQueue) > 0 && c.isScreenStableLocked() {
66+
select {
67+
case c.stableSignal <- struct{}{}:
68+
c.sendingMessage = true
69+
default:
70+
}
71+
}
72+
```
73+
74+
### `findGreaterThanMessageBox``lib/msgfmt/message_box.go:11-22`
75+
76+
```go
77+
func findGreaterThanMessageBox(lines []string) int {
78+
for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- {
79+
if strings.Contains(lines[i], ">") {
80+
if i > 0 && strings.Contains(lines[i-1], "───────────────") {
81+
return i - 1
82+
}
83+
return i
84+
}
85+
}
86+
return -1
87+
}
88+
```
89+
90+
Only checks for `───────────────` (U+2500). Does not handle `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C).
91+
92+
### `findGenericSlimMessageBox``lib/msgfmt/message_box.go:28-38`
93+
94+
```go
95+
func findGenericSlimMessageBox(lines []string) int {
96+
for i := len(lines) - 3; i >= max(len(lines)-9, 0); i-- {
97+
if strings.Contains(lines[i], "───────────────") &&
98+
(strings.Contains(lines[i+1], "|") || strings.Contains(lines[i+1], "") || strings.Contains(lines[i+1], "")) &&
99+
strings.Contains(lines[i+2], "───────────────") {
100+
return i
101+
}
102+
}
103+
return -1
104+
}
105+
```
106+
107+
Same limitation — only checks for U+2500.
108+
109+
### `isGenericAgentReadyForInitialPrompt``lib/msgfmt/agent_readiness.go:34-38`
110+
111+
```go
112+
func isGenericAgentReadyForInitialPrompt(message string) bool {
113+
message = trimEmptyLines(message)
114+
messageWithoutInputBox := removeMessageBox(message)
115+
return len(messageWithoutInputBox) != len(message)
116+
}
117+
```
118+
119+
Returns `true` only if `removeMessageBox` actually removes something. If neither message box detector matches, the message is unchanged and readiness returns `false`.
120+
121+
### Existing test: "agent not ready - status is stable" — `pty_conversation_test.go:~1082`
122+
123+
The existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` already asserts that status is `stable` when the agent is not ready. This is the **current expected behavior** when there IS an initial prompt configured but the agent hasn't become ready yet.
124+
125+
However, this test ALSO has `InitialPrompt` configured. The issue scenario is different: `Send()` is called by the user (not as initial prompt) while `initialPromptReady` is `false`.
126+
127+
### Existing test: "no initial prompt - normal status logic applies" — `pty_conversation_test.go:~1160`
128+
129+
When `ReadyForInitialPrompt` always returns `false` AND there is no `InitialPrompt` configured, status is currently `stable`. From a pure screen-stability perspective this is correct — the screen IS stable.
130+
131+
However, `Send()` will still block in this state because `stableSignal` requires `initialPromptReady` to fire. This means status says `stable` but the system cannot actually process user messages — an inconsistency that is the root cause of the bug.
132+
133+
### Existing test: "no initial prompt configured - normal status logic applies" — `pty_conversation_test.go:~1207`
134+
135+
When `ReadyForInitialPrompt` is NOT set (nil → defaults to `return true`) and no `InitialPrompt` is configured, status correctly reaches `stable`. This test is UNAFFECTED by the fix because `initialPromptReady` becomes `true` on the first snapshot tick via the default function.
136+
137+
## Approaches
138+
139+
### Approach A: Guard `statusLocked()` with `initialPromptReady` check
140+
141+
**Description:** When `initialPromptReady` is `false`, return `ConversationStatusChanging` (or a new status) from `statusLocked()`. This prevents `Send()` from enqueueing and returns `ErrMessageValidationChanging` immediately.
142+
143+
**Precedent:** `statusLocked()` already returns `ConversationStatusChanging` when there are items in the outbound queue or a message is being sent. This follows the same pattern.
144+
145+
**Strongest argument for:** Fail-fast. Any future detection failures fail immediately with a clear error instead of hanging. This is a general safety net.
146+
147+
**Strongest argument against:** Changes the public status semantics. Currently, `statusLocked()` reports on screen state. Adding `initialPromptReady` couples it to agent detection. Also, callers currently expect `"stable"` to mean "screen is stable" — now it would also mean "agent detection succeeded". This could break the existing test `TestInitialPromptReadiness/"agent not ready - status is stable until agent becomes ready"` which explicitly asserts status is `stable` when readiness is `false`.
148+
149+
**Consideration:** The `stableSignal` only gates the signal when `initialPromptReady` is false. But this is orthogonal to the **user-initiated** `Send()` path. The initial prompt path and the user-message path both go through the same queue and same signal. The real issue is that `initialPromptReady` gates the signal for ALL queued messages, not just the initial prompt.
150+
151+
**Nuance:** We need to be careful about when `ReadyForInitialPrompt` is `nil` (defaults to `func(string) bool { return true }`). When there's no readiness function, `initialPromptReady` becomes `true` on the first snapshot tick. This won't cause regressions.
152+
153+
### Approach B: Decouple `stableSignal` from `initialPromptReady` for user-sent messages
154+
155+
**Description:** Only gate the `stableSignal` on `initialPromptReady` for the initial prompt. For user-sent messages, fire the signal based purely on screen stability. This could be done by tracking whether the queued message is the initial prompt or a user message.
156+
157+
**Strongest argument for:** Precisely targets the bug without changing status semantics. The initial prompt legitimately needs readiness gating; user messages do not.
158+
159+
**Strongest argument against:** Adds complexity to the queue/signal mechanism. The `outboundQueue` currently treats all messages uniformly. Adding message-type awareness complicates the design. Also, if the agent truly isn't ready, sending a user message to it may not work correctly anyway.
160+
161+
**What this makes easy:** Preserves existing status semantics and test assertions.
162+
**What this makes hard:** Complicating the send path and potentially allowing messages to be sent to an unready agent.
163+
164+
### Approach C: Improve message box detection to handle `` (U+254C)
165+
166+
**Description:** Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` as an alternative pattern in `findGreaterThanMessageBox` and `findGenericSlimMessageBox`.
167+
168+
**Strongest argument for:** Fixes the specific real-world trigger. The onboarding screen for Claude Code v2.1.87 uses this character.
169+
170+
**Strongest argument against:** Only fixes this one variant. Future Claude Code versions might use yet another character. Does not prevent the indefinite hang for other detection failures.
171+
172+
**What this makes easy:** Simple, targeted fix.
173+
**What this makes hard:** Doesn't address the systemic issue.
174+
175+
### Approach D: Both A and C (recommended by the issue)
176+
177+
**Description:** Fix the detection for this specific Claude Code version (C) AND add the `statusLocked()` guard (A) so future detection failures fail fast.
178+
179+
**Strongest argument for:** Belt and suspenders. Fixes the immediate problem and prevents the class of bugs.
180+
181+
**Strongest argument against:** Status semantics change (same as A). However, the issue author explicitly recommends this.
182+
183+
## Decisions
184+
185+
### Decision 1: Guard `statusLocked()` with `initialPromptReady` AND fix detection
186+
- **Question:** How to prevent `Send()` from hanging when readiness detection fails?
187+
- **Options:** (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C
188+
- **Chosen:** D — both guard and detection
189+
- **Classification:** Agent-recommended (issue author also recommends option D)
190+
- **Reasoning:** The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this. The detection fix handles the immediate trigger; the guard prevents the class of bugs.
191+
192+
### Decision 2: Return `ConversationStatusChanging` when `initialPromptReady` is false
193+
- **Question:** What status to return when readiness is false?
194+
- **Options:** `changing` vs `initializing`
195+
- **Chosen:** `changing`
196+
- **Classification:** Agent-recommended
197+
- **Reasoning:** The snapshot buffer IS full (past the `initializing` phase). `changing` matches the error `ErrMessageValidationChanging`.
198+
199+
### Decision 3: Apply the guard unconditionally (not only when `InitialPrompt` is configured)
200+
- **Question:** Should the `initialPromptReady` guard only apply when `InitialPrompt` is configured? (Open Question 2)
201+
- **Options:** Conditional (only when InitialPrompt set) vs unconditional
202+
- **Chosen:** Unconditional
203+
- **Classification:** Agent-recommended
204+
- **Reasoning:** The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could transition to `stable`. So the unconditional guard causes no regressions for the default case.
205+
206+
### Decision 4: Add U+254C only to detection
207+
- **Question:** Which additional Unicode box-drawing characters to support?
208+
- **Options:** Just U+254C, also U+254D, broad set
209+
- **Chosen:** U+254C only
210+
- **Classification:** Agent-recommended
211+
- **Reasoning:** This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters.
212+
213+
## Open Questions
214+
215+
All open questions have been resolved — see Decisions section above.

docs/plans/issue-209.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Plan: Fix Send() blocking indefinitely when ReadyForInitialPrompt returns false (#209)
2+
3+
Research: [issue-209-research.md](./issue-209-research.md)
4+
5+
Stacked on: [PR #208](https://github.com/coder/agentapi/pull/208) (`fix/write-stabilize-non-fatal-phase1`)
6+
7+
## Problem
8+
9+
`Send()` hangs indefinitely when `ReadyForInitialPrompt` never returns `true`, even though `Status()` reports `"stable"`. This is because `statusLocked()` doesn't check `initialPromptReady`, allowing `Send()` to enqueue messages, but the `stableSignal` (which the send loop waits on) requires `initialPromptReady == true`. If readiness never arrives, the signal never fires and `Send()` blocks forever. (See research: Problem Context)
10+
11+
The real-world trigger is Claude Code v2.1.87's onboarding screen using `` (U+254C) instead of `` (U+2500) in its box-drawing characters, causing message box detection to fail. (See research: Code Analysis — findGreaterThanMessageBox)
12+
13+
## Decisions
14+
15+
1. **Guard `statusLocked()` with `initialPromptReady` check AND fix detection.**
16+
- Options: (A) guard statusLocked only, (B) decouple stableSignal, (C) fix detection only, (D) both A+C.
17+
- Chosen: D — both guard and detection.
18+
- Classification: Agent-recommended (issue author also recommends option D).
19+
- Reasoning: The `stableSignal` gates ALL outbound messages on `initialPromptReady`. `statusLocked()` must reflect this — otherwise status says "stable" but the system cannot process messages. The detection fix handles the immediate trigger; the guard prevents the class of bugs.
20+
21+
2. **Return `ConversationStatusChanging` (not `Initializing`) when `initialPromptReady` is false.**
22+
- Options: `changing` vs `initializing`.
23+
- Chosen: `changing`.
24+
- Classification: Agent-recommended.
25+
- Reasoning: The snapshot buffer IS full (past the `initializing` phase). The error `ErrMessageValidationChanging` says "message can only be sent when the agent is waiting for user input" — which is semantically correct when readiness hasn't been detected.
26+
27+
3. **Apply the guard unconditionally (not only when `InitialPrompt` is configured).**
28+
- Options: Conditional (only when InitialPrompt set) vs unconditional.
29+
- Chosen: Unconditional.
30+
- Classification: Agent-recommended.
31+
- Reasoning: The `stableSignal` gates on `initialPromptReady` for ALL queued messages, not just the initial prompt. If `initialPromptReady` is false and a user calls `Send()`, the message hangs regardless of whether `InitialPrompt` is configured. Status must reflect actual send capability. When `ReadyForInitialPrompt` is nil (the default), it auto-returns `true` and `initialPromptReady` becomes `true` on the first snapshot tick — before status could ever transition to `stable`. So the unconditional guard causes no regressions for the default case. (See research: Approach A — Nuance)
32+
33+
4. **Add `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` (U+254C repeated) as an alternative pattern in message box detection.**
34+
- Options: just U+254C, also U+254D, broad set of horizontal box-drawing characters.
35+
- Chosen: U+254C only.
36+
- Classification: Agent-recommended.
37+
- Reasoning: This is the specific character seen in the wild. The `statusLocked()` guard provides the safety net for future unknown characters. Adding a broad set risks false positives.
38+
39+
5. **Update existing tests that assert `stable` when `initialPromptReady` is false.**
40+
- The tests `"agent not ready - status is stable until agent becomes ready"` and `"no initial prompt - normal status logic applies"` currently assert `stable` when readiness is false. The research notes the second test's assertion was correct from a pure screen-stability perspective, but is inconsistent with `Send()` behavior: `stableSignal` gates on `initialPromptReady` for ALL messages, so `Send()` would hang despite `stable` status. The status must reflect actual send capability.
41+
- The test `"no initial prompt configured - normal status logic applies"` is **unaffected** because it doesn't set `ReadyForInitialPrompt`, so the default (`return true`) applies and `initialPromptReady` becomes `true` before stability is reached.
42+
- Classification: Agent-recommended.
43+
44+
## Implementation Flow
45+
46+
1. **Message box detection** — Extend both message box detection functions to also match `╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌` patterns. Add a testdata fixture for the Claude onboarding screen with `` characters. Run readiness tests to verify.
47+
48+
2. **Status guard** — Add a check in the status logic: when `initialPromptReady` is false and the screen is otherwise stable, return `changing` instead of `stable`. This prevents `Send()` from enqueueing messages that can never be processed.
49+
50+
3. **Update existing tests** — Fix the two tests that assert `stable` when readiness is false. Update them to expect `changing`. Confirm the third "no initial prompt configured" test is unaffected.
51+
52+
4. **Add reproducing test** — Add a test that demonstrates `Send()` returns an error instead of hanging when `initialPromptReady` is false.
53+
54+
5. **Run full test suite** — Verify no regressions.

0 commit comments

Comments
 (0)