Skip to content

Fix #278: bound the OpenAI-compatible ESC-cancel queue#310

Open
ericleepi314 wants to merge 1 commit into
fix/issue-277-mcp-abortfrom
fix/issue-278-bounded-cancel-queue
Open

Fix #278: bound the OpenAI-compatible ESC-cancel queue#310
ericleepi314 wants to merge 1 commit into
fix/issue-277-mcp-abortfrom
fix/issue-278-bounded-cancel-queue

Conversation

@ericleepi314

Copy link
Copy Markdown
Collaborator

Closes #278

Stacked on #308 (← #307#306#305#304). Merge in order; GitHub retargets automatically.

Summary

The ESC-cancel worker-thread pattern in OpenAICompatibleProvider.chat_stream_response used an unbounded queue.Queue — flagged as a follow-up in PR #148. A pathological proxy that keeps sending bytes after ESC (and never closes the SDK iterator) accumulated megabytes while the consumer had already raised AbortError and left.

  • Queue bounded at 64.
  • Worker-side drop: _put_or_drop_on_abort stops the worker from reading the stream at all once the abort trips — the queue drains, the consumer hits the Empty tick, and AbortError raises within ~100ms even against a firehose.
  • consumer_gone event (from critic review): without it, the bounded queue introduced a new failure mode the unbounded one didn't have — a consumer that unwinds for a non-abort reason (e.g. on_text_chunk raising) left the worker retrying a full queue forever, pinning the httpx connection open. The consumer scope sets the event via contextlib.ExitStack on every exit path; the worker bails within one 0.25s poll.
  • A genuine stream error that loses the race against ESC is logged at debug instead of vanishing.
  • Consumer loop deliberately unchanged: chunks received before the abort are still processed (pinned by test_in_loop_check_catches_abort_between_chunks); the worker-side drop is what guarantees nothing arrives after.

Test plan

  • New: firehose stream (ignores close(), yields forever) → AbortError <2s and the yield counter settles (worker stopped reading); consumer-crash (callback raises ValueError) → error propagates with class intact and the worker is released; 150-chunk backpressure happy path → exact content/order through the bounded queue
  • All 11 tests in the file + 71 provider-abort tests pass
  • Full suite on the stack: 7797 passed, 0 failed, 5 skipped
  • Critic review loop: APPROVE after 1 revision round (the consumer-crash livelock was found there, repro'd empirically)

🤖 Generated with Claude Code

…#278)

The worker-thread ESC-cancel pattern used an unbounded queue.Queue: a
pathological proxy that keeps sending bytes after ESC (and never
closes the SDK iterator) accumulated chunks without limit while the
consumer had already raised AbortError and left.

- chunk_queue bounded at 64 (post-abort staleness drains trivially;
  the producer gets slack against transient consumer pauses)
- the worker drops items and stops READING the stream once the abort
  trips or the consumer exits (consumer_gone, set via ExitStack on
  every consumer exit path) — without the latter, a consumer dying
  for a non-abort reason (on_text_chunk raising) would leave the
  worker retrying a full queue forever, pinning the httpx connection
- a genuine stream error losing the race against ESC is logged at
  debug instead of vanishing

Closes #278

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant