Skip to content

Session hangs ~8 minutes after compaction when prompt cache misses — no timeout or feedback #1614

@loganrosen

Description

@loganrosen

Describe the bug

After compaction runs mid-session, the next model API call can hang for ~8 minutes with no user feedback beyond "Thinking…". The CLI appears completely stuck — there's no way to tell if it's still working or frozen.

Root cause from logs: Compaction replaces the conversation history with a summary, which can invalidate the prompt cache. The next API call sends ~80K tokens with cache_read_tokens: 0, and the server takes an extremely long time to respond. There is no client-side timeout.

Log evidence

19:04:12.493Z  CompactionProcessor: Compaction complete - replaced 182 messages
               with summary + 10 new messages, saved ~87835 tokens
19:04:12.602Z  --- Start of group: Sending request to the AI model ---
19:04:12.624Z  Flushed 3 events to session ...
                 *** 8 MINUTES OF COMPLETE SILENCE ***
19:12:08.176Z  [response finally arrives]

Telemetry for the hung call:

{
  "input_tokens": 80135,
  "output_tokens": 88,
  "cache_read_tokens": 0,
  "duration": 475571
}

It's intermittent — cache hit vs miss

Across all session logs I checked, there were 5 compactions total. 4 got cache hits and completed fast. 1 got a cache miss and hung for 8 minutes:

Session Compaction # cache_read_tokens Duration
7312 1 72,529 ✅ 3.7s
7312 2 72,642 ✅ 3.4s
26123 1 72,666 ✅ 3.4s
26123 2 72,693 ✅ 4.3s
26123 3 0 ❌ 475s

The cache miss case is a 100x+ outlier. It was the 3rd compaction in the same session, so it's not just a first-compaction issue — looks like a server-side cache eviction that the client has no protection against.

Why this is a problem

  1. No timeout: An 8-minute model call is never aborted or retried.
  2. No progress feedback: "Thinking…" with a spinner for 8 minutes. No elapsed time shown, no indication of whether the server is responding.
  3. Users will Escape out: Without feedback, most users will assume it's frozen and cancel — losing the agentic turn that was in progress.

Suggested improvements

  • Add a client-side timeout for model API calls (e.g., 2-3 minutes) with automatic retry
  • Show elapsed time on the "Thinking…" indicator so users can make informed decisions
  • Investigate whether compaction output can be structured to preserve the prompt cache prefix

Affected version

0.0.414

Steps to reproduce

  1. Run a long session until compaction triggers multiple times (150+ turns)
  2. After compaction runs, observe the next model call
  3. If the prompt cache is not hit (intermittent — happened on 1 of 5 compactions in my logs), the response takes 5-10 minutes

Environment

  • macOS darwin arm64
  • Node v24.11.1
  • Model: claude-sonnet-4.6

Additional context

Distinct from #1036 (OAuth agent memory retry loops) and #888 (compaction losing context). This is specifically about the model API call itself hanging when the prompt cache misses after compaction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:context-memoryContext window, memory, compaction, checkpoints, and instruction loadingarea:sessionsSession management, resume, history, session picker, and session state

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions