Skip to content

CLI headless mode: internal state accumulation causes progressive latency degradation across sessions #2755

@pablocastilla

Description

@pablocastilla

Describe the bug

CLI headless mode: internal state accumulation causes progressive latency degradation across sessions

Summary

When running the Copilot CLI in headless mode (copilot --headless --port <port>) with BYOK (Azure OpenAI), response latency degrades progressively with each new session, even after properly disconnecting and deleting sessions via the SDK. The first request after a fresh CLI start completes in ~1–3s, but subsequent requests degrade to 17–30s. Only killing and restarting the CLI process restores performance.

Environment

  • CLI version: 1.0.28
  • SDK version (Python): 0.2.2 (container), 0.1.32 (host)
  • OS: Debian (python:3.13-slim Docker image), Linux amd64
  • Provider: Azure OpenAI (BYOK) — gpt-4o-mini deployment
  • Provider wire API: responses
  • Azure OpenAI API version: 2025-01-01-preview

Steps to Reproduce

  1. Start CLI in headless mode inside a container:

    copilot --headless --port 14321
  2. From an external process, connect via SDK, create a session, send a message, disconnect, delete the session:

    client = CopilotClient({"cli_url": "container-host:4321"})
    await client.start()
    
    session = await client.create_session(
        session_id="test-1",
        model="gpt-4o-mini",
        provider={"type": "azure", "base_url": "...", "api_key": "...", "azure": {"api_version": "..."}},
        system_message={"mode": "replace", "content": "Answer briefly."},
        streaming=True,
    )
    response = await session.send_and_wait("What is 2+2?")  # ~1-3s ✅
    await session.disconnect()
    
    # Cleanup
    sessions = await client.list_sessions()
    for s in sessions:
        await client.delete_session(s.session_id)
    await client.stop()
  3. From a new process (fresh Python interpreter, new CopilotClient instance), repeat step 2 with a different session_id:

    # New process, new client, new session_id
    response = await session.send_and_wait("What is 2+2?")  # ~17-30s ❌
  4. Each subsequent new-process request gets progressively slower, stabilizing at ~28-30s.

Expected Behavior

After session.disconnect() + client.delete_session(), the CLI should fully release all resources associated with that session. New sessions — whether from the same or a different SDK client process — should have consistent latency (~1-3s for a simple prompt when the LLM backend responds in <1s).

Actual Behavior

Request # Same CLI instance send_and_wait latency Notes
1 Fresh start ~1,000–3,000 ms ✅ Fast
2 Reused ~17,000 ms ❌ Degraded
3 Reused ~28,000–30,000 ms ❌ Severely degraded
4+ Reused ~28,000–30,000 ms ❌ Plateaus at ~29s
1 (after kill+restart) Fresh start ~1,000–3,000 ms ✅ Fast again

Key observations:

  • Azure OpenAI is NOT the bottleneck: Direct HTTP calls to the same Azure endpoint consistently return in ~900ms (verified with httpx, bypassing CLI/SDK entirely).
  • delete_session() does not fully clean up: After list_sessions() + delete_session() for all sessions, list_sessions() returns empty, BUT the CLI process still retains internal state that causes degradation.
  • Wiping session-state files doesn't help: Deleting /root/.copilot/session-state/* on disk while the CLI is running has no effect — the state is held in memory.
  • Only a process kill fixes it: killall copilot followed by a fresh copilot --headless --port <port> restores full performance immediately.
  • Pattern is consistent: Reproduced dozens of times across multiple hours. The degradation occurs even when each request uses a unique session_id and a completely fresh CopilotClient from a new OS process.

Workaround (current)

We run the CLI via an entrypoint script that auto-restarts it when killed. After each SDK request, the application kills the CLI process via docker exec killall copilot, and the entrypoint restarts it with clean state. This is fragile and adds ~3-5s of restart overhead per request.

# entrypoint.sh (simplified)
copilot --headless --port 14321 &
CLI_PID=$!
while true; do
    if ! kill -0 $CLI_PID 2>/dev/null; then
        rm -rf /root/.copilot/session-state/*
        copilot --headless --port 14321 &
        CLI_PID=$!
    fi
    sleep 1
done

Impact

This makes the CLI unsuitable for any multi-request server workload (web backends, API services, chatbots) without the kill-restart hack. In production, our chat feature degrades from sub-3s responses to 30s responses after just 2 user messages.

Possible Root Cause (speculation)

The CLI appears to retain conversation context or model state in memory across sessions even after delete_session(). This accumulated context may be sent with each new request to the LLM provider, causing the provider to process increasingly large payloads (explaining the progressive slowdown). The ~29s plateau could be the provider's timeout or max-context processing time.

Reproduction Script

Full reproduction script available: creates N sequential requests with fresh SDK clients, measures latency, and optionally kills CLI between requests to demonstrate the fix.

# Set environment variables:
# COPILOT_CLI_URL=localhost:4321
# AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
# AZURE_OPENAI_API_KEY=your-key
# AZURE_OPENAI_API_VERSION=2025-01-01-preview
# NUM_REQUESTS=3
# KILL_CLI=0  (set to 1 to kill CLI between requests — makes all fast)

python tools/test_sdk_perf.py

test_sdk_perf.py

Affected version

No response

Steps to reproduce the behavior

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:non-interactiveNon-interactive mode (-p), CI/CD, ACP protocol, and headless automationarea:sessionsSession management, resume, history, session picker, and session state

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions