Skip to content

Issue: Foreground subagents exhaust single API key rate limit, causing 429 errors and hangs #2368

@Liewzheng

Description

@Liewzheng

What feature would you like to see?

Problem Description

When running multiple foreground subagents concurrently (e.g., 3–4 coder or explore subagents working on independent tasks), all subagents share the same API key as the root runtime. This leads to severe rate-limit contention:

  1. Rate limit exhaustion: A single KIMI_API_KEY has finite TPM/RPM quotas. With 3–4 subagents each making multi-step LLM calls, the quota is consumed almost instantly.
  2. 429 errors and retries: Subsequent requests hit 429 Too Many Requests. Subagents either retry (wasting tokens) or hang waiting for quota recovery.
  3. Poor user experience: From the user's perspective, subagents that should complete in seconds instead take minutes or fail silently. The shell UI shows subagents as "running" with no visible progress.
  4. No backend attribution: All requests appear in the Kimi console as KimiCLI/1.44.0 with no way to distinguish root agent calls from subagent calls, making it impossible to diagnose which subagent is consuming quota.

Reproduction Steps

  1. Configure a single API key via /login or KIMI_API_KEY.
  2. Launch 3+ foreground subagents concurrently:
    /coder "Analyze app.py"
    /coder "Review key pool design"
    /coder "Check test coverage"
    
  3. Observe that:
    • Subagent response latency increases dramatically after the first few LLM calls
    • 429 errors appear in logs (if debug mode is enabled)
    • Subagents may exceed the default timeout and get killed

Expected Behavior

  • Each concurrent subagent should use a distinct API key when multiple keys are available
  • The system should enforce a concurrency limit based on available key count to avoid exhausting all keys
  • Subagent requests should carry a discernible User-Agent so backend monitoring can attribute quota consumption correctly

Environment

  • Kimi CLI version: 1.44.0
  • OS: macOS / Linux
  • Python: 3.14
  • Provider: kimi (Kimi Code platform)

Additional Context

  • The root agent itself also consumes the same key for compaction, user replies, etc. With subagents added, the contention becomes even worse.
  • There is currently no concurrency limit for foreground subagents beyond the hardcoded background task limit, meaning a user can accidentally spawn unlimited subagents and DDoS their own API key.
  • The timeout description in the Agent tool schema claims "Foreground: no default timeout (runs until completion)", which means a hung subagent (due to rate-limit backoff) will never be killed.

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions