What feature would you like to see?
Problem Description
When running multiple foreground subagents concurrently (e.g., 3–4 coder or explore subagents working on independent tasks), all subagents share the same API key as the root runtime. This leads to severe rate-limit contention:
- Rate limit exhaustion: A single
KIMI_API_KEY has finite TPM/RPM quotas. With 3–4 subagents each making multi-step LLM calls, the quota is consumed almost instantly.
- 429 errors and retries: Subsequent requests hit
429 Too Many Requests. Subagents either retry (wasting tokens) or hang waiting for quota recovery.
- Poor user experience: From the user's perspective, subagents that should complete in seconds instead take minutes or fail silently. The shell UI shows subagents as "running" with no visible progress.
- No backend attribution: All requests appear in the Kimi console as
KimiCLI/1.44.0 with no way to distinguish root agent calls from subagent calls, making it impossible to diagnose which subagent is consuming quota.
Reproduction Steps
- Configure a single API key via
/login or KIMI_API_KEY.
- Launch 3+ foreground subagents concurrently:
/coder "Analyze app.py"
/coder "Review key pool design"
/coder "Check test coverage"
- Observe that:
- Subagent response latency increases dramatically after the first few LLM calls
429 errors appear in logs (if debug mode is enabled)
- Subagents may exceed the default timeout and get killed
Expected Behavior
- Each concurrent subagent should use a distinct API key when multiple keys are available
- The system should enforce a concurrency limit based on available key count to avoid exhausting all keys
- Subagent requests should carry a discernible User-Agent so backend monitoring can attribute quota consumption correctly
Environment
- Kimi CLI version: 1.44.0
- OS: macOS / Linux
- Python: 3.14
- Provider: kimi (Kimi Code platform)
Additional Context
- The root agent itself also consumes the same key for compaction, user replies, etc. With subagents added, the contention becomes even worse.
- There is currently no concurrency limit for foreground subagents beyond the hardcoded background task limit, meaning a user can accidentally spawn unlimited subagents and DDoS their own API key.
- The timeout description in the Agent tool schema claims "Foreground: no default timeout (runs until completion)", which means a hung subagent (due to rate-limit backoff) will never be killed.
Additional information
No response
What feature would you like to see?
Problem Description
When running multiple foreground subagents concurrently (e.g., 3–4
coderorexploresubagents working on independent tasks), all subagents share the same API key as the root runtime. This leads to severe rate-limit contention:KIMI_API_KEYhas finite TPM/RPM quotas. With 3–4 subagents each making multi-step LLM calls, the quota is consumed almost instantly.429 Too Many Requests. Subagents either retry (wasting tokens) or hang waiting for quota recovery.KimiCLI/1.44.0with no way to distinguish root agent calls from subagent calls, making it impossible to diagnose which subagent is consuming quota.Reproduction Steps
/loginorKIMI_API_KEY.429errors appear in logs (if debug mode is enabled)Expected Behavior
Environment
Additional Context
Additional information
No response