Problem
When using the distributed sandbox warm pool (e.g. code-interpreter template), pool.acquire() intermittently returns sandboxes that expire during the checkReady phase, causing READY_TIMEOUT failures. The renew call never gets a chance to execute.
User report: "Just acquired it asynchronously, next second it's already expired, check fails, and forceRenew never runs."
Root cause: tryTakeIdle uses a binary expiry check (expiresAt > now) — a sandbox with 1ms remaining TTL still passes. During the subsequent checkReady polling (up to 30s), the sandbox expires server-side and becomes unreachable.
Affected code paths:
| Store |
Current logic |
InMemoryPoolStateStore.tryTakeIdle |
if (entry.expiresAt.isAfter(now)) return sandboxId |
RedisPoolStateStore TAKE_IDLE_SCRIPT (Lua) |
if tonumber(expires_at) > now_ms then return sandbox_id |
Additionally, reconciler.reapExpiredIdle only cleans up already-expired entries — it won't proactively reclaim near-expiry sandboxes to trigger replenishment.
Proposed Solution
1. Add acquireMinRemainingTtl to PoolConfig
data class PoolConfig(
// ... existing fields
val acquireMinRemainingTtl: Duration = Duration.ofSeconds(60),
)
2. Update tryTakeIdle condition
Before:
if (entry.expiresAt.isAfter(now)) return sandboxId
After:
if (entry.expiresAt.isAfter(now.plus(minRemainingTtl))) return sandboxId
// else: discard and continue to next entry
Redis Lua — before:
if tonumber(expires_at) > now_ms then return sandbox_id
After:
if tonumber(expires_at) > (now_ms + min_remaining_ttl_ms) then return sandbox_id
3. (Optional) Proactive reconciler cleanup
Make reapExpiredIdle also evict entries where TTL < minRemainingTtl, triggering the pool to replenish with fresh sandboxes.
Change Scope
| File |
Change |
Lines |
PoolConfig |
Add acquireMinRemainingTtl field + default |
~3 |
InMemoryPoolStateStore.tryTakeIdle |
Update expiry condition |
~2 |
RedisPoolStateStore TAKE_IDLE_SCRIPT |
Update Lua condition |
~1 |
SandboxPool.acquire / store interface |
Pass minRemainingTtl parameter |
~3 |
| Total |
|
~10-15 lines |
Acceptance Criteria
Labels
enhancement, sdk, pool, reliability
Priority
High — causes intermittent READY_TIMEOUT in production with no user-side workaround.
Problem
When using the distributed sandbox warm pool (e.g.
code-interpretertemplate),pool.acquire()intermittently returns sandboxes that expire during thecheckReadyphase, causingREADY_TIMEOUTfailures. Therenewcall never gets a chance to execute.User report: "Just acquired it asynchronously, next second it's already expired, check fails, and forceRenew never runs."
Root cause:
tryTakeIdleuses a binary expiry check (expiresAt > now) — a sandbox with 1ms remaining TTL still passes. During the subsequentcheckReadypolling (up to 30s), the sandbox expires server-side and becomes unreachable.Affected code paths:
InMemoryPoolStateStore.tryTakeIdleif (entry.expiresAt.isAfter(now)) return sandboxIdRedisPoolStateStoreTAKE_IDLE_SCRIPT (Lua)if tonumber(expires_at) > now_ms then return sandbox_idAdditionally,
reconciler.reapExpiredIdleonly cleans up already-expired entries — it won't proactively reclaim near-expiry sandboxes to trigger replenishment.Proposed Solution
1. Add
acquireMinRemainingTtltoPoolConfig2. Update
tryTakeIdleconditionBefore:
After:
Redis Lua — before:
After:
3. (Optional) Proactive reconciler cleanup
Make
reapExpiredIdlealso evict entries whereTTL < minRemainingTtl, triggering the pool to replenish with fresh sandboxes.Change Scope
PoolConfigacquireMinRemainingTtlfield + defaultInMemoryPoolStateStore.tryTakeIdleRedisPoolStateStoreTAKE_IDLE_SCRIPTSandboxPool.acquire/ store interfaceminRemainingTtlparameterAcceptance Criteria
acquireMinRemainingTtldefaults to 60s — no behavior change for users who don't set itminRemainingTtlare skipped during acquire (discarded, try next)minRemainingTtl = 60sLabels
enhancement,sdk,pool,reliabilityPriority
High — causes intermittent
READY_TIMEOUTin production with no user-side workaround.