Skip to content

How to pre-load heavy imports in a pool pod before the first run_python()? #834

@xavieralmendros-aily

Description

@xavieralmendros-aily

We're running OpenSandbox on Kubernetes with a Pool CRD pre-warming Python sandbox pods (bufferMin: 1). Our image is custom (built from our internal ai-python:3.12 base) and includes pandas, numpy, matplotlib, ipykernel==6.29.5, jupyter_server==2.17.0. code-interpreter (Python SDK) works against it correctly.

Versions

  • opensandbox-server: chart kubernetes/charts/opensandbox, server image v0.1.11
  • opensandbox-controller: same chart, v0.1.11
  • execd: opensandbox/execd:v1.0.13 (server), opensandbox/execd:v1.0.6 (pool init container)
  • opensandbox Python SDK: 0.1.7
  • opensandbox-code-interpreter Python SDK: 0.1.2

Problem

Even with a pre-warmed pool, the first run_python() call in a session pays the full kernel boot + heavy-imports cost (~3–5 s on our setup, dominated by import pandas, numpy, matplotlib). Subsequent calls in the same session are fast because sys.modules is populated. For interactive AI-agent workloads, that first-call latency is the dealbreaker.

Our design goal: when a pool pod reaches ready, it should already have a Python kernel process running with those heavy packages imported in memory. The agent claiming the pod should hit a warm kernel on call #1.

What we tried

We used the BOOTSTRAP_CMD env var in the pool's sandbox container to call execd's REST API before sleep infinity:

POST http://localhost:44772/code/context {"language": "python"}
POST http://localhost:44772/code {"code": "", "context": {"language": "python"}}

The idea was: create the default Python context, run a warmup script that does import pandas, numpy, matplotlib, and then let sleep infinity keep the pod alive so the kernel enters the pool already warm.

This had two problems:

  1. Timing against lazy kernelspec loading. After jupyter server prints "is running", GET /api/kernelspecs returns {} for a few seconds while the KernelSpecManager initializes. During that window execd logs failed to create session,
    retrying: no kernel specs found. Our retry loop handles it, but it's fragile.
  2. Even when pre-warm succeeds, it may not be safe to reuse. Issue [BUG] pre-warmed pod won't be deleted never #743 reports that a pod returned to the pool keeps residual state from the previous session. That means a kernel pre-warmed by one session could leak variables/data to
    the next session, which is unacceptable for multi-tenant. We'd want the pre-warm to be imports-only state, with user-defined state reset between sessions — but we don't see a primitive for that.

Questions

  1. Is there a supported pattern for pre-loading heavy Python imports into a pool pod's kernel, such that the pod enters ready with those imports already resident in memory?

  2. If not, is there a roadmap item for it? The closest related issues we found are FEATURE: Client-driven Sandboxes Connection Pool (ms-level sandbox load) #268 (client-driven connection pool) and [Feature] Official support for multi-sandbox per pod / fast-sandbox runtime (OSEP-0007) #780 (multi-sandbox per pod), both focused on pool-capacity latency rather than kernel-state latency.

  3. Longer-term: is there any plan for process snapshotting (CRIU-style or equivalent) so that a pool pod can restore a pre-imported Python process in sub-second time? E2B and Modal use this approach to get <1 s first-call latency on full scientific-Python stacks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions