Description
When the code interpreter is deployed on AWS ECS (EC2 launch type) with an IAM task role, user code executed inside the nsjail sub-sandbox cannot authenticate to AWS services via boto3. The sub-sandbox env whitelist in _build_sanitized_env (src/services/sandbox/executor.py) does not propagate AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, AWS_REGION, or S3_* variables. Even when those vars are propagated, the sandbox process (UID 1001) cannot reach the ECS metadata endpoint at 169.254.170.2 because ECS restricts that endpoint to root (UID 0).
Net effect: any boto3 call from user code in the sandbox fails with NoCredentialsError or EndpointConnectionError, even though the outer container (running as root) authenticates correctly.
Steps to Reproduce
- Deploy the code interpreter on AWS ECS (EC2 launch type) with an IAM task role granting S3 read/write on a target bucket.
- Set the following container env vars:
AWS_REGION, S3_ENDPOINT=s3.<region>.amazonaws.com, S3_BUCKET, S3_REGION, S3_SECURE=true.
- Configure
SANDBOX_EGRESS_ALLOWLIST to include the S3 hosts (*.s3.<region>.amazonaws.com).
- Verify outer-container access works:
# inside the running container (ecs execute-command)
python3 -c "import boto3; print(boto3.client('s3', region_name='eu-central-1').list_objects_v2(Bucket='<bucket>', MaxKeys=1)['KeyCount'])"
# → prints 1
- Run the same code in the sub-sandbox via
POST /exec:
curl -s -X POST http://localhost:8000/exec \
-H 'Content-Type: application/json' -H "X-API-Key: $KEY" \
-d '{"lang":"py","code":"import boto3, os\nprint(boto3.client(\"s3\").list_objects_v2(Bucket=os.environ[\"S3_BUCKET\"], MaxKeys=1).get(\"KeyCount\"))"}'
- Observe failure.
Expected Behavior
User code in the sub-sandbox can authenticate to AWS services using the ECS task role, equivalent to what works in the outer container. boto3 picks up credentials transparently without requiring user-supplied keys.
Actual Behavior
Three failure modes observed in sequence while diagnosing:
- Without any AWS env propagation:
NoCredentialsError: Unable to locate credentials. Sub-sandbox sees no AWS_* vars at all.
- With
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI propagated:
urllib3.exceptions.NewConnectionError:
AWSHTTPConnection(host='169.254.170.2', port=80):
Failed to establish a new connection: [Errno 101] Network is unreachable
Verified UID-restricted: same endpoint returns HTTP 200 as root, fails as UID 1001.
- With
NO_PROXY=...,.amazonaws.com (so boto3 bypasses egress proxy): EndpointConnectionError to S3, because the sandbox has no direct internet route — it must go through the egress proxy at 127.0.0.1:18443.
Root Cause
Two interacting issues in src/services/sandbox/executor.py:
_build_sanitized_env hardcodes a small env whitelist (PATH, HOME, TMPDIR, language-specific vars, proxy vars) and does not pass through any AWS-related variables.
- The ECS metadata endpoint (
169.254.170.2) is UID-restricted on the host, so even propagating AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is insufficient — the sandbox UID cannot reach it.
Proposed Fix
Resolve task role credentials in the outer container (which has UID 0 and thus access to 169.254.170.2) and inject them as static env vars per execution. boto3 picks up AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN natively.
Patch in src/services/sandbox/executor.py, inside _build_sanitized_env, right after the base whitelist is built:
# AWS env passthrough + pre-fetched task role credentials.
# Required on ECS where the metadata endpoint (169.254.170.2) is
# UID-restricted to root and unreachable from sandbox UID 1001.
import os as _os, json as _json, urllib.request as _ur
for _v in (
"AWS_REGION", "AWS_DEFAULT_REGION",
"S3_ENDPOINT", "S3_BUCKET", "S3_REGION", "S3_SECURE",
):
if (_val := _os.environ.get(_v)) is not None:
env_whitelist[_v] = _val
_rel = _os.environ.get("AWS_CONTAINER_CREDENTIALS_RELATIVE_URI")
if _rel:
try:
with _ur.urlopen(f"http://169.254.170.2{_rel}", timeout=2) as _r:
_creds = _json.loads(_r.read())
env_whitelist["AWS_ACCESS_KEY_ID"] = _creds["AccessKeyId"]
env_whitelist["AWS_SECRET_ACCESS_KEY"] = _creds["SecretAccessKey"]
env_whitelist["AWS_SESSION_TOKEN"] = _creds["Token"]
except Exception:
pass
# Also handle EKS / generic case
_full = _os.environ.get("AWS_CONTAINER_CREDENTIALS_FULL_URI")
if _full and "AWS_ACCESS_KEY_ID" not in env_whitelist:
try:
_req = _ur.Request(_full)
if _tok := _os.environ.get("AWS_CONTAINER_AUTHORIZATION_TOKEN"):
_req.add_header("Authorization", _tok)
with _ur.urlopen(_req, timeout=2) as _r:
_creds = _json.loads(_r.read())
env_whitelist["AWS_ACCESS_KEY_ID"] = _creds["AccessKeyId"]
env_whitelist["AWS_SECRET_ACCESS_KEY"] = _creds["SecretAccessKey"]
env_whitelist["AWS_SESSION_TOKEN"] = _creds["Token"]
except Exception:
pass
NO_PROXY is intentionally left as 127.0.0.1,localhost — S3 traffic must go through the egress proxy on 127.0.0.1:18443, since the sandbox has no direct internet route. Operators must include *.s3.<region>.amazonaws.com in SANDBOX_EGRESS_ALLOWLIST.
Verification
After applying the patch and redeploying:
KEY=$(env | grep '^API_KEY=' | cut -d= -f2)
# Single call
curl -s -X POST http://localhost:8000/exec \
-H 'Content-Type: application/json' -H "X-API-Key: $KEY" \
-d '{"lang":"py","code":"import boto3, os\nr = boto3.client(\"s3\").list_objects_v2(Bucket=os.environ[\"S3_BUCKET\"], MaxKeys=3)\nprint([o[\"Key\"] for o in r.get(\"Contents\", [])])"}'
# → ["images/...", "lead-generator/...", ...]
# Paginated (validates session-long use)
curl -s -X POST http://localhost:8000/exec \
-H 'Content-Type: application/json' -H "X-API-Key: $KEY" \
-d '{"lang":"py","code":"import boto3, os\np = boto3.client(\"s3\").get_paginator(\"list_objects_v2\")\nn=0\nfor pg in p.paginate(Bucket=os.environ[\"S3_BUCKET\"]):\n n += pg.get(\"KeyCount\", 0)\nprint(\"total:\", n)"}'
# → "total: 329"
Verified working with REPL pool sandboxes >1h after spawn (credentials are re-fetched per /exec call).
Environment
- OS: Amazon Linux 2023 (ECS-optimized AMI), m5.large EC2 launch type
- Python: 3.12 (per upstream image)
- Code Interpreter version: 1.2.0
- Region: eu-central-1
- Auth: ECS IAM task role, no static credentials
- pydantic-settings: v2.x
Misleading Diagnostic Path
The host-local fallback URL (localhost:3900) initially observed in early tests made this look like an S3Config / Pydantic v1→v2 config-loading bug. After thorough investigation, S3Config was confirmed to load env vars correctly under pydantic-settings v2 — the apparent default fallback was actually the sub-sandbox seeing no S3_* vars at all, masquerading as a config-loading issue. The following were also investigated and ruled out:
SANDBOX_EGRESS_ALLOWLIST populated with all S3 domain variants — proxy correctly tunnels CONNECT to S3.
ENABLE_NETWORK_ISOLATION toggles, HTTP_PROXY / HTTPS_PROXY adjustments, VPC endpoints — not the cause.
- iptables-level UID matching — actual restriction sits at the ECS credential provider layer, not iptables.
Root cause is the env-passthrough whitelist combined with UID-restricted access to the ECS metadata endpoint.
Acceptance Criteria
References
Description
When the code interpreter is deployed on AWS ECS (EC2 launch type) with an IAM task role, user code executed inside the nsjail sub-sandbox cannot authenticate to AWS services via boto3. The sub-sandbox env whitelist in
_build_sanitized_env(src/services/sandbox/executor.py) does not propagateAWS_CONTAINER_CREDENTIALS_RELATIVE_URI,AWS_REGION, orS3_*variables. Even when those vars are propagated, the sandbox process (UID 1001) cannot reach the ECS metadata endpoint at169.254.170.2because ECS restricts that endpoint to root (UID 0).Net effect: any boto3 call from user code in the sandbox fails with
NoCredentialsErrororEndpointConnectionError, even though the outer container (running as root) authenticates correctly.Steps to Reproduce
AWS_REGION,S3_ENDPOINT=s3.<region>.amazonaws.com,S3_BUCKET,S3_REGION,S3_SECURE=true.SANDBOX_EGRESS_ALLOWLISTto include the S3 hosts (*.s3.<region>.amazonaws.com).POST /exec:Expected Behavior
User code in the sub-sandbox can authenticate to AWS services using the ECS task role, equivalent to what works in the outer container. boto3 picks up credentials transparently without requiring user-supplied keys.
Actual Behavior
Three failure modes observed in sequence while diagnosing:
NoCredentialsError: Unable to locate credentials. Sub-sandbox sees noAWS_*vars at all.AWS_CONTAINER_CREDENTIALS_RELATIVE_URIpropagated:NO_PROXY=...,.amazonaws.com(so boto3 bypasses egress proxy):EndpointConnectionErrorto S3, because the sandbox has no direct internet route — it must go through the egress proxy at127.0.0.1:18443.Root Cause
Two interacting issues in
src/services/sandbox/executor.py:_build_sanitized_envhardcodes a small env whitelist (PATH,HOME,TMPDIR, language-specific vars, proxy vars) and does not pass through any AWS-related variables.169.254.170.2) is UID-restricted on the host, so even propagatingAWS_CONTAINER_CREDENTIALS_RELATIVE_URIis insufficient — the sandbox UID cannot reach it.Proposed Fix
Resolve task role credentials in the outer container (which has UID 0 and thus access to
169.254.170.2) and inject them as static env vars per execution. boto3 picks upAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKENnatively.Patch in
src/services/sandbox/executor.py, inside_build_sanitized_env, right after the base whitelist is built:NO_PROXYis intentionally left as127.0.0.1,localhost— S3 traffic must go through the egress proxy on127.0.0.1:18443, since the sandbox has no direct internet route. Operators must include*.s3.<region>.amazonaws.cominSANDBOX_EGRESS_ALLOWLIST.Verification
After applying the patch and redeploying:
Verified working with REPL pool sandboxes >1h after spawn (credentials are re-fetched per
/execcall).Environment
Misleading Diagnostic Path
The host-local fallback URL (
localhost:3900) initially observed in early tests made this look like an S3Config / Pydantic v1→v2 config-loading bug. After thorough investigation,S3Configwas confirmed to load env vars correctly underpydantic-settingsv2 — the apparent default fallback was actually the sub-sandbox seeing noS3_*vars at all, masquerading as a config-loading issue. The following were also investigated and ruled out:SANDBOX_EGRESS_ALLOWLISTpopulated with all S3 domain variants — proxy correctly tunnels CONNECT to S3.ENABLE_NETWORK_ISOLATIONtoggles,HTTP_PROXY/HTTPS_PROXYadjustments, VPC endpoints — not the cause.Root cause is the env-passthrough whitelist combined with UID-restricted access to the ECS metadata endpoint.
Acceptance Criteria
boto3.client("s3").list_objects_v2(...)against an AWS bucket using only the IAM task role (no static keys).S3_ACCESS_KEY/S3_SECRET_KEYenv vars (unchanged code path).References