Skip to content

Sub-sandbox env passthrough drops AWS task-role credentials, blocking boto3 in user code #113

@Zirkonium88

Description

@Zirkonium88

Description

When the code interpreter is deployed on AWS ECS (EC2 launch type) with an IAM task role, user code executed inside the nsjail sub-sandbox cannot authenticate to AWS services via boto3. The sub-sandbox env whitelist in _build_sanitized_env (src/services/sandbox/executor.py) does not propagate AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, AWS_REGION, or S3_* variables. Even when those vars are propagated, the sandbox process (UID 1001) cannot reach the ECS metadata endpoint at 169.254.170.2 because ECS restricts that endpoint to root (UID 0).

Net effect: any boto3 call from user code in the sandbox fails with NoCredentialsError or EndpointConnectionError, even though the outer container (running as root) authenticates correctly.

Steps to Reproduce

  1. Deploy the code interpreter on AWS ECS (EC2 launch type) with an IAM task role granting S3 read/write on a target bucket.
  2. Set the following container env vars: AWS_REGION, S3_ENDPOINT=s3.<region>.amazonaws.com, S3_BUCKET, S3_REGION, S3_SECURE=true.
  3. Configure SANDBOX_EGRESS_ALLOWLIST to include the S3 hosts (*.s3.<region>.amazonaws.com).
  4. Verify outer-container access works:
    # inside the running container (ecs execute-command)
    python3 -c "import boto3; print(boto3.client('s3', region_name='eu-central-1').list_objects_v2(Bucket='<bucket>', MaxKeys=1)['KeyCount'])"
    # → prints 1
  5. Run the same code in the sub-sandbox via POST /exec:
    curl -s -X POST http://localhost:8000/exec \
      -H 'Content-Type: application/json' -H "X-API-Key: $KEY" \
      -d '{"lang":"py","code":"import boto3, os\nprint(boto3.client(\"s3\").list_objects_v2(Bucket=os.environ[\"S3_BUCKET\"], MaxKeys=1).get(\"KeyCount\"))"}'
  6. Observe failure.

Expected Behavior

User code in the sub-sandbox can authenticate to AWS services using the ECS task role, equivalent to what works in the outer container. boto3 picks up credentials transparently without requiring user-supplied keys.

Actual Behavior

Three failure modes observed in sequence while diagnosing:

  1. Without any AWS env propagation: NoCredentialsError: Unable to locate credentials. Sub-sandbox sees no AWS_* vars at all.
  2. With AWS_CONTAINER_CREDENTIALS_RELATIVE_URI propagated:
    urllib3.exceptions.NewConnectionError:
    AWSHTTPConnection(host='169.254.170.2', port=80):
    Failed to establish a new connection: [Errno 101] Network is unreachable
    
    Verified UID-restricted: same endpoint returns HTTP 200 as root, fails as UID 1001.
  3. With NO_PROXY=...,.amazonaws.com (so boto3 bypasses egress proxy): EndpointConnectionError to S3, because the sandbox has no direct internet route — it must go through the egress proxy at 127.0.0.1:18443.

Root Cause

Two interacting issues in src/services/sandbox/executor.py:

  1. _build_sanitized_env hardcodes a small env whitelist (PATH, HOME, TMPDIR, language-specific vars, proxy vars) and does not pass through any AWS-related variables.
  2. The ECS metadata endpoint (169.254.170.2) is UID-restricted on the host, so even propagating AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is insufficient — the sandbox UID cannot reach it.

Proposed Fix

Resolve task role credentials in the outer container (which has UID 0 and thus access to 169.254.170.2) and inject them as static env vars per execution. boto3 picks up AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN natively.

Patch in src/services/sandbox/executor.py, inside _build_sanitized_env, right after the base whitelist is built:

# AWS env passthrough + pre-fetched task role credentials.
# Required on ECS where the metadata endpoint (169.254.170.2) is
# UID-restricted to root and unreachable from sandbox UID 1001.
import os as _os, json as _json, urllib.request as _ur

for _v in (
    "AWS_REGION", "AWS_DEFAULT_REGION",
    "S3_ENDPOINT", "S3_BUCKET", "S3_REGION", "S3_SECURE",
):
    if (_val := _os.environ.get(_v)) is not None:
        env_whitelist[_v] = _val

_rel = _os.environ.get("AWS_CONTAINER_CREDENTIALS_RELATIVE_URI")
if _rel:
    try:
        with _ur.urlopen(f"http://169.254.170.2{_rel}", timeout=2) as _r:
            _creds = _json.loads(_r.read())
        env_whitelist["AWS_ACCESS_KEY_ID"] = _creds["AccessKeyId"]
        env_whitelist["AWS_SECRET_ACCESS_KEY"] = _creds["SecretAccessKey"]
        env_whitelist["AWS_SESSION_TOKEN"] = _creds["Token"]
    except Exception:
        pass

# Also handle EKS / generic case
_full = _os.environ.get("AWS_CONTAINER_CREDENTIALS_FULL_URI")
if _full and "AWS_ACCESS_KEY_ID" not in env_whitelist:
    try:
        _req = _ur.Request(_full)
        if _tok := _os.environ.get("AWS_CONTAINER_AUTHORIZATION_TOKEN"):
            _req.add_header("Authorization", _tok)
        with _ur.urlopen(_req, timeout=2) as _r:
            _creds = _json.loads(_r.read())
        env_whitelist["AWS_ACCESS_KEY_ID"] = _creds["AccessKeyId"]
        env_whitelist["AWS_SECRET_ACCESS_KEY"] = _creds["SecretAccessKey"]
        env_whitelist["AWS_SESSION_TOKEN"] = _creds["Token"]
    except Exception:
        pass

NO_PROXY is intentionally left as 127.0.0.1,localhost — S3 traffic must go through the egress proxy on 127.0.0.1:18443, since the sandbox has no direct internet route. Operators must include *.s3.<region>.amazonaws.com in SANDBOX_EGRESS_ALLOWLIST.

Verification

After applying the patch and redeploying:

KEY=$(env | grep '^API_KEY=' | cut -d= -f2)

# Single call
curl -s -X POST http://localhost:8000/exec \
  -H 'Content-Type: application/json' -H "X-API-Key: $KEY" \
  -d '{"lang":"py","code":"import boto3, os\nr = boto3.client(\"s3\").list_objects_v2(Bucket=os.environ[\"S3_BUCKET\"], MaxKeys=3)\nprint([o[\"Key\"] for o in r.get(\"Contents\", [])])"}'
# → ["images/...", "lead-generator/...", ...]

# Paginated (validates session-long use)
curl -s -X POST http://localhost:8000/exec \
  -H 'Content-Type: application/json' -H "X-API-Key: $KEY" \
  -d '{"lang":"py","code":"import boto3, os\np = boto3.client(\"s3\").get_paginator(\"list_objects_v2\")\nn=0\nfor pg in p.paginate(Bucket=os.environ[\"S3_BUCKET\"]):\n    n += pg.get(\"KeyCount\", 0)\nprint(\"total:\", n)"}'
# → "total: 329"

Verified working with REPL pool sandboxes >1h after spawn (credentials are re-fetched per /exec call).

Environment

  • OS: Amazon Linux 2023 (ECS-optimized AMI), m5.large EC2 launch type
  • Python: 3.12 (per upstream image)
  • Code Interpreter version: 1.2.0
  • Region: eu-central-1
  • Auth: ECS IAM task role, no static credentials
  • pydantic-settings: v2.x

Misleading Diagnostic Path

The host-local fallback URL (localhost:3900) initially observed in early tests made this look like an S3Config / Pydantic v1→v2 config-loading bug. After thorough investigation, S3Config was confirmed to load env vars correctly under pydantic-settings v2 — the apparent default fallback was actually the sub-sandbox seeing no S3_* vars at all, masquerading as a config-loading issue. The following were also investigated and ruled out:

  • SANDBOX_EGRESS_ALLOWLIST populated with all S3 domain variants — proxy correctly tunnels CONNECT to S3.
  • ENABLE_NETWORK_ISOLATION toggles, HTTP_PROXY / HTTPS_PROXY adjustments, VPC endpoints — not the cause.
  • iptables-level UID matching — actual restriction sits at the ECS credential provider layer, not iptables.

Root cause is the env-passthrough whitelist combined with UID-restricted access to the ECS metadata endpoint.

Acceptance Criteria

  • User code in the sub-sandbox can call boto3.client("s3").list_objects_v2(...) against an AWS bucket using only the IAM task role (no static keys).
  • Garage / MinIO setups remain functional via static S3_ACCESS_KEY / S3_SECRET_KEY env vars (unchanged code path).
  • Pool-warmed REPL sandboxes >1h old still authenticate successfully (credentials refreshed per execution).
  • Failure to reach the metadata endpoint is non-fatal (caught silently); user code without AWS dependencies is unaffected.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions