Skip to content
Open
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ We now ask: **What if our agent was 100x simpler, and still worked nearly as wel
- **Minimal**: Just some 100 lines of python for the [agent class](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/agents/default.py) (and a bit more for the [environment](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/environments/local.py),
[model](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/models/litellm_model.py), and [run script](https://github.com/SWE-agent/mini-swe-agent/blob/main/src/minisweagent/run/hello_world.py)) — no fancy dependencies!
- **Performant:** Scores >74% on the [SWE-bench verified benchmark](https://www.swebench.com/); starts much faster than Claude Code
- **Deployable:** Supports **local environments**, **docker/podman**, **singularity/apptainer**, **bublewrap**, **contree**, and more
- **Deployable:** Supports **local environments**, **docker/podman**, **singularity/apptainer**, **bubblewrap**, **contree**, **[E2B](https://e2b.dev)** (no local Docker required), and more
- **Compatible:** Supports all models via **litellm**, **openrouter**, **portkey**, and more. Support for `/completion` and `/response` endpoints, interleaved thinking etc.
- Built by the Princeton & Stanford team behind [SWE-bench](https://swebench.com), [SWE-agent](https://swe-agent.com), and more
- **Tested:** [![Codecov](https://img.shields.io/codecov/c/github/swe-agent/mini-swe-agent?style=flat-square)](https://codecov.io/gh/SWE-agent/mini-swe-agent)
Expand Down
2 changes: 2 additions & 0 deletions docs/advanced/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,5 @@ On top, there are a few more specialized environment classes that you can use:

* **`contree`** ([`ContreeEnvironment`](../reference/environments/contree.md)) - Uses [ConTree](https://contree.dev/) for safe code execution sandboxing. Platform that built for agents and supports Git-like execution.

* **`e2b`** ([`E2BEnvironment`](../reference/environments/e2b.md)) - [E2B](https://e2b.dev) cloud sandbox execution. Converts Docker images into persistent E2B templates so **no local Docker daemon is required**. Suitable for large-scale, fully-remote SWE-bench evaluations.

78 changes: 78 additions & 0 deletions docs/reference/environments/e2b.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# E2B

!!! note "E2B Environment class"

- [Read on GitHub](https://github.com/swe-agent/mini-swe-agent/blob/main/src/minisweagent/environments/extra/e2b.py)
- Requires an [E2B](https://e2b.dev) account and API key

??? note "Full source code"

```python
--8<-- "src/minisweagent/environments/extra/e2b.py"
```

::: minisweagent.environments.extra.e2b

This environment executes commands in [E2B](https://e2b.dev) cloud sandboxes.
E2B converts Docker images into persistent sandbox templates, so **no local Docker daemon is required** — everything runs in the cloud.

This makes it well-suited for:

- Large-scale, fully-remote SWE-bench evaluations
- Environments where Docker is unavailable (CI, serverless)
- Parallel agent runs without managing local container infrastructure

## How it works

The first time a Docker image is used, `E2BEnvironment` builds a persistent E2B template from that image (via `Template.build`). Subsequent runs reuse the cached template, so the build cost is paid only once per unique image.

## Setup

1. Install the E2B extra:
```bash
pip install "mini-swe-agent[e2b]"
```

2. Set your E2B API key:
```bash
export E2B_API_KEY="your-e2b-api-key"
```

## Usage

Evaluate on SWE-bench using E2B as the sandbox backend:
```bash
mini-extra swebench \
--subset verified \
--split test \
--workers 50 \
--environment-class e2b
```

Or specify it in your YAML config:
```yaml
environment:
environment_class: e2b
sandbox_timeout: 3600 # seconds the sandbox stays alive
cpu_count: 2
memory_mb: 2048
```

## Configuration reference

| Field | Default | Description |
|-------|---------|-------------|
| `image` | *(required)* | Docker Hub image to use as the sandbox base |
| `cwd` | `/` | Default working directory for commands |
| `timeout` | `30` | Per-command timeout in seconds |
| `env` | `{}` | Environment variables set in every command |
| `sandbox_timeout` | `3600` | How long the sandbox stays alive (seconds) |
| `cpu_count` | `2` | vCPUs allocated to the sandbox |
| `memory_mb` | `2048` | Memory allocated to the sandbox (MiB) |
| `build_timeout` | `1800` | Max seconds to wait for a template build |
| `skip_cache` | `False` | Force-rebuild the template even if it exists |
| `api_key` | `None` | E2B API key (falls back to `E2B_API_KEY` env var) |
| `registry_username` | `None` | Username for private Docker registry auth |
| `registry_password` | `None` | Password for private Docker registry auth |

{% include-markdown "../../_footer.md" %}
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ contree = [
"contree-sdk>=0.2.0",
]

e2b = [
"e2b>=1.0.0",
]

[project.urls]
Documentation = "https://mini-swe-agent.com/latest/"
Repository = "https://github.com/SWE-agent/mini-swe-agent"
Expand Down
1 change: 1 addition & 0 deletions src/minisweagent/environments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"swerex_modal": "minisweagent.environments.extra.swerex_modal.SwerexModalEnvironment",
"bubblewrap": "minisweagent.environments.extra.bubblewrap.BubblewrapEnvironment",
"contree": "minisweagent.environments.extra.contree.ContreeEnvironment",
"e2b": "minisweagent.environments.extra.e2b.E2BEnvironment",
}


Expand Down
278 changes: 278 additions & 0 deletions src/minisweagent/environments/extra/e2b.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
"""E2B cloud sandbox environment implementation."""

from __future__ import annotations

import atexit
import concurrent.futures
import hashlib
import logging
import re
from typing import Any

# Module-level registry of live sandboxes for best-effort cleanup on exit
# (covers Ctrl+C and unhandled exceptions where __del__ may not be called).
_active_sandboxes: set[E2BEnvironment] = set()


def _cleanup_all_sandboxes() -> None:
"""Kill all sandboxes that are still alive at interpreter shutdown."""
for env in list(_active_sandboxes):
env.stop()


atexit.register(_cleanup_all_sandboxes)

from pydantic import BaseModel, Field


class E2BEnvironmentConfig(BaseModel):
image: str
"""Docker Hub image name to use as the E2B template base.
Example: ``'swebench/sweb.eval.x86_64.django__django-11099:latest'``
"""
cwd: str = "/"
"""Working directory in which to execute commands."""
timeout: int = 30
"""Timeout for executing commands in the sandbox."""
env: dict[str, str] = Field(default_factory=dict)
"""Environment variables to set when executing commands."""
sandbox_timeout: int = 3600
"""How long (in seconds) the sandbox is allowed to stay alive."""

# Template build options (passed to Template.build())
cpu_count: int = 2
"""Number of vCPUs allocated to the sandbox."""
memory_mb: int = 2048
"""Memory allocated to the sandbox in MiB. Default is higher than E2B's 1024 MiB default
to accommodate larger SWE-bench images."""
skip_cache: bool = False
"""If True, force-rebuild the template even if it already exists."""
tags: list[str] = Field(default_factory=list)
"""Optional tags to attach to the template."""
build_timeout: int = 1800
"""Timeout for template builds in seconds (default 30 min to handle large images)."""

# E2B authentication (can also be set via the E2B_API_KEY env var)
api_key: str | None = None
"""E2B API key. Falls back to the E2B_API_KEY environment variable."""

# Private registry credentials (passed to Template().from_image())
registry_username: str | None = None
"""Username for authenticating against a private Docker registry."""
registry_password: str | None = None
"""Password for authenticating against a private Docker registry."""


class E2BTemplateManager:
"""Converts Docker images to E2B templates and manages their lifecycle.

Can be used independently of :class:`E2BEnvironment` for pre-building
templates in batch scripts.
"""

def __init__(self, config: E2BEnvironmentConfig) -> None:
self.config = config
self.logger = logging.getLogger("minisweagent.environment.e2b")

@staticmethod
def _image_to_template_name(docker_image: str) -> str:
"""Deterministically map a Docker image name to a valid E2B template name.

A sha256 8-character suffix is appended to avoid collisions between
images that produce the same sanitized prefix. The result is at most
63 characters and contains only lower-case alphanumerics and hyphens.

Example::

'swebench/sweb.eval.x86_64.django__django-11099:latest'
→ 'swebench-sweb-eval-x86-64-django--django-11099-l-a1b2c3d4'
"""
hash_suffix = hashlib.sha256(docker_image.encode()).hexdigest()[:8]
name = re.sub(r"[^a-zA-Z0-9-]", "-", docker_image)
name = re.sub(r"-{3,}", "--", name)
name = name.lower()
# Reserve 9 characters for "-" + 8-char hash suffix → prefix max 54 chars
prefix = name[:54].strip("-")
if not prefix:
return hash_suffix
return f"{prefix}-{hash_suffix}"

def get_or_build(self, docker_image: str) -> str:
"""Return the E2B template name for *docker_image*, building it if needed."""
from e2b import Template

template_name = self._image_to_template_name(docker_image)
if not Template.exists(template_name, api_key=self.config.api_key) or self.config.skip_cache:
self.logger.info(
"E2B template %s not found. Starting build (up to %d seconds)...",
template_name,
self.config.build_timeout,
)
self._build_template(docker_image, template_name)
self.logger.info("E2B template %s built successfully.", template_name)
else:
self.logger.debug("E2B template %s already exists.", template_name)
return template_name

def rebuild(self, docker_image: str) -> str:
"""Force-rebuild the E2B template for *docker_image*."""
template_name = self._image_to_template_name(docker_image)
self.logger.info("Rebuilding E2B template %s...", template_name)
self._build_template(docker_image, template_name)
self.logger.info("E2B template %s rebuilt successfully.", template_name)
return template_name

def _build_template(self, docker_image: str, template_name: str) -> None:
"""Build an E2B template from *docker_image*.

Uses :class:`concurrent.futures.ThreadPoolExecutor` for timeout
enforcement because ``signal.alarm`` only works on the main thread
and this method may be called from worker threads.
"""
from e2b import Template

template = Template().from_image(
docker_image,
username=self.config.registry_username,
password=self.config.registry_password,
)

def _do_build() -> None:
Template.build(
template,
template_name,
cpu_count=self.config.cpu_count,
memory_mb=self.config.memory_mb,
skip_cache=self.config.skip_cache,
tags=self.config.tags or None,
api_key=self.config.api_key,
)

executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
future = executor.submit(_do_build)
try:
future.result(timeout=self.config.build_timeout)
except concurrent.futures.TimeoutError as e:
executor.shutdown(wait=False, cancel_futures=True)
msg = f"E2B template build timed out after {self.config.build_timeout}s: {template_name}"
raise TimeoutError(msg) from e
except Exception:
executor.shutdown(wait=False, cancel_futures=True)
raise
else:
executor.shutdown(wait=True)


class E2BEnvironment:
"""Executes bash commands inside an E2B cloud sandbox.

`E2B <https://e2b.dev>`_ provides isolated cloud sandboxes that can run
arbitrary Docker images without requiring a local Docker daemon. This
makes it suitable for large-scale, fully-remote SWE-bench evaluations.

The first time a Docker image is used it is converted into a persistent
E2B template; subsequent runs reuse the cached template.

See :class:`E2BEnvironmentConfig` for keyword arguments.
"""

def __init__(self, **kwargs: Any) -> None:
from e2b import Sandbox
from e2b.exceptions import SandboxException

self.logger = logging.getLogger("minisweagent.environment.e2b")
self.config = E2BEnvironmentConfig(**kwargs)
manager = E2BTemplateManager(self.config)
template_name = manager.get_or_build(self.config.image)
self.logger.info("Creating E2B sandbox (template: %s)...", template_name)
try:
self.sandbox = Sandbox.create(
template=template_name,
timeout=self.config.sandbox_timeout,
api_key=self.config.api_key,
metadata={"user": "junyeoplee2"}, # TEMP. DO NOT MERGE
)
except SandboxException as e:
if "404" not in str(e):
raise
self.logger.warning("Template %s not found (stale cache). Rebuilding...", template_name)
manager.rebuild(self.config.image)
self.sandbox = Sandbox.create(
template=template_name,
timeout=self.config.sandbox_timeout,
api_key=self.config.api_key,
metadata={"user": "junyeoplee2"}, # TEMP. DO NOT MERGE
)
self.logger.info("E2B sandbox ready (id: %s)", self.sandbox.sandbox_id)
_active_sandboxes.add(self)

def execute(self, action: dict, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
"""Execute a command in the sandbox and return the output."""
command = action.get("command", "") if isinstance(action, dict) else action
try:
result = self.sandbox.commands.run(
command,
user="root",
cwd=cwd or self.config.cwd,
timeout=timeout or self.config.timeout,
envs=self.config.env or None,
)
output: dict[str, Any] = {
"output": result.stdout + result.stderr,
"returncode": result.exit_code,
"exception_info": "",
}
except Exception as e:
output = {
"output": "",
"returncode": -1,
"exception_info": f"An error occurred while executing the command: {e}",
"extra": {"exception_type": type(e).__name__, "exception": str(e)},
}
self._check_finished(output)
return output

def _check_finished(self, output: dict) -> None:
"""Raise :class:`~minisweagent.exceptions.Submitted` when the task-submission marker is detected."""
from minisweagent.exceptions import Submitted

lines = output.get("output", "").lstrip().splitlines(keepends=True)
if lines and lines[0].strip() == "COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT" and output["returncode"] == 0:
submission = "".join(lines[1:])
raise Submitted(
{
"role": "exit",
"content": submission,
"extra": {"exit_status": "Submitted", "submission": submission},
}
)

def get_template_vars(self, **kwargs: Any) -> dict[str, Any]:
from minisweagent.utils.serialize import recursive_merge

return recursive_merge(self.config.model_dump(), kwargs)

def serialize(self) -> dict:
return {
"info": {
"config": {
"environment": self.config.model_dump(
mode="json",
exclude={"api_key", "registry_password"},
),
"environment_type": f"{self.__class__.__module__}.{self.__class__.__name__}",
}
}
}

def stop(self) -> None:
_active_sandboxes.discard(self)
sandbox = getattr(self, "sandbox", None)
if sandbox is not None:
try:
sandbox.kill()
except Exception:
pass

def __del__(self) -> None:
self.stop()
Loading