wip – opened by mistake#2689
Closed
sdcoffey wants to merge 11 commits into
Closed
Conversation
…#6) This pull request adds a native sandbox runtime to `openai-agents-python` by porting the relevant `universal_computer` sandbox pieces into `src/agents/sandbox` and reshaping them around the existing Agents SDK model instead of preserving the old UC agent stack. The main change is a new `SandboxAgent` + `SandboxAgentRunner` flow that fits the normal `Agent`/`Runner` paradigm while still supporting manifests, capabilities, session state, snapshots, and sandbox-backed tools. Along the way, this PR ports the sandbox sessions, manifests, entries, utilities, and Docker/Modal/E2B/Unix backends, removes the stale `universal_computer` import paths, and makes the sandbox code Python 3.10 compatible. This pull request also cleans up the port rather than carrying forward legacy UC shapes wholesale. In particular, it drops the old PTY/session-terminal surface and the legacy context-manager-oriented DX, uses concrete sandbox types instead of a protocol-heavy type layer, and keeps Docker as an optional extra rather than a core dependency. To make the new flow easier to try, it adds a Docker example that streams agent text and tool activity as it runs, plus focused sandbox tests covering manifests, snapshots, session behavior, and the runner integration. --------- --------- Co-authored-by: Kazuhiro Sera <seratch@openai.com>
This pull request fixes a shared sandbox persistence contract bug across
Docker, E2B, Modal, and Unix-local.
Before this change, Docker had branch-local logic to account for actual
in-workspace mount targets, but the other backends still reasoned mostly
from the logical manifest key. A manifest such as `entries={"logical":
Mount(..., mount_path=Path("actual"))}` could therefore exclude
`logical` while still persisting or snapshotting `actual` in Unix-local,
Modal, and parts of E2B. E2B also enumerated mounts only from top-level
manifest entries, so nested mounts under `Dir(...)` were never unmounted
before snapshotting.
The fix moves that contract into shared manifest and mount helpers. The
manifest now computes effective ephemeral persistence paths from both
logical entry paths and resolved in-workspace mount targets, and it
exposes deepest-first ephemeral mount targets for backends that need
temporary unmount/remount around persistence. Docker is refactored onto
those shared helpers, Unix-local and Modal now exclude relocated mount
targets consistently, and E2B now handles nested mounts plus
rollback-safe remount behavior when persistence setup fails partway
through.
The regression coverage locks the contract at both the shared-helper and
backend levels. The new tests cover relocated `mount_path` exclusion,
nested mount traversal and ordering, E2B unmount/remount rollback, and
Modal persistence modes so these backends do not drift apart again.
This pull request adds a dedicated Codex sandbox artifact and updates sandbox write plumbing so the GitHub release archive can be streamed directly into the target box before being extracted in place. - add target-aware Codex GitHub asset resolution for Linux and macOS, with Windows explicitly unsupported for now - auto-add a dummy-version Codex entry from the sandbox manifest at `codex_relpath` unless an explicit entry already exists there - stream write payloads through unix-local and Docker sandboxes so artifact downloads no longer need SDK-side buffering - add focused sandbox manifest, session, entry, and extract coverage for the new behavior
This pull request fixes a sandbox ordering bug that broke the contract of `InputGuardrail(run_in_parallel=False)` on the first turn. Before this change, both `Runner.run()` and `Runner.run_streamed()` called `SandboxRuntime.prepare_agent()` before running first-turn sequential input guardrails. In sandbox-backed runs, that meant a guardrail trip could still happen after sandbox side effects had already occurred. Concretely, a blocked run could still create and start a runner-owned sandbox session, start a stopped injected session, or apply capability-driven manifest deltas to an already-running injected session. That behavior is wrong because sequential input guardrails are supposed to run before the agent starts. For sandbox agents, "agent start" was effectively happening too early through sandbox preparation and session materialization. This change moves first-turn sequential input guardrails ahead of sandbox preparation for sandbox-enabled runs in both the non-streaming and streaming execution paths. Parallel input guardrails still run alongside the model work as before, so the fix is narrowly scoped to the blocking-before-start path. The streamed guardrail helper was also updated so early guardrail execution still records results correctly even before an agent span exists. The new regression tests cover all affected combinations: - non-streamed runner-owned sandbox sessions, - non-streamed running injected sessions, - streamed runner-owned sandbox sessions, - streamed running injected sessions. Each test verifies that when the guardrail trips, sandbox preparation produces no side effects: no session creation, no session start, and no live-session manifest materialization.
#18) This pull request fixes two correctness bugs in `WorkspaceJsonlSink` by preserving pre-existing outbox history on the first flush and by excluding ephemeral sink outputs through runtime-only persistence paths instead of mutating the manifest. It updates the shared sandbox session layer so Unix-local, Docker, Modal, and E2B persistence all honor the same skip-path set, which keeps durable siblings under existing directories while still pruning the sink outbox. It also adds regressions covering pre-populated outboxes, repeated flushes, existing-parent persistence, Docker staged-copy pruning, and Modal/E2B tar exclusion wiring.
This pull request fixes Modal snapshot-filesystem persistence so cleanup failures while stripping ephemeral paths fail closed instead of silently taking a snapshot. It checks the pre-snapshot `rm -rf` result before calling `snapshot_filesystem()`, restores the backed-up ephemeral payload when cleanup fails, and raises a structured archive-read error with exit-code context. The change also adds regression coverage for the non-zero cleanup path to ensure snapshotting is skipped and restore still runs.
This pull request fixes a behavioral mismatch where `SandboxRunConfig(session=...)` skipped `Capability.process_manifest()` while `session_state` resume flows and fresh `client.create(...)` sessions did not. In practice, that meant capability-provided manifest changes only worked in some sandbox startup modes. A capability that adds workspace scaffolding such as `README.md`, `cap.txt`, helper files, or other manifest-backed setup would behave correctly when the runtime created or resumed a session, but silently fail when a caller injected an already-created live session. The same agent and capability could therefore produce different instructions, different visible workspace files, and different tool preconditions depending only on whether the run used `session=...` or `session_state=...`. This was especially risky for long-lived injected sessions. Capability tools still bound to the session, but any files or manifest-backed instructions those tools expected were never added to `session.state.manifest`, and for already-running sessions they were never materialized into the workspace either. That creates a hard-to-diagnose failure mode where capability-dependent runs break only in the live-session configuration. This change makes injected live sessions follow the same manifest-processing path as the other sandbox acquisition modes. The runtime now applies capability manifest mutations to the injected session state, reapplies the manifest once for already-running injected sessions so capability-owned files exist without restarting the session, and preserves the existing caller-owned lifecycle semantics for injected sessions.
This pull request fixes sandbox ZIP extraction compatibility for helper-call paths that receive non-seekable archive streams. The change replaces the old `seekable()`-presence heuristic with an actual random-access probe in `src/agents/sandbox/session/archive_extraction.py`. Streams that already support `tell()` and `seek()` are passed through unchanged, while non-seekable streams are copied into a rewindable `SpooledTemporaryFile` before `zipfile.ZipFile(...)` reads them. This keeps the normal `BaseSandboxSession.extract()` behavior unchanged while making the lower-level ZIP helper correct for future direct callers. The pull request also removes the now-unused private `_zipfile_compatible_stream()` seam from `src/agents/sandbox/session/base_sandbox_session.py` and updates `tests/test_sandbox_extract.py` to cover both valid random-access streams without a `seekable()` method and streams whose `seekable()` method explicitly returns `False`.
This pull request adds a sandbox `Skills` capability and ports the tax-prep packaged-agent demo onto the sandbox runtime used in this repository. This branch includes four follow-up commits: - `f25d142f` fixes Modal archive writes and hardens `ls` path parsing. It also renames the Docker runner example to `examples/sandbox/basic.py`, updates the extensions README, and adds regression coverage. - `62a37eda` makes sandbox Codex installs ephemeral by default so Codex artifacts are treated as runtime-only state. - `3cb9e996` adds the new skills capability, focused tests, a sandbox-backed apply-patch helper for examples, and a Docker-based `examples/sandbox/tax_prep.py` demo with sample tax PDFs. - `88e45e3e` thins the sandbox skills capability so it only mounts skills into a Codex auto-discovery root (defaulting to `.agents/skills`) and no longer injects prompt-side skill indexes or custom skill instructions. Notes: - I intentionally left unrelated local files uncommitted, including `2026-03-16__victoria_zheng/`. - The full local verification stack now passes: `make format`, `make lint`, `make typecheck`, and `make tests`.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b7b7b60fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.