feat(sandbox): integrate OpenShell sandbox and supervisor session management tools by artdroz · Pull Request #1052 · cnoe-io/ai-platform-engineering

artdroz · 2026-03-30T12:57:33Z

Summary

OpenShell Sandbox Integration: Custom dynamic agents can now run inside isolated OpenShell sandbox containers with configurable network/filesystem policies, providing secure code execution via deepagents SandboxBackendProtocol
Supervisor Session Management Tools: The single-node supervisor gains 7 tools (sessions_spawn, sessions_yield, sessions_send, sessions_history, sessions_list, session_status, subagents) for multi-agent orchestration
UI Enhancements: Top-level Policy tab with live YAML editor and event stream, sandbox status badges with error surfacing, host/sandbox tool distinction in builtin tools picker, conversation history viewer with scroll fix
Sandbox SSL/Auth Hardening: Auto-inject OpenShell proxy CA cert into sandbox trust store, git credential helper for PAT auth, and expanded default policy for full GitHub/git operations

Architecture

UI (:3000) ---- Supervisor A2A (:8000) ---- LLM (Bedrock)
    |                    |
    |              session_tools ---- Dynamic Agents API (:8100)
    |                                       |
    +------------------------------- Agent Runtime
                                      |          |
                                  MCP Servers   OpenShell Sandbox (gRPC)
                                  (host-side)    +-- execute, filesystem
                                                 +-- Policy Engine

How the Sandbox Works

Sandbox Lifecycle

Each custom dynamic agent can opt into sandbox execution via SandboxConfig (stored in MongoDB). When sandbox.enabled = true, the following lifecycle runs during agent initialization:

Agent Create/Update (UI)           Agent Runtime Init (first chat)
        |                                    |
  SandboxConfig saved               _setup_sandbox_backend()
  to MongoDB with:                          |
  - enabled: true                  SandboxManager.get_or_create_sandbox()
  - policy_template: permissive             |
  - sandbox_name: "da-coding-agent"  +------+------+
                                     |             |
                              sandbox exists?   no -> CreateSandbox gRPC
                                     |             |   (named, persistent)
                                   yes             |
                                     |        wait_ready()
                                     +------+------+
                                            |
                                    SandboxSession connected
                                            |
                                    OpenShellBackend wraps session
                                            |
                                    initialize_policy()
                                    (template -> YAML -> CLI -> hot-reload)
                                            |
                                    _configure_sandbox_env()
                                      |                |
                              _inject_ca_cert()   _inject_git_credentials()
                              (gateway CA ->      (PAT -> credential helper
                               trust store)        + GIT_TERMINAL_PROMPT=0)
                                            |
                                    CompositeBackend(default=OpenShellBackend)
                                    passed to create_deep_agent(backend=...)

Gateway Discovery & Provisioning

The SandboxManager singleton connects to the OpenShell Gateway in one of two modes:

Explicit endpoint (OPENSHELL_GATEWAY env var) -- connects directly via SandboxClient. When the endpoint is an HTTPS URL, the manager parses it and uses SandboxClient.from_active_cluster() which correctly handles TLS and strips the scheme for gRPC. When it's a bare host:port, it connects directly.
Local auto-start -- if no endpoint is set, the manager runs openshell gateway start to spin up a local k3s-in-Docker gateway (idempotent, ~60s first run). It then waits for gateway metadata files at ~/.config/openshell/gateways/<name>/metadata.json before connecting via SandboxClient.from_active_cluster().

All openshell CLI invocations use the configurable openshell_cli_path setting, pass explicit --gateway <name> flags, and strip the OPENSHELL_GATEWAY env var from subprocess environments to prevent the CLI from misinterpreting URLs as gateway names.

Sandbox Identity & Persistence

Each agent gets a named sandbox derived from its agent ID: da-{agent_id} (e.g., da-coding-agent)
Named sandboxes are persistent -- the same container is reused across chat sessions, so cloned repos, installed packages, and filesystem state survive between conversations
On first chat, get_or_create_sandbox() first tries client.get(name) to reconnect to an existing sandbox; only creates a new one via CreateSandbox gRPC if none exists
The SandboxSession object wraps the gRPC connection and provides exec() for command execution

Command Execution (OpenShellBackend)

The OpenShellBackend implements the deepagents SandboxBackendProtocol:

execute(command) -- runs shell commands inside the sandbox. All deepagents filesystem tools (read_file, write_file, edit_file, grep, glob, ls) are inherited from BaseSandbox and delegate to execute() under the hood.
gRPC newline workaround -- OpenShell's gRPC protocol rejects command arguments containing \n or \r. When the LLM produces multi-line commands, we pipe them via stdin (session.exec(["bash"], stdin=command.encode())) instead of passing as a -c argument.
upload_files() -- pipes file content via stdin to cat > dest inside the sandbox
download_files() -- reads files via base64 encoding and decodes on the host

The backend is wrapped in a CompositeBackend(default=OpenShellBackend) and passed to create_deep_agent(backend=...), which routes all filesystem and execute middleware through it.

Policy Engine

Policies control what the sandbox can access at the network and filesystem level. They are applied via the openshell CLI and hot-reloaded without restarting the sandbox.

Policy templates:

Template	Network	Filesystem	Use Case
`permissive`	PyPI, npm, GitHub (incl. codeload), AWS Bedrock, Azure OpenAI	RW: `/sandbox`, `/tmp`, `/workspace`, CA cert paths; RO: `/usr`, `/lib`, `/proc`, `/etc`	Default for coding agents that need package managers and LLM APIs
`restrictive`	None (fully air-gapped)	RW: `/sandbox`, `/tmp`, CA cert paths; RO: `/usr`, `/lib`, `/proc`, `/etc`	High-security workloads
`custom`	User-defined YAML	User-defined	Full control via YAML editor in UI

Network rules are scoped by endpoint (host:port) and binary path. For example, the pypi rule only allows /sandbox/.venv/bin/pip and /usr/local/bin/uv to reach pypi.org:443 -- even if curl runs in the sandbox, it cannot access PyPI. The github rule includes all binaries needed for git operations: git, git-remote-https, git-core/**, gh, curl, wget, and Python interpreters.

Policy lifecycle:

initialize_policy(sandbox_name, template) -- builds YAML from template, writes to temp file, applies via openshell policy set <name> --gateway <gw> --policy <file> --wait
update_policy() -- same flow, used for live edits from the UI Policy tab
add_allow_rule() / remove_rule() -- mutate the in-memory policy dict and hot-reload
cleanup_temporary_rules() -- removes rules marked _temporary (session-scoped)

Policy status is monitored via _query_policy_status() which parses openshell policy get <name> --gateway <gw> CLI output for Status: loaded|failed and error details.

SSL/TLS Trust Setup

The OpenShell proxy performs TLS interception on all outbound network traffic from the sandbox. Without the proxy's CA certificate in the sandbox trust store, git clone, pip install, curl, and Python requests all fail with "server certificate verification failed".

On sandbox init, _inject_ca_cert():

Reads the gateway CA cert from ~/.config/openshell/gateways/<name>/mtls/ca.crt on the host
Writes it to /usr/local/share/ca-certificates/openshell-proxy.crt inside the sandbox
Runs update-ca-certificates (or appends to the bundle as fallback)
Sets git config --global http.sslCAInfo /etc/ssl/certs/ca-certificates.crt

Both the permissive and restrictive policy templates include /usr/local/share/ca-certificates and /etc/ssl/certs as read-write paths to allow this injection.

Credential Injection

_inject_git_credentials() runs after CA cert setup to configure GitHub authentication:

Creates a git credential helper script at /sandbox/.git-credentials/helper.sh that returns the PAT using git's credential protocol (protocol=https, host=github.com, username=x-access-token, password=<PAT>)
Configures git config --global credential.helper to use this script
Sets GIT_TERMINAL_PROMPT=0 in .bashrc and .profile to prevent git from ever hanging on interactive auth prompts

This approach is more robust than the previous url.insteadOf method, which could trigger password prompts on some git versions when the PAT was invalid or expired.

Event Streaming (Pub/Sub)

The SandboxManager implements a pub/sub pattern for sandbox events (denials, policy updates):

subscribe(sandbox_name) returns a dedicated asyncio.Queue per consumer
_broadcast() pushes events to all subscriber queues (dead queues are auto-pruned)
Multiple SSE clients (chat panel, Policy tab) each get their own queue and receive all events independently
start_watch() runs a background asyncio.Task that polls /proc/openshell/denials inside the sandbox and broadcasts denial events

Host-Side Tools vs Sandbox Tools

When sandbox is enabled, tools are split into two execution domains:

Domain	Tools	Execution
Sandbox (inside container)	`execute`, `read_file`, `write_file`, `edit_file`, `grep`, `glob`, `ls`	Via `OpenShellBackend` -> gRPC -> sandbox container
Host (outside sandbox)	`fetch_url`, `current_datetime`, `user_info`, `sleep`, MCP tools	In the Dynamic Agents Python process, NOT subject to sandbox policies

The BuiltinToolDefinition model includes runs_in_sandbox: bool and sandbox_warning: str | None metadata so the UI can clearly label which tools bypass the sandbox. When fetch_url is enabled on a sandboxed agent, the backend logs a warning and the UI shows an amber "host" badge.

Backend Changes

OpenShell Sandbox (4 new files)

openshell_backend.py -- SandboxBackendProtocol implementation wrapping SandboxSession. Pipes multi-line commands via stdin to work around gRPC newline restriction.
sandbox.py -- SandboxManager singleton for sandbox lifecycle, policy management, and pub/sub event broadcasting. Handles gateway URL parsing for gRPC, configurable CLI path, and env sanitization for subprocess calls.
sandbox_policy.py -- Permissive/restrictive policy templates (PyPI, npm, GitHub, AWS Bedrock, Azure OpenAI) with rule mutation helpers. Includes CA cert write paths and comprehensive GitHub binary allowlists.
routes/sandbox.py -- REST/SSE endpoints for status, live events, policy updates, and allow-rule management.

Agent Runtime

Sets up OpenShellBackend with CompositeBackend when sandbox is enabled
_inject_ca_cert(): Reads OpenShell gateway CA cert from host, installs into sandbox trust store, configures git sslCAInfo
_inject_git_credentials(): Creates git credential helper script with PAT, sets GIT_TERMINAL_PROMPT=0
Logs warnings for host-side tools (fetch_url) running outside sandbox
Policy initialization with error detection and reporting

Session Management Tools (1 new file)

session_tools.py -- 7 LangChain tools communicating with Dynamic Agents API via HTTP:
- sessions_spawn: fire-and-forget (background thread) or blocking mode
- sessions_yield: stabilization-based polling (waits for message count to settle over 2+ consecutive polls before returning)
- sessions_send/history/list/status: conversation management
- subagents: agent discovery and status

Models and Config

SandboxConfig, SandboxPolicyTemplate Pydantic models
BuiltinToolDefinition extended with runs_in_sandbox and sandbox_warning metadata
OpenShell settings in config.py (gateway, gateway_name, timeout, cli_path)
GitHub MCP server in config.yaml

Frontend Changes

New Components (5 files)

SandboxPolicyTab.tsx -- Top-level Policy tab: per-agent policy editor, network rules display, filesystem permissions, live event stream
SandboxPolicyPanel.tsx -- Inline policy editor for chat sidebar
SandboxDenialCard.tsx / SandboxRequestStream.tsx / SandboxToolCard.tsx -- Event rendering components

Enhanced Components

DynamicAgentEditor.tsx -- Sandbox toggle with name, policy template selector, custom YAML editor
DynamicAgentContext.tsx -- SandboxStatusBadge with policy error surfacing, allow-rule form with visible errors
BuiltinToolsPicker.tsx -- host badge on non-sandbox tools when sandbox enabled, warning banners, info note
ConversationsTab.tsx -- Scroll fix (native overflow-y-auto), agent grouping, click-to-view history
page.tsx -- Added Policy tab to Custom Agents page

New API Routes (4 files)

/api/dynamic-agents/sandbox/events/[agentId] -- SSE proxy
/api/dynamic-agents/sandbox/status/[agentId] -- Status proxy
/api/dynamic-agents/sandbox/policy/[agentId] -- Policy proxy
/api/dynamic-agents/conversations/[id]/messages -- Messages proxy

Stats

50+ files changed: 34 modified, 16+ new
+5,800 / -240 lines

Test plan

TODO: Follow-up Work

Kubernetes Chart / Deployment Model

Live Policy Validation & Enforcement

…agement tools Add OpenShell sandbox support for custom dynamic agents with policy-based network/filesystem isolation, and session management tools for multi-agent orchestration from the supervisor. Sandbox integration: - OpenShellBackend implementing deepagents SandboxBackendProtocol - SandboxManager with pub/sub event broadcasting for concurrent SSE clients - Policy templates (permissive/restrictive/custom) with live YAML editing - Sandbox routes for status, events (SSE), policy updates, and allow rules - GitHub PAT injection into sandbox via git config URL rewriting - gRPC newline workaround (pipe multi-line commands via stdin) Supervisor session tools: - sessions_spawn/yield/send/history/list/status and subagents tools - Fire-and-forget spawning with background thread for non-blocking mode - Stabilization-based yield polling (waits for message count to settle) - System prompt guidance for timing and tool usage patterns UI enhancements: - Top-level Policy tab with live YAML editor and request stream - Sandbox status badges with error surfacing in chat sidebar - Built-in tools picker shows host/sandbox distinction with warnings - Conversations tab with scroll fix and click-to-view session history - SSE event components for sandbox denials and tool executions Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>

github-actions · 2026-03-30T12:57:52Z

✅ No proprietary content detected. This PR is clear for review!

github-actions · 2026-03-30T12:59:00Z

📊 Test Coverage Report

Main Tests Coverage

Metric	Coverage	Details
Lines	37.4%	7616/20348 lines
Branches	0.0%	0/0 branches

📁 Coverage Artifacts

Main tests: coverage-reports-main artifact
RAG tests: coverage-reports-rag artifact (not available)
Download artifacts to view detailed HTML coverage reports

github-actions · 2026-03-30T12:59:31Z

🧪 CAIPE UI Test Results

✅ All tests passed

🔴 Overall Coverage: 29%

📊 Detailed Coverage

Metric	Covered	Total	Percentage
Lines	6485	20681	31.35%
Statements	6956	22557	30.83%
Functions	1162	4430	26.23%
Branches	4333	16052	26.99%

✅ Test Suites

✅ auth-guard.test.tsx - Route protection & authorization
✅ token-expiry-guard.test.tsx - Token expiry handling
✅ a2a-sdk-client.test.ts - A2A streaming SDK
✅ auth-utils.test.ts - Authentication utilities (100% coverage)
✅ auth-config.test.ts - OIDC configuration

📈 Coverage Thresholds

Threshold	Target	Current	Status
Minimum	40%	29%	❌ Fail
Good	60%	29%	⚠️ Below target
Excellent	80%	29%	⚠️ Below target

⚠️ Areas Needing Tests

High Priority:

hooks/use-a2a-streaming.ts - Core streaming functionality
store/chat-store.ts - Chat state management
store/agent-skills-store.ts - Agent skills
lib/api-client.ts - API communication
lib/storage-mode.ts - MongoDB/localStorage switching

Medium Priority:

components/chat/ChatPanel.tsx - Main chat interface
components/agent-builder/* - Agent builder UI
lib/mongodb.ts - MongoDB integration

💡 Run locally: make caipe-ui-tests
📦 Full report: Check workflow artifacts

github-actions · 2026-03-30T13:05:21Z

🐳 Prebuild Docker Image Published

Repository: ghcr.io/cnoe-io/prebuild/ai-platform-engineering
Tag: feat-openshell-sandbox-session-tools-2

Usage

docker pull ghcr.io/cnoe-io/prebuild/ai-platform-engineering:feat-openshell-sandbox-session-tools-2

Note: This prebuild image will be automatically cleaned up when the PR is closed or merged.

Add local copy of wrap_tools_with_error_handling utility (dynamic_agents has its own venv and cannot import from the parent package) and apply it in _build_subagent_tools so MCP tool failures return error messages to the LLM instead of crashing the subagent graph. Signed-off-by: Arthur Drozdov <adrozdov@cisco.com> Made-with: Cursor

github-actions · 2026-03-30T14:35:30Z

✅ No proprietary content detected. This PR is clear for review!

github-actions · 2026-03-30T14:42:50Z

📊 Test Coverage Report

Main Tests Coverage

Metric	Coverage	Details
Lines	37.4%	7616/20348 lines
Branches	0.0%	0/0 branches

📁 Coverage Artifacts

Main tests: coverage-reports-main artifact
RAG tests: coverage-reports-rag artifact (not available)
Download artifacts to view detailed HTML coverage reports

github-actions · 2026-03-30T14:44:20Z

🐳 Prebuild Docker Image Published

Repository: ghcr.io/cnoe-io/prebuild/ai-platform-engineering
Tag: feat-openshell-sandbox-session-tools-3

Usage

docker pull ghcr.io/cnoe-io/prebuild/ai-platform-engineering:feat-openshell-sandbox-session-tools-3

Note: This prebuild image will be automatically cleaned up when the PR is closed or merged.

…t/SSL The OpenShell proxy performs TLS interception, causing git/curl/pip inside the sandbox to fail with 'server certificate verification failed'. The previous url.insteadOf approach for PAT injection could also trigger password prompts. Changes: - Inject OpenShell gateway CA cert into sandbox trust store on init - Replace url.insteadOf with a proper git credential helper script - Set GIT_TERMINAL_PROMPT=0 to prevent git from hanging on auth prompts - Add CA cert paths as read-write in both policy templates - Add codeload.github.com, git-remote-https, git-core/** to GitHub policy - Add openshell_cli_path setting for configurable CLI path - Fix sandbox.py to strip OPENSHELL_GATEWAY env and use --gateway flag Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>

github-actions · 2026-03-31T11:47:48Z

✅ No proprietary content detected. This PR is clear for review!

github-actions · 2026-03-31T11:48:51Z

📊 Test Coverage Report

Main Tests Coverage

Metric	Coverage	Details
Lines	37.4%	7616/20348 lines
Branches	0.0%	0/0 branches

📁 Coverage Artifacts

Main tests: coverage-reports-main artifact
RAG tests: coverage-reports-rag artifact (not available)
Download artifacts to view detailed HTML coverage reports

github-actions · 2026-03-31T11:56:30Z

🐳 Prebuild Docker Image Published

Repository: ghcr.io/cnoe-io/prebuild/ai-platform-engineering
Tag: feat-openshell-sandbox-session-tools-4

Usage

docker pull ghcr.io/cnoe-io/prebuild/ai-platform-engineering:feat-openshell-sandbox-session-tools-4

Note: This prebuild image will be automatically cleaned up when the PR is closed or merged.

github-project-automation Bot added this to CAIPE (AI Platform Engineering) Project Backlog Mar 30, 2026

sriaradhyula added the 0.5.0 label Apr 12, 2026

sriaradhyula added this to the 0.5.0 milestone Apr 12, 2026

sriaradhyula added 0.6.0 and removed 0.5.0 labels May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandbox): integrate OpenShell sandbox and supervisor session management tools#1052

feat(sandbox): integrate OpenShell sandbox and supervisor session management tools#1052
artdroz wants to merge 3 commits into
mainfrom
prebuild/feat/openshell-sandbox-session-tools

artdroz commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

artdroz commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

How the Sandbox Works

Sandbox Lifecycle

Gateway Discovery & Provisioning

Sandbox Identity & Persistence

Command Execution (OpenShellBackend)

Policy Engine

SSL/TLS Trust Setup

Credential Injection

Event Streaming (Pub/Sub)

Host-Side Tools vs Sandbox Tools

Backend Changes

OpenShell Sandbox (4 new files)

Agent Runtime

Session Management Tools (1 new file)

Models and Config

Frontend Changes

New Components (5 files)

Enhanced Components

New API Routes (4 files)

Stats

Test plan

TODO: Follow-up Work

Kubernetes Chart / Deployment Model

Live Policy Validation & Enforcement

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

📊 Test Coverage Report

Main Tests Coverage

📁 Coverage Artifacts

Uh oh!

github-actions Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CAIPE UI Test Results

🔴 Overall Coverage: 29%

📊 Detailed Coverage

✅ Test Suites

Uh oh!

github-actions Bot commented Mar 30, 2026

🐳 Prebuild Docker Image Published

Usage

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

📊 Test Coverage Report

Main Tests Coverage

📁 Coverage Artifacts

Uh oh!

github-actions Bot commented Mar 30, 2026

🐳 Prebuild Docker Image Published

Usage

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

📊 Test Coverage Report

Main Tests Coverage

📁 Coverage Artifacts

Uh oh!

github-actions Bot commented Mar 31, 2026

🐳 Prebuild Docker Image Published

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

artdroz commented Mar 30, 2026 •

edited

Loading

github-actions Bot commented Mar 30, 2026 •

edited

Loading