Skip to content

feat(sandbox): integrate OpenShell sandbox and supervisor session management tools#1052

Draft
artdroz wants to merge 3 commits into
mainfrom
prebuild/feat/openshell-sandbox-session-tools
Draft

feat(sandbox): integrate OpenShell sandbox and supervisor session management tools#1052
artdroz wants to merge 3 commits into
mainfrom
prebuild/feat/openshell-sandbox-session-tools

Conversation

@artdroz
Copy link
Copy Markdown
Collaborator

@artdroz artdroz commented Mar 30, 2026

Summary

  • OpenShell Sandbox Integration: Custom dynamic agents can now run inside isolated OpenShell sandbox containers with configurable network/filesystem policies, providing secure code execution via deepagents SandboxBackendProtocol
  • Supervisor Session Management Tools: The single-node supervisor gains 7 tools (sessions_spawn, sessions_yield, sessions_send, sessions_history, sessions_list, session_status, subagents) for multi-agent orchestration
  • UI Enhancements: Top-level Policy tab with live YAML editor and event stream, sandbox status badges with error surfacing, host/sandbox tool distinction in builtin tools picker, conversation history viewer with scroll fix
  • Sandbox SSL/Auth Hardening: Auto-inject OpenShell proxy CA cert into sandbox trust store, git credential helper for PAT auth, and expanded default policy for full GitHub/git operations

Architecture

UI (:3000) ---- Supervisor A2A (:8000) ---- LLM (Bedrock)
    |                    |
    |              session_tools ---- Dynamic Agents API (:8100)
    |                                       |
    +------------------------------- Agent Runtime
                                      |          |
                                  MCP Servers   OpenShell Sandbox (gRPC)
                                  (host-side)    +-- execute, filesystem
                                                 +-- Policy Engine

How the Sandbox Works

Sandbox Lifecycle

Each custom dynamic agent can opt into sandbox execution via SandboxConfig (stored in MongoDB). When sandbox.enabled = true, the following lifecycle runs during agent initialization:

Agent Create/Update (UI)           Agent Runtime Init (first chat)
        |                                    |
  SandboxConfig saved               _setup_sandbox_backend()
  to MongoDB with:                          |
  - enabled: true                  SandboxManager.get_or_create_sandbox()
  - policy_template: permissive             |
  - sandbox_name: "da-coding-agent"  +------+------+
                                     |             |
                              sandbox exists?   no -> CreateSandbox gRPC
                                     |             |   (named, persistent)
                                   yes             |
                                     |        wait_ready()
                                     +------+------+
                                            |
                                    SandboxSession connected
                                            |
                                    OpenShellBackend wraps session
                                            |
                                    initialize_policy()
                                    (template -> YAML -> CLI -> hot-reload)
                                            |
                                    _configure_sandbox_env()
                                      |                |
                              _inject_ca_cert()   _inject_git_credentials()
                              (gateway CA ->      (PAT -> credential helper
                               trust store)        + GIT_TERMINAL_PROMPT=0)
                                            |
                                    CompositeBackend(default=OpenShellBackend)
                                    passed to create_deep_agent(backend=...)

Gateway Discovery & Provisioning

The SandboxManager singleton connects to the OpenShell Gateway in one of two modes:

  1. Explicit endpoint (OPENSHELL_GATEWAY env var) -- connects directly via SandboxClient. When the endpoint is an HTTPS URL, the manager parses it and uses SandboxClient.from_active_cluster() which correctly handles TLS and strips the scheme for gRPC. When it's a bare host:port, it connects directly.
  2. Local auto-start -- if no endpoint is set, the manager runs openshell gateway start to spin up a local k3s-in-Docker gateway (idempotent, ~60s first run). It then waits for gateway metadata files at ~/.config/openshell/gateways/<name>/metadata.json before connecting via SandboxClient.from_active_cluster().

All openshell CLI invocations use the configurable openshell_cli_path setting, pass explicit --gateway <name> flags, and strip the OPENSHELL_GATEWAY env var from subprocess environments to prevent the CLI from misinterpreting URLs as gateway names.

Sandbox Identity & Persistence

  • Each agent gets a named sandbox derived from its agent ID: da-{agent_id} (e.g., da-coding-agent)
  • Named sandboxes are persistent -- the same container is reused across chat sessions, so cloned repos, installed packages, and filesystem state survive between conversations
  • On first chat, get_or_create_sandbox() first tries client.get(name) to reconnect to an existing sandbox; only creates a new one via CreateSandbox gRPC if none exists
  • The SandboxSession object wraps the gRPC connection and provides exec() for command execution

Command Execution (OpenShellBackend)

The OpenShellBackend implements the deepagents SandboxBackendProtocol:

  • execute(command) -- runs shell commands inside the sandbox. All deepagents filesystem tools (read_file, write_file, edit_file, grep, glob, ls) are inherited from BaseSandbox and delegate to execute() under the hood.
  • gRPC newline workaround -- OpenShell's gRPC protocol rejects command arguments containing \n or \r. When the LLM produces multi-line commands, we pipe them via stdin (session.exec(["bash"], stdin=command.encode())) instead of passing as a -c argument.
  • upload_files() -- pipes file content via stdin to cat > dest inside the sandbox
  • download_files() -- reads files via base64 encoding and decodes on the host

The backend is wrapped in a CompositeBackend(default=OpenShellBackend) and passed to create_deep_agent(backend=...), which routes all filesystem and execute middleware through it.

Policy Engine

Policies control what the sandbox can access at the network and filesystem level. They are applied via the openshell CLI and hot-reloaded without restarting the sandbox.

Policy templates:

Template Network Filesystem Use Case
permissive PyPI, npm, GitHub (incl. codeload), AWS Bedrock, Azure OpenAI RW: /sandbox, /tmp, /workspace, CA cert paths; RO: /usr, /lib, /proc, /etc Default for coding agents that need package managers and LLM APIs
restrictive None (fully air-gapped) RW: /sandbox, /tmp, CA cert paths; RO: /usr, /lib, /proc, /etc High-security workloads
custom User-defined YAML User-defined Full control via YAML editor in UI

Network rules are scoped by endpoint (host:port) and binary path. For example, the pypi rule only allows /sandbox/.venv/bin/pip and /usr/local/bin/uv to reach pypi.org:443 -- even if curl runs in the sandbox, it cannot access PyPI. The github rule includes all binaries needed for git operations: git, git-remote-https, git-core/**, gh, curl, wget, and Python interpreters.

Policy lifecycle:

  1. initialize_policy(sandbox_name, template) -- builds YAML from template, writes to temp file, applies via openshell policy set <name> --gateway <gw> --policy <file> --wait
  2. update_policy() -- same flow, used for live edits from the UI Policy tab
  3. add_allow_rule() / remove_rule() -- mutate the in-memory policy dict and hot-reload
  4. cleanup_temporary_rules() -- removes rules marked _temporary (session-scoped)

Policy status is monitored via _query_policy_status() which parses openshell policy get <name> --gateway <gw> CLI output for Status: loaded|failed and error details.

SSL/TLS Trust Setup

The OpenShell proxy performs TLS interception on all outbound network traffic from the sandbox. Without the proxy's CA certificate in the sandbox trust store, git clone, pip install, curl, and Python requests all fail with "server certificate verification failed".

On sandbox init, _inject_ca_cert():

  1. Reads the gateway CA cert from ~/.config/openshell/gateways/<name>/mtls/ca.crt on the host
  2. Writes it to /usr/local/share/ca-certificates/openshell-proxy.crt inside the sandbox
  3. Runs update-ca-certificates (or appends to the bundle as fallback)
  4. Sets git config --global http.sslCAInfo /etc/ssl/certs/ca-certificates.crt

Both the permissive and restrictive policy templates include /usr/local/share/ca-certificates and /etc/ssl/certs as read-write paths to allow this injection.

Credential Injection

_inject_git_credentials() runs after CA cert setup to configure GitHub authentication:

  • Creates a git credential helper script at /sandbox/.git-credentials/helper.sh that returns the PAT using git's credential protocol (protocol=https, host=github.com, username=x-access-token, password=<PAT>)
  • Configures git config --global credential.helper to use this script
  • Sets GIT_TERMINAL_PROMPT=0 in .bashrc and .profile to prevent git from ever hanging on interactive auth prompts

This approach is more robust than the previous url.insteadOf method, which could trigger password prompts on some git versions when the PAT was invalid or expired.

Event Streaming (Pub/Sub)

The SandboxManager implements a pub/sub pattern for sandbox events (denials, policy updates):

  • subscribe(sandbox_name) returns a dedicated asyncio.Queue per consumer
  • _broadcast() pushes events to all subscriber queues (dead queues are auto-pruned)
  • Multiple SSE clients (chat panel, Policy tab) each get their own queue and receive all events independently
  • start_watch() runs a background asyncio.Task that polls /proc/openshell/denials inside the sandbox and broadcasts denial events

Host-Side Tools vs Sandbox Tools

When sandbox is enabled, tools are split into two execution domains:

Domain Tools Execution
Sandbox (inside container) execute, read_file, write_file, edit_file, grep, glob, ls Via OpenShellBackend -> gRPC -> sandbox container
Host (outside sandbox) fetch_url, current_datetime, user_info, sleep, MCP tools In the Dynamic Agents Python process, NOT subject to sandbox policies

The BuiltinToolDefinition model includes runs_in_sandbox: bool and sandbox_warning: str | None metadata so the UI can clearly label which tools bypass the sandbox. When fetch_url is enabled on a sandboxed agent, the backend logs a warning and the UI shows an amber "host" badge.


Backend Changes

OpenShell Sandbox (4 new files)

  • openshell_backend.py -- SandboxBackendProtocol implementation wrapping SandboxSession. Pipes multi-line commands via stdin to work around gRPC newline restriction.
  • sandbox.py -- SandboxManager singleton for sandbox lifecycle, policy management, and pub/sub event broadcasting. Handles gateway URL parsing for gRPC, configurable CLI path, and env sanitization for subprocess calls.
  • sandbox_policy.py -- Permissive/restrictive policy templates (PyPI, npm, GitHub, AWS Bedrock, Azure OpenAI) with rule mutation helpers. Includes CA cert write paths and comprehensive GitHub binary allowlists.
  • routes/sandbox.py -- REST/SSE endpoints for status, live events, policy updates, and allow-rule management.

Agent Runtime

  • Sets up OpenShellBackend with CompositeBackend when sandbox is enabled
  • _inject_ca_cert(): Reads OpenShell gateway CA cert from host, installs into sandbox trust store, configures git sslCAInfo
  • _inject_git_credentials(): Creates git credential helper script with PAT, sets GIT_TERMINAL_PROMPT=0
  • Logs warnings for host-side tools (fetch_url) running outside sandbox
  • Policy initialization with error detection and reporting

Session Management Tools (1 new file)

  • session_tools.py -- 7 LangChain tools communicating with Dynamic Agents API via HTTP:
    • sessions_spawn: fire-and-forget (background thread) or blocking mode
    • sessions_yield: stabilization-based polling (waits for message count to settle over 2+ consecutive polls before returning)
    • sessions_send/history/list/status: conversation management
    • subagents: agent discovery and status

Models and Config

  • SandboxConfig, SandboxPolicyTemplate Pydantic models
  • BuiltinToolDefinition extended with runs_in_sandbox and sandbox_warning metadata
  • OpenShell settings in config.py (gateway, gateway_name, timeout, cli_path)
  • GitHub MCP server in config.yaml

Frontend Changes

New Components (5 files)

  • SandboxPolicyTab.tsx -- Top-level Policy tab: per-agent policy editor, network rules display, filesystem permissions, live event stream
  • SandboxPolicyPanel.tsx -- Inline policy editor for chat sidebar
  • SandboxDenialCard.tsx / SandboxRequestStream.tsx / SandboxToolCard.tsx -- Event rendering components

Enhanced Components

  • DynamicAgentEditor.tsx -- Sandbox toggle with name, policy template selector, custom YAML editor
  • DynamicAgentContext.tsx -- SandboxStatusBadge with policy error surfacing, allow-rule form with visible errors
  • BuiltinToolsPicker.tsx -- host badge on non-sandbox tools when sandbox enabled, warning banners, info note
  • ConversationsTab.tsx -- Scroll fix (native overflow-y-auto), agent grouping, click-to-view history
  • page.tsx -- Added Policy tab to Custom Agents page

New API Routes (4 files)

  • /api/dynamic-agents/sandbox/events/[agentId] -- SSE proxy
  • /api/dynamic-agents/sandbox/status/[agentId] -- Status proxy
  • /api/dynamic-agents/sandbox/policy/[agentId] -- Policy proxy
  • /api/dynamic-agents/conversations/[id]/messages -- Messages proxy

Stats

  • 50+ files changed: 34 modified, 16+ new
  • +5,800 / -240 lines

Test plan

  • Create a custom agent with sandbox enabled (permissive policy) -- verify sandbox provisions and policy loads
  • Chat with sandboxed agent -- verify execute and filesystem tools work inside sandbox
  • Verify git clone works inside sandbox (CA cert + credential helper)
  • Verify pip install works inside sandbox (CA cert trusted)
  • Check Policy tab shows agent with correct policy, network rules, and filesystem permissions
  • Verify live event stream shows sandbox events in Policy tab
  • Update policy via YAML editor -- verify changes apply
  • Create agent without sandbox -- verify no sandbox-related UI elements
  • Enable fetch_url on sandboxed agent -- verify host badge and warning appear
  • Use supervisor to sessions_spawn a task to the coding agent -- verify fire-and-forget works
  • Use sessions_yield -- verify it waits for completion (not returning immediately)
  • Use sessions_history -- verify transcript is returned
  • Click a conversation in Conversations tab -- verify scroll works and messages display
  • Verify sandbox status badge shows errors when policy fails

TODO: Follow-up Work

Kubernetes Chart / Deployment Model

  • Add Helm chart values for OpenShell Gateway sidecar or DaemonSet deployment alongside dynamic agents
  • Define per-agent sandbox provisioning in Kubernetes (one sandbox container per custom subagent, not shared)
  • Configure sandbox resource limits (CPU, memory, ephemeral storage) in chart values with sensible defaults
  • Add OpenShell Gateway service discovery (internal ClusterIP service, gRPC port) so dynamic agents auto-connect
  • Support sandbox image registry configuration (private registry, image pull secrets)
  • Add sandbox readiness/liveness probes to the Helm chart for health monitoring
  • Configure persistent volume claims for sandbox workspace if state needs to survive restarts
  • Add NetworkPolicy resources to Kubernetes to enforce sandbox egress rules at the cluster level (defense-in-depth alongside OpenShell policy)
  • Support multi-tenant sandbox isolation -- namespace-per-tenant or pod-security-standards for sandbox pods
  • Add Helm chart tests validating sandbox provisioning and policy application on deploy

Live Policy Validation & Enforcement

  • Validate policy YAML schema before applying -- reject malformed policies with clear error messages in the UI
  • Add dry-run mode for policy changes -- preview what would be allowed/denied before committing
  • Implement policy diff view in the UI -- show what changed between current and proposed policy
  • Add real-time policy violation alerts (toast notifications) when sandbox denials occur during agent execution
  • Support policy versioning -- track policy change history per agent with rollback capability
  • Add policy audit log -- record who changed what policy and when (for compliance)
  • Validate network rules against DNS resolution -- warn if a rule references an unreachable host
  • Add policy presets library -- curated templates beyond permissive/restrictive (e.g., "GitHub-only", "AWS-only", "air-gapped")
  • Support wildcard validation in network rules -- warn on overly broad patterns (e.g., *:*)
  • Enforce minimum-security baseline -- prevent disabling all network restrictions or granting unrestricted filesystem access
  • Add CI/CD policy-as-code support -- allow defining agent sandbox policies in git alongside agent configs

…agement tools

Add OpenShell sandbox support for custom dynamic agents with policy-based
network/filesystem isolation, and session management tools for multi-agent
orchestration from the supervisor.

Sandbox integration:
- OpenShellBackend implementing deepagents SandboxBackendProtocol
- SandboxManager with pub/sub event broadcasting for concurrent SSE clients
- Policy templates (permissive/restrictive/custom) with live YAML editing
- Sandbox routes for status, events (SSE), policy updates, and allow rules
- GitHub PAT injection into sandbox via git config URL rewriting
- gRPC newline workaround (pipe multi-line commands via stdin)

Supervisor session tools:
- sessions_spawn/yield/send/history/list/status and subagents tools
- Fire-and-forget spawning with background thread for non-blocking mode
- Stabilization-based yield polling (waits for message count to settle)
- System prompt guidance for timing and tool usage patterns

UI enhancements:
- Top-level Policy tab with live YAML editor and request stream
- Sandbox status badges with error surfacing in chat sidebar
- Built-in tools picker shows host/sandbox distinction with warnings
- Conversations tab with scroll fix and click-to-view session history
- SSE event components for sandbox denials and tool executions

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@github-actions
Copy link
Copy Markdown
Contributor

📊 Test Coverage Report

Main Tests Coverage

Metric Coverage Details
Lines 37.4% 7616/20348 lines
Branches 0.0% 0/0 branches

📁 Coverage Artifacts

  • Main tests: coverage-reports-main artifact
  • RAG tests: coverage-reports-rag artifact (not available)
  • Download artifacts to view detailed HTML coverage reports

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

🧪 CAIPE UI Test Results

All tests passed

🔴 Overall Coverage: 29%

Coverage
lines
statements
functions
branches

📊 Detailed Coverage

Metric Covered Total Percentage
Lines 6485 20681 31.35%
Statements 6956 22557 30.83%
Functions 1162 4430 26.23%
Branches 4333 16052 26.99%

✅ Test Suites

  • ✅ auth-guard.test.tsx - Route protection & authorization
  • ✅ token-expiry-guard.test.tsx - Token expiry handling
  • ✅ a2a-sdk-client.test.ts - A2A streaming SDK
  • ✅ auth-utils.test.ts - Authentication utilities (100% coverage)
  • ✅ auth-config.test.ts - OIDC configuration
📈 Coverage Thresholds
Threshold Target Current Status
Minimum 40% 29% ❌ Fail
Good 60% 29% ⚠️ Below target
Excellent 80% 29% ⚠️ Below target
⚠️ Areas Needing Tests

High Priority:

  • hooks/use-a2a-streaming.ts - Core streaming functionality
  • store/chat-store.ts - Chat state management
  • store/agent-skills-store.ts - Agent skills
  • lib/api-client.ts - API communication
  • lib/storage-mode.ts - MongoDB/localStorage switching

Medium Priority:

  • components/chat/ChatPanel.tsx - Main chat interface
  • components/agent-builder/* - Agent builder UI
  • lib/mongodb.ts - MongoDB integration

💡 Run locally: make caipe-ui-tests
📦 Full report: Check workflow artifacts

@github-actions
Copy link
Copy Markdown
Contributor

🐳 Prebuild Docker Image Published

Repository: ghcr.io/cnoe-io/prebuild/ai-platform-engineering
Tag: feat-openshell-sandbox-session-tools-2

Usage

docker pull ghcr.io/cnoe-io/prebuild/ai-platform-engineering:feat-openshell-sandbox-session-tools-2

Note: This prebuild image will be automatically cleaned up when the PR is closed or merged.

Add local copy of wrap_tools_with_error_handling utility (dynamic_agents
has its own venv and cannot import from the parent package) and apply
it in _build_subagent_tools so MCP tool failures return error messages
to the LLM instead of crashing the subagent graph.

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>
Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@github-actions
Copy link
Copy Markdown
Contributor

📊 Test Coverage Report

Main Tests Coverage

Metric Coverage Details
Lines 37.4% 7616/20348 lines
Branches 0.0% 0/0 branches

📁 Coverage Artifacts

  • Main tests: coverage-reports-main artifact
  • RAG tests: coverage-reports-rag artifact (not available)
  • Download artifacts to view detailed HTML coverage reports

@github-actions
Copy link
Copy Markdown
Contributor

🐳 Prebuild Docker Image Published

Repository: ghcr.io/cnoe-io/prebuild/ai-platform-engineering
Tag: feat-openshell-sandbox-session-tools-3

Usage

docker pull ghcr.io/cnoe-io/prebuild/ai-platform-engineering:feat-openshell-sandbox-session-tools-3

Note: This prebuild image will be automatically cleaned up when the PR is closed or merged.

…t/SSL

The OpenShell proxy performs TLS interception, causing git/curl/pip
inside the sandbox to fail with 'server certificate verification failed'.
The previous url.insteadOf approach for PAT injection could also trigger
password prompts.

Changes:
- Inject OpenShell gateway CA cert into sandbox trust store on init
- Replace url.insteadOf with a proper git credential helper script
- Set GIT_TERMINAL_PROMPT=0 to prevent git from hanging on auth prompts
- Add CA cert paths as read-write in both policy templates
- Add codeload.github.com, git-remote-https, git-core/** to GitHub policy
- Add openshell_cli_path setting for configurable CLI path
- Fix sandbox.py to strip OPENSHELL_GATEWAY env and use --gateway flag

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@github-actions
Copy link
Copy Markdown
Contributor

📊 Test Coverage Report

Main Tests Coverage

Metric Coverage Details
Lines 37.4% 7616/20348 lines
Branches 0.0% 0/0 branches

📁 Coverage Artifacts

  • Main tests: coverage-reports-main artifact
  • RAG tests: coverage-reports-rag artifact (not available)
  • Download artifacts to view detailed HTML coverage reports

@github-actions
Copy link
Copy Markdown
Contributor

🐳 Prebuild Docker Image Published

Repository: ghcr.io/cnoe-io/prebuild/ai-platform-engineering
Tag: feat-openshell-sandbox-session-tools-4

Usage

docker pull ghcr.io/cnoe-io/prebuild/ai-platform-engineering:feat-openshell-sandbox-session-tools-4

Note: This prebuild image will be automatically cleaned up when the PR is closed or merged.

@sriaradhyula sriaradhyula added this to the 0.5.0 milestone Apr 12, 2026
@sriaradhyula sriaradhyula added 0.6.0 and removed 0.5.0 labels May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

2 participants