Skip to content

Commit ef78196

Browse files
authored
Merge pull request #63 from usnavy13/feat/bash-and-interop
feat: Bash execution, PTC, and session isolation with full test coverage
2 parents 5634235 + 1a6dcf4 commit ef78196

32 files changed

Lines changed: 4449 additions & 177 deletions

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111
- nsjail-based sandboxing for code execution (replaces Docker socket-based approach)
12-
- Single unified Docker image with all 12 language runtimes
12+
- Single unified Docker image with all 13 language runtimes
1313
- Hour and day periods for execution heatmap visualizations
1414
- MyPy type checking integration with comprehensive type hints
1515
- Dynamic Content Security Policy headers based on request path
@@ -33,7 +33,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3333
### Added
3434

3535
#### Core Features
36-
- Multi-language code execution supporting 12 languages: Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, and D
36+
- Multi-language code execution supporting 13 languages: Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, D, and Bash
3737
- FastAPI-based REST API with interactive documentation
3838
- Sandboxed execution environments with comprehensive security controls
3939
- Redis-based session management with automatic cleanup

Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,8 +267,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
267267
# REPL Server + entrypoint
268268
# ============================================
269269
COPY docker/repl_server.py /opt/repl_server.py
270+
COPY docker/ptc_server.py /opt/ptc_server.py
270271
COPY docker/entrypoint.sh /opt/entrypoint.sh
271-
RUN chmod +x /opt/repl_server.py /opt/entrypoint.sh
272+
RUN chmod +x /opt/repl_server.py /opt/ptc_server.py /opt/entrypoint.sh
272273

273274
# ============================================
274275
# Sandbox directory structure

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Get up and running in minutes by building the execution environment.
3030
docker build -t code-interpreter:nsjail .
3131
```
3232

33-
This builds a single image containing all 12 language runtimes and nsjail for sandboxed execution.
33+
This builds a single image containing all 13 language runtimes and nsjail for sandboxed execution.
3434

3535
4. **Start the API**
3636

@@ -55,7 +55,7 @@ The dashboard requires the master API key for authentication.
5555

5656
## Features
5757

58-
- **Multi-language Support**: Execute code in 12 languages - Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, and D
58+
- **Multi-language Support**: Execute code in 13 languages - Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, D, and Bash
5959
- **Sub-50ms Python Execution**: Pre-warmed REPL sandboxes achieve ~20-40ms latency for simple Python code
6060
- **Sandbox Pool**: Pre-warmed nsjail sandboxes provide ~3ms acquisition time (vs 500-2000ms cold start)
6161
- **High Concurrency**: Thread-safe execution supporting 10+ concurrent requests
@@ -88,7 +88,7 @@ For a deep dive into the system design, components, and request flows, see [ARCH
8888

8989
The API provides endpoints for code execution, file management, and session state control.
9090

91-
- `POST /exec`: Execute code in one of the 12 supported languages.
91+
- `POST /exec`: Execute code in one of the 13 supported languages.
9292
- `POST /upload`: Upload files for processing.
9393
- `GET /download`: Retrieve generated files.
9494

@@ -98,7 +98,7 @@ For detailed information on all endpoints and specific language notes, see [ARCH
9898

9999
## Supported Languages
100100

101-
We support 12 programming languages including Python, JavaScript, TypeScript, Go, Rust, and more. Each language has optimized execution paths and resource limits.
101+
We support 13 programming languages including Python, JavaScript, TypeScript, Go, Rust, Bash, and more. Each language has optimized execution paths and resource limits.
102102

103103
See the [Supported Languages table](docs/ARCHITECTURE.md#supported-languages) for details on versions and included libraries.
104104

docker/ptc_server.py

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
#!/usr/bin/env python3
2+
"""Programmatic Tool Calling (PTC) Server for nsjail sandbox execution.
3+
4+
This script runs INSIDE the nsjail sandbox and provides a Python execution
5+
environment where code can call externally-defined tools. Tool calls are
6+
serialized as JSON over stdin/stdout, allowing the host process to fulfill
7+
them and send results back.
8+
9+
Protocol:
10+
1. Host sends initial request via stdin:
11+
{"code": "...", "tools": [{"name": "...", "description": "...", "parameters": {...}}]}
12+
13+
2. Code executes. When a tool stub is called, PTC server writes to stdout:
14+
{"type": "tool_calls", "calls": [{"id": "...", "name": "...", "input": {...}}]}
15+
16+
3. Host reads tool_calls, fulfills them, and writes results to stdin:
17+
{"type": "tool_results", "results": [{"call_id": "...", "result": ..., "is_error": false}]}
18+
19+
4. Code continues. On completion, PTC server writes:
20+
{"type": "completed", "stdout": "...", "stderr": "..."}
21+
22+
5. On error, PTC server writes:
23+
{"type": "error", "error": "..."}
24+
"""
25+
26+
import asyncio
27+
import json
28+
import os
29+
import sys
30+
import traceback
31+
import uuid
32+
from io import StringIO
33+
34+
DELIMITER = "\n---PTC_END---\n"
35+
36+
# Keep references to the REAL stdin/stdout for protocol communication.
37+
# User code's print() will be redirected to a StringIO capture buffer.
38+
_real_stdin = sys.stdin
39+
_real_stdout = sys.stdout
40+
_real_stderr = sys.stderr
41+
42+
43+
def _write_message(msg: dict) -> None:
44+
"""Write a JSON message to the host via the real stdout."""
45+
data = json.dumps(msg) + DELIMITER
46+
_real_stdout.write(data)
47+
_real_stdout.flush()
48+
49+
50+
def _read_message() -> dict:
51+
"""Read a JSON message from the host via the real stdin."""
52+
buf = ""
53+
while True:
54+
line = _real_stdin.readline()
55+
if not line:
56+
raise EOFError("stdin closed")
57+
buf += line
58+
if DELIMITER in buf:
59+
json_part = buf.split(DELIMITER)[0]
60+
return json.loads(json_part)
61+
62+
63+
# Pending tool calls collected during async execution
64+
_pending_calls = []
65+
_tool_results_map = {} # call_id -> result
66+
67+
68+
def _make_tool_stub(tool_name: str) -> callable:
69+
"""Create an async function stub for a tool."""
70+
71+
async def tool_stub(**kwargs):
72+
call_id = uuid.uuid4().hex[:12]
73+
call_info = {
74+
"id": call_id,
75+
"name": tool_name,
76+
"input": kwargs,
77+
}
78+
_pending_calls.append(call_info)
79+
80+
# Wait for result - the main loop will flush calls and read results
81+
while call_id not in _tool_results_map:
82+
await asyncio.sleep(0.01)
83+
84+
result_info = _tool_results_map.pop(call_id)
85+
if result_info.get("is_error"):
86+
raise RuntimeError(
87+
result_info.get("error_message", "Tool call failed")
88+
)
89+
return result_info.get("result")
90+
91+
tool_stub.__name__ = tool_name
92+
tool_stub.__qualname__ = tool_name
93+
return tool_stub
94+
95+
96+
async def _execute_with_tools(
97+
code: str, tools: list, user_stdout: StringIO, user_stderr: StringIO
98+
) -> dict:
99+
"""Execute code with tool stubs, capturing user output."""
100+
global _pending_calls, _tool_results_map
101+
102+
_pending_calls = []
103+
_tool_results_map = {}
104+
105+
# Build namespace with tool stubs
106+
namespace = {"__builtins__": __builtins__, "__name__": "__main__"}
107+
108+
try:
109+
import json as _json
110+
111+
namespace["json"] = _json
112+
except ImportError:
113+
pass
114+
115+
for tool in tools:
116+
namespace[tool["name"]] = _make_tool_stub(tool["name"])
117+
118+
# Wrap user code in async function
119+
indented_code = "\n".join(" " + line for line in code.split("\n"))
120+
wrapped_code = f"async def __ptc_main__():\n{indented_code}\n"
121+
122+
try:
123+
compiled = compile(wrapped_code, "<ptc_code>", "exec")
124+
exec(compiled, namespace)
125+
except SyntaxError as e:
126+
return {"type": "error", "error": f"SyntaxError: {e}"}
127+
128+
main_func = namespace["__ptc_main__"]
129+
main_task = asyncio.ensure_future(main_func())
130+
131+
try:
132+
while not main_task.done():
133+
# Let the task run briefly to accumulate batched calls
134+
await asyncio.sleep(0.05)
135+
136+
if _pending_calls and not main_task.done():
137+
calls_to_send = list(_pending_calls)
138+
_pending_calls.clear()
139+
140+
_write_message({
141+
"type": "tool_calls",
142+
"calls": calls_to_send,
143+
})
144+
145+
# Wait for results from host
146+
response = _read_message()
147+
148+
if response.get("type") != "tool_results":
149+
return {
150+
"type": "error",
151+
"error": f"Expected tool_results, got "
152+
f"{response.get('type')}",
153+
}
154+
155+
for result in response.get("results", []):
156+
_tool_results_map[result["call_id"]] = result
157+
158+
# Task completed
159+
main_task.result()
160+
return {"type": "completed"}
161+
162+
except Exception as e:
163+
tb = traceback.format_exc()
164+
return {
165+
"type": "error",
166+
"error": str(e),
167+
"stderr_extra": tb,
168+
}
169+
170+
171+
def main():
172+
"""Main entry point for PTC server."""
173+
try:
174+
os.chdir("/mnt/data")
175+
except OSError:
176+
pass
177+
178+
# Read initial request
179+
try:
180+
request = _read_message()
181+
except Exception as e:
182+
_write_message({
183+
"type": "error",
184+
"error": f"Failed to read initial request: {e}",
185+
})
186+
return
187+
188+
code = request.get("code", "")
189+
tools = request.get("tools", [])
190+
191+
if not code:
192+
_write_message({"type": "error", "error": "No code provided"})
193+
return
194+
195+
# Redirect sys.stdout and sys.stderr so user print() calls
196+
# are captured, not mixed with our protocol messages.
197+
user_stdout = StringIO()
198+
user_stderr = StringIO()
199+
sys.stdout = user_stdout
200+
sys.stderr = user_stderr
201+
202+
try:
203+
result = asyncio.run(
204+
_execute_with_tools(code, tools, user_stdout, user_stderr)
205+
)
206+
except Exception as e:
207+
result = {
208+
"type": "error",
209+
"error": str(e),
210+
}
211+
212+
# Restore real stdout for final message
213+
sys.stdout = _real_stdout
214+
sys.stderr = _real_stderr
215+
216+
# Attach captured user output
217+
result["stdout"] = user_stdout.getvalue()
218+
stderr_val = user_stderr.getvalue()
219+
if result.get("stderr_extra"):
220+
stderr_val += result.pop("stderr_extra")
221+
result["stderr"] = stderr_val
222+
223+
_write_message(result)
224+
225+
226+
if __name__ == "__main__":
227+
main()

scripts/load_test/config.py

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,21 @@
6060
}
6161

6262
# Supported languages
63-
SUPPORTED_LANGUAGES = ["py", "js", "ts", "go", "java", "c", "cpp", "php", "rs", "r", "f90", "d"]
63+
SUPPORTED_LANGUAGES = [
64+
"py",
65+
"js",
66+
"ts",
67+
"go",
68+
"java",
69+
"c",
70+
"cpp",
71+
"php",
72+
"rs",
73+
"r",
74+
"f90",
75+
"d",
76+
"bash",
77+
]
6478

6579

6680
@dataclass
@@ -112,11 +126,7 @@ def get_api_key(self) -> str:
112126
}
113127

114128

115-
def get_vm_type(
116-
cpu_cores: int,
117-
memory_gb: int,
118-
provider: str = "azure"
119-
) -> str:
129+
def get_vm_type(cpu_cores: int, memory_gb: int, provider: str = "azure") -> str:
120130
"""Get recommended VM type for given resources."""
121131
vm_maps = {
122132
"azure": AZURE_VM_TYPES,

scripts/load_test/scenarios/multi_language.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
1-
"""Multi-language test scenarios for all 12 supported languages."""
1+
"""Multi-language test scenarios for all 13 supported languages."""
22

33
from typing import List
44
from .base import BaseScenario
55

6-
76
# Language-specific hello world and compute code
87
LANGUAGE_CODE = {
98
"py": {
@@ -150,6 +149,15 @@
150149
writeln("D compute result: ", result);
151150
}""",
152151
},
152+
"bash": {
153+
"baseline": 'echo "Hello from Bash"',
154+
"compute": """sum=0
155+
for i in $(seq 0 9999); do
156+
sum=$((sum + i * i))
157+
done
158+
echo "Bash compute result: $sum"
159+
""",
160+
},
153161
}
154162

155163

0 commit comments

Comments
 (0)