Add sandbox launcher

SecAI-Hub · SecAI-Hub · commit 5c3991f592b7 · 2026-04-28T18:43:00.000-07:00
diff --git a/README.md b/README.md
@@ -316,7 +316,7 @@ All CI jobs are defined in [`.github/workflows/ci.yml`](.github/workflows/ci.yml
 | Job | Workflow Link | What It Proves |
 |-----|--------------|---------------|
 | `go-build-and-test` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | 428 Go tests across 9 services with `-race` (build, test, vet) |
-| `python-test` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | 1,132 Python tests (unit/integration + adversarial/acceptance), ruff lint, bandit security scan (enforced on HIGH/HIGH), mypy type checking |
+| `python-test` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | 1,133 Python tests (unit/integration + adversarial/acceptance), ruff lint, bandit security scan (enforced on HIGH/HIGH), mypy type checking |
 | `appsec-lint` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Hadolint for container build files and Semgrep project security rules |
 | `security-regression` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Adversarial test suite: prompt injection, policy bypass, containment, recovery |
 | `supply-chain-verify` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | SBOM generation via Syft, cosign availability, provenance keywords in release/build workflows |
@@ -338,7 +338,7 @@ All CI jobs are defined in [`.github/workflows/ci.yml`](.github/workflows/ci.yml
 | [API Reference](docs/api.md) | HTTP API for all services |
 | [Policy Schema](docs/policy-schema.md) | Full policy.yaml schema reference |
 | [Security Status](docs/security-status.md) | Implementation status of all 54 milestones |
-| [Test Matrix](docs/test-matrix.md) | Test coverage: 1,560 tests across Go and Python (see [test-counts.json](docs/test-counts.json)) |
+| [Test Matrix](docs/test-matrix.md) | Test coverage: 1,561 tests across Go and Python (see [test-counts.json](docs/test-counts.json)) |
 | [Compatibility Matrix](docs/compatibility-matrix.md) | GPU, VM, and hardware support |
 | [Security Test Matrix](docs/security-test-matrix.md) | Security feature test coverage |
 | [FAQ](docs/faq.md) | Common questions |
@@ -462,7 +462,7 @@ for svc in airlock registry tool-firewall gpu-integrity-watch mcp-firewall \
   (cd services/$svc && go test -v -race ./...)
 done
 
-# Python tests (1,132 total)
+# Python tests (1,133 total)
 python -m pip install -r requirements-ci.txt
 PYTHONPATH=services python -m pytest tests/ -v
 
@@ -564,7 +564,7 @@ services/
   search-mediator/          Python -- Tor-routed web search (:8485)
   ui/                       Python/Flask -- Web UI (:8480)
   common/                   Python -- Shared utilities (audit, auth, mlock)
-tests/                      1,132 Python tests, 428 Go tests (1,560 total)
+tests/                      1,133 Python tests, 428 Go tests (1,561 total)
 docs/                       Architecture, API, threat model, install guides
 schemas/                    OpenAPI spec, JSON Schema for config files
 examples/                   Task-oriented walkthroughs
diff --git a/docs/install/sandbox.md b/docs/install/sandbox.md
@@ -34,6 +34,13 @@ Treat the host OS, container runtime, and anyone with host admin access as fully
 
 ## Start The Stack
 
+**Windows (one-command launcher from the repo root)**
+
+```powershell
+.\secai-sandbox.cmd start
+.\secai-sandbox.cmd open
+```
+
 **Linux / macOS**
 
 ```bash
@@ -123,6 +130,12 @@ bash scripts/sandbox/start.sh --with-search --with-airlock --with-inference
 
 ## Stop The Stack
 
+**Windows (one-command launcher from the repo root)**
+
+```powershell
+.\secai-sandbox.cmd stop
+```
+
 **Linux / macOS**
 
 ```bash
diff --git a/docs/security-status.md b/docs/security-status.md
@@ -19,7 +19,7 @@ All M5 security assurance criteria are met. The controls below have been impleme
 | Tool Firewall, default-deny policy | Implemented | M4 | Go tool-firewall service on :8475, default-deny egress |
 | Online Airlock, sanitization | Implemented | M5 | Go airlock service on :8490, disabled by default (privacy risk) |
 | Systemd sandboxing, kernel hardening, nftables | Implemented | M6 | Systemd unit hardening, sysctl tuning, nftables rules |
-| CI/CD, Go/Python tests, shellcheck | Implemented | M7 | GitHub Actions ci.yml. See docs/test-counts.json for current counts (428 Go, 1132 Python as of 2026-04-29) |
+| CI/CD, Go/Python tests, shellcheck | Implemented | M7 | GitHub Actions ci.yml. See docs/test-counts.json for current counts (428 Go, 1133 Python as of 2026-04-29) |
 | Image/video generation, diffusion worker | Implemented | M8 | Diffusion worker for image generation workloads |
 | Multi-GPU support (NVIDIA/AMD/Intel/Apple) | Implemented | M9 | CUDA, ROCm/HIP, XPU/Vulkan, Metal/MPS backends |
 | Tor-routed search, SearXNG, PII stripping | Implemented | M10 | Search mediator with Tor routing and PII redaction |
@@ -60,7 +60,7 @@ All M5 security assurance criteria are met. The controls below have been impleme
 | Production readiness hardening | Implemented | M45 | Incident recorder file-backed persistence (survives restarts), graceful shutdown (SIGTERM/SIGINT with connection draining) for all 9 Go services, HTTP server timeouts for mcp-firewall and gpu-integrity-watch, systemd production hardening (TimeoutStartSec, TimeoutStopSec, StartLimitInterval, StartLimitBurst) for all 12 daemon units, first-boot health validation script, audit log rotation via logrotate, CI dependency vulnerability scanning (govulncheck + pip-audit), production operations guide (upgrade, key rotation, capacity limits, monitoring) |
 | Operational maturity | Implemented | M46 | Bootstrap trust gap fix (cosign verify before unverified rebase, documented trust gap rationale), CI runs on all changes (removed blanket paths-ignore for .md files), Python quality gates (ruff lint + bandit security scan + split test suites into unit/integration and adversarial/acceptance), docs-validation CI job (broken link detection, required docs check, test-counts.json validation), production-readiness checklist (formal release gate), SLOs (availability/latency/correctness targets + alerting thresholds), release channel policy (stable/candidate/dev + versioning + upgrade paths + security patch SLA), support lifecycle (hardware matrix, driver versions, support windows, deprecation policy, scope boundaries), CI evidence table with current job descriptions and workflow links, sample verification output for verify-release.sh |
 | CI enforcement hardening | Implemented | M47 | Enforced vulnerability scanning: bandit fails CI on HIGH-severity/HIGH-confidence findings, govulncheck fails on unwaived Go vulns, pip-audit fails on unwaived Python vulns. Waiver mechanism (`.github/vuln-waivers.json`) with mandatory expiry dates for reviewed/accepted findings. mypy type checking gate for security-sensitive services (common, agent, quarantine, ui). Pinned reproducible Python CI dependencies (`requirements-ci.txt`). Go 1.23->1.25 upgrade fixing 12 stdlib CVEs (crypto/tls, crypto/x509, encoding/asn1, net/url, os). Flask 3.1.1->3.1.3 (GHSA-68rp-wp8r-4726). Verification-first bootstrap documentation (signed rebase as default quickstart, unverified bootstrap moved to labeled recovery section). |
-| Production hardening | Implemented | M48 | Build script fail-closed for required services, quarantine scanners, search mediator, and signing policy material; final binary verification gate; incident store fsync (f.Sync() before close on both incident persistence and audit log writes); GPU backend metadata recording (`/etc/secure-ai/gpu-backend.json` written at build time with backend/version/timestamp); llama-server watchdog (Type=notify wrapper with startup health gate + WatchdogSec=30 continuous monitoring); model catalog externalization (`/etc/secure-ai/model-catalog.yaml` with YAML loading + hardcoded fallback); circuit breaker for Python services; post-upgrade model verification in Greenboot; cosign key rotation documentation. Current automated suite: 428 Go + 1132 Python tests (1,560 total). |
+| Production hardening | Implemented | M48 | Build script fail-closed for required services, quarantine scanners, search mediator, and signing policy material; final binary verification gate; incident store fsync (f.Sync() before close on both incident persistence and audit log writes); GPU backend metadata recording (`/etc/secure-ai/gpu-backend.json` written at build time with backend/version/timestamp); llama-server watchdog (Type=notify wrapper with startup health gate + WatchdogSec=30 continuous monitoring); model catalog externalization (`/etc/secure-ai/model-catalog.yaml` with YAML loading + hardcoded fallback); circuit breaker for Python services; post-upgrade model verification in Greenboot; cosign key rotation documentation. Current automated suite: 428 Go + 1133 Python tests (1,561 total). |
 | Signed-first install path | Implemented | M49 | Signed bootstrap script (`secai-bootstrap.sh`) configures container signing policy (policy.json + registries.d + cosign public key) before first rebase -- eliminates unverified transport from production install path. Digest-pinned install flow (CI publishes image digest in build summary and release assets). First-boot setup wizard (interactive verification of image integrity, transport, vault setup, TPM2 sealing, health check). Signing policy files baked into OS image (`/etc/pki/containers/secai-cosign.pub`, `/etc/containers/registries.d/secai-os.yaml`, policy.json merge in build script). Recovery/dev bootstrap path separated into dedicated doc with clear warnings. |
 | Production operations package | Implemented | M50 | Backup script (`secai-backup.sh`) with full/config/logs/keys categories, age/gpg encryption, internal SHA256 manifest, LUKS header backup. Restore script (`secai-restore.sh`) with integrity verification, staging extraction, double-confirmation LUKS header restore, post-restore health check. Production operations doc extended with rollback decision matrix (Greenboot auto-rollback triggers + manual criteria), 5 break-glass recovery procedures (token loss, attestation failure, Level 1 panic lockout, signing policy break, Greenboot exhaustion), formal data retention policy (7 data classes with retention periods, disk capacity thresholds at 70/80/90/95%). |
 | Stronger observability | Implemented | M51 | Unified appliance health dashboard (trusted/degraded/recovery_required state derived from runtime attestor + integrity monitor + incident recorder). Live SLO compliance monitoring (in-process tracker measuring uptime % and P95 latency against docs/slos.md targets, 7-day rolling window). Webhook alerting hooks for containment events (fire-and-forget POST with retry, configurable per-event-type filtering in incident-containment.yaml). Forensic bundle export wired to HTTP mux (was implemented but unregistered), enriched with real audit log entries and policy digest, accessible via UI download button, Flask proxy, and CLI script (`secai-forensic.sh`). Recovery ceremony endpoints also wired (ack, reattest, status). |
diff --git a/docs/security-test-matrix.md b/docs/security-test-matrix.md
@@ -78,9 +78,9 @@ Last updated: 2026-04-29
 
 | Language | Current Automated Tests | Source of Truth |
 |----------|--------------------------|-----------------|
-| Python | 1132 | `docs/test-counts.json` and `pytest --collect-only` |
+| Python | 1133 | `docs/test-counts.json` and `pytest --collect-only` |
 | Go | 428 | `docs/test-counts.json` and `go test -v -count=1 ./...` |
-| **Total** | **1560** | Enforced by `.github/scripts/check-test-counts.sh` |
+| **Total** | **1561** | Enforced by `.github/scripts/check-test-counts.sh` |
 
 Security coverage overlaps heavily with functional coverage, so the feature tables above use exact file or service totals rather than attempting to split each test into exclusive "security" and "non-security" buckets.
 
diff --git a/docs/test-counts.json b/docs/test-counts.json
@@ -12,6 +12,6 @@
     "incident-recorder": 97
   },
   "go_total": 428,
-  "python_total": 1132,
-  "grand_total": 1560
+  "python_total": 1133,
+  "grand_total": 1561
 }
diff --git a/docs/test-matrix.md b/docs/test-matrix.md
@@ -12,7 +12,7 @@ Last updated: 2026-04-29
 | Language | Test Count | Runner |
 |----------|-----------|--------|
 | Go | 428 | `go test -race ./...` |
-| Python | 1132 | `pytest` |
+| Python | 1133 | `pytest` |
 | Shell | CI-scoped scripts plus Makefile target for all repo shell scripts | `shellcheck` |
 
 ## Go Tests (428 total)
@@ -29,7 +29,7 @@ Last updated: 2026-04-29
 | Integrity Monitor | services/integrity-monitor/ | 50 | Baseline computation, continuous scanning, violation detection, state machine, HMAC baselines, incident-recorder integration |
 | Incident Recorder | services/incident-recorder/ | 97 | Incident creation, auto-containment, lifecycle management, severity ranking, policy loading, containment execution, enforcement chain integration, recovery ceremony, severity escalation, forensic bundle export (M43), persistence durability (fsync) |
 
-## Python Tests (1132 total)
+## Python Tests (1133 total)
 
 | Test File | Location | Tests | Description |
 |-----------|----------|-------|-------------|
@@ -59,7 +59,7 @@ Last updated: 2026-04-29
 | test_recipe_validation.py | tests/ | 26 | Recipe and packaged-file validation |
 | test_release_artifacts.py | tests/ | 52 | Release workflow, artifact manifest, and verification UX consistency |
 | test_sandbox.py | tests/ | 31 | Sandbox compose, policy, and runtime constraints |
-| test_sandbox_bundle.py | tests/ | 7 | Sandbox bundle and artifact checks |
+| test_sandbox_bundle.py | tests/ | 8 | Sandbox bundle and artifact checks |
 | test_search.py | tests/ | 36 | Search mediator, PII stripping, injection detection |
 | test_secure_boot.py | tests/ | 38 | Secure boot and measured boot behavior |
 | test_traffic_analysis.py | tests/ | 41 | Padding, timing jitter, dummy traffic generation |
diff --git a/secai-sandbox.cmd b/secai-sandbox.cmd
@@ -0,0 +1,150 @@
+@echo off
+setlocal EnableExtensions EnableDelayedExpansion
+
+set "REPO_ROOT=%~dp0"
+set "START_SCRIPT=%REPO_ROOT%scripts\sandbox\start.ps1"
+set "STOP_SCRIPT=%REPO_ROOT%scripts\sandbox\stop.ps1"
+set "COMPOSE_FILE=%REPO_ROOT%deploy\sandbox\compose.yaml"
+set "ENV_FILE=%REPO_ROOT%deploy\sandbox\.env"
+
+set "ACTION=%~1"
+if "%ACTION%"=="" set "ACTION=start"
+if /I "%ACTION%"=="help" goto help
+if "%ACTION%"=="-h" goto help
+if "%ACTION%"=="--help" goto help
+
+if /I "%ACTION%"=="start" (
+    shift
+    goto start_stack
+)
+if /I "%ACTION%"=="up" (
+    shift
+    goto start_stack
+)
+if /I "%ACTION%"=="stop" goto stop_stack
+if /I "%ACTION%"=="down" goto stop_stack
+if /I "%ACTION%"=="restart" goto restart_stack
+if /I "%ACTION%"=="status" goto status_stack
+if /I "%ACTION%"=="ps" goto status_stack
+if /I "%ACTION%"=="logs" goto logs_stack
+if /I "%ACTION%"=="open" goto open_ui
+
+echo Unknown command: %ACTION%
+echo.
+goto help_error
+
+:start_stack
+set "PS_ARGS="
+:parse_start_args
+if "%~1"=="" goto run_start
+if /I "%~1"=="--with-inference" (
+    set "PS_ARGS=!PS_ARGS! -WithInference"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="-WithInference" (
+    set "PS_ARGS=!PS_ARGS! -WithInference"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="--with-diffusion" (
+    set "PS_ARGS=!PS_ARGS! -WithDiffusion"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="-WithDiffusion" (
+    set "PS_ARGS=!PS_ARGS! -WithDiffusion"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="--with-search" (
+    set "PS_ARGS=!PS_ARGS! -WithSearch"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="-WithSearch" (
+    set "PS_ARGS=!PS_ARGS! -WithSearch"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="--with-airlock" (
+    set "PS_ARGS=!PS_ARGS! -WithAirlock"
+    shift
+    goto parse_start_args
+)
+if /I "%~1"=="-WithAirlock" (
+    set "PS_ARGS=!PS_ARGS! -WithAirlock"
+    shift
+    goto parse_start_args
+)
+echo Unknown start option: %~1
+exit /b 2
+
+:run_start
+powershell -NoProfile -ExecutionPolicy Bypass -File "%START_SCRIPT%" !PS_ARGS!
+exit /b %ERRORLEVEL%
+
+:stop_stack
+powershell -NoProfile -ExecutionPolicy Bypass -File "%STOP_SCRIPT%"
+exit /b %ERRORLEVEL%
+
+:restart_stack
+call "%~f0" stop
+if errorlevel 1 exit /b %ERRORLEVEL%
+call "%~f0" start
+exit /b %ERRORLEVEL%
+
+:status_stack
+where docker >nul 2>nul
+if errorlevel 1 (
+    echo Docker was not found in PATH.
+    exit /b 1
+)
+docker compose -f "%COMPOSE_FILE%" --profile search --profile llm --profile diffusion ps
+exit /b %ERRORLEVEL%
+
+:logs_stack
+where docker >nul 2>nul
+if errorlevel 1 (
+    echo Docker was not found in PATH.
+    exit /b 1
+)
+docker compose -f "%COMPOSE_FILE%" --profile search --profile llm --profile diffusion logs -f --tail=100
+exit /b %ERRORLEVEL%
+
+:open_ui
+set "UI_PORT=8480"
+if exist "%ENV_FILE%" (
+    for /f "tokens=1,* delims==" %%A in ('findstr /R "^SECAI_UI_PORT=" "%ENV_FILE%"') do set "UI_PORT=%%B"
+)
+start "" "http://127.0.0.1:%UI_PORT%"
+exit /b 0
+
+:help
+echo SecAI OS Docker sandbox launcher
+echo.
+echo Usage:
+echo   secai-sandbox.cmd [command] [options]
+echo.
+echo Commands:
+echo   start       Build and start the sandbox stack ^(default^)
+echo   stop        Stop the sandbox stack
+echo   restart     Stop, then start the sandbox stack
+echo   status      Show container status
+echo   logs        Follow sandbox logs
+echo   open        Open the UI in your default browser
+echo   help        Show this help
+echo.
+echo Start options:
+echo   --with-search       Enable Tor and SearXNG search sidecars
+echo   --with-airlock      Enable airlock policy in sandbox mode
+echo   --with-inference    Enable local LLM inference profile
+echo   --with-diffusion    Enable diffusion worker profile
+echo.
+echo UI:
+echo   http://127.0.0.1:8480
+exit /b 0
+
+:help_error
+call "%~f0" help
+exit /b 2
diff --git a/tests/test_sandbox_bundle.py b/tests/test_sandbox_bundle.py
@@ -80,6 +80,7 @@ def test_sandbox_bundle_has_docs_and_helpers():
         "scripts/sandbox/stop.sh",
         "scripts/sandbox/start.ps1",
         "scripts/sandbox/stop.ps1",
+        "secai-sandbox.cmd",
     ]:
         assert (REPO_ROOT / rel_path).exists()
 
@@ -98,6 +99,22 @@ def test_sandbox_start_helpers_use_digest_pinned_alpine():
     assert "docker.io/library/alpine:3.20" not in powershell_helper
 
 
+def test_windows_sandbox_launcher_delegates_to_hardened_helpers():
+    launcher = (REPO_ROOT / "secai-sandbox.cmd").read_text(encoding="utf-8")
+
+    assert "scripts\\sandbox\\start.ps1" in launcher
+    assert "scripts\\sandbox\\stop.ps1" in launcher
+    assert "-ExecutionPolicy Bypass" in launcher
+    for option, ps_switch in {
+        "--with-search": "-WithSearch",
+        "--with-airlock": "-WithAirlock",
+        "--with-inference": "-WithInference",
+        "--with-diffusion": "-WithDiffusion",
+    }.items():
+        assert option in launcher
+        assert ps_switch in launcher
+
+
 def test_sandbox_stop_helpers_include_optional_profiles():
     shell_helper = (REPO_ROOT / "scripts" / "sandbox" / "stop.sh").read_text(
         encoding="utf-8"

Original file line number	Diff line number	Diff line change
`@@ -12,6 +12,6 @@`
`12`	`12`	`"incident-recorder": 97`
`13`	`13`	`},`
`14`	`14`	`"go_total": 428,`
`15`		`- "python_total": 1132,`
`16`		`- "grand_total": 1560`
	`15`	`+ "python_total": 1133,`
	`16`	`+ "grand_total": 1561`
`17`	`17`	`}`