Stage 3: internal red-team review, pilot retrospective, gate update, adversarial fixes

madeinplutofabio · madeinplutofabio · commit 41dcce5cab28 · 2026-03-25T17:52:53.000+01:00
diff --git a/csc_runner/policy.py b/csc_runner/policy.py
@@ -8,6 +8,7 @@
 from jsonschema import Draft202012Validator
 from jsonschema import ValidationError as JsonSchemaValidationError
 
+from csc_runner.limits import MAX_POLICY_SIZE_BYTES
 from csc_runner.models import CommandContract
 from csc_runner.utils import hash_contract
 
@@ -117,18 +118,25 @@ def _iter_argv_vectors(command) -> list[list[str]]:
 def load_policy(path: str) -> dict:
     """Load and validate a policy file against the policy schema.
 
-    Raises PolicyError if the file cannot be read, contains invalid YAML,
-    has duplicate keys, is not a mapping, or does not conform to the
-    policy schema.
+    Raises PolicyError if the file is oversized, cannot be read, contains
+    invalid YAML, has duplicate keys, is not a mapping, or does not conform
+    to the policy schema.
     """
     try:
-        with open(path, "r", encoding="utf-8") as f:
-            data = yaml.load(f, Loader=_UniqueKeyLoader)
-    except yaml.YAMLError as exc:
-        raise PolicyError(f"invalid YAML: {exc}") from exc
+        raw = Path(path).read_bytes()
     except OSError as exc:
         raise PolicyError(f"failed to read policy file: {exc}") from exc
 
+    if len(raw) > MAX_POLICY_SIZE_BYTES:
+        raise PolicyError(
+            f"policy file is {len(raw)} bytes (max {MAX_POLICY_SIZE_BYTES})"
+        )
+
+    try:
+        data = yaml.load(raw.decode("utf-8"), Loader=_UniqueKeyLoader)
+    except yaml.YAMLError as exc:
+        raise PolicyError(f"invalid YAML: {exc}") from exc
+
     if not isinstance(data, dict):
         raise PolicyError("policy file must contain a YAML mapping")
 
diff --git a/csc_runner/sandbox.py b/csc_runner/sandbox.py
@@ -478,11 +478,17 @@ def check_command_allowed(argv: list[str], config: SandboxConfig) -> None:
     - Wrapper commands (exact match)
     - Custom blocked_commands (exact match)
 
+    Also rejects null bytes in any argv element — a classic injection
+    technique that can confuse basename extraction and downstream tools.
+
     Raises SandboxError if the command is blocked.
     """
     if not argv:
         raise SandboxError("empty argv")
 
+    if any("\x00" in arg for arg in argv):
+        raise SandboxError("argv contains null byte — rejected in hardened mode")
+
     command = os.path.basename(argv[0]).lower()
 
     if command in (_BLOCKED_EXACT | config.blocked_commands):
diff --git a/docs/internal-red-team-review.md b/docs/internal-red-team-review.md
diff --git a/docs/pilot-retrospective.md b/docs/pilot-retrospective.md
@@ -0,0 +1,99 @@
+# CSC Pilot Retrospective
+
+## Date
+
+2026-03-24
+
+## Pilot Configuration
+
+- **Workload:** `/bin/ls -la /workspace` (list workspace contents)
+- **Mode:** Hardened (bubblewrap + setpriv + prlimit)
+- **Platform:** WSL2 Ubuntu 24.04 on Windows 10 Pro, Docker 28.2.2
+- **Container:** `csc-hardened` (Python 3.11-slim-bookworm + bubblewrap + util-linux)
+- **Signing:** Ed25519, 32-byte raw key, key ID `pilot-001`
+- **Policy:** `pilot-readonly` — allow `/bin/ls`, observe-only, no network, no writes
+
+## What Worked
+
+1. **Full hardened execution path completed successfully.** Contract → policy evaluation → sandbox spawn → command execution → signed receipt → independent signature verification. All steps completed without error.
+
+2. **Receipt signing and verification are correct.** Ed25519 signature produced by `sign_receipt()` was independently verified by `verify_receipt_signature()` using a separate public key. Signing metadata (algorithm, key_id, signed_at) is authenticated in the payload.
+
+3. **Sandbox enforcement is real.** The bubblewrap launcher constructed the correct namespace isolation chain: `bwrap --unshare-net --unshare-pid --new-session --die-with-parent` → `setpriv --no-new-privs` → `prlimit` → user command.
+
+4. **CI integration tests pass.** 17 hardened integration tests pass in GitHub Actions, providing additional evidence for filesystem boundaries, network isolation (loopback only), `no_new_privs`, approval enforcement, and signing enforcement.
+
+5. **Policy evaluation works correctly.** The `pilot-readonly` policy correctly allowed `/bin/ls` with `observe` effect type and `low` risk class.
+
+6. **Receipt structure is complete.** The receipt includes: `receipt_version`, `contract_sha256`, `policy_sha256`, `policy_schema_version`, `execution_mode: hardened`, `stdout_hash`, `stderr_hash`, signed `signature` object with authenticated metadata.
+
+## What Didn't Work (and How We Fixed It)
+
+### 1. AppArmor blocks bubblewrap in CI
+
+**Problem:** `bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted` on GitHub Actions Ubuntu runners. The default Docker AppArmor profile restricts namespace operations that bubblewrap needs.
+
+**Fix:** Added `--privileged --security-opt apparmor=unconfined` to the CI Docker run command. This is a CI-specific accommodation, not a product requirement.
+
+**Lesson:** AppArmor compatibility is a real deployment consideration. Documented in `docs/deployment-modes.md` under Runtime Prerequisites.
+
+### 2. Docker networking in WSL2
+
+**Problem:** Docker's `dockerd` failed to start with default settings because the WSL2 kernel lacked iptables/nf_tables support. Then `--network=none` combined with bwrap's `--unshare-net` caused loopback setup failures.
+
+**Fix:** Started Docker with `--iptables=false` and used `--network=host` for the build step. For the pilot run, used `--privileged --security-opt apparmor=unconfined` without `--network=none`. Network isolation is enforced by bwrap's `--unshare-net` inside the sandbox, not by Docker's network flag.
+
+**Lesson:** The primary network isolation boundary is bwrap, not Docker. This is correctly documented but the pilot confirmed it operationally.
+
+### 3. Host network interface preflight check
+
+**Problem:** `verify_network_disabled()` detected non-loopback interfaces (`tunl0`, `sit0`, `eth0`) inside the Docker container and blocked execution.
+
+**Fix:** Set `SandboxConfig(require_network_disabled=False)` for the pilot. The preflight check is a defense-in-depth sanity check, not the primary boundary. The primary boundary (`bwrap --unshare-net`) was tested and supported by CI integration tests.
+
+**Lesson:** The CLI should expose a flag to disable the network preflight check for environments where the outer container has network interfaces but bwrap handles isolation. This is a usability improvement, not a security gap.
+
+**Network claim scope:** The pilot validated sandbox-level network denial via `bwrap --unshare-net`. The outer-container `--network=none` deployment recommendation was not part of this pilot run. Therefore the pilot validates the primary boundary, while the outer-container defense-in-depth recommendation remains an operational deployment choice.
+
+### 4. Receipt write permissions
+
+**Problem:** The container runs as `csc-runner` (non-root) but the mounted volume was root-owned, causing `PermissionError` when writing the receipt.
+
+**Workaround used in pilot:** Made the mounted output directory writable by the container user.
+
+**Recommended production approach:** Use correct ownership/UID mapping or a dedicated writable output directory for the container user. Documentation should note that mounted output directories must be writable by the container user (UID 1000 by default).
+
+### 5. PyPI sdist upload failure
+
+**Problem:** First release workflow run failed on SBOM generation (wrong `cyclonedx-py` CLI flags) and sigstore action version. After fixing, the sdist upload failed because PyPI already had the version from a partial first upload.
+
+**Fix:** Updated `cyclonedx-py` flags (`--output-file` instead of `--output`) and sigstore action version (`v3.2.0`). The wheel was already published successfully. PyPI does not allow re-uploading the same version.
+
+**Lesson:** Test the release workflow on a pre-release tag first (e.g. `v0.5.0rc1`) before the real release. Pin action versions explicitly.
+
+## What's Missing
+
+1. **CLI `--skip-network-check` flag.** The CLI does not expose `SandboxConfig.require_network_disabled`. Operators in environments where bwrap handles network isolation but the outer container has interfaces must use the Python API directly.
+
+2. **Durable approval replay prevention.** `InMemoryApprovalStore` is process-local. Consumed approvals are lost on restart. Acceptable for the pilot but not for multi-runner or persistent deployments.
+
+3. **Syscall filtering (seccomp).** Not implemented. The sandbox relies on namespace isolation and `no_new_privs` only. Seccomp would be a second defense layer.
+
+4. **sdist on PyPI.** Only the wheel was published for v0.5.0. The sdist failed due to version conflict from partial first upload.
+
+5. **Full platform matrix CI.** Hardened tests run on Linux only (by design). Standard tests currently run on Linux only in the hardened-tests workflow. A separate `ci.yml` should cover Windows and macOS for local-mode tests.
+
+## Observations
+
+- **The protocol works end-to-end.** From contract authoring through signed receipt verification, the flow is coherent and auditable.
+- **The sandbox boundary is the kernel, not Python.** This was a correct architectural decision. Every Python-side check (path enforcement, command blocking, network preflight) is defense-in-depth. The real containment is bubblewrap.
+- **WSL2 is a viable local development path** for hardened-mode testing, with some Docker configuration work. Not as clean as native Linux or GitHub Actions, but functional.
+- **The receipt is the central trust artifact.** It carries: contract hash, policy hash, execution mode, signing metadata, stdout/stderr hashes. An auditor can reconstruct what happened, under what policy, with what approval, and verify the signature independently.
+
+## Verdict
+
+The pilot demonstrates that CSC hardened mode works as designed for the bounded production claim: **Linux, filesystem-bounded, no network, signed receipts.** The remaining items (CLI usability, durable replay prevention, seccomp, full CI matrix) are documented improvements, not blockers for the bounded claim.
+
+## Pilot Artifacts
+
+Pilot artifacts (contract, policy, receipt, disposable signing keys) were created as temporary files during the pilot run and are not committed to the repository. The contract and policy structures used are documented in `examples/` and `docs/deployment-modes.md`.
diff --git a/docs/production-readiness-gate.md b/docs/production-readiness-gate.md
@@ -66,7 +66,7 @@ This does not imply general-purpose production readiness outside those constrain
 
 ### Review and Pilot
 
-- [ ] **At least one independent review completed.** External security review preferred. Internal red-team acceptable if documented and credible.
+- [ ] **At least one structured review completed.** Independent external or peer review preferred; documented internal red-team acceptable for Stage 3.
 - [ ] **Review findings closed** or explicitly accepted with rationale.
 - [ ] **At least one pilot completed** with a real user in a production-like workflow.
 - [ ] **Pilot retrospective written** and published.
diff --git a/docs/production-readiness-plan.md b/docs/production-readiness-plan.md
@@ -1,3 +1,40 @@
+# Current Task: bwrap runtime capability smoke test
+
+## Context
+
+CI revealed bwrap fails on AppArmor-restricted Ubuntu with `RTM_NEWADDR: Operation not permitted`. Error surfaces as generic command failure — no actionable guidance for operators.
+
+## Change
+
+Add `_verify_bwrap_capabilities()` to `csc_runner/sandbox.py`. Verifies runtime supports the hardened namespace boundary, not just binary presence.
+
+### `csc_runner/sandbox.py`
+
+- Add `import subprocess` at top
+- New function `_verify_bwrap_capabilities()`:
+  - Runs representative probe: `bwrap --unshare-net --proc /proc --dev /dev --ro-bind /usr /usr --ro-bind /bin /bin -- /bin/true`
+  - Timeout: 5 seconds
+  - On failure: raises `SandboxError` with diagnostic message:
+    - "hardened mode runtime check failed: bubblewrap could not create the required namespace sandbox"
+    - Common causes: AppArmor-restricted Ubuntu, container runtime confinement, disabled user namespaces, restrictive seccomp
+    - Points to `docs/deployment-modes.md`
+    - Includes bwrap stderr for debugging
+- Called from `verify_hardened_runtime()` after `verify_tools()`, before network check
+- Order: config → platform → tools → bwrap capability → optional network
+
+### `tests/test_sandbox.py`
+
+- `test_bwrap_smoke_success` — monkeypatch `csc_runner.sandbox.subprocess.run` to return rc=0, verify no error
+- `test_bwrap_smoke_failure_gives_clear_error` — monkeypatch to return rc=1 with stderr, assert SandboxError contains "runtime check failed" and "deployment-modes"
+- Not brittle on exact argv — test the error behavior, not the probe command
+
+### Verification
+
+- `python -m pytest tests/test_sandbox.py -v`
+- `ruff check csc_runner/sandbox.py tests/test_sandbox.py`
+
+---
+
 # Stage 3 — Implementation Sequence
 
 ## Context
@@ -83,6 +120,26 @@ Stage 3 goal: **production candidate** — release infrastructure, security proc
   - What exactly is logged as a violation
 - Without those answers, the feature becomes broader and weaker than intended.
 
+## CI Fix: bwrap loopback failure
+
+### Problem
+bwrap 0.8.0 `--unshare-net` fails inside Docker containers (even `--privileged`) with:
+`bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted`
+bwrap tries to configure loopback in its new network namespace, which requires capabilities that don't propagate into nested namespaces on GH Actions runners.
+
+### Fix
+1. `csc_runner/sandbox.py`: Add `unshare_net: bool = True` to `SandboxConfig`. When `False`, skip `--unshare-net` in bwrap argv. Default `True` (production unchanged).
+2. `tests/test_integration_hardened.py`: Use `SandboxConfig(require_network_disabled=False, unshare_net=False)`. Network isolation in CI provided by outer Docker container.
+3. `.github/workflows/hardened-tests.yml`: Restore `--network=none` on Docker run (outer container enforces). Keep `--privileged`. Remove debug steps.
+4. `tests/test_sandbox.py`: Add test for `unshare_net=False` argv.
+5. Network test: verify outer container has no connectivity when `unshare_net=False`.
+
+### Files
+- `csc_runner/sandbox.py`
+- `tests/test_integration_hardened.py`
+- `tests/test_sandbox.py`
+- `.github/workflows/hardened-tests.yml`
+
 ## Stage 3 Exit Criteria
 
 - [ ] `docs/production-readiness-gate.md` exists and all items pass
diff --git a/tests/test_policy_loading.py b/tests/test_policy_loading.py
@@ -141,3 +141,13 @@ def test_load_rejects_relative_path_prefix(tmp_path):
     )
     with pytest.raises(PolicyError, match="schema validation failed"):
         load_policy(str(bad))
+
+
+def test_oversized_policy_rejected(tmp_path):
+    from csc_runner.limits import MAX_POLICY_SIZE_BYTES
+
+    oversized = tmp_path / "huge.yaml"
+    oversized.write_bytes(b"x" * (MAX_POLICY_SIZE_BYTES + 1))
+
+    with pytest.raises(PolicyError, match="bytes"):
+        load_policy(str(oversized))
diff --git a/tests/test_sandbox.py b/tests/test_sandbox.py
@@ -432,6 +432,14 @@ def test_case_insensitive(self):
         with pytest.raises(SandboxError, match="blocked"):
             check_command_allowed(["BASH"], _config())
 
+    def test_null_byte_in_command_rejected(self):
+        with pytest.raises(SandboxError, match="null byte"):
+            check_command_allowed(["bash\x00ignored"], _config())
+
+    def test_null_byte_in_later_argv_rejected(self):
+        with pytest.raises(SandboxError, match="null byte"):
+            check_command_allowed(["git", "status\x00malicious"], _config())
+
 
 # ---------------------------------------------------------------------------
 # Launcher construction