Skip to content

Commit 3b7c6a5

Browse files
authored
Merge pull request #2 from eleostech/add-docker-support
Add docker support
2 parents ee72b9f + 9cd2a9b commit 3b7c6a5

10 files changed

Lines changed: 125 additions & 61 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ on:
44
push:
55
branches: [main]
66
pull_request:
7+
merge_group:
78
schedule:
89
- cron: "0 6 * * 1" # Weekly Monday 6am UTC — catches upstream Debian image drift
910

CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44

55
## Development workflow
66

7+
Before running the e2e tests, ensure the test prerequisites from
8+
HACKING.md are installed.
9+
710
Always run the test suite before committing:
811

912
```bash

HACKING.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@ curl -fL https://github.com/astral-sh/uv/releases/latest/download/uv-aarch64-unk
3535

3636
No sudo is required to run the VM or the test suite.
3737

38+
**Running tests inside a VM (nested):** If you are running the e2e
39+
tests from inside the VM itself, uncomment the Debian cloud image
40+
offloader rules in `allowlist.txt` — the image download redirects to
41+
hosts outside `*.debian.org` that are blocked by default.
42+
3843
## Running the tests
3944

4045
```bash

allowlist.txt

Lines changed: 53 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,30 @@
1313
# POST https://api.github.com/repos/myorg/myrepo/issues
1414
# GET https://api.openweathermap.org/data/2.5/weather
1515

16+
# ── mitmproxy CA certificate ───────────────────────────────────────
17+
# Magic domain served by mitmproxy over plain HTTP. The guest
18+
# fetches the CA cert at boot before any HTTPS traffic.
19+
GET http://mitm.it/cert/pem
20+
21+
# ── OS package repos (Debian) ─────────────────────────────────────
22+
# The VM runs Debian. deb.debian.org is the primary apt CDN (Fastly).
23+
# cloud.debian.org hosts cloud image checksums (large files redirect
24+
# to offloaders — see the "Debian cloud images" section below).
25+
GET https://deb.debian.org/*
26+
GET https://security.debian.org/*
27+
GET https://cloud.debian.org/*
28+
29+
# ── OS package repos (Ubuntu — uncomment if using an Ubuntu image) ─
30+
# GET https://archive.ubuntu.com/*
31+
# GET https://security.ubuntu.com/*
32+
# GET https://ports.ubuntu.com/*
33+
# If your Ubuntu mirror is a geo subdomain (e.g. us.archive.ubuntu.com),
34+
# add it here — domain wildcards are not supported.
35+
36+
# ── Python package repos ──────────────────────────────────────────
37+
GET https://pypi.org/*
38+
GET https://files.pythonhosted.org/*
39+
1640
# ── Claude Code ────────────────────────────────────────────────────
1741
# Anthropic API — scoped to the v1 API prefix so only API calls are
1842
# permitted, not arbitrary requests to the domain.
@@ -24,17 +48,41 @@ GET https://api.anthropic.com/v1/*
2448
GET https://api.anthropic.com/api/*
2549
POST https://api.anthropic.com/api/*
2650

27-
# Claude Code binary downloads from Google Cloud Storage. GET-only
28-
# to prevent POST-based exfiltration. Scoped to the known Anthropic
29-
# release bucket; paths vary by version and platform.
51+
# Claude Code binary downloads from Google Cloud Storage. Scoped
52+
# to the known Anthropic release bucket; paths vary by version and platform.
3053
GET https://storage.googleapis.com/claude-code-dist-86c565f3-f756-42ad-8dfa-d59b1c096819/*
3154
GET https://downloads.claude.ai/claude-code-releases/*
3255
GET https://api.anthropic.com/api/hello
3356

3457
# ── uv (Python package manager) ───────────────────────────────────
3558
# Installer script and binary download. The install script lives at
36-
# astral.sh and redirects to a GitHub release asset whose URL varies
37-
# by version and platform.
59+
# astral.sh; binary downloads come from releases.astral.sh (or GitHub
60+
# release assets as a fallback). URLs vary by version and platform.
3861
GET https://astral.sh/uv/install.sh
62+
GET https://releases.astral.sh/github/uv/releases/*
3963
GET https://github.com/astral-sh/uv/releases/*
4064
GET https://release-assets.githubusercontent.com/github-production-release-asset/*
65+
66+
# ── Docker Hub ────────────────────────────────────────────────────
67+
# Registry API — paths vary by image name, tag, and sha256 digest
68+
# (e.g. /v2/library/hello-world/manifests/latest). Scoped to /v2/.
69+
GET https://registry-1.docker.io/v2/*
70+
# Auth tokens — the registry returns 401 with a token URL whose
71+
# query parameters vary per request (scope, service, etc.).
72+
GET https://auth.docker.io/token*
73+
# Blob storage — the registry redirects layer downloads to this
74+
# Cloudflare R2 bucket. Paths contain per-blob sha256 digests.
75+
GET https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/*
76+
77+
# ── Debian cloud images (nested VM testing only) ──────────────────
78+
# Only needed when running the e2e test suite inside a VM (i.e. the
79+
# tests boot a nested QEMU guest). See HACKING.md for details.
80+
# cloud.debian.org (*.debian.org, already trusted) serves checksums
81+
# directly but 302-redirects large files (qcow2) to offloader hosts
82+
# at Umea University. The offloader is deterministic per-URL (hash),
83+
# so different images may hit different hosts. Paths vary by arch,
84+
# release, and date.
85+
# GET https://gemmei.ftp.acc.umu.se/images/cloud/*
86+
# GET https://saimei.ftp.acc.umu.se/images/cloud/*
87+
# GET https://laotzu.ftp.acc.umu.se/images/cloud/*
88+
# GET https://chuangtzu.ftp.acc.umu.se/images/cloud/*

cloud-init/user-data

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ users:
55
lock_passwd: true
66
sudo: ALL=(ALL) NOPASSWD:ALL
77
shell: /bin/bash
8+
groups: docker, kvm
89
ssh_authorized_keys:
910
- __SSH_PUB_KEY__
1011

@@ -53,6 +54,16 @@ write_files:
5354
#!/bin/sh
5455
printf '\n\033]8;;%s\a%s\033]8;;\a\n\n' "$1" "$1"
5556

57+
# Docker daemon proxy configuration. write_files runs before the
58+
# packages stage, so this override is already in place when docker.io
59+
# is installed and systemd first loads the docker.service unit.
60+
- path: /etc/systemd/system/docker.service.d/proxy.conf
61+
content: |
62+
[Service]
63+
Environment="HTTP_PROXY=http://__HOST_IP__:__PROXY_PORT__"
64+
Environment="HTTPS_PROXY=http://__HOST_IP__:__PROXY_PORT__"
65+
Environment="NO_PROXY=localhost,127.0.0.1,__HOST_IP__"
66+
5667
- path: /etc/systemd/system/mnt-9p.mount
5768
content: |
5869
[Unit]
@@ -144,11 +155,12 @@ write_files:
144155
> POST https://api.anthropic.com/v1/*
145156
> GET https://api.anthropic.com/v1/*
146157

147-
## Trusted infrastructure (always allowed)
158+
## Default allowlist
148159

149-
Package repos (debian.org, ubuntu.com, pypi.org) and the
150-
mitmproxy CA endpoint (mitm.it) are trusted at the proxy level
151-
and need no allowlist rules.
160+
Package repos (deb.debian.org, pypi.org), the mitmproxy CA
161+
endpoint (mitm.it), and other infrastructure are included in the
162+
default allowlist.txt. All network access is governed by that
163+
single file — there are no hidden trusted domains.
152164

153165
- path: /etc/systemd/system/home-vm-shared.service
154166
content: |
@@ -169,6 +181,7 @@ packages:
169181
- curl
170182
- bindfs
171183
- git
184+
- docker.io
172185

173186
runcmd:
174187
- mkdir -p /mnt/9p /home/vm/shared

filter.py

Lines changed: 7 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,13 @@
11
"""
22
mitmproxy allowlist filter — controls what the VM can access.
33
4-
Traffic is filtered at two levels:
4+
All network access is governed by allowlist.txt. Each non-blank,
5+
non-comment line must be:
56
6-
1. **Trusted domains** (below): infrastructure the VM needs to function —
7-
package repos, CA cert endpoint. All HTTP methods and paths are allowed.
8-
Edit these only when changing system-level dependencies.
7+
METHOD https://hostname/path/pattern
98
10-
2. **User rules** (allowlist.txt): per-method, per-URL patterns that grant
11-
access to specific endpoints. Each non-blank, non-comment line must be:
12-
13-
METHOD https://hostname/path/pattern
14-
15-
Wildcards (*) are allowed only in the path, not in the hostname.
16-
The filter reloads the file automatically when it changes.
9+
Wildcards (*) are allowed only in the path, not in the hostname.
10+
The filter reloads the file automatically when it changes.
1711
"""
1812

1913
import json
@@ -24,26 +18,6 @@
2418

2519
from mitmproxy import http
2620

27-
# ── Trusted domains ─────────────────────────────────────────────────
28-
# Full-domain allowlist for system infrastructure. Patterns are
29-
# matched with re.fullmatch against the request hostname.
30-
TRUSTED_DOMAINS: list[str] = [
31-
# OS package repos — scoped to actual apt hostnames
32-
r".*\.debian\.org",
33-
"archive.ubuntu.com",
34-
"security.ubuntu.com",
35-
"ports.ubuntu.com",
36-
r".*\.archive\.ubuntu\.com",
37-
# Python package repos
38-
"pypi.org",
39-
r".*\.pypi\.org",
40-
"files.pythonhosted.org",
41-
# mitmproxy's magic domain that serves the CA cert
42-
"mitm.it",
43-
]
44-
45-
_trusted = [re.compile(p) for p in TRUSTED_DOMAINS]
46-
4721
# ── Paths ───────────────────────────────────────────────────────────
4822
ALLOWLIST_PATH = Path(__file__).parent / "allowlist.txt"
4923
BLOCKED_LOG = Path(__file__).parent / ".vm" / "blocked.jsonl"
@@ -109,13 +83,9 @@ def is_allowed(
10983
) -> bool:
11084
"""Return True if the request is permitted.
11185
112-
Checks trusted domains first (all methods/paths allowed), then user
113-
rules. A ``GET`` rule implicitly allows ``HEAD`` requests to the
114-
same URL pattern.
86+
A ``GET`` rule implicitly allows ``HEAD`` requests to the same URL
87+
pattern.
11588
"""
116-
if any(p.fullmatch(host) for p in _trusted):
117-
return True
118-
11989
req_path = urlparse(url).path or "/"
12090

12191
for rule_method, url_pattern in rules:

tests/test_e2e.py

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ def test_cloud_init_success(running_vm):
237237
SSH subprocess open during the entire cloud-init run (which includes
238238
package installation and can take several minutes in TCG mode).
239239
"""
240-
deadline = time.monotonic() + 300
240+
deadline = time.monotonic() + 600
241241
last_detail = ""
242242
while time.monotonic() < deadline:
243243
try:
@@ -262,7 +262,7 @@ def test_cloud_init_success(running_vm):
262262
if "status: error" in r.stdout:
263263
pytest.fail(f"cloud-init finished with errors:\n{r.stdout}")
264264
time.sleep(10)
265-
pytest.fail("cloud-init did not complete within 300s")
265+
pytest.fail("cloud-init did not complete within 600s")
266266

267267

268268
def test_curl_http_pypi_org(running_vm):
@@ -307,6 +307,31 @@ def test_curl_https_pypi_org(running_vm):
307307
)
308308

309309

310+
def test_docker_hello_world(running_vm):
311+
"""docker run hello-world should pull the image and print the greeting.
312+
313+
Exercises the Docker daemon's proxy configuration (systemd service
314+
override) and the Docker Hub allowlist rules. The daemon pulls the
315+
image through mitmproxy, then runs the container locally.
316+
"""
317+
_progress("Running docker hello-world (includes image pull)…")
318+
result = _vm_ssh(
319+
"docker run hello-world 2>&1",
320+
timeout=180,
321+
)
322+
if result.returncode != 0:
323+
_dump_logs()
324+
pytest.fail(
325+
f"docker run hello-world failed (rc={result.returncode})\n"
326+
f"stdout: {result.stdout[:1000]}\n"
327+
f"stderr: {result.stderr[:1000]}"
328+
)
329+
assert "Hello from Docker!" in result.stdout, (
330+
f"Expected 'Hello from Docker!' in output.\n"
331+
f"stdout: {result.stdout[:1000]}"
332+
)
333+
334+
310335
def test_blocked_domain(running_vm):
311336
"""Requests to domains not in filter.py's allowlist should be blocked with 403."""
312337
result = _vm_ssh(

tests/test_filter.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -107,15 +107,12 @@ def test_missing_file_returns_empty(self, tmp_path):
107107
# ---------------------------------------------------------------------------
108108

109109
class TestIsAllowed:
110-
def test_trusted_domain_allows_any_method(self):
111-
assert fm.is_allowed([], "GET", "pypi.org", "https://pypi.org/simple/")
112-
assert fm.is_allowed([], "POST", "pypi.org", "https://pypi.org/")
110+
def test_empty_rules_block_known_domains(self):
111+
"""With no rules, even well-known domains are blocked."""
112+
assert not fm.is_allowed([], "GET", "pypi.org", "https://pypi.org/simple/")
113+
assert not fm.is_allowed([], "GET", "deb.debian.org", "http://deb.debian.org/")
113114

114-
def test_trusted_domain_regex(self):
115-
assert fm.is_allowed([], "GET", "ftp.debian.org", "http://ftp.debian.org/")
116-
assert fm.is_allowed([], "GET", "security.debian.org", "http://security.debian.org/")
117-
118-
def test_non_trusted_domain_blocked(self):
115+
def test_non_matching_domain_blocked(self):
119116
assert not fm.is_allowed([], "GET", "example.com", "http://example.com/")
120117

121118
def test_method_url_rule_matching(self):

uv.lock

Lines changed: 0 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vm.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -438,7 +438,13 @@ def build_qemu_args(backend: Backend, memory: str) -> list[str]:
438438
def start_mitmproxy(proxy_port: int = PROXY_PORT) -> subprocess.Popen:
439439
"""Start mitmdump in the background, logging to .vm/mitmdump.log."""
440440
log_path = STATE_DIR / "mitmdump.log"
441-
cmd = ["mitmdump", "--listen-host", "127.0.0.1", "-p", str(proxy_port)]
441+
cmd = [
442+
"mitmdump", "--listen-host", "127.0.0.1", "-p", str(proxy_port),
443+
# Stream large responses instead of buffering them in memory.
444+
# Without this, a 200+ MB download (e.g. Claude Code binary) can
445+
# OOM the process — especially in a nested VM with limited RAM.
446+
"--set", "stream_large_bodies=1m",
447+
]
442448

443449
# If this host itself uses an upstream proxy (e.g. we're inside a sandboxed
444450
# VM), forward mitmproxy's own outbound traffic through it.

0 commit comments

Comments
 (0)