Skip to content

Commit d980b3b

Browse files
Pangjipingclaude
andauthored
docs(egress): add supervisor docs, fix API/config documentation gaps (#984)
- Add opensandbox-supervisor README (components/internal/supervisor/) covering all flags, backoff, crashloop breaker, hooks, event log schema - Add mitmproxy process supervisor (crash recovery) section with generation tagging, health gate, and observability docs - Fix egress README: Go 1.24 → 1.25, add PUT/DELETE/healthz endpoints, add always-rules files, add DNS upstream env vars, add SSL_INSECURE - Egress README supervisor section now links to internal supervisor docs Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 54f516e commit d980b3b

3 files changed

Lines changed: 266 additions & 4 deletions

File tree

components/egress/README.md

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,18 +55,46 @@ Optional advanced features:
5555
- Nameserver bypass: `OPENSANDBOX_EGRESS_NAMESERVER_EXEMPT`
5656
- Denied hostname webhook: `OPENSANDBOX_EGRESS_DENY_WEBHOOK`, `OPENSANDBOX_EGRESS_SANDBOX_ID`
5757
- DoH/DoT controls: `OPENSANDBOX_EGRESS_BLOCK_DOH_443`, `OPENSANDBOX_EGRESS_DOH_BLOCKLIST`
58+
- Custom DNS upstream: `OPENSANDBOX_EGRESS_DNS_UPSTREAM` (comma-separated IPs, optional `:port`), `OPENSANDBOX_EGRESS_DNS_UPSTREAM_TIMEOUT` (default `5` seconds)
59+
- DNS upstream health probe: `OPENSANDBOX_EGRESS_DNS_UPSTREAM_PROBE` (enable), `OPENSANDBOX_EGRESS_DNS_UPSTREAM_PROBE_INTERVAL_SEC`
60+
61+
### Always-Rules Files
62+
63+
Static rule files under `/var/egress/rules/` are loaded at startup and take priority over dynamic API rules:
64+
65+
| File | Purpose |
66+
|------|---------|
67+
| `/var/egress/rules/deny.always` | Domains always denied, overrides user and allow rules |
68+
| `/var/egress/rules/allow.always` | Domains always allowed, overrides user rules |
69+
| `/var/egress/rules/log_skip.always` | Domain patterns whose DNS blocks are not logged (noise reduction) |
70+
71+
Format: one domain per line (supports wildcards like `*.example.com`). Lines starting with `#` are comments. Missing files are silently ignored.
72+
73+
Rule precedence: `deny.always` > `allow.always` > user policy (API/env).
74+
75+
Always-rules are hot-reloaded: the sidecar polls the files once per minute and applies changes without restart.
5876

5977
### Runtime HTTP API
6078

61-
- `GET /policy`: get current policy
62-
- `POST /policy`: replace policy (`{}`, `null`, empty body => reset to deny-all)
63-
- `PATCH /policy`: merge/append rules (body is JSON array of egress rules)
79+
| Method | Path | Description |
80+
|--------|------|-------------|
81+
| `GET` | `/policy` | Get current policy and enforcement mode |
82+
| `POST` | `/policy` | Replace policy (`{}`, `null`, empty body => reset to deny-all) |
83+
| `PUT` | `/policy` | Alias for `POST` |
84+
| `PATCH` | `/policy` | Merge/append rules (body is JSON array of egress rules) |
85+
| `DELETE` | `/policy` | Remove specific targets (body is JSON string array, e.g. `["*.example.com"]`) |
86+
| `GET` | `/healthz` | Health check; returns `200 ok` or `503 mitmproxy not ready` (when transparent MITM is enabled but not yet initialized) |
6487

6588
Quick example:
6689

6790
```bash
91+
# Replace policy
6892
curl -XPOST http://127.0.0.1:18080/policy \
6993
-d '{"defaultAction":"deny","egress":[{"action":"allow","target":"*.example.com"}]}'
94+
95+
# Remove specific targets
96+
curl -XDELETE http://127.0.0.1:18080/policy \
97+
-d '["*.example.com"]'
7098
```
7199

72100
### Experimental: Transparent MITM (mitmproxy)
@@ -128,7 +156,7 @@ curl -I https://github.com
128156

129157
## Development
130158

131-
- **Language**: Go 1.24+
159+
- **Language**: Go 1.25+
132160
- **Key Packages**:
133161
- `pkg/dnsproxy`: DNS server and policy matching logic.
134162
- `pkg/iptables`: `iptables` rule management.
@@ -152,6 +180,21 @@ An end-to-end benchmark compares **dns** (pass-through, no nft write) and **dns+
152180

153181
More details in [docs/benchmark.md](docs/benchmark.md).
154182

183+
## Process Supervisor
184+
185+
The egress container runs under [`opensandbox-supervisor`](../../components/internal/supervisor/README.md), a lightweight process wrapper that restarts the egress worker on crash with exponential backoff, a crashloop circuit breaker, and structured JSONL event logging.
186+
187+
```
188+
ENTRYPOINT: supervisor --pre-start=cleanup.sh --name=egress --grace-period=20s -- /opt/opensandbox-egress/egress
189+
```
190+
191+
Egress-specific configuration:
192+
193+
- **`--grace-period=20s`**: Egress needs extra time to drain DNS connections and tear down iptables/nft rules on shutdown (default is 10 s).
194+
- **Pre-start hook** (`cleanup.sh`): Reaps orphaned `mitmdump` processes from a previous crash so the new egress can bind the MITM listen port. Intentionally does NOT tear down iptables/nft rules — keeping enforcement active during the backoff window protects the workload.
195+
196+
For full supervisor documentation (all flags, backoff behavior, crashloop breaker, event log schema, library API), see the [supervisor README](../../components/internal/supervisor/README.md).
197+
155198
## Troubleshooting
156199

157200
- **"iptables setup failed"**: ensure sidecar has `--cap-add=NET_ADMIN`.

components/egress/docs/mitmproxy-transparent.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com;.*\.exa
4747
| `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` | No | Host/IP regex list for TLS pass-through (`;` separated) | Empty |
4848
| `OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR` | No | mitm config and CA directory (passed as `--set confdir=`, also used as `HOME`) | Default directory under `/var/lib/mitmproxy` |
4949
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style) | `/etc/ssl/certs` |
50+
| `OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE` | No | Skip upstream TLS certificate verification (`1/true/on`). Needed when clients connect by IP (no SNI → hostname mismatch). | Disabled |
5051

5152
Notes:
5253

@@ -112,3 +113,59 @@ Limits:
112113
- Currently IPv4 `iptables` only; IPv6 is not automatically handled.
113114
- Non-Linux environments (for example local macOS runtime) are not supported for transparent mode.
114115
- Full HTTPS decryption introduces CPU/memory and certificate trust overhead; benchmark before production rollout.
116+
117+
## Process Supervisor (Crash Recovery)
118+
119+
The egress sidecar includes a built-in supervisor that monitors the `mitmdump` child process and automatically restarts it on unexpected exits.
120+
121+
### Restart behavior
122+
123+
When `mitmdump` exits unexpectedly, the supervisor restarts it with **exponential backoff**: 1 s, 2 s, 4 s, ..., capped at **30 s**. Retries continue indefinitely until the process starts successfully or the egress sidecar itself shuts down.
124+
125+
A successful restart requires two conditions:
126+
127+
1. `mitmdump` process starts without error.
128+
2. The listen port (`127.0.0.1:<port>`) accepts TCP connections within 15 seconds.
129+
130+
If the listener does not come up in time, the half-started process is gracefully terminated (SIGTERM → wait → SIGKILL) before the next attempt, so the port is released cleanly.
131+
132+
### Generation tagging
133+
134+
Each `mitmdump` launch is assigned a monotonically increasing **generation number**. When a process exits, the exit event carries the generation it was launched with. The supervisor compares this against the currently-live generation:
135+
136+
- **Match**: the live process just died — trigger restart.
137+
- **Mismatch**: a stale process from a previous failed attempt was reaped — ignore.
138+
139+
This prevents restart storms where multiple rapid failures queue up cascading restart attempts.
140+
141+
### Health gate integration
142+
143+
When transparent mitmproxy is enabled:
144+
145+
- `/healthz` returns **503** until the full mitm stack is ready (process started, listener up, iptables installed, CA exported).
146+
- On crash, the health gate is set back to not-ready (503) immediately.
147+
- After a successful restart and listener readiness, the health gate is restored.
148+
149+
Kubernetes readiness probes that hit `/healthz` will stop routing traffic to the sandbox during the restart window.
150+
151+
### Graceful shutdown
152+
153+
When the egress sidecar receives `SIGTERM` or `SIGINT`:
154+
155+
1. The supervisor watcher goroutine exits (context cancelled).
156+
2. `iptables` transparent redirect rules are removed.
157+
3. `mitmdump` receives `SIGTERM`; if it does not exit within 5 seconds, `SIGKILL` is sent.
158+
159+
Any `OnExit` callbacks still blocked on the restart channel are unblocked via a dedicated shutdown channel, preventing goroutine leaks.
160+
161+
### Observability
162+
163+
All supervisor activity is logged with the `[mitmproxy]` prefix:
164+
165+
| Log pattern | Meaning |
166+
|-------------|---------|
167+
| `mitmdump exited (gen=N): <error>; restarting...` | Live process crashed; restart initiated |
168+
| `ignoring stale exit event (gen=N, current=M)` | Old generation reaped; no action needed |
169+
| `restart attempt N failed; retrying in Xs` | Launch or listener wait failed; backing off |
170+
| `mitmdump restarted (pid P, gen N, attempt M)` | Successful restart |
171+
| `dropping exit event during shutdown` | Exit event discarded because egress is shutting down |
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# opensandbox-supervisor
2+
3+
A lightweight process supervisor that wraps a single worker with restart backoff, lifecycle hooks, a crashloop circuit breaker, and a structured event log. Designed to run as a container `ENTRYPOINT` or as a child of another process; it does not assume PID 1 and performs no zombie reaping.
4+
5+
## Usage
6+
7+
```
8+
opensandbox-supervisor [flags] -- <worker-cmd> [worker-args...]
9+
```
10+
11+
Everything after `--` is the worker command. The supervisor starts the worker, monitors it, and restarts it on unexpected exits.
12+
13+
### Example (egress sidecar)
14+
15+
```dockerfile
16+
ENTRYPOINT ["/opt/opensandbox-egress/supervisor", \
17+
"--pre-start=/opt/opensandbox-egress/cleanup.sh", \
18+
"--name=egress", \
19+
"--grace-period=20s", \
20+
"--", \
21+
"/opt/opensandbox-egress/egress"]
22+
```
23+
24+
## Flags
25+
26+
| Flag | Default | Description |
27+
|------|---------|-------------|
28+
| `--pre-start` | _(none)_ | Executable to run before each worker launch (repeatable). No shell expansion; wrap in a script if needed. |
29+
| `--post-exit` | _(none)_ | Executable to run after each worker exit (repeatable). Receives `WORKER_*` env vars. Failures are logged, not fatal. |
30+
| `--event-log` | stderr | Path to JSONL event log file. Supports rotation via lumberjack. |
31+
| `--backoff-min` | `1s` | Minimum restart backoff. |
32+
| `--backoff-max` | `30s` | Maximum restart backoff (exponential growth capped here). |
33+
| `--backoff-jitter` | `0.1` | Jitter fraction (±10%). Set to `0` to disable. |
34+
| `--stable-after` | `60s` | Worker uptime after which backoff resets to minimum. |
35+
| `--burst-window` | `5m` | Sliding window for crashloop detection. |
36+
| `--burst-max` | `10` | Maximum launches allowed within `burst-window` before the breaker trips. |
37+
| `--on-burst-exit` | `true` | `true`: supervisor exits non-zero when burst budget trips (lets kubelet react). `false`: keep retrying indefinitely. |
38+
| `--grace-period` | `10s` | Time between SIGTERM and SIGKILL when shutting the worker down. |
39+
| `--pre-start-timeout` | `30s` | Timeout for each pre-start hook execution. |
40+
| `--post-exit-timeout` | `30s` | Timeout for each post-exit hook execution. |
41+
| `--name` | _(basename of worker cmd)_ | Worker name shown in logs and events. |
42+
| `--log-level` | `info` | Supervisor diagnostic log level (`debug`\|`info`\|`warn`\|`error`). |
43+
44+
## Restart Behavior
45+
46+
### Exponential Backoff
47+
48+
When the worker exits unexpectedly, the supervisor sleeps before restarting:
49+
50+
```
51+
1s → 2s → 4s → 8s → 16s → 30s → 30s → ...
52+
```
53+
54+
Each delay is perturbed by ±`backoff-jitter` (default ±10%) to avoid thundering herds. After the worker has been alive at least `stable-after` (default 60 s), the backoff resets to `backoff-min`.
55+
56+
### Crashloop Circuit Breaker
57+
58+
A sliding-window counter tracks launches. If more than `burst-max` (default 10) launches occur within `burst-window` (default 5 min), the supervisor either:
59+
60+
- **Exits non-zero** (`--on-burst-exit=true`, default) — surfacing the crashloop via Kubernetes pod status instead of silently retrying.
61+
- **Continues retrying** (`--on-burst-exit=false`) — for environments without an outer restart supervisor.
62+
63+
## Lifecycle Hooks
64+
65+
### Pre-start hooks
66+
67+
Run **before each worker launch**. A non-zero exit aborts that launch attempt and counts toward the crashloop budget. Use for cleanup tasks like reaping orphaned child processes from a previous crash.
68+
69+
### Post-exit hooks
70+
71+
Run **after the worker has been reaped**. Failures are logged but do not block the restart loop. Post-exit hooks run to completion even during shutdown (bounded by `--post-exit-timeout`) so cleanup paths are not aborted.
72+
73+
Post-exit hooks receive these environment variables:
74+
75+
| Variable | Description |
76+
|----------|-------------|
77+
| `WORKER_EXIT_CODE` | Worker's exit code (`-1` if not available) |
78+
| `WORKER_SIGNAL` | Signal name if worker was signaled (e.g. `terminated`, `killed`) |
79+
| `WORKER_DURATION_MS` | Wall-clock worker runtime in milliseconds |
80+
| `WORKER_PID` | Worker's PID |
81+
| `WORKER_ATTEMPT` | Launch attempt number (1-based) |
82+
83+
## Graceful Shutdown
84+
85+
On context cancellation (typically from `SIGTERM` or `SIGINT`):
86+
87+
1. Supervisor sends `SIGTERM` to the worker.
88+
2. Waits up to `--grace-period` for the worker to exit on its own.
89+
3. Sends `SIGKILL` if the worker does not exit in time.
90+
91+
### Signal Handling
92+
93+
- The supervisor does **not** install `signal.Notify` itself; the caller (e.g. `cmd/supervisor/main.go`) translates OS signals into context cancellation.
94+
- `SIGINT` and `SIGTERM` both result in `SIGTERM` to the worker.
95+
- Other signals (`SIGHUP`, `SIGUSR1`, etc.) are **not forwarded**. Add forwarding in the caller if the worker needs them.
96+
97+
### Process Group Isolation
98+
99+
The worker is started with `Setpgid=true` on Unix so signals delivered to the supervisor's process group do not reach the worker by side channel. The supervisor signals the worker explicitly via its PID.
100+
101+
## Structured Event Log
102+
103+
One JSONL record per lifecycle event, written to stderr by default or to the file specified by `--event-log` (with automatic rotation).
104+
105+
### Event Kinds
106+
107+
| Event | When | Key Fields |
108+
|-------|------|------------|
109+
| `start` | Worker process launched | `pid`, `gen`, `attempt` |
110+
| `exit` | Worker exited | `pid`, `gen`, `attempt`, `exit_code`, `signal`, `duration_ms`, `reason` |
111+
| `prestart` | Pre-start hook ran | `hook`, `exit_code`, `duration_ms` |
112+
| `postexit` | Post-exit hook ran | `hook`, `exit_code`, `duration_ms` |
113+
| `backoff` | Sleeping before next restart | `sleep_ms`, `next_attempt` |
114+
| `stable` | Worker uptime exceeded `stable-after`; backoff reset | `pid`, `gen`, `duration_ms`, `reset_backoff` |
115+
| `burst_exit` | Crashloop budget exceeded | `attempts`, `window` |
116+
| `shutdown` | Supervisor shutting down | `reason` |
117+
118+
### Example Events
119+
120+
```jsonl
121+
{"ts":"2026-01-15T10:30:00Z","name":"egress","event":"start","pid":42,"gen":1,"attempt":1}
122+
{"ts":"2026-01-15T10:30:00.15Z","name":"egress","event":"exit","pid":42,"gen":1,"attempt":1,"exit_code":1,"duration_ms":150,"reason":"crashed"}
123+
{"ts":"2026-01-15T10:30:00.15Z","name":"egress","event":"backoff","sleep_ms":1000,"next_attempt":2}
124+
{"ts":"2026-01-15T10:30:01.15Z","name":"egress","event":"prestart","hook":"cleanup.sh","exit_code":0,"duration_ms":50}
125+
{"ts":"2026-01-15T10:30:01.2Z","name":"egress","event":"start","pid":43,"gen":2,"attempt":2}
126+
```
127+
128+
### Exit Reasons
129+
130+
| Reason | Meaning |
131+
|--------|---------|
132+
| `exited` | Worker exited with code 0 |
133+
| `crashed` | Worker exited with non-zero code |
134+
| `signaled` | Worker killed by signal |
135+
| `shutdown` | Supervisor-initiated stop (context cancelled) |
136+
| `launch_failed` | Worker binary could not be started |
137+
| `no_processstate` | Unexpected: no process state available |
138+
139+
## Library Usage
140+
141+
The `internal/supervisor` package can be used programmatically:
142+
143+
```go
144+
import "github.com/alibaba/opensandbox/internal/supervisor"
145+
146+
spec := supervisor.Spec{
147+
Name: "my-worker",
148+
Cmd: "/usr/local/bin/worker",
149+
Args: []string{"--config", "/etc/worker.toml"},
150+
PreStart: []supervisor.Hook{{Argv: []string{"/usr/local/bin/cleanup.sh"}}},
151+
BackoffMin: time.Second,
152+
BackoffMax: 30 * time.Second,
153+
GracePeriod: 15 * time.Second,
154+
}
155+
156+
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
157+
defer cancel()
158+
159+
err := supervisor.Run(ctx, spec)
160+
```
161+
162+
`Run` blocks until context cancellation or `ErrBurstExceeded`. Zero-valued fields receive sensible defaults (see Flags table above for values).

0 commit comments

Comments
 (0)