Skip to content

Commit 2a91287

Browse files
Pangjipingclaude
andauthored
feat(egress): wrap with supervisor + cleanup hook (#951)
* feat(egress): wrap with supervisor + cleanup hook Hard-crashed egress leaves stale iptables/nft rules and a zombie mitmdump holding port 18081; restarting the container then accumulates duplicate rules and the new mitmdump fails to bind, sending the in-process mitm watchdog (PR #942) into a retry loop. This change keeps the egress process under a dedicated supervisor so restarts are deterministic and the dirty state is reset on every launch and exit. components/internal/supervisor: new shared single-worker supervisor. Exponential backoff with jitter, pre-start / post-exit hooks (failures non-fatal), crashloop circuit breaker, JSONL event log. SIGTERM is forwarded to the worker with a configurable grace period before SIGKILL. Includes unit + integration tests using a re-exec'd test binary as a fake child. components/internal/cmd/supervisor: opensandbox-supervisor binary built from the same module; flag-driven, no new external deps. components/egress/scripts/cleanup.sh: best-effort, idempotent reset of the iptables DNS REDIRECT rules, transparent-HTTP rules, the nftables `opensandbox` table, and stray mitmdump processes. Hard contract: never exit non-zero so a misbehaving cleanup cannot block restarts. components/egress/Dockerfile: builds and installs the supervisor and the cleanup script alongside the egress binary under /opt/opensandbox-egress/, then switches the ENTRYPOINT to run the supervisor with cleanup as both pre-start and post-exit, grace 20s to cover the egress-internal shutdown budget. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style(supervisor): gofmt Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(supervisor): jitter-disable, env aliasing, cleanup dedupe, tests Self-review fixes: - BackoffJitter is now *float64 so callers can pass &zero to disable jitter explicitly. The previous default override turned 0 into 0.1, making "no jitter" impossible. cmd/supervisor exposes the value via --backoff-jitter (default 0.1). - Build hookEnv into a fresh slice instead of `append(spec.Env, ...)`, which could write into spec.Env's underlying array when cap > len. - Hoist delete_until_gone to file scope in cleanup.sh; remove the two inline duplicates. - Add cmd/supervisor argv tests: splitOnDoubleDash table cases, toHooks, openEventLog stderr + dir creation, eventLogDest label. - Add backoff tests covering jitter=0 and applyDefaults() pointer semantics. - Document signal handling in the package doc: SIGINT/SIGTERM trigger shutdown via ctx; SIGHUP / SIGUSR1 / SIGUSR2 / SIGWINCH / SIGQUIT are NOT forwarded. - Remove dead fakeClock.advance helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(egress): keep enforcement active during restart backoff PR review (#951, Codex P1) caught a real security regression: the post-exit cleanup hook was tearing down the iptables DNS REDIRECT rules and the `inet opensandbox` nft table before the supervisor slept for backoff and relaunched egress. Because the egress sidecar shares its network namespace with the workload it is meant to filter, that window left the workload with unfiltered egress instead of the stale default-deny rules continuing to protect it. With a worst-case crashloop budget of 10 launches over 5 minutes, that window can stretch to minutes. The fix is to leave netfilter state alone between runs: - Drop the post-exit hook entirely. The backoff window now keeps the previous run's enforcement rules in place. - Slim cleanup.sh to mitmdump-reaping only. iptables rule accumulation across many restarts is a slower-burn drift that egress's own SetupRedirect tolerates (first match wins); the nftables manager already prepends `delete table` to its ruleset script, so ApplyStatic is idempotent. Neither needs hook intervention. - Keep the pre-start mitmdump kill so the new egress can bind the transparent-MITM listen port without colliding with an orphan. (Codex P2 — zombie reaping when supervisor is PID 1 — is intentionally not addressed in this commit; it does not gate the security fix.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent ce17955 commit 2a91287

15 files changed

Lines changed: 1882 additions & 3 deletions

File tree

components/egress/Dockerfile

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ WORKDIR /workspace/components/egress
4040
# Static-ish build (no cgo by default) to simplify runtime deps.
4141
RUN go mod download
4242

43+
# Pre-download internal-module deps for the supervisor build below.
44+
RUN cd /workspace/components/internal && go mod download
45+
4346
# Copy the rest of the egress sources
4447
COPY components/egress ./
4548
RUN if [ -n "${CC}" ]; then export CC; fi; \
@@ -55,6 +58,23 @@ RUN if [ -n "${CC}" ]; then export CC; fi; \
5558
-X 'github.com/alibaba/opensandbox/internal/version.GitCommit=${GIT_COMMIT}'" \
5659
-o /out/egress .
5760

61+
# Build the opensandbox-supervisor binary from the internal module.
62+
# Installed alongside /egress so a future ENTRYPOINT switch can wrap egress
63+
# without changing this stage again.
64+
RUN cd /workspace/components/internal && \
65+
if [ -n "${CC}" ]; then export CC; fi; \
66+
if [ -n "${CXX}" ]; then export CXX; fi; \
67+
export CGO_ENABLED="${CGO_ENABLED}" \
68+
CGO_CFLAGS="${CGO_CFLAGS:-${CFLAGS}}" \
69+
CGO_CXXFLAGS="${CGO_CXXFLAGS:-${CXXFLAGS}}" \
70+
CGO_LDFLAGS="${CGO_LDFLAGS}"; \
71+
go build ${GOFLAGS} -trimpath -buildvcs=false \
72+
-ldflags "${LDFLAGS} -buildid= -B none \
73+
-X 'github.com/alibaba/opensandbox/internal/version.Version=${VERSION}' \
74+
-X 'github.com/alibaba/opensandbox/internal/version.BuildTime=${BUILD_TIME}' \
75+
-X 'github.com/alibaba/opensandbox/internal/version.GitCommit=${GIT_COMMIT}'" \
76+
-o /out/opensandbox-supervisor ./cmd/supervisor
77+
5878
FROM debian:bookworm-slim
5979

6080
# iptables is needed for DNS REDIRECT; ca-certificates for TLS to upstream resolvers
@@ -91,9 +111,29 @@ RUN useradd -r -u 10042 -d /var/lib/mitmproxy -s /usr/sbin/nologin mitmproxy \
91111
&& (command -v mitmdump && mitmdump --version) \
92112
&& mkdir -p /var/egress/mitmscripts
93113

94-
COPY --from=builder /out/egress /egress
114+
# All egress runtime artifacts live under one directory to keep paths grouped.
115+
COPY --from=builder /out/egress /opt/opensandbox-egress/egress
116+
COPY --from=builder /out/opensandbox-supervisor /opt/opensandbox-egress/supervisor
117+
# Pre-start hook: reap any mitmdump left over from a previous crashed
118+
# egress so the new launch can bind the transparent-MITM listen port.
119+
# Intentionally does NOT touch iptables/nft rules — the sidecar shares
120+
# a network namespace with the workload, so leaving rules in place keeps
121+
# egress filtering active across the supervisor's backoff window.
122+
COPY components/egress/scripts/cleanup.sh /opt/opensandbox-egress/cleanup.sh
123+
RUN chmod 0755 /opt/opensandbox-egress/cleanup.sh \
124+
/opt/opensandbox-egress/egress \
125+
/opt/opensandbox-egress/supervisor
95126

96127
COPY components/egress/mitmscripts /var/egress/mitmscripts
97128

98-
# Default entrypoint; expects OPENSANDBOX_NETWORK_POLICY env at runtime.
99-
ENTRYPOINT ["/egress"]
129+
# Supervisor wraps the egress binary: restarts on crash with backoff and
130+
# forwards SIGTERM gracefully. The cleanup hook runs only as pre-start;
131+
# running it on post-exit would tear down enforcement during the backoff
132+
# window and leave the workload unprotected.
133+
# Expects OPENSANDBOX_NETWORK_POLICY env at runtime.
134+
ENTRYPOINT ["/opt/opensandbox-egress/supervisor", \
135+
"--pre-start=/opt/opensandbox-egress/cleanup.sh", \
136+
"--name=egress", \
137+
"--grace-period=20s", \
138+
"--", \
139+
"/opt/opensandbox-egress/egress"]
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/sh
2+
# Copyright 2026 Alibaba Group Holding Ltd.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
# Pre-start hook for opensandbox-supervisor wrapping the egress worker.
17+
# Reaps any mitmdump left over from a previous crashed egress so the next
18+
# launch can bind the transparent-MITM listen port (default 18081).
19+
#
20+
# Scope is deliberately narrow:
21+
# * iptables NAT rules are NOT torn down here. The egress sidecar shares
22+
# a network namespace with the workload it protects; tearing rules
23+
# down between crashes would leave the workload with unfiltered egress
24+
# for the full backoff window. Egress's own SetupRedirect is additive
25+
# and tolerates pre-existing rules (first match wins).
26+
# * The `inet opensandbox` nft table is NOT touched here either. The
27+
# egress nftables manager already prepends `delete table inet
28+
# opensandbox` to its ruleset script, so ApplyStatic is idempotent.
29+
#
30+
# Hard contract: this script MUST NOT exit non-zero. A misbehaving cleanup
31+
# hook is worse than a stray mitmdump; supervisor would treat the hook
32+
# failure as a launch attempt and trip its crashloop budget faster.
33+
34+
# Intentionally no `set -e`. `set -u` for typo safety on env names only.
35+
set -u
36+
37+
log() { printf '[egress-cleanup] %s\n' "$*" >&2; }
38+
39+
# Wraps a command so non-zero exit is silently absorbed. Output goes to
40+
# stderr so it shows up in container logs without polluting the event log.
41+
try() { "$@" 2>&1 | sed 's/^/ /' >&2; return 0; }
42+
43+
# ─── stray mitmdump (orphaned after hard crash) ──────────────────────
44+
kill_stray_mitmdump() {
45+
command -v pkill >/dev/null 2>&1 || { log "pkill not present; skipping mitmdump reap"; return 0; }
46+
# mitmdump runs as the `mitmproxy` user (uid 10042 per egress Dockerfile).
47+
# `-u mitmproxy` scopes pkill to that uid so we never touch anything else;
48+
# `-f mitmdump` is the cmdline match safety net inside that uid.
49+
# SIGTERM first; give it a moment; SIGKILL anything that ignored TERM.
50+
try pkill -TERM -u mitmproxy -f mitmdump
51+
# Short sleep, but bounded so this hook still finishes inside the
52+
# supervisor's PreStartTimeout (default 30s) with plenty of headroom.
53+
sleep 1
54+
try pkill -KILL -u mitmproxy -f mitmdump
55+
log "stray mitmdump processes reaped (best-effort)"
56+
}
57+
58+
main() {
59+
log "starting (worker_exit_code=${WORKER_EXIT_CODE:-?} signal=${WORKER_SIGNAL:-?} attempt=${WORKER_ATTEMPT:-?})"
60+
kill_stray_mitmdump
61+
log "done"
62+
exit 0
63+
}
64+
65+
# Trap unexpected interpreter errors so we still exit 0.
66+
trap 'log "cleanup hit shell error on line $LINENO; exiting 0 anyway"; exit 0' HUP INT TERM
67+
main "$@" || true
68+
exit 0
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
// Copyright 2026 Alibaba Group Holding Ltd.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
// Command opensandbox-supervisor wraps a single worker process with restart
16+
// backoff, lifecycle hooks, and a structured event log. It is designed to
17+
// run as a container ENTRYPOINT or as a child of another process; it does
18+
// not assume PID 1 and performs no zombie reaping.
19+
//
20+
// Usage:
21+
//
22+
// opensandbox-supervisor [flags] -- <worker-cmd> [worker-args...]
23+
package main
24+
25+
import (
26+
"context"
27+
"errors"
28+
"flag"
29+
"fmt"
30+
"io"
31+
"os"
32+
"os/signal"
33+
"path/filepath"
34+
"syscall"
35+
"time"
36+
37+
"github.com/alibaba/opensandbox/internal/logger"
38+
"github.com/alibaba/opensandbox/internal/supervisor"
39+
"github.com/alibaba/opensandbox/internal/version"
40+
"gopkg.in/natefinch/lumberjack.v2"
41+
)
42+
43+
// multiFlag collects a repeatable flag into a string slice.
44+
type multiFlag []string
45+
46+
func (m *multiFlag) String() string { return fmt.Sprintf("%v", *m) }
47+
func (m *multiFlag) Set(v string) error { *m = append(*m, v); return nil }
48+
49+
func main() {
50+
version.EchoVersion("OpenSandbox Supervisor")
51+
52+
var (
53+
preStart multiFlag
54+
postExit multiFlag
55+
eventLog string
56+
backoffMin time.Duration
57+
backoffMax time.Duration
58+
backoffJitter float64
59+
stableAfter time.Duration
60+
burstWindow time.Duration
61+
burstMax int
62+
onBurst bool
63+
grace time.Duration
64+
preTimeout time.Duration
65+
postTimeout time.Duration
66+
name string
67+
logLevel string
68+
)
69+
70+
fs := flag.NewFlagSet("opensandbox-supervisor", flag.ExitOnError)
71+
fs.Var(&preStart, "pre-start", "Executable to run before each worker launch (repeatable). No shell expansion; wrap in a script if needed.")
72+
fs.Var(&postExit, "post-exit", "Executable to run after each worker exit (repeatable). Receives WORKER_* env. Failures are logged, not fatal.")
73+
fs.StringVar(&eventLog, "event-log", "", "Path to JSONL event log. Empty = stderr.")
74+
fs.DurationVar(&backoffMin, "backoff-min", time.Second, "Minimum restart backoff.")
75+
fs.DurationVar(&backoffMax, "backoff-max", 30*time.Second, "Maximum restart backoff (exponential capped here).")
76+
fs.Float64Var(&backoffJitter, "backoff-jitter", 0.1, "Backoff jitter fraction (0 disables, e.g. 0.1 = ±10%). Negative clamped to 0.")
77+
fs.DurationVar(&stableAfter, "stable-after", 60*time.Second, "Worker uptime after which backoff resets.")
78+
fs.DurationVar(&burstWindow, "burst-window", 5*time.Minute, "Crashloop budget sliding window.")
79+
fs.IntVar(&burstMax, "burst-max", 10, "Max launches inside burst-window before tripping the breaker.")
80+
fs.BoolVar(&onBurst, "on-burst-exit", true, "true: supervisor exits non-zero when the burst budget trips, so a higher-level supervisor (e.g. kubelet) reacts. false: keep retrying.")
81+
fs.DurationVar(&grace, "grace-period", 10*time.Second, "Time between SIGTERM and SIGKILL when shutting the worker down.")
82+
fs.DurationVar(&preTimeout, "pre-start-timeout", 30*time.Second, "Timeout for each pre-start hook.")
83+
fs.DurationVar(&postTimeout, "post-exit-timeout", 30*time.Second, "Timeout for each post-exit hook.")
84+
fs.StringVar(&name, "name", "", "Worker name shown in logs and events (default: basename of the worker cmd).")
85+
fs.StringVar(&logLevel, "log-level", "info", "Supervisor diagnostic log level (debug|info|warn|error).")
86+
87+
args := os.Args[1:]
88+
workerArgs := splitOnDoubleDash(&args)
89+
if err := fs.Parse(args); err != nil {
90+
os.Exit(2)
91+
}
92+
if len(workerArgs) == 0 {
93+
fmt.Fprintln(os.Stderr, "opensandbox-supervisor: missing worker command after `--`")
94+
fs.Usage()
95+
os.Exit(2)
96+
}
97+
98+
log := logger.MustNew(logger.Config{Level: logLevel}).Named("supervisor")
99+
defer log.Sync()
100+
101+
eventWriter, closer, err := openEventLog(eventLog)
102+
if err != nil {
103+
log.Errorf("event log: %v", err)
104+
os.Exit(2)
105+
}
106+
defer closer()
107+
108+
spec := supervisor.Spec{
109+
Name: name,
110+
Cmd: workerArgs[0],
111+
Args: workerArgs[1:],
112+
PreStart: toHooks(preStart),
113+
PostExit: toHooks(postExit),
114+
BackoffMin: backoffMin,
115+
BackoffMax: backoffMax,
116+
BackoffJitter: &backoffJitter,
117+
StableAfter: stableAfter,
118+
BurstWindow: burstWindow,
119+
BurstMax: burstMax,
120+
OnBurstExit: &onBurst,
121+
GracePeriod: grace,
122+
PreStartTimeout: preTimeout,
123+
PostExitTimeout: postTimeout,
124+
EventLog: eventWriter,
125+
Logger: log,
126+
}
127+
if spec.Name == "" {
128+
spec.Name = filepath.Base(spec.Cmd)
129+
}
130+
131+
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
132+
defer cancel()
133+
134+
log.Infof("supervising %q (event-log=%s)", spec.Cmd, eventLogDest(eventLog))
135+
err = supervisor.Run(ctx, spec)
136+
switch {
137+
case err == nil, errors.Is(err, context.Canceled):
138+
os.Exit(0)
139+
case errors.Is(err, supervisor.ErrBurstExceeded):
140+
log.Errorf("supervisor: %v", err)
141+
os.Exit(1)
142+
default:
143+
log.Errorf("supervisor: %v", err)
144+
os.Exit(2)
145+
}
146+
}
147+
148+
// splitOnDoubleDash takes everything after the first "--" as the worker
149+
// argv and trims the supervisor flag slice in place.
150+
func splitOnDoubleDash(args *[]string) []string {
151+
for i, a := range *args {
152+
if a == "--" {
153+
worker := append([]string(nil), (*args)[i+1:]...)
154+
*args = (*args)[:i]
155+
return worker
156+
}
157+
}
158+
return nil
159+
}
160+
161+
func toHooks(paths []string) []supervisor.Hook {
162+
if len(paths) == 0 {
163+
return nil
164+
}
165+
out := make([]supervisor.Hook, 0, len(paths))
166+
for _, p := range paths {
167+
out = append(out, supervisor.Hook{Argv: []string{p}})
168+
}
169+
return out
170+
}
171+
172+
func openEventLog(path string) (io.Writer, func(), error) {
173+
if path == "" {
174+
return os.Stderr, func() {}, nil
175+
}
176+
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
177+
return nil, nil, fmt.Errorf("mkdir %s: %w", filepath.Dir(path), err)
178+
}
179+
lj := &lumberjack.Logger{
180+
Filename: path,
181+
MaxSize: logger.DefaultRotateMaxSize,
182+
MaxAge: logger.DefaultRotateMaxAge,
183+
MaxBackups: logger.DefaultRotateMaxBackups,
184+
Compress: true,
185+
}
186+
return lj, func() { _ = lj.Close() }, nil
187+
}
188+
189+
func eventLogDest(path string) string {
190+
if path == "" {
191+
return "stderr"
192+
}
193+
return path
194+
}

0 commit comments

Comments
 (0)