Skip to content

Commit e74b059

Browse files
committed
refactor(egress): split mitmproxy config into yaml (static) vs env (dynamic)
Move fleet-wide, rarely-changing mitmproxy options into a baked-in config.yaml under the standard mitm confdir layout, so launch.go only emits per-deployment dynamic overrides via --set. This eliminates two classes of bug along the way: - stream_large_bodies was set in two places (launch.go --set 1m and custom.py ctx.options 10m), with the addon silently winning — making the launch.go line dead code. Now declared once in config.yaml (10m). - ignore_hosts was env-driven with `;`-separated values, but each value was passed as a separate --set, and mitmproxy --set on a list option REPLACES the list — so configuring multiple bypass patterns silently only kept the last one. config.yaml uses a native YAML list with no override semantics. Static options now in /var/lib/mitmproxy/.mitmproxy/config.yaml: mode, listen_host, connection_strategy (eager), stream_large_bodies (10m), http2, ignore_hosts (empty default), ssl_verify_upstream_trusted_confdir (default). Dynamic overrides remain env-driven and applied as --set in launch.go (precedence: --set > config.yaml > mitm defaults): OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT (toggle) OPENSANDBOX_EGRESS_MITMPROXY_PORT OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR Removed env vars (no internal use, replaced by config.yaml): OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR — confdir is the mitm user's home (/var/lib/mitmproxy), which is also where config.yaml lives; splitting them via env created an unused escape hatch that would have broken config.yaml discovery. OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS — replaced by ignore_hosts in config.yaml (native list, no covert-overwrite bug). The mitmproxy.Config struct loses its ConfDir field accordingly. SyncRootCA still accepts an optional confDirEnv argument so the existing candidate-path search behavior is preserved if a future caller needs to plumb it back in.
1 parent 2a91287 commit e74b059

6 files changed

Lines changed: 144 additions & 48 deletions

File tree

components/egress/Dockerfile

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,13 +104,22 @@ RUN apt-get update \
104104
&& rm -rf /var/lib/apt/lists/*
105105

106106
# Python mitmproxy (transparent mode): mitmdump runs as user mitmproxy; iptables skips this uid.
107+
# /var/lib/mitmproxy is mitm's home, used as the confdir (CA + config.yaml live under .mitmproxy/).
107108
RUN useradd -r -u 10042 -d /var/lib/mitmproxy -s /usr/sbin/nologin mitmproxy \
108-
&& mkdir -p /var/lib/mitmproxy \
109-
&& chown mitmproxy:mitmproxy /var/lib/mitmproxy \
109+
&& mkdir -p /var/lib/mitmproxy/.mitmproxy \
110+
&& chown -R mitmproxy:mitmproxy /var/lib/mitmproxy \
110111
&& pip3 install --no-cache-dir --break-system-packages 'mitmproxy>=10,<11' \
111112
&& (command -v mitmdump && mitmdump --version) \
112113
&& mkdir -p /var/egress/mitmscripts
113114

115+
# Static mitmproxy options (mode, listen_host, connection_strategy, stream_large_bodies,
116+
# http2, ignore_hosts, ssl_verify_upstream_trusted_confdir). mitmdump auto-loads
117+
# config.yaml from its confdir. Dynamic per-deployment options stay env-driven and
118+
# are applied as --set by launch.go (which overrides values declared here).
119+
COPY components/egress/mitmproxy/config.yaml /var/lib/mitmproxy/.mitmproxy/config.yaml
120+
RUN chown mitmproxy:mitmproxy /var/lib/mitmproxy/.mitmproxy/config.yaml \
121+
&& chmod 0644 /var/lib/mitmproxy/.mitmproxy/config.yaml
122+
114123
# All egress runtime artifacts live under one directory to keep paths grouped.
115124
COPY --from=builder /out/egress /opt/opensandbox-egress/egress
116125
COPY --from=builder /out/opensandbox-supervisor /opt/opensandbox-egress/supervisor

components/egress/docs/mitmproxy-transparent.md

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -32,28 +32,48 @@ export OPENSANDBOX_EGRESS_MITMPROXY_PORT=18081
3232

3333
# Optional: load an additional user-defined mitm addon (loaded after the system addon)
3434
export OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT=/path/to/your/addon.py
35-
36-
# Optional: bypass decryption for selected domains (semicolon-separated regex list)
37-
export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com;.*\.example\.internal'
3835
```
3936

37+
To bypass decryption for selected domains, edit the baked-in
38+
`components/egress/mitmproxy/config.yaml` and rebuild the image — see
39+
"Static Configuration (config.yaml)" below.
40+
4041
## Configuration Reference
4142

43+
### Environment Variables (Per-Deployment Overrides)
44+
4245
| Variable | Required | Purpose | Default |
4346
|------|----------|------|--------|
4447
| `OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT` | Yes | Enable transparent mitmproxy (`1/true/on`, etc.) | Disabled |
4548
| `OPENSANDBOX_EGRESS_MITMPROXY_PORT` | No | mitmdump listen port; `iptables` redirects `80/443` here | `18081` |
4649
| `OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT` | No | Additional user mitm addon script path (`-s`); loaded after the system addon | Empty |
47-
| `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` | No | Host/IP regex list for TLS pass-through (`;` separated) | Empty |
48-
| `OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR` | No | mitm config and CA directory (passed as `--set confdir=`, also used as `HOME`) | Default directory under `/var/lib/mitmproxy` |
49-
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style) | `/etc/ssl/certs` |
50+
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style); overrides the config.yaml default | `/etc/ssl/certs` |
51+
| `OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE` | No | Skip upstream TLS verification (`1/true/on`); use when clients connect by IP and SNI is unavailable | Disabled |
5052

5153
Notes:
5254

53-
- `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` means **no decryption**, not “completely bypass mitm process”.
5455
- In transparent mode, mitmproxy generally recommends matching by IP/range; verify SNI/resolve behavior if using domain regex only.
5556
- Before mitm, `iptables`, and CA export are ready, `GET /healthz` returns `503 (mitm not ready)` to prevent premature readiness.
5657

58+
### Static Configuration (config.yaml)
59+
60+
Fleet-wide, rarely-changing mitm options live in
61+
`components/egress/mitmproxy/config.yaml`, baked into the image at
62+
`/var/lib/mitmproxy/.mitmproxy/config.yaml` and auto-loaded by mitmdump.
63+
This is the single source of truth for:
64+
65+
- `mode` (`transparent`)
66+
- `listen_host` (`127.0.0.1`)
67+
- `connection_strategy` (`eager`)
68+
- `stream_large_bodies` (`10m`)
69+
- `http2` (`true`)
70+
- `ignore_hosts` (regex list for TLS pass-through; empty by default — append entries here rather than via env, because `--set` on a list option REPLACES the entire list)
71+
- `ssl_verify_upstream_trusted_confdir` (default `/etc/ssl/certs`; overridable per-deployment via env)
72+
73+
Precedence: command-line `--set` (from env overrides) > `config.yaml` > mitmproxy built-in defaults.
74+
75+
To change a static option for the whole fleet: edit `config.yaml`, rebuild the egress image, redeploy. To bypass decryption for a specific host **temporarily** in one deployment, the option is to edit and remount `config.yaml` rather than pass an env override.
76+
5777
## Common Configuration Templates
5878

5979
### 1) Enable Transparent MITM Only
@@ -81,11 +101,18 @@ The user addon is loaded after the system addon (`-s system.py -s user.py`), so
81101

82102
### 4) Bypass Decryption for Specific Domains (e.g. log upload)
83103

84-
```bash
85-
export OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT=true
86-
export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com'
104+
Edit `components/egress/mitmproxy/config.yaml` and append to `ignore_hosts`,
105+
then rebuild the egress image:
106+
107+
```yaml
108+
ignore_hosts:
109+
- '.*\.log\.aliyuncs\.com'
87110
```
88111
112+
`ignore_hosts` means **no decryption**, not "completely bypass mitm process":
113+
mitm still proxies the TCP connection, it just forwards bytes without
114+
breaking TLS, and addons do not see request/response content.
115+
89116
### 5) Use a Fixed CA (consistent fingerprint across replicas)
90117

91118
If CA files already exist in `confdir`, mitmproxy reuses them instead of regenerating on each startup. Typical paths:
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Baked-in static mitmproxy options for the OpenSandbox egress sidecar.
2+
#
3+
# Loaded automatically by mitmdump from <confdir>/config.yaml. The mitmproxy
4+
# user's home is /var/lib/mitmproxy, so the effective path inside the image is
5+
# /var/lib/mitmproxy/.mitmproxy/config.yaml (mitm's default confdir layout).
6+
#
7+
# This file is the single source of truth for fleet-wide, rarely-changing
8+
# mitmproxy options. Per-deployment overrides (listen port, debug flags,
9+
# user addon path, upstream TLS trust dir) remain env-driven and are
10+
# applied as `--set key=value` by launch.go, which takes precedence over
11+
# values declared here.
12+
#
13+
# Precedence: command line --set > this file > mitm built-in defaults
14+
#
15+
# DO NOT put per-sandbox or per-pipeline-run values here. DO NOT put
16+
# secrets here. This file ships in the image and is identical for every
17+
# replica.
18+
19+
mode:
20+
- transparent
21+
22+
# Loopback only: transparent mode receives traffic via iptables REDIRECT
23+
# from inside the same network namespace; never expose mitm on 0.0.0.0.
24+
listen_host: 127.0.0.1
25+
26+
# Eager: open the upstream connection alongside the client connection so
27+
# mitm's IO loop continuously observes upstream FIN/RST. Lazy defers the
28+
# upstream open until the full request is buffered and checks the
29+
# upstream pool per-request; on h1 keepalive sessions that exposes a
30+
# stale-connection race where the second request on a reused TCP picks
31+
# a peer-closed conn, surfacing as a silent transport error on the client
32+
# (e.g. `git fetch` exiting 128 with empty stderr after POST
33+
# /git-upload-pack right after GET /info/refs). Eager trades a wasted
34+
# TCP/TLS handshake for the small fraction of requests denied by the
35+
# egress addon, which is acceptable because a denied flow already
36+
# short-circuits with `flow.response = ...` before any HTTP write
37+
# reaches upstream.
38+
connection_strategy: eager
39+
40+
# Threshold above which mitm streams response bodies chunk-by-chunk
41+
# instead of buffering them in memory. Set conservatively to keep RSS
42+
# bounded; chunked / SSE responses are forced to stream regardless via
43+
# the system addon's responseheaders hook.
44+
stream_large_bodies: 10m
45+
46+
# Path-style trust store for upstream TLS verification (OpenSSL c_rehash
47+
# layout). Debian / RHEL ship the system CA bundle directory here.
48+
ssl_verify_upstream_trusted_confdir: /etc/ssl/certs
49+
50+
# HTTP/2 negotiation enabled (default). Keep ALPN h2/http1.1 negotiation
51+
# on so upstreams that only advertise h2 are reachable.
52+
http2: true
53+
54+
# Hosts (regex) for which mitm performs TLS pass-through: TCP forwarded
55+
# without decryption, addons do not see request/response content.
56+
#
57+
# Use this for high-throughput hosts where mitm interception adds no
58+
# value and only risks performance/race issues — e.g. code hosting that
59+
# does not need L7 policy, large file uploads, hosts known to behave
60+
# poorly through a TLS-terminating proxy.
61+
#
62+
# Each entry is a Python regex matched against the host. Use anchors and
63+
# escape dots. Example:
64+
# ignore_hosts:
65+
# - 'gitlab\.example\.com'
66+
# - '.*\.internal-registry\.example\.com'
67+
#
68+
# Empty by default. Operators extending the image should append entries
69+
# here rather than passing --set on the command line, because --set on
70+
# a list option REPLACES the entire list.
71+
ignore_hosts: []

components/egress/mitmproxy_transparent.go

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,6 @@ func startMitmproxyTransparentIfEnabled() (*mitmTransparent, error) {
106106
cfg := mitmproxy.Config{
107107
ListenPort: mpPort,
108108
UserName: mitmproxy.RunAsUser,
109-
ConfDir: strings.TrimSpace(os.Getenv(constants.EnvMitmproxyConfDir)),
110109
ScriptPath: strings.TrimSpace(os.Getenv(constants.EnvMitmproxyScript)),
111110
}
112111
// Buffer absorbs OnExit events from a retry storm so OnExit goroutines
@@ -131,8 +130,7 @@ func startMitmproxyTransparentIfEnabled() (*mitmTransparent, error) {
131130
}
132131
log.Infof("mitmproxy: transparent intercept active (OUTPUT tcp 80,443 -> %d; trust mitm CA in clients)", mpPort)
133132

134-
confDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyConfDir))
135-
if err := mitmproxy.SyncRootCA(confDir, mpHome); err != nil {
133+
if err := mitmproxy.SyncRootCA("", mpHome); err != nil {
136134
return nil, fmt.Errorf("mitm CA export: %w", err)
137135
}
138136
return &mitmTransparent{

components/egress/pkg/constants/configuration.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,13 @@ const (
3636
EnvNameserverExempt = "OPENSANDBOX_EGRESS_NAMESERVER_EXEMPT"
3737

3838
// MITM: mitmdump transparent; Linux + CAP_NET_ADMIN, runs as a dedicated user.
39+
// Static mitm options (mode, listen_host, connection_strategy, stream_large_bodies,
40+
// http2, ignore_hosts, ssl_verify_upstream_trusted_confdir default) live in
41+
// /var/lib/mitmproxy/.mitmproxy/config.yaml; only per-deployment overrides are env-driven.
3942
EnvMitmproxyTransparent = "OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT"
4043
EnvMitmproxyPort = "OPENSANDBOX_EGRESS_MITMPROXY_PORT"
41-
EnvMitmproxyConfDir = "OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR"
4244
EnvMitmproxyScript = "OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT"
4345
EnvMitmproxyUpstreamTrustDir = "OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR"
44-
EnvMitmproxyIgnoreHosts = "OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS"
4546
EnvMitmproxySslInsecure = "OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE"
4647

4748
// Comma-separated upstream resolvers: literal IP only (optional :port) — no hostnames (see dnsproxy REDIRECT note).

components/egress/pkg/mitmproxy/launch.go

Lines changed: 21 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -32,17 +32,23 @@ import (
3232
const RunAsUser = "mitmproxy"
3333

3434
// Loopback: transparent mode receives via REDIRECT; do not listen on 0.0.0.0 in the netns.
35+
// Kept as a Go constant only for the startup log line; the actual listen_host is set in
36+
// /var/lib/mitmproxy/.mitmproxy/config.yaml (shipped via the egress Dockerfile).
3537
const listenHostLoopback = "127.0.0.1"
3638

3739
// systemScriptPath: bundled system addon shipped via the egress Dockerfile
3840
// (COPY components/egress/mitmscripts /var/egress/mitmscripts). Always loaded.
3941
const systemScriptPath = "/var/egress/mitmscripts/system.py"
4042

41-
// Config: mitmdump --mode transparent; UserName must match iptables ! --uid-owner, ConfDir is mitm state/CA.
43+
// Config: mitmdump --mode transparent. Static options (mode, listen_host,
44+
// connection_strategy, stream_large_bodies, http2, ignore_hosts,
45+
// ssl_verify_upstream_trusted_confdir) live in
46+
// /var/lib/mitmproxy/.mitmproxy/config.yaml and are auto-loaded by mitmdump.
47+
// This struct carries only per-launch dynamic values that override those
48+
// defaults via `--set`.
4249
type Config struct {
4350
ListenPort int
4451
UserName string
45-
ConfDir string
4652
// ScriptPath is an optional user-supplied addon, loaded after the system addon.
4753
ScriptPath string
4854
// OnExit is called (if non-nil) when mitmdump exits. Called from a background goroutine.
@@ -92,24 +98,21 @@ func Launch(cfg Config) (*Running, error) {
9298
return nil, fmt.Errorf("mitmproxy: lookup user %q: %w", uname, err)
9399
}
94100

101+
// Only per-launch dynamic values are passed on the command line. Static
102+
// options (mode, listen_host, connection_strategy, stream_large_bodies,
103+
// http2, ignore_hosts, ssl_verify_upstream_trusted_confdir) come from
104+
// /var/lib/mitmproxy/.mitmproxy/config.yaml shipped in the egress image.
105+
// `--set` overrides config.yaml, so the env-driven overrides below take
106+
// precedence at runtime without rebuilding the image.
95107
args := []string{
96-
"--mode", "transparent",
97-
"--listen-host", listenHostLoopback,
98108
"--listen-port", strconv.Itoa(cfg.ListenPort),
99109
}
100110

101-
trustDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyUpstreamTrustDir))
102-
if trustDir == "" {
103-
trustDir = "/etc/ssl/certs"
111+
// Upstream cert trust path override. Default in config.yaml is /etc/ssl/certs;
112+
// override per-deployment when the upstream uses a private CA bundle.
113+
if trustDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyUpstreamTrustDir)); trustDir != "" {
114+
args = append(args, "--set", "ssl_verify_upstream_trusted_confdir="+trustDir)
104115
}
105-
args = append(args, "--set", "ssl_verify_upstream_trusted_confdir="+trustDir)
106-
107-
// Stream large bodies instead of buffering them in memory (OOM prevention).
108-
args = append(args, "--set", "stream_large_bodies=1m")
109-
110-
// Lazy connection strategy: defer upstream connection until the request is fully received,
111-
// which avoids unnecessary connections for blocked/filtered requests.
112-
args = append(args, "--set", "connection_strategy=lazy")
113116

114117
// Transparent mode redirects TCP to IP addresses. Clients connecting to IPs
115118
// do not send SNI, so upstream TLS cert hostname verification fails with
@@ -119,34 +122,21 @@ func Launch(cfg Config) (*Running, error) {
119122
args = append(args, "--set", "ssl_insecure=true")
120123
}
121124

122-
homeEnv := home
123-
if strings.TrimSpace(cfg.ConfDir) != "" {
124-
cd := strings.TrimSpace(cfg.ConfDir)
125-
args = append(args, "--set", "confdir="+cd)
126-
homeEnv = cd
127-
}
128125
// Load the system addon first so user addons can observe / override its hooks.
129126
args = append(args, "-s", systemScriptPath)
130127
if user := strings.TrimSpace(cfg.ScriptPath); user != "" {
131128
args = append(args, "-s", user)
132129
}
133130

134-
// Upstream passthrough: each pattern becomes --set ignore_hosts= (regex; IP ranges are practical in transparent mode).
135-
for _, p := range strings.Split(os.Getenv(constants.EnvMitmproxyIgnoreHosts), ";") {
136-
p = strings.TrimSpace(p)
137-
if p == "" {
138-
continue
139-
}
140-
args = append(args, "--set", "ignore_hosts="+p)
141-
}
142-
143131
cmd := exec.Command("mitmdump", args...)
144132
cmd.Stdout = os.Stdout
145133
cmd.Stderr = os.Stderr
146134
cmd.SysProcAttr = &syscall.SysProcAttr{
147135
Credential: &syscall.Credential{Uid: uid, Gid: gid},
148136
}
149-
cmd.Env = append(os.Environ(), "HOME="+homeEnv)
137+
// HOME determines mitm's confdir (~/.mitmproxy) which holds both the CA
138+
// and the baked-in config.yaml.
139+
cmd.Env = append(os.Environ(), "HOME="+home)
150140

151141
if err := cmd.Start(); err != nil {
152142
return nil, fmt.Errorf("mitmproxy: start mitmdump: %w", err)

0 commit comments

Comments
 (0)