Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions components/egress/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,22 @@ RUN apt-get update \
&& rm -rf /var/lib/apt/lists/*

# Python mitmproxy (transparent mode): mitmdump runs as user mitmproxy; iptables skips this uid.
# /var/lib/mitmproxy is mitm's home, used as the confdir (CA + config.yaml live under .mitmproxy/).
RUN useradd -r -u 10042 -d /var/lib/mitmproxy -s /usr/sbin/nologin mitmproxy \
&& mkdir -p /var/lib/mitmproxy \
&& chown mitmproxy:mitmproxy /var/lib/mitmproxy \
&& mkdir -p /var/lib/mitmproxy/.mitmproxy \
&& chown -R mitmproxy:mitmproxy /var/lib/mitmproxy \
&& pip3 install --no-cache-dir --break-system-packages 'mitmproxy>=10,<11' \
&& (command -v mitmdump && mitmdump --version) \
&& mkdir -p /var/egress/mitmscripts

# Static mitmproxy options (mode, listen_host, connection_strategy, stream_large_bodies,
# http2, ignore_hosts, ssl_verify_upstream_trusted_confdir). mitmdump auto-loads
# config.yaml from its confdir. Dynamic per-deployment options stay env-driven and
# are applied as --set by launch.go (which overrides values declared here).
COPY components/egress/mitmproxy/config.yaml /var/lib/mitmproxy/.mitmproxy/config.yaml
RUN chown mitmproxy:mitmproxy /var/lib/mitmproxy/.mitmproxy/config.yaml \
&& chmod 0644 /var/lib/mitmproxy/.mitmproxy/config.yaml

# All egress runtime artifacts live under one directory to keep paths grouped.
COPY --from=builder /out/egress /opt/opensandbox-egress/egress
COPY --from=builder /out/opensandbox-supervisor /opt/opensandbox-egress/supervisor
Expand All @@ -122,7 +131,8 @@ COPY --from=builder /out/opensandbox-supervisor /opt/opensandbox-egress/supervis
COPY components/egress/scripts/cleanup.sh /opt/opensandbox-egress/cleanup.sh
RUN chmod 0755 /opt/opensandbox-egress/cleanup.sh \
/opt/opensandbox-egress/egress \
/opt/opensandbox-egress/supervisor
/opt/opensandbox-egress/supervisor \
&& ln -s /opt/opensandbox-egress/egress /egress

COPY components/egress/mitmscripts /var/egress/mitmscripts

Expand Down
81 changes: 71 additions & 10 deletions components/egress/docs/mitmproxy-transparent.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,28 +32,82 @@ export OPENSANDBOX_EGRESS_MITMPROXY_PORT=18081

# Optional: load an additional user-defined mitm addon (loaded after the system addon)
export OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT=/path/to/your/addon.py

# Optional: bypass decryption for selected domains (semicolon-separated regex list)
export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com;.*\.example\.internal'
```

To bypass decryption for selected domains, edit the baked-in
`components/egress/mitmproxy/config.yaml` and rebuild the image — see
"Static Configuration (config.yaml)" below.

## Configuration Reference

### Environment Variables (Per-Deployment Overrides)

| Variable | Required | Purpose | Default |
|------|----------|------|--------|
| `OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT` | Yes | Enable transparent mitmproxy (`1/true/on`, etc.) | Disabled |
| `OPENSANDBOX_EGRESS_MITMPROXY_PORT` | No | mitmdump listen port; `iptables` redirects `80/443` here | `18081` |
| `OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT` | No | Additional user mitm addon script path (`-s`); loaded after the system addon | Empty |
| `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` | No | Host/IP regex list for TLS pass-through (`;` separated) | Empty |
| `OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR` | No | mitm config and CA directory (passed as `--set confdir=`, also used as `HOME`) | Default directory under `/var/lib/mitmproxy` |
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style) | `/etc/ssl/certs` |
| `OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR` | No | Trust directory for upstream TLS verification (OpenSSL style); overrides the config.yaml default | `/etc/ssl/certs` |
| `OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE` | No | Skip upstream TLS verification (`1/true/on`); use when clients connect by IP and SNI is unavailable | Disabled |

Notes:

- `OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS` means **no decryption**, not “completely bypass mitm process”.
- In transparent mode, mitmproxy generally recommends matching by IP/range; verify SNI/resolve behavior if using domain regex only.
- Before mitm, `iptables`, and CA export are ready, `GET /healthz` returns `503 (mitm not ready)` to prevent premature readiness.

### Static Configuration (config.yaml)

Fleet-wide, rarely-changing mitm options live in
`components/egress/mitmproxy/config.yaml`, baked into the image at
`/var/lib/mitmproxy/.mitmproxy/config.yaml` and auto-loaded by mitmdump.
This is the single source of truth for:

- `mode` (`transparent`) — mitm default is `regular`
- `listen_host` (`127.0.0.1`) — mitm default is `0.0.0.0`
- `stream_large_bodies` (`10m`) — mitm default is unset (entire body buffered)
- `ssl_verify_upstream_trusted_confdir` (`/etc/ssl/certs`) — mitm default is unset; overridable per-deployment via env
- `ignore_hosts` (`[]`) — matches the mitm default; kept in the file as a discoverable extension point for operators adding TLS pass-through entries

Only deviations from the mitm built-in defaults are declared in `config.yaml` (the `ignore_hosts` line is the one intentional exception, kept for discoverability). Other options that happen to match the default (`connection_strategy=lazy`, `http2=true`, etc.) are omitted — the file is the diff against upstream defaults, not a full enumeration.

Precedence: command-line `--set` (from env overrides) > `config.yaml` > mitmproxy built-in defaults.

#### Overriding the built-in config.yaml

There is no env var to point mitm at an alternate config file. Operators who need different static defaults (e.g. a different `ignore_hosts` list, `connection_strategy`, or `stream_large_bodies`) should pick one of the following:

1. **Build a downstream image** that derives from the official egress image and replaces the file:

```dockerfile
FROM <opensandbox-egress-image>:<tag>
COPY my-config.yaml /var/lib/mitmproxy/.mitmproxy/config.yaml
RUN chown mitmproxy:mitmproxy /var/lib/mitmproxy/.mitmproxy/config.yaml \
&& chmod 0644 /var/lib/mitmproxy/.mitmproxy/config.yaml
```

This is the recommended path because the override is version-controlled, reviewable, and reproducible.

2. **Mount an override file at runtime** over the baked-in path. For Kubernetes, mount a `ConfigMap` as a file at `/var/lib/mitmproxy/.mitmproxy/config.yaml` (be aware that a `ConfigMap` file mount typically lands as read-only with the original UID, so verify the mitmproxy user can read it):

```yaml
volumeMounts:
- name: mitm-config
mountPath: /var/lib/mitmproxy/.mitmproxy/config.yaml
subPath: config.yaml
readOnly: true
volumes:
- name: mitm-config
configMap:
name: egress-mitm-config
defaultMode: 0644
```

Useful for staged rollouts or per-environment overrides without rebuilding the image.

3. **Single-option escape hatch via env-driven `--set`** (already supported for the documented env variables above). This only works for options exposed via env and only for the single specific override; it cannot replace the whole file.

Do not edit `config.yaml` inside a running container — the file lives in the container layer, edits are lost on restart, and the mitmproxy user has read-only access by design.

## Common Configuration Templates

### 1) Enable Transparent MITM Only
Expand Down Expand Up @@ -81,11 +135,18 @@ The user addon is loaded after the system addon (`-s system.py -s user.py`), so

### 4) Bypass Decryption for Specific Domains (e.g. log upload)

```bash
export OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT=true
export OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS='.*\.log\.aliyuncs\.com'
Edit `components/egress/mitmproxy/config.yaml` and append to `ignore_hosts`,
then rebuild the egress image:

```yaml
ignore_hosts:
- '.*\.log\.aliyuncs\.com'
```

`ignore_hosts` means **no decryption**, not "completely bypass mitm process":
mitm still proxies the TCP connection, it just forwards bytes without
breaking TLS, and addons do not see request/response content.

### 5) Use a Fixed CA (consistent fingerprint across replicas)

If CA files already exist in `confdir`, mitmproxy reuses them instead of regenerating on each startup. Typical paths:
Expand Down
38 changes: 38 additions & 0 deletions components/egress/mitmproxy/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Static mitmproxy options that override mitm built-in defaults for the
# OpenSandbox egress sidecar. Loaded automatically by mitmdump from
# /var/lib/mitmproxy/.mitmproxy/config.yaml.
#
# Only deviations from mitm defaults are listed here. Options that
# happen to match the mitm default (connection_strategy=lazy, http2=true,
# etc.) are intentionally omitted — the file is meant to be the diff
Comment on lines +5 to +7
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential behavioral change: connection_strategy silently switches from lazy to eager.

The old launch.go explicitly passed --set connection_strategy=lazy. This PR removes that flag and does NOT add connection_strategy to config.yaml, relying on the mitmproxy default. The comment here says connection_strategy=lazy "matches the mitm default" — but in mitmproxy 10+ (which the Dockerfile installs: 'mitmproxy>=10,<11'), the default changed from lazy to eager.

With eager, mitmproxy opens upstream connections immediately on TCP handshake rather than waiting for the full request. This changes behavior for blocked/filtered requests (the old lazy code avoided unnecessary upstream connections for them).

The commit message even notes: "connection_strategy (lazy — historical default preserved here)" — but it's NOT preserved anywhere. Either add connection_strategy: lazy to config.yaml explicitly, or acknowledge the intentional switch to eager in the PR description.

# against upstream defaults, not a full enumeration. The one exception
# is ignore_hosts: even though [] matches the mitm default, it is kept
# as a discoverable extension point for operators who want to bypass
# decryption for specific hosts without hunting through the docs.
#
# Per-deployment overrides remain env-driven and applied as --set by
# launch.go. Precedence: command-line --set > this file > mitm defaults.

mode:
- transparent

# mitm default 0.0.0.0; transparent mode must only accept loopback inside
# the netns (iptables REDIRECT pushes outbound traffic here, and exposing
# mitm on the LAN would route any inbound connection through it).
listen_host: 127.0.0.1

# mitm default None (whole body buffered in memory). 10m bounds RSS for
# the allow path; chunked / SSE responses are forced to stream regardless
# by the system addon's responseheaders hook.
stream_large_bodies: 10m

# mitm default None (Python certifi bundle). Match the OS trust store so
# private-CA additions land where mitm reads them.
ssl_verify_upstream_trusted_confdir: /etc/ssl/certs

# Hosts (Python regex) for TLS pass-through: mitm forwards bytes without
# decryption and addons do not see request/response content. Empty matches
# the mitm default; kept here as a discoverable extension point. Append
# entries here rather than passing --set on the command line, because
# --set on a list option REPLACES the entire list.
ignore_hosts: []
4 changes: 1 addition & 3 deletions components/egress/mitmproxy_transparent.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ func startMitmproxyTransparentIfEnabled() (*mitmTransparent, error) {
cfg := mitmproxy.Config{
ListenPort: mpPort,
UserName: mitmproxy.RunAsUser,
ConfDir: strings.TrimSpace(os.Getenv(constants.EnvMitmproxyConfDir)),
ScriptPath: strings.TrimSpace(os.Getenv(constants.EnvMitmproxyScript)),
}
// Buffer absorbs OnExit events from a retry storm so OnExit goroutines
Expand All @@ -131,8 +130,7 @@ func startMitmproxyTransparentIfEnabled() (*mitmTransparent, error) {
}
log.Infof("mitmproxy: transparent intercept active (OUTPUT tcp 80,443 -> %d; trust mitm CA in clients)", mpPort)

confDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyConfDir))
if err := mitmproxy.SyncRootCA(confDir, mpHome); err != nil {
if err := mitmproxy.SyncRootCA("", mpHome); err != nil {
return nil, fmt.Errorf("mitm CA export: %w", err)
}
return &mitmTransparent{
Expand Down
5 changes: 3 additions & 2 deletions components/egress/pkg/constants/configuration.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,13 @@ const (
EnvNameserverExempt = "OPENSANDBOX_EGRESS_NAMESERVER_EXEMPT"

// MITM: mitmdump transparent; Linux + CAP_NET_ADMIN, runs as a dedicated user.
// Static mitm options (mode, listen_host, connection_strategy, stream_large_bodies,
// http2, ignore_hosts, ssl_verify_upstream_trusted_confdir default) live in
// /var/lib/mitmproxy/.mitmproxy/config.yaml; only per-deployment overrides are env-driven.
Comment on lines +39 to +41
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the launch.go comment — lists connection_strategy among options that "live in config.yaml" but it's omitted from the file and relies on the mitmproxy default. If the default changes upstream (which it already has in v10: lazy → eager), behavior changes silently with no trace in this repo's config.

EnvMitmproxyTransparent = "OPENSANDBOX_EGRESS_MITMPROXY_TRANSPARENT"
EnvMitmproxyPort = "OPENSANDBOX_EGRESS_MITMPROXY_PORT"
EnvMitmproxyConfDir = "OPENSANDBOX_EGRESS_MITMPROXY_CONFDIR"
EnvMitmproxyScript = "OPENSANDBOX_EGRESS_MITMPROXY_SCRIPT"
EnvMitmproxyUpstreamTrustDir = "OPENSANDBOX_EGRESS_MITMPROXY_UPSTREAM_TRUST_DIR"
EnvMitmproxyIgnoreHosts = "OPENSANDBOX_EGRESS_MITMPROXY_IGNORE_HOSTS"
EnvMitmproxySslInsecure = "OPENSANDBOX_EGRESS_MITMPROXY_SSL_INSECURE"

// Comma-separated upstream resolvers: literal IP only (optional :port) — no hostnames (see dnsproxy REDIRECT note).
Expand Down
52 changes: 21 additions & 31 deletions components/egress/pkg/mitmproxy/launch.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,23 @@ import (
const RunAsUser = "mitmproxy"

// Loopback: transparent mode receives via REDIRECT; do not listen on 0.0.0.0 in the netns.
// Kept as a Go constant only for the startup log line; the actual listen_host is set in
// /var/lib/mitmproxy/.mitmproxy/config.yaml (shipped via the egress Dockerfile).
const listenHostLoopback = "127.0.0.1"

// systemScriptPath: bundled system addon shipped via the egress Dockerfile
// (COPY components/egress/mitmscripts /var/egress/mitmscripts). Always loaded.
const systemScriptPath = "/var/egress/mitmscripts/system.py"

// Config: mitmdump --mode transparent; UserName must match iptables ! --uid-owner, ConfDir is mitm state/CA.
// Config: mitmdump --mode transparent. Static options (mode, listen_host,
// connection_strategy, stream_large_bodies, http2, ignore_hosts,
// ssl_verify_upstream_trusted_confdir) live in
// /var/lib/mitmproxy/.mitmproxy/config.yaml and are auto-loaded by mitmdump.
Comment on lines +43 to +46
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this comment lists connection_strategy and http2 as living in config.yaml, but they are intentionally omitted from the file (relying on mitmproxy defaults). The comment is inaccurate — it should say these options rely on mitm defaults, not that they're declared in config.yaml.

Same inaccuracy in configuration.go:39-41 and launch.go:102-104.

// This struct carries only per-launch dynamic values that override those
// defaults via `--set`.
type Config struct {
ListenPort int
UserName string
ConfDir string
// ScriptPath is an optional user-supplied addon, loaded after the system addon.
ScriptPath string
// OnExit is called (if non-nil) when mitmdump exits. Called from a background goroutine.
Expand Down Expand Up @@ -92,24 +98,21 @@ func Launch(cfg Config) (*Running, error) {
return nil, fmt.Errorf("mitmproxy: lookup user %q: %w", uname, err)
}

// Only per-launch dynamic values are passed on the command line. Static
// options (mode, listen_host, connection_strategy, stream_large_bodies,
// http2, ignore_hosts, ssl_verify_upstream_trusted_confdir) come from
// /var/lib/mitmproxy/.mitmproxy/config.yaml shipped in the egress image.
// `--set` overrides config.yaml, so the env-driven overrides below take
// precedence at runtime without rebuilding the image.
args := []string{
"--mode", "transparent",
"--listen-host", listenHostLoopback,
"--listen-port", strconv.Itoa(cfg.ListenPort),
}

trustDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyUpstreamTrustDir))
if trustDir == "" {
trustDir = "/etc/ssl/certs"
// Upstream cert trust path override. Default in config.yaml is /etc/ssl/certs;
// override per-deployment when the upstream uses a private CA bundle.
if trustDir := strings.TrimSpace(os.Getenv(constants.EnvMitmproxyUpstreamTrustDir)); trustDir != "" {
args = append(args, "--set", "ssl_verify_upstream_trusted_confdir="+trustDir)
}
args = append(args, "--set", "ssl_verify_upstream_trusted_confdir="+trustDir)

// Stream large bodies instead of buffering them in memory (OOM prevention).
args = append(args, "--set", "stream_large_bodies=1m")

// Lazy connection strategy: defer upstream connection until the request is fully received,
// which avoids unnecessary connections for blocked/filtered requests.
args = append(args, "--set", "connection_strategy=lazy")

// Transparent mode redirects TCP to IP addresses. Clients connecting to IPs
// do not send SNI, so upstream TLS cert hostname verification fails with
Expand All @@ -119,34 +122,21 @@ func Launch(cfg Config) (*Running, error) {
args = append(args, "--set", "ssl_insecure=true")
}

homeEnv := home
if strings.TrimSpace(cfg.ConfDir) != "" {
cd := strings.TrimSpace(cfg.ConfDir)
args = append(args, "--set", "confdir="+cd)
homeEnv = cd
}
// Load the system addon first so user addons can observe / override its hooks.
args = append(args, "-s", systemScriptPath)
if user := strings.TrimSpace(cfg.ScriptPath); user != "" {
args = append(args, "-s", user)
}

// Upstream passthrough: each pattern becomes --set ignore_hosts= (regex; IP ranges are practical in transparent mode).
for _, p := range strings.Split(os.Getenv(constants.EnvMitmproxyIgnoreHosts), ";") {
p = strings.TrimSpace(p)
if p == "" {
continue
}
args = append(args, "--set", "ignore_hosts="+p)
}

cmd := exec.Command("mitmdump", args...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{Uid: uid, Gid: gid},
}
cmd.Env = append(os.Environ(), "HOME="+homeEnv)
// HOME determines mitm's confdir (~/.mitmproxy) which holds both the CA
// and the baked-in config.yaml.
cmd.Env = append(os.Environ(), "HOME="+home)

if err := cmd.Start(); err != nil {
return nil, fmt.Errorf("mitmproxy: start mitmdump: %w", err)
Expand Down
Loading