Skip to content

feat(rootfs): isolate envd in a dedicated network namespace#2700

Closed
ValentaTomas wants to merge 2 commits into
mainfrom
feat/envd-network-namespace
Closed

feat(rootfs): isolate envd in a dedicated network namespace#2700
ValentaTomas wants to merge 2 commits into
mainfrom
feat/envd-network-namespace

Conversation

@ValentaTomas

Copy link
Copy Markdown
Member

Moves `eth0` + envd into a new `envd-ns` so customer iptables changes in the default namespace can no longer break envd's MMDS / `/init` path (cf. the recent customer who DNAT'd :80 → :4200 and broke MMDS token fetch).

Topology

```
default ns (customer) envd-ns
veth-customer 192.168.250.1 veth-envd 192.168.250.2
eth0 169.254.0.21 (moved in)
```

  • envd-ns runs as a router: `POSTROUTING MASQUERADE` on eth0 for customer outbound; `PREROUTING DNAT` to `192.168.250.1` for inbound, with a `RETURN` exception for port 49983 (envd) so `/init` reaches envd directly.
  • Customer ns picks up forwarded packets on `veth-customer` and the existing feat(rootfs): blanket IPv4 PREROUTING DNAT to 127.0.0.1 #2698 PREROUTING DNAT forwards them to `127.0.0.1`.
  • envd binds inside envd-ns via systemd `NetworkNamespacePath=/run/netns/envd-ns`.

Notes

  • `e2b-netns.service` is oneshot, gated `Before=network-pre.target sysinit.target`, idempotent.
  • Customer's view of the network changes: their "eth0" is now `veth-customer` (`192.168.250.1`) and gateway is `192.168.250.2`. Apps binding `0.0.0.0` or `127.0.0.1` continue to work; apps explicitly binding `169.254.0.21` would not (rare).
  • Existing per-port socat / port forwarder code is unchanged for this PR; it remains the bridge from `veth-customer:port` to `127.0.0.1:port` in the customer ns. Combined with feat(rootfs): blanket IPv4 PREROUTING DNAT to 127.0.0.1 #2698 the path works end-to-end.

Test plan

  • `ip netns list` inside the guest shows `envd-ns`; `ip -n envd-ns addr` shows `eth0 169.254.0.21`.
  • Orchestrator can reach `169.254.0.21:49983` (envd `/init`).
  • Customer process binding `127.0.0.1:8080` is reachable from the orchestrator via the standard port-forward path.
  • Customer running `iptables -t nat -A OUTPUT -d 169.254.169.254 -p tcp --dport 80 -j REDIRECT --to-ports 4200` no longer affects envd's MMDS fetch.
  • Customer outbound to the public internet via the egress proxy still works.

Customer iptables in the default namespace can break envd's MMDS / /init
routing (cf. customer DNAT'ing :80 to a different port and breaking the
MMDS token fetch). Move eth0 + envd into envd-ns; route customer traffic
through a veth pair. Customer can only modify their own namespace.

Port forwarding chain:
  orchestrator -> eth0 (envd-ns) -> DNAT to 192.168.250.1
  -> veth-customer (default ns) -> existing PREROUTING DNAT to 127.0.0.1.

Port 49983 (envd) RETURN's from the envd-ns DNAT, so /init reaches envd
directly without traversing the customer namespace.
@cla-bot cla-bot Bot added the cla-signed label May 17, 2026
@cursor

cursor Bot commented May 17, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches early-boot networking (systemd units, netns, iptables, addressing) and changes interface/gateway expectations, so mis-ordering or missing tooling could break connectivity or envd startup. Failures are partially ignored for idempotency, which can also mask real setup errors.

Overview
The new netns/veth + iptables setup changes guest networking semantics and relies on e2b-netns.service ordering and the presence of ip/iptables; if the oneshot unit partially fails (many commands are --prefixed) envd may start in a broken namespace with hard-to-debug connectivity loss. The DNAT/RETURN rules are broad and could unintentionally capture traffic if ports/addresses drift, and moving eth0 plus changing the customer-side interface/gateway may break workloads that bind or route to the old link-local address.

Reviewed by Cursor Bugbot for commit 91fea54. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov

codecov Bot commented May 17, 2026

Copy link
Copy Markdown

❌ 19 Tests Failed:

Tests completed Failed Passed Skipped
2620 19 2601 7
View the full list of 22 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/metrics::TestTeamMetrics

Flake rate in main: 71.70% (Passed 221 times, Failed 560 times)

Stack Traces | 0.95s run time
=== RUN   TestTeamMetrics
=== PAUSE TestTeamMetrics
=== CONT  TestTeamMetrics
    team_metrics_test.go:61: 
        	Error Trace:	.../api/metrics/team_metrics_test.go:61
        	Error:      	Should be true
        	Test:       	TestTeamMetrics
        	Messages:   	MaxConcurrentSandboxes should be >= 0
--- FAIL: TestTeamMetrics (0.95s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestNoNetworkConfig_SSHWorks

Flake rate in main: 54.95% (Passed 223 times, Failed 272 times)

Stack Traces | 5.75s run time
=== RUN   TestNoNetworkConfig_SSHWorks
=== PAUSE TestNoNetworkConfig_SSHWorks
=== CONT  TestNoNetworkConfig_SSHWorks
Executing command curl in sandbox i464t9c41olee3a52qf0q
    sandbox_network_out_test.go:918: Testing SSH connection to github.com (port 22)...
Executing command curl in sandbox imobzu2attt23r77b4vf1
    sandbox_network_out_test.go:919: Command [ssh] output: event:{start:{pid:1350}}
    sandbox_network_out_test.go:919: Command [ssh] output: event:{data:{stderr:"Connection timed out during banner exchange\r\nConnection to 140.82.114.3 port 22 timed out\r\n"}}
    sandbox_network_out_test.go:919: Command [ssh] output: event:{end:{exit_code:255 exited:true status:"exit status 255" error:"exit status 255"}}
    sandbox_network_out_test.go:923: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:923
        	Error:      	"Connection timed out during banner exchange\r\nConnection to 140.82.114.3 port 22 timed out\r\n" does not contain "Permission denied (publickey)"
        	Test:       	TestNoNetworkConfig_SSHWorks
--- FAIL: TestNoNetworkConfig_SSHWorks (5.75s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSnapshotTemplateList

Flake rate in main: 54.84% (Passed 224 times, Failed 272 times)

Stack Traces | 0s run time
=== RUN   TestSnapshotTemplateList
=== PAUSE TestSnapshotTemplateList
=== CONT  TestSnapshotTemplateList
--- FAIL: TestSnapshotTemplateList (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSnapshotTemplateList/list_snapshots

Flake rate in main: 54.66% (Passed 224 times, Failed 270 times)

Stack Traces | 19.8s run time
=== RUN   TestSnapshotTemplateList/list_snapshots
=== PAUSE TestSnapshotTemplateList/list_snapshots
=== CONT  TestSnapshotTemplateList/list_snapshots
    snapshot_template_test.go:123: 
        	Error Trace:	.../api/sandboxes/snapshot_template_test.go:37
        	            				.../api/sandboxes/snapshot_template_test.go:123
        	Error:      	Not equal: 
        	            	expected: 201
        	            	actual  : 500
        	Test:       	TestSnapshotTemplateList/list_snapshots
--- FAIL: TestSnapshotTemplateList/list_snapshots (19.81s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.85% (Passed 231 times, Failed 767 times)

Stack Traces | 215s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (214.94s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/14_allow_internet_access_omitted_is_noop

Flake rate in main: 55.65% (Passed 216 times, Failed 271 times)

Stack Traces | 0.62s run time
=== RUN   TestUpdateNetworkConfig/14_allow_internet_access_omitted_is_noop
Executing command curl in sandbox ip5z8kqb53l490qpvyl0q
    sandbox_network_update_test.go:328: Command [curl] output: event:{start:{pid:1381}}
    sandbox_network_update_test.go:328: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Sun, 17 May 2026 21:05:45 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:328: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:328: Command [curl] completed successfully in sandbox ip5z8kqb53l490qpvyl0q
Executing command curl in sandbox ip5z8kqb53l490qpvyl0q
    sandbox_network_update_test.go:328: Command [curl] output: event:{start:{pid:1382}}
    sandbox_network_update_test.go:328: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:67
        	            				.../api/sandboxes/sandbox_network_update_test.go:58
        	            				.../api/sandboxes/sandbox_network_update_test.go:328
        	Error:      	Received unexpected error:
        	            	failed to execute command curl in sandbox ip5z8kqb53l490qpvyl0q: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestUpdateNetworkConfig/14_allow_internet_access_omitted_is_noop
        	Messages:   	https://1.1.1.1 should be reachable
--- FAIL: TestUpdateNetworkConfig/14_allow_internet_access_omitted_is_noop (0.62s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 77.39% (Passed 222 times, Failed 760 times)

Stack Traces | 7.07s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox ip5z8kqb53l490qpvyl0q
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1389}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox ip5z8kqb53l490qpvyl0q
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1390}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox ip5z8kqb53l490qpvyl0q
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1391}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Sun, 17 May 2026 21:06:03 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox ip5z8kqb53l490qpvyl0q
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (7.07s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV

Flake rate in main: 59.78% (Passed 220 times, Failed 327 times)

Stack Traces | 0s run time
=== RUN   TestTemplateBuildENV
=== PAUSE TestTemplateBuildENV
=== CONT  TestTemplateBuildENV
--- FAIL: TestTemplateBuildENV (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV/ENV_with_multiline_value

Flake rate in main: 60.34% (Passed 213 times, Failed 324 times)

Stack Traces | 7.09s run time
=== RUN   TestTemplateBuildENV/ENV_with_multiline_value
=== PAUSE TestTemplateBuildENV/ENV_with_multiline_value
=== CONT  TestTemplateBuildENV/ENV_with_multiline_value
    build_template_test.go:134: test-ubuntu-env-multiline: [info] Building template asdudg0jc8j9ys1yskom/a377394c-cce4-4a71-a5d7-6b96f6cf908b
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] FROM ubuntu:22.04 [ffd709f131f42dfab282de47a91dd2c139e900c1c11fc574b49b517a05ef0a32]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] DEFAULT USER user [90bdd4afa342293c931373351bf578872dec9179214ba3e8bf9edba311466213]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 1/2] ENV MULTILINE line1
        line2
        line3 [e93da3f3765f20eb6407c336b9e4e0b9321d994ec5f6cb547743a2a4070eed23]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 2/2] RUN [[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1 [477610d61cdf858776262d3331809539bcbcf16f706aac18515a57337bae1786]
    build_template_test.go:134: test-ubuntu-env-multiline: [error] Build failed: failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1
    build_template_test.go:374: Build failed: {<nil> failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1 0xc0004c7c50}
--- FAIL: TestTemplateBuildENV/ENV_with_multiline_value (7.09s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 56.36% (Passed 391 times, Failed 505 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_0_0_0_0

Flake rate in main: 63.07% (Passed 219 times, Failed 374 times)

Stack Traces | 23.6s run time
=== RUN   TestBindLocalhost/bind_0_0_0_0
=== PAUSE TestBindLocalhost/bind_0_0_0_0
=== CONT  TestBindLocalhost/bind_0_0_0_0
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1302}}
    localhost_bind_test.go:87: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:87
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestBindLocalhost/bind_0_0_0_0
        	Messages:   	Failed to connect to server bound to 0.0.0.0
--- FAIL: TestBindLocalhost/bind_0_0_0_0 (23.62s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_127_0_0_1

Flake rate in main: 58.96% (Passed 220 times, Failed 316 times)

Stack Traces | 15.8s run time
=== RUN   TestBindLocalhost/bind_127_0_0_1
=== PAUSE TestBindLocalhost/bind_127_0_0_1
=== CONT  TestBindLocalhost/bind_127_0_0_1
Executing command python in sandbox iquu4584795z31mjwixsc
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1301}}
    localhost_bind_test.go:87: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:87
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestBindLocalhost/bind_127_0_0_1
        	Messages:   	Failed to connect to server bound to 127.0.0.1
--- FAIL: TestBindLocalhost/bind_127_0_0_1 (15.77s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::

Flake rate in main: 57.12% (Passed 220 times, Failed 293 times)

Stack Traces | 15.2s run time
=== RUN   TestBindLocalhost/bind_::
=== PAUSE TestBindLocalhost/bind_::
=== CONT  TestBindLocalhost/bind_::
Executing command python in sandbox ivokw7g7c2xth5896gf6t
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1301}}
    localhost_bind_test.go:87: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:87
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestBindLocalhost/bind_::
        	Messages:   	Failed to connect to server bound to ::
--- FAIL: TestBindLocalhost/bind_:: (15.24s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.45% (Passed 219 times, Failed 397 times)

Stack Traces | 15.9s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1301}}
    localhost_bind_test.go:87: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:87
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Failed to connect to server bound to ::1
--- FAIL: TestBindLocalhost/bind_::1 (15.89s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_localhost

Flake rate in main: 64.33% (Passed 219 times, Failed 395 times)

Stack Traces | 17.9s run time
=== RUN   TestBindLocalhost/bind_localhost
=== PAUSE TestBindLocalhost/bind_localhost
=== CONT  TestBindLocalhost/bind_localhost
Executing command python in sandbox io0lphx7yxzqqg3rddivc
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1301}}
Executing command python in sandbox iedyx0meb2r39zc8zn4x8
    localhost_bind_test.go:87: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:87
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestBindLocalhost/bind_localhost
        	Messages:   	Failed to connect to server bound to localhost
--- FAIL: TestBindLocalhost/bind_localhost (17.89s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.17% (Passed 229 times, Failed 448 times)

Stack Traces | 84.9s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (84.92s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 66.87% (Passed 219 times, Failed 442 times)

Stack Traces | 26.9s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1294}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 186 MB\nFree memory before tmpfs mount: 798 MB\nMemory to use in integrity test (80% of free, min 64MB): 638 MB\n"}}
Executing command bash in sandbox i1g1p8rqdjvih0x6d2yj1 (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"638+0 records in\n638+0 records out\n668991488 bytes (669 MB, 638 MiB) copied, 3.28563 s, 204 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"C"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"o"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"a"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"d"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"b"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"i"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"g timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=638\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 3.25\n\tPercent of CPU this job got: 98%\n\tElapsed (w"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"all clock) time (h:mm:ss or m:ss): 0:03.28\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2628\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 342\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 118\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 832 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox i02ttjgqcna4ufh823hwm
Executing command bash in sandbox i02ttjgqcna4ufh823hwm (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1311}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"f6221abc989c80cc26eba2ef2e0a22436b69863413b322f13d5b6c3c452a04e6\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox i02ttjgqcna4ufh823hwm
Executing command bash in sandbox i02ttjgqcna4ufh823hwm (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1315}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox i02ttjgqcna4ufh823hwm: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (26.89s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestMaskRequestHostAPIParameter

Flake rate in main: 54.75% (Passed 224 times, Failed 271 times)

Stack Traces | 32.3s run time
=== RUN   TestMaskRequestHostAPIParameter
=== PAUSE TestMaskRequestHostAPIParameter
=== CONT  TestMaskRequestHostAPIParameter
Executing command apt-get in sandbox i69ifaqjck2k6vh0zxnbh (user: root)
    mask_request_host_test.go:39: Command [apt-get] output: event:{start:{pid:1301}}
Executing command ls in sandbox ilji2cy662j8dlgmjvv5w
    mask_request_host_test.go:39: Command [apt-get] output: event:{data:{stdout:"Hit:1 http://deb.debian.org/debian bookworm InRelease\nHit:2 http://deb.debian.org/debian bookworm-updates InRelease\n"}}
    mask_request_host_test.go:39: Command [apt-get] output: event:{data:{stdout:"Hit:3 http://deb.debian.org/debian-security bookworm-security InRelease\n"}}
    mask_request_host_test.go:39: Command [apt-get] output: event:{data:{stdout:"Reading package lists..."}}
    mask_request_host_test.go:39: Command [apt-get] output: event:{data:{stdout:"\n"}}
    mask_request_host_test.go:39: Command [apt-get] output: event:{end:{exited:true  status:"exit status 0"}}
    mask_request_host_test.go:39: Command [apt-get] completed successfully in sandbox i69ifaqjck2k6vh0zxnbh
Executing command apt-get in sandbox i69ifaqjck2k6vh0zxnbh (user: root)
    mask_request_host_test.go:41: Command [apt-get] output: event:{start:{pid:1402}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Reading package lists..."}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Building dependency tree..."}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"\nReading state information..."}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"The following NEW packages will be installed:\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"  netcat-traditional\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"0 upgraded, 1 newly installed, 0 to remove and 154 not upgraded.\nNeed to get 67.9 kB of archives.\nAfter this operation, 146 kB of additional disk space will be used.\nGet:1 http://deb.debian.org/debian bookworm/main amd64 netcat-traditional amd64 1.10-47 [67.9 kB]\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stderr:"debconf: delaying package configuration, since apt-utils is not installed\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Fetched 67.9 kB in 0s (635 kB/s)\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Selecting previously unselected package netcat-traditional.\r\n(Reading database ... \r"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 5%\r(Reading database ... 10%\r(Reading database ... 15%\r(Reading database ... 20%\r(Reading database ... 25%\r(Reading database ... 30%\r(Reading database ... 35%\r"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 40%\r(Reading database ... 45%\r(Reading database ... 50%\r(Reading database ... 55%\r(Reading database ... 60%\r(Reading database ... 65%\r"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 70%\r"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 75%\r"}}
Executing command ls in sandbox ivsqi53k1n6khl4bj861m
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 80%\r"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 85%\r"}}
Executing command ls in sandbox ivsqi53k1n6khl4bj861m
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 90%\r"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 95%\r"}}
Executing command ls in sandbox ij8qocb5gr1kgh4wr9928
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"(Reading database ... 100%\r(Reading database ... 25971 files and directories currently installed.)\r\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Preparing to unpack .../netcat-traditional_1.10-47_amd64.deb ...\r\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Unpacking netcat-traditional (1.10-47) ...\r\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"Setting up netcat-traditional (1.10-47) ...\r\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{data:{stdout:"update-alternatives: using /bin/nc.traditional to provide /bin/nc (nc) in auto mode\r\n"}}
    mask_request_host_test.go:41: Command [apt-get] output: event:{end:{exited:true  status:"exit status 0"}}
    mask_request_host_test.go:41: Command [apt-get] completed successfully in sandbox i69ifaqjck2k6vh0zxnbh
Executing command /bin/bash in sandbox i69ifaqjck2k6vh0zxnbh (user: root)
    mask_request_host_test.go:48: Command [/bin/bash] output: event:{start:{pid:1430}}
Executing command ls in sandbox ij8qocb5gr1kgh4wr9928
    mask_request_host_test.go:73: Command [cat] output: event:{start:{pid:1432}}
    mask_request_host_test.go:73: Command [cat] output: event:{end:{exited:true  status:"exit status 0"}}
    mask_request_host_test.go:73: Command [cat] completed successfully in sandbox i69ifaqjck2k6vh0zxnbh
    mask_request_host_test.go:77: Data: 
    mask_request_host_test.go:78: 
        	Error Trace:	.../tests/proxies/mask_request_host_test.go:78
        	Error:      	"" does not contain "Host: localhost:8080"
        	Test:       	TestMaskRequestHostAPIParameter
--- FAIL: TestMaskRequestHostAPIParameter (32.26s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxAutoResumeViaProxy

Flake rate in main: 55.40% (Passed 223 times, Failed 277 times)

Stack Traces | 51.9s run time
=== RUN   TestSandboxAutoResumeViaProxy
=== PAUSE TestSandboxAutoResumeViaProxy
=== CONT  TestSandboxAutoResumeViaProxy
Executing command cat in sandbox i69ifaqjck2k6vh0zxnbh (user: root)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:97: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:98: 
        	Error Trace:	.../tests/proxies/auto_resume_test.go:98
        	Error:      	Expected value not to be nil.
        	Test:       	TestSandboxAutoResumeViaProxy
--- FAIL: TestSandboxAutoResumeViaProxy (51.91s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxNoAutoResumeViaProxyWithoutFlag

Flake rate in main: 54.66% (Passed 224 times, Failed 270 times)

Stack Traces | 35.5s run time
=== RUN   TestSandboxNoAutoResumeViaProxyWithoutFlag
=== PAUSE TestSandboxNoAutoResumeViaProxyWithoutFlag
=== CONT  TestSandboxNoAutoResumeViaProxyWithoutFlag
    auto_resume_test.go:145: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:145: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:145: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:145: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:145: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:145: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    auto_resume_test.go:145: [Status code: 502] Response body: {"sandboxId":"ivt3yen0h89445wn6bvcr","message":"The sandbox was not found","code":502}
    auto_resume_test.go:145: [Status code: 502] Response body: {"sandboxId":"ivt3yen0h89445wn6bvcr","message":"The sandbox was not found","code":502}
    auto_resume_test.go:145: [Status code: 502] Response body: {"sandboxId":"ivt3yen0h89445wn6bvcr","message":"The sandbox was not found","code":502}
    auto_resume_test.go:145: [Status code: 502] Response body: {"sandboxId":"ivt3yen0h89445wn6bvcr","message":"The sandbox was not found","code":502}
    auto_resume_test.go:146: 
        	Error Trace:	.../tests/proxies/auto_resume_test.go:146
        	Error:      	Expected value not to be nil.
        	Test:       	TestSandboxNoAutoResumeViaProxyWithoutFlag
--- FAIL: TestSandboxNoAutoResumeViaProxyWithoutFlag (35.50s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxProxyWorkingPort

Flake rate in main: 54.93% (Passed 224 times, Failed 273 times)

Stack Traces | 13.1s run time
=== RUN   TestSandboxProxyWorkingPort
=== PAUSE TestSandboxProxyWorkingPort
=== CONT  TestSandboxProxyWorkingPort
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:63: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    closed_port_test.go:64: 
        	Error Trace:	.../tests/proxies/closed_port_test.go:64
        	Error:      	Expected value not to be nil.
        	Test:       	TestSandboxProxyWorkingPort
--- FAIL: TestSandboxProxyWorkingPort (13.10s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxWithTrafficAccessTokenAutoResumeViaProxy

Flake rate in main: 55.40% (Passed 223 times, Failed 277 times)

Stack Traces | 54.1s run time
=== RUN   TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
=== PAUSE TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
=== CONT  TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
    traffic_access_token_test.go:263: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    traffic_access_token_test.go:263: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    traffic_access_token_test.go:263: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i3aiadlvghg0u4exyhall","message":"The sandbox was not found","code":502}
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i3aiadlvghg0u4exyhall","message":"The sandbox was not found","code":502}
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i3aiadlvghg0u4exyhall","message":"The sandbox was not found","code":502}
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i3aiadlvghg0u4exyhall","message":"The sandbox was not found","code":502}
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i3aiadlvghg0u4exyhall","message":"The sandbox was not found","code":502}
    traffic_access_token_test.go:263: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    traffic_access_token_test.go:263: Error: Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    traffic_access_token_test.go:264: 
        	Error Trace:	.../tests/proxies/traffic_access_token_test.go:264
        	Error:      	Expected value not to be nil.
        	Test:       	TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
--- FAIL: TestSandboxWithTrafficAccessTokenAutoResumeViaProxy (54.15s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The iptables DNAT rule in the setup script is restricted to the TCP protocol, which prevents other traffic types like UDP or ICMP from being forwarded to the customer namespace. Removing the protocol restriction from the DNAT rule ensures all non-envd traffic is correctly routed to the customer namespace.

Comment on lines +57 to +58
# Forward all other inbound TCP to the customer-side veth peer.
ip netns exec "${NS}" iptables -t nat -A PREROUTING -i eth0 -p tcp -j DNAT --to-destination "${VETH_HOST_IP}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The iptables DNAT rule in the setup script is restricted to the TCP protocol, which prevents other traffic types like UDP or ICMP from being forwarded to the customer namespace. This breaks any UDP-based services or diagnostic tools that the customer might be running. Removing the protocol restriction from the DNAT rule ensures all non-envd traffic is correctly routed to the customer namespace.

# Forward all other inbound traffic to the customer-side veth peer.
ip netns exec "${NS}" iptables -t nat -A PREROUTING -i eth0 -j DNAT --to-destination "${VETH_HOST_IP}"

…in in unit

Per Petr's review: split the imperative shell script into:
- .netdev for the veth pair
- .network for customer-side veth IP + gateway
- e2b-netns.service uses individual ExecStart lines (with - prefix for
  idempotent ip-netns/ip-link operations); iptables rules remain wrapped
  in sh -c with -C check before -A so they don't accumulate across reboots.
@ValentaTomas

Copy link
Copy Markdown
Member Author

Refactored per @arkamar's suggestion (91fea54):

  • Veth pair declared in 10-e2b-veth.netdev.
  • Customer-side IP + gateway in 10-e2b-veth-customer.network.
  • e2b-netns.service is now a chain of ExecStart=- lines for the irreducibly imperative bits (ip netns add, ip link set ... netns, configuring eth0 + veth-envd inside envd-ns); - prefix gives idempotency on re-runs.
  • iptables rules still need -C check before -A to be safely re-runnable across boots, wrapped in sh -c (no clean systemd primitive for "insert if missing").

Trade-offs vs. fully-declarative-everywhere:

  • ip netns add and ip link set <if> netns <ns> don't have native systemd directives.
  • systemd-networkd doesn't run inside envd-ns, so the envd-side interface config is in the unit's ExecStart chain.

LMK if you'd like to push further (e.g. install systemd-networkd inside envd-ns too) or if this strikes the right balance.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Child processes spawned by envd inherit wrong network namespace
    • Removed NetworkNamespacePath from envd.service so envd runs in the default namespace, updated socat to bind to veth-customer IP (192.168.250.1), and removed port 49983 RETURN rule so all traffic gets DNAT'ed correctly.

Create PR

Or push these changes by commenting:

@cursor push d12aa6ed65
Preview (d12aa6ed65)
diff --git a/packages/envd/internal/port/forward.go b/packages/envd/internal/port/forward.go
--- a/packages/envd/internal/port/forward.go
+++ b/packages/envd/internal/port/forward.go
@@ -25,7 +25,7 @@
 	PortStateDelete  PortState = "DELETE"
 )
 
-var defaultGatewayIP = net.IPv4(169, 254, 0, 21)
+var defaultGatewayIP = net.IPv4(192, 168, 250, 1)
 
 type PortToForward struct {
 	socat *exec.Cmd

diff --git a/packages/orchestrator/pkg/template/build/core/rootfs/files/e2b-netns.service.tpl b/packages/orchestrator/pkg/template/build/core/rootfs/files/e2b-netns.service.tpl
--- a/packages/orchestrator/pkg/template/build/core/rootfs/files/e2b-netns.service.tpl
+++ b/packages/orchestrator/pkg/template/build/core/rootfs/files/e2b-netns.service.tpl
@@ -30,7 +30,6 @@
 # Idempotent iptables: -C check before -A. Wrapped in sh because there is no
 # declarative systemd directive for "add rule if not present".
 ExecStart=/bin/sh -c 'ip netns exec envd-ns iptables -t nat -C POSTROUTING -o eth0 -j MASQUERADE 2>/dev/null || ip netns exec envd-ns iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE'
-ExecStart=/bin/sh -c 'ip netns exec envd-ns iptables -t nat -C PREROUTING -i eth0 -p tcp --dport 49983 -j RETURN 2>/dev/null || ip netns exec envd-ns iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 49983 -j RETURN'
 ExecStart=/bin/sh -c 'ip netns exec envd-ns iptables -t nat -C PREROUTING -i eth0 -p tcp -j DNAT --to-destination 192.168.250.1 2>/dev/null || ip netns exec envd-ns iptables -t nat -A PREROUTING -i eth0 -p tcp -j DNAT --to-destination 192.168.250.1'
 
 [Install]

diff --git a/packages/orchestrator/pkg/template/build/core/rootfs/files/envd.service.tpl b/packages/orchestrator/pkg/template/build/core/rootfs/files/envd.service.tpl
--- a/packages/orchestrator/pkg/template/build/core/rootfs/files/envd.service.tpl
+++ b/packages/orchestrator/pkg/template/build/core/rootfs/files/envd.service.tpl
@@ -13,9 +13,6 @@
 Restart=always
 User=root
 Group=root
-# Run envd inside the dedicated network namespace set up by e2b-netns.service.
-# Customer iptables in the default namespace cannot reach this namespace.
-NetworkNamespacePath=/run/netns/envd-ns
 Environment=GOTRACEBACK=all
 LimitCORE=infinity
 ExecStartPre=/bin/sh -c 'mountpoint -q /etc/ssl/certs || (mkdir -p /run/e2b/certs && mount --bind /run/e2b/certs /etc/ssl/certs) && ([ -s /etc/ssl/certs/ca-certificates.crt ] || update-ca-certificates)'

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 91fea54. Configure here.

Group=root
# Run envd inside the dedicated network namespace set up by e2b-netns.service.
# Customer iptables in the default namespace cannot reach this namespace.
NetworkNamespacePath=/run/netns/envd-ns

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Child processes spawned by envd inherit wrong network namespace

High Severity

NetworkNamespacePath=/run/netns/envd-ns causes all child processes forked by envd (socat port forwarders, user terminal sessions, command execution) to inherit envd-ns. The existing socat port forwarder binds to 169.254.0.21:port in envd-ns and tries to connect to localhost:port in envd-ns — but customer processes listen in the default namespace's loopback, which is unreachable from envd-ns. Additionally, any user-facing process spawning (terminals, exec) would place the user process in the wrong network namespace where veth-customer (192.168.250.1) isn't visible.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 91fea54. Configure here.

@ValentaTomas

Copy link
Copy Markdown
Member Author

Closing in favor of the dirty self-heal pin (#2701). The dedicated network namespace is still the right structural fix and we'll likely reopen / rebase this once the immediate Bitfrost mitigation lands and we have time to coordinate the systemd-networkd + iptables-in-customer-ns story properly.

@ValentaTomas ValentaTomas deleted the feat/envd-network-namespace branch May 18, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants