Skip to content

feat(provision): mask periodic rootfs-dirtying timers#2564

Closed
ValentaTomas wants to merge 1 commit into
mainfrom
feat/mask-idle-daemons
Closed

feat(provision): mask periodic rootfs-dirtying timers#2564
ValentaTomas wants to merge 1 commit into
mainfrom
feat/mask-idle-daemons

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 5, 2026

Mask periodic timers/daemons that scribble small files into the rootfs (audit, apt-daily, motd-news, man-db, e2scrub, fstrim, logrotate, unattended-upgrades, accounts-daemon, udisks2). Each wake-up dirties at least one 4 KiB block per run with no value for ephemeral sandboxes.

systemctl mask is a no-op on units not present, so this is safe across all customer base images.

Mask audit, apt-daily, motd-news, man-db, e2scrub_all, fstrim,
logrotate, unattended-upgrades, accounts-daemon, udisks2 timers and
services in the base provisioning script. Each periodic wake-up
scribbles small files into the rootfs and dirties at least one 4 KiB
block per run, all of which end up in the next snapshot diff with no
value for ephemeral sandboxes.

`systemctl mask` is a no-op on units that aren't installed, so this
is safe across all customer base images. Trailing `|| true` keeps the
script running if the systemctl invocation hits an unexpected error
on a minimal base image.
@cursor
Copy link
Copy Markdown

cursor Bot commented May 5, 2026

PR Summary

Medium Risk
Disables services like unattended-upgrades/auditd, which can change security/logging behavior inside sandboxes. The trailing || true can hide unexpected systemctl failures, making it harder to notice when masking didn’t apply.

Overview
Masking unattended-upgrades, auditd, and other system services can break workloads that rely on updates, auditing, or device management inside the sandbox, and this change makes that the default. The unconditional || true suppresses errors from systemctl mask, which can silently leave the system in an unexpected state if masking fails for reasons other than a missing unit.

Reviewed by Cursor Bugbot for commit 121ca16. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
2355 4 2351 5
View the top 1 failed test(s) by shortest run time
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV/ENV_with_multiline_value
Stack Traces | 7.07s run time
=== RUN   TestTemplateBuildENV/ENV_with_multiline_value
=== PAUSE TestTemplateBuildENV/ENV_with_multiline_value
=== CONT  TestTemplateBuildENV/ENV_with_multiline_value
    build_template_test.go:134: test-ubuntu-env-multiline: [info] Building template c0mxh8ilxc7n2fhs1gub/ee762d77-b3c5-4427-a74a-916028d91adb
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] FROM ubuntu:22.04 [d348e2cff10c25f36746ae9dedfe29467eb1263cf1791c5ae2f92931669846a8]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] DEFAULT USER user [c525ee1881b40a4e83308206a0611834576850c851fc6c70665a63de7920b1f1]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 1/2] ENV MULTILINE line1
        line2
        line3 [cdc2d756da206dedd399917260b9c4e50416e84ada565bafb39c320bea21b414]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 2/2] RUN [[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1 [71b515bc75c971261de5a53fe159d57c0264dba2c3b30b5a2deb325dae91097d]
    build_template_test.go:134: test-ubuntu-env-multiline: [error] Build failed: failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1
    build_template_test.go:374: Build failed: {<nil> failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1 0xc00059f610}
--- FAIL: TestTemplateBuildENV/ENV_with_multiline_value (7.07s)
View the full list of 5 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/metrics::TestTeamMetrics

Flake rate in main: 55.56% (Passed 4 times, Failed 5 times)

Stack Traces | 2.87s run time
=== RUN   TestTeamMetrics
=== PAUSE TestTeamMetrics
=== CONT  TestTeamMetrics
    team_metrics_test.go:61: 
        	Error Trace:	.../api/metrics/team_metrics_test.go:61
        	Error:      	Should be true
        	Test:       	TestTeamMetrics
        	Messages:   	MaxConcurrentSandboxes should be >= 0
--- FAIL: TestTeamMetrics (2.87s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 57.14% (Passed 6 times, Failed 8 times)

Stack Traces | 185s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (185.41s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 50.00% (Passed 6 times, Failed 6 times)

Stack Traces | 0.87s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox if8rnojxhazptyqwp9ucl
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1365}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox if8rnojxhazptyqwp9ucl
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1366}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox if8rnojxhazptyqwp9ucl
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1367}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Tue, 05 May 2026 02:24:30 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox if8rnojxhazptyqwp9ucl
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (0.87s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestDeleteTemplateWithAccessToken

Flake rate in main: 66.67% (Passed 1 times, Failed 2 times)

Stack Traces | 300s run time
=== RUN   TestDeleteTemplateWithAccessToken
=== PAUSE TestDeleteTemplateWithAccessToken
=== CONT  TestDeleteTemplateWithAccessToken
    delete_template_test.go:47: Build timeout exceeded
--- FAIL: TestDeleteTemplateWithAccessToken (300.07s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV

Flake rate in main: 66.67% (Passed 1 times, Failed 2 times)

Stack Traces | 0s run time
=== RUN   TestTemplateBuildENV
=== PAUSE TestTemplateBuildENV
=== CONT  TestTemplateBuildENV
--- FAIL: TestTemplateBuildENV (0.00s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The provision.sh script is updated to mask various systemd timers and services to prevent unnecessary rootfs writes. I have no feedback to provide.

@ValentaTomas ValentaTomas marked this pull request as ready for review May 5, 2026 02:10
@qodo-code-review
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Warning

/review is deprecated. Use /agentic_review instead (removed after 2026-05-31).

Here are some key observations to aid the review process:

⚡ Recommended focus areas for review

Ineffective Mask

systemctl mask does not stop already-active units; if any of these services/timers are started earlier in the build or enabled by dependencies, they can still run until the next boot (or until explicitly stopped). Consider whether masking should be paired with stopping/disabling (or --now where applicable) to guarantee they don’t scribble during the remainder of provisioning.

systemctl mask --quiet \
    apt-daily.timer apt-daily-upgrade.timer \
    motd-news.service motd-news.timer \
    man-db.timer e2scrub_all.timer fstrim.timer \
    logrotate.timer \
    unattended-upgrades.service \
    accounts-daemon.service \
    udisks2.service \
    auditd.service \
    || true
Hidden Failure

The trailing || true suppresses all failures from the masking call (not just “unit not found”), which can silently miss intended masks (e.g., due to lack of systemd, DBus issues, permission problems, or transient systemctl errors), undermining the stated guarantee.

systemctl mask --quiet \
    apt-daily.timer apt-daily-upgrade.timer \
    motd-news.service motd-news.timer \
    man-db.timer e2scrub_all.timer fstrim.timer \
    logrotate.timer \
    unattended-upgrades.service \
    accounts-daemon.service \
    udisks2.service \
    auditd.service \
    || true
Functional Breakage

Masking auditd.service, accounts-daemon.service, and udisks2.service can break software that expects these D-Bus/system services to exist (even in “ephemeral” environments), causing hard failures rather than just reducing disk churn; validate this won’t regress workloads that rely on device discovery, user/session metadata, or audit hooks.

unattended-upgrades.service \
accounts-daemon.service \
udisks2.service \
auditd.service \

@ValentaTomas ValentaTomas marked this pull request as draft May 5, 2026 02:16
@ValentaTomas
Copy link
Copy Markdown
Member Author

Close for now, will reopen after measuring later.

@ValentaTomas ValentaTomas deleted the feat/mask-idle-daemons branch May 6, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants