You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(sandbox): pre-pause guest reclaim via envd (#2551)
Adds an opt-in pre-pause step that runs `sync`, `drop_caches`,
`compact_memory`, and `fstrim -av` on the live VM via envd's Process
service to shrink the memfile/rootfs diff. Each step is wrapped in
`timeout -s KILL` with its own cap, so a stuck step (most realistically
a slow `sync` on a large dirty backlog) cannot starve the rest — and a
killed step does not abort the chain (`;`-separated, not `&&`).
Pausing FC is unaffected by an in-flight guest `sync` we time out: FC
only drains in-flight virtio I/O before completing the pause; any
unflushed dirty pages stay in the memfile snapshot and converge on
resume. Per-step timeouts trade reclaim payoff, never correctness —
`drop_caches` is documented non-destructive, `fstrim` consults FS
allocation metadata not pagecache, and a partial `compact_memory` is
just less-compacted.
Disabled by default — the LD flag's null default leaves every step at 0
(skipped). Missing keys, zero, negative, and wrong-type values all
collapse to "skip". The orchestrator skips the envd call entirely when
the chain is empty. The outer `Connect-Timeout-Ms` is the sum of
per-step caps plus a small slack.
Single LD flag, one rule per cohort:
- `guest-pause-reclaim` (JSON) — per-step caps in milliseconds keyed by
step name, evaluated against sandbox / team / template LD contexts so
targeting is configured in LaunchDarkly.
Example value:
```json
{"sync":500,"drop_caches":200,"compact_memory":1000,"fstrim":500}
```
`resume-build` exposes `-reclaim` to inject the example values into the
offline LD store for local testing.
Pairs cleanly with #2553 (disable proactive compaction in the guest base
image), but is independent of it and of FPH (#2552). Split out from
#2550.
0 commit comments