Skip to content

fix(orchestrator): fix race condition in Cleanup causing missed cleanup functions#3136

Open
AdaAibaby wants to merge 4 commits into
e2b-dev:mainfrom
AdaAibaby:fix/cleanup-race-condition
Open

fix(orchestrator): fix race condition in Cleanup causing missed cleanup functions#3136
AdaAibaby wants to merge 4 commits into
e2b-dev:mainfrom
AdaAibaby:fix/cleanup-race-condition

Conversation

@AdaAibaby

Copy link
Copy Markdown
Contributor

There was a TOCTOU race between Add()/AddPriority() and run():

  1. Add() checks hasRun (false) outside the lock
  2. Add() blocks waiting for mu.Lock()
  3. run() acquires lock, sets hasRun=true, executes all cleanups, unlocks
  4. Add() acquires lock, appends f — but run() already finished

Result: f is never executed, potentially leaking resources (network namespaces, cgroups, file descriptors, etc.).

Fix:

  • Move hasRun.Store(true) inside the lock in run()
  • Add double-checked locking in Add()/AddPriority(): re-check hasRun after acquiring the lock and execute f inline if cleanup already ran

Add race-condition tests that reliably reproduce the bug with -race.

/cc @jakubno @dobrac @ValentaTomas Would appreciate your review on this change, thanks!

…up functions

There was a TOCTOU race between Add()/AddPriority() and run():

1. Add() checks hasRun (false) outside the lock
2. Add() blocks waiting for mu.Lock()
3. run() acquires lock, sets hasRun=true, executes all cleanups, unlocks
4. Add() acquires lock, appends f — but run() already finished

Result: f is never executed, potentially leaking resources (network
namespaces, cgroups, file descriptors, etc.).

Fix:
- Move hasRun.Store(true) inside the lock in run()
- Add double-checked locking in Add()/AddPriority(): re-check hasRun
  after acquiring the lock and execute f inline if cleanup already ran

Add race-condition tests that reliably reproduce the bug with -race.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Executing the cleanup and priority cleanup functions while holding the mutex lock can lead to deadlocks if the functions attempt to acquire the same lock or perform blocking operations. Releasing the lock before executing these functions avoids this risk.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/orchestrator/pkg/sandbox/cleanup.go
Comment thread packages/orchestrator/pkg/sandbox/cleanup.go
adababys and others added 3 commits June 29, 2026 19:45
Avoid holding the mutex while executing cleanup functions in the
double-check branch. This prevents potential deadlocks if a cleanup
function performs blocking operations or re-enters the lock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants