Skip to content

libexpr: add temp roots during evaluation to prevent GC races#15795

Closed
martinjlowm wants to merge 1 commit into
NixOS:masterfrom
martinjlowm:fix/add-temp-roots-during-eval-upstream
Closed

libexpr: add temp roots during evaluation to prevent GC races#15795
martinjlowm wants to merge 1 commit into
NixOS:masterfrom
martinjlowm:fix/add-temp-roots-during-eval-upstream

Conversation

@martinjlowm
Copy link
Copy Markdown

@martinjlowm martinjlowm commented May 5, 2026

Motivation

Evaluation resolves store paths (reads .drv files, computes closures, extracts
output paths) that may originate from previous sessions without registering temp
roots (reading derivations from disk instead of JIT writing). This creates a
race window where GC can delete .drv files between evaluation resolving them
and the build phase adding its own temp roots, resulting in "No such file or
directory" errors.

Add addTempRoot() calls at the key points where libexpr reads derivations
from the store for paths that have no temp root from the current process:

  • import: before validating and reading an imported .drv file (with
    isDerivation checked first to avoid unnecessary daemon round-trips for
    non-derivation imports)
  • PackageInfo constructor: before reading a derivation for package metadata
    queries (only caller is nix-build for user-supplied .drv paths)

Context

We're experiencing concurrent GC passes that regularly wipe .drv's off of disk
when a concurrent process is halted during the eval phase, this leads to the
following when the process gets to proceed:

error: opening file '/nix/store/...-rust_aws-smithy-protocol-test-0.63.14.drv': No such file or directory

This complements #15469 — without the GC-side fix (re-checking tempRoots
before each deletion), addTempRoot at consumption sites only narrows the race
window without eliminating it.

This happens for us on: nix (Nix) 2.28.3

Now, this is my first contribution to Nix itself, so I'm not aware of any potential performance penalties in extending the set of temproots at eval-time.

@edolstra
Copy link
Copy Markdown
Member

edolstra commented May 5, 2026

I don't think this is the right approach. Roots have to be registered before the path is added to the store. Otherwise you make the time window for a race smaller, but don't eliminate it. E.g. calling addTempRoot() in import is too late since a GC may have already started at that point.

The typical correct pattern is something like:

store->addTempRoot(path);
if (!store->isValidPath(path))
    store->addToStore(...path...);

i.e. register the temp root and then create the path.

Also, adding addTempRoot() calls where they're not needed may have a non-negligible performance impact (especially when using the daemon, since it requires a roundtrip through the daemon). For instance, doing this in the PackageInfo constructor may result in tens of thousands of root registrations when traversing nixpkgs. Ideally though those operations shouldn't read the .drv files at all.

Protect store paths from previous sessions against concurrent GC
by adding addTempRoot at consumption sites in import and the
PackageInfo constructor. Current-session paths already have temp
roots from writeDerivation.
@martinjlowm martinjlowm force-pushed the fix/add-temp-roots-during-eval-upstream branch from 65f4544 to 52f00a3 Compare May 5, 2026 19:12
@martinjlowm
Copy link
Copy Markdown
Author

martinjlowm commented May 5, 2026

I don't think this is the right approach. Roots have to be registered before the path is added to the store. Otherwise you make the time window for a race smaller, but don't eliminate it. E.g. calling addTempRoot() in import is too late since a GC may have already started at that point.

The typical correct pattern is something like:

store->addTempRoot(path);
if (!store->isValidPath(path))
    store->addToStore(...path...);

i.e. register the temp root and then create the path.

Also, adding addTempRoot() calls where they're not needed may have a non-negligible performance impact (especially when using the daemon, since it requires a roundtrip through the daemon). For instance, doing this in the PackageInfo constructor may result in tens of thousands of root registrations when traversing nixpkgs. Ideally though those operations shouldn't read the .drv files at all.

Thanks for the input 🙏. Revised it to add the temp root exactly when the drv is read from disk in the situation where a file already exists instead of only relying on writeDerivation to add a temp root.

@martinjlowm martinjlowm marked this pull request as ready for review May 5, 2026 19:31
@martinjlowm martinjlowm requested a review from edolstra as a code owner May 5, 2026 19:31
@xokdvium
Copy link
Copy Markdown
Contributor

xokdvium commented May 5, 2026

Side note, please don't use 2.28.3 and upgrade to something that's actually supported.

@martinjlowm
Copy link
Copy Markdown
Author

Side note, please don't use 2.28.3 and upgrade to something that's actually supported.

Definitely! - just wanted to point out on what version it was.

Comment thread src/libexpr/get-drvs.cc
Comment on lines +27 to +31
/* Prevent GC from deleting the .drv before we read it. This
constructor is only called from CLI entry points (nix-build)
where the path originates from a previous session, so it
has no temp root from the current process. */
store->addTempRoot(drvPath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be that it should be the caller of this constructor that ensures that the path is rooted, not the constructor itself. Otherwise the path could disappear

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think it's smarter to couple the info about a temp root to being as close to derivationFromPath (and readDerivation) as possible? That is, avoiding the possibility of initializing PackageInfo and forgetting about locking it down just before it being parsed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you doing a readDerivation on an unrooted path it's already too late. It must be rooted before (either by writeDerivation or by a permanent root created by a cli command that's passing it to another one). We are not trying to make it les racy, but rather address the root cause in a correct way - not just one that happens to reduce the race window.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerned about the input derivation to nix-build abc.drv (https://github.com/NixOS/nix/blob/master/src/nix/nix-build/nix-build.cc#L395-L397) from CLI, but rather transitive derivations referenced from it for a deeply nested derivation: https://github.com/NixOS/nix/blob/master/src/libexpr/get-drvs.cc#L392 - this instantiates PackageInfo - I'd say that is as close to the root as we can get without introducing a separate "root" pass that immediately resolves the full tree of drvs and their paths.

Walking down the call branch of nix build familiarizing myself more with the codebase (thought this would be it), I see, from his contributions, dramforever has been on a spree lately on related files: #15616 - I think perhaps his work solves the one specific problem we're facing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those go through writeDerivation since evaluator roots all derivations, so this is not an issue.

Comment thread src/libexpr/get-drvs.cc
constructor is only called from CLI entry points (nix-build)
where the path originates from a previous session, so it
has no temp root from the current process. */
store->addTempRoot(drvPath);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already too late to add a GC root here, since the path has already been created. So if a GC was started between then and this point, the path may be deleted.

However I also don't see why it's needed in the first place, since Store::writeDerivation() already has a call to addTempRoot().

Comment thread src/libexpr/primops.cc
/* Add a temp root before checking validity to prevent GC
from deleting the path between our check and the
subsequent readDerivation(). */
state.store->addTempRoot(storePath);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same applies here: the drv was presumably added by a call to writeDerivation(), so it should already be a temp root.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something, but to my understanding writeDerivation is only called on the initial session when the drv is yet to be realized, stored in the store and for any succeeding calls directly nix eval --impure --expr 'let drv = /nix/store/foo.drv; in (import drv)', for transitive drv paths would call readDerivation if they already exist? At this point there is nothing to write because they should already live in store, ready to be executed for the build phase.

@martinjlowm
Copy link
Copy Markdown
Author

I tried to come up with probabilistic test cases for these two cases and I realized I have completely misinterpreted the flow, function naming and the data structures. I thought the drv (file) traversal was done in get-drvs.cc reading unprotected files from disk via readDerivation.

This work deviates from the problem that we actually faced - will test out the latest version with the latest temp root additions, and have better data to back the claims if we still face problems.

Sorry for wasting your time - I learned a lot though.

@martinjlowm martinjlowm closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants