libexpr: add temp roots during evaluation to prevent GC races#15795
libexpr: add temp roots during evaluation to prevent GC races#15795martinjlowm wants to merge 1 commit into
Conversation
|
I don't think this is the right approach. Roots have to be registered before the path is added to the store. Otherwise you make the time window for a race smaller, but don't eliminate it. E.g. calling The typical correct pattern is something like: i.e. register the temp root and then create the path. Also, adding |
Protect store paths from previous sessions against concurrent GC by adding addTempRoot at consumption sites in import and the PackageInfo constructor. Current-session paths already have temp roots from writeDerivation.
65f4544 to
52f00a3
Compare
Thanks for the input 🙏. Revised it to add the temp root exactly when the drv is read from disk in the situation where a file already exists instead of only relying on |
|
Side note, please don't use 2.28.3 and upgrade to something that's actually supported. |
Definitely! - just wanted to point out on what version it was. |
| /* Prevent GC from deleting the .drv before we read it. This | ||
| constructor is only called from CLI entry points (nix-build) | ||
| where the path originates from a previous session, so it | ||
| has no temp root from the current process. */ | ||
| store->addTempRoot(drvPath); |
There was a problem hiding this comment.
It seems to be that it should be the caller of this constructor that ensures that the path is rooted, not the constructor itself. Otherwise the path could disappear
There was a problem hiding this comment.
Don't you think it's smarter to couple the info about a temp root to being as close to derivationFromPath (and readDerivation) as possible? That is, avoiding the possibility of initializing PackageInfo and forgetting about locking it down just before it being parsed?
There was a problem hiding this comment.
If you doing a readDerivation on an unrooted path it's already too late. It must be rooted before (either by writeDerivation or by a permanent root created by a cli command that's passing it to another one). We are not trying to make it les racy, but rather address the root cause in a correct way - not just one that happens to reduce the race window.
There was a problem hiding this comment.
I'm not concerned about the input derivation to nix-build abc.drv (https://github.com/NixOS/nix/blob/master/src/nix/nix-build/nix-build.cc#L395-L397) from CLI, but rather transitive derivations referenced from it for a deeply nested derivation: https://github.com/NixOS/nix/blob/master/src/libexpr/get-drvs.cc#L392 - this instantiates PackageInfo - I'd say that is as close to the root as we can get without introducing a separate "root" pass that immediately resolves the full tree of drvs and their paths.
Walking down the call branch of nix build familiarizing myself more with the codebase (thought this would be it), I see, from his contributions, dramforever has been on a spree lately on related files: #15616 - I think perhaps his work solves the one specific problem we're facing.
There was a problem hiding this comment.
Those go through writeDerivation since evaluator roots all derivations, so this is not an issue.
| constructor is only called from CLI entry points (nix-build) | ||
| where the path originates from a previous session, so it | ||
| has no temp root from the current process. */ | ||
| store->addTempRoot(drvPath); |
There was a problem hiding this comment.
I think it's already too late to add a GC root here, since the path has already been created. So if a GC was started between then and this point, the path may be deleted.
However I also don't see why it's needed in the first place, since Store::writeDerivation() already has a call to addTempRoot().
| /* Add a temp root before checking validity to prevent GC | ||
| from deleting the path between our check and the | ||
| subsequent readDerivation(). */ | ||
| state.store->addTempRoot(storePath); |
There was a problem hiding this comment.
I think the same applies here: the drv was presumably added by a call to writeDerivation(), so it should already be a temp root.
There was a problem hiding this comment.
Perhaps I'm missing something, but to my understanding writeDerivation is only called on the initial session when the drv is yet to be realized, stored in the store and for any succeeding calls directly nix eval --impure --expr 'let drv = /nix/store/foo.drv; in (import drv)', for transitive drv paths would call readDerivation if they already exist? At this point there is nothing to write because they should already live in store, ready to be executed for the build phase.
|
I tried to come up with probabilistic test cases for these two cases and I realized I have completely misinterpreted the flow, function naming and the data structures. I thought the drv (file) traversal was done in get-drvs.cc reading unprotected files from disk via readDerivation. This work deviates from the problem that we actually faced - will test out the latest version with the latest temp root additions, and have better data to back the claims if we still face problems. Sorry for wasting your time - I learned a lot though. |
Motivation
Evaluation resolves store paths (reads
.drvfiles, computes closures, extractsoutput paths) that may originate from previous sessions without registering temp
roots (reading derivations from disk instead of JIT writing). This creates a
race window where GC can delete
.drvfiles between evaluation resolving themand the build phase adding its own temp roots, resulting in "No such file or
directory" errors.
Add
addTempRoot()calls at the key points wherelibexprreads derivationsfrom the store for paths that have no temp root from the current process:
import: before validating and reading an imported .drv file (withisDerivationchecked first to avoid unnecessary daemon round-trips fornon-derivation imports)
PackageInfoconstructor: before reading a derivation for package metadataqueries (only caller is
nix-buildfor user-supplied.drvpaths)Context
We're experiencing concurrent GC passes that regularly wipe
.drv's off of diskwhen a concurrent process is halted during the eval phase, this leads to the
following when the process gets to proceed:
This complements #15469 — without the GC-side fix (re-checking
tempRootsbefore each deletion),
addTempRootat consumption sites only narrows the racewindow without eliminating it.
This happens for us on: nix (Nix) 2.28.3
Now, this is my first contribution to Nix itself, so I'm not aware of any potential performance penalties in extending the set of temproots at eval-time.