Skip to content

queue-runner: resolve CA derivations after dependency outputs land#1629

Merged
Ericson2314 merged 1 commit intoNixOS:masterfrom
obsidiansystems:ca-resolve-two-phase
May 2, 2026
Merged

queue-runner: resolve CA derivations after dependency outputs land#1629
Ericson2314 merged 1 commit intoNixOS:masterfrom
obsidiansystems:ca-resolve-two-phase

Conversation

@amaanq
Copy link
Copy Markdown
Member

@amaanq amaanq commented Mar 30, 2026

Instead of resolving at dispatch time and carrying two drv identities through the gRPC layer, resolve in succeed_step once outputs are in the DB. The resolved drv becomes a fresh Step that dispatch picks up normally; the builder only ever sees one drv. A new resolvedFromBuild/resolvedFromStep foreign key BuildSteps ties the resolved step back to the dependency that triggered it.

Comment thread subprojects/hydra/sql/hydra.sql Outdated
Copy link
Copy Markdown
Member

@Ericson2314 Ericson2314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point where we try_resolve before, we do this:

  • If result is None or "same derivation":

    just continue with this step, same as we always did

  • If result is Some(drv) and that drv is not the same as the current drv:

    Step finished, status "resolved", fill in the DB pointer to the new step, and the new step now builds, this step is done.


the current "try resolve reverse dependencies" is a possible optimisation --- incrementally resolve reverse dependencies, reminiscent of union find --- but that's a more complex thing we shouldn't worry about at this time.

@amaanq amaanq force-pushed the ca-resolve-two-phase branch 2 times, most recently from ae692ac to 05eef39 Compare March 31, 2026 03:10
Comment thread subprojects/crates/db/src/models.rs Outdated
Comment thread subprojects/hydra/sql/hydra.sql Outdated
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
@Ericson2314 Ericson2314 force-pushed the ca-resolve-two-phase branch from 05eef39 to de9aed8 Compare March 31, 2026 18:45
@artemist artemist force-pushed the ca-resolve-two-phase branch 6 times, most recently from 0768499 to 80430bf Compare April 6, 2026 18:37
@artemist
Copy link
Copy Markdown
Member

artemist commented Apr 6, 2026

I think I have this working, but there are a few things I'm unclear on and would like to figure out:

  • Do the BuildSteps entries we create in the database properly map to the Step objects in the queue-runner, and do they have sufficient information to recover state after a restart and be shown in the UI?
  • How should we handle cached failures, and what is actually needed?
  • Are we doing any double-building, and how can we test that?

I also believe my change needs substantial refactoring, as I'm using only parts of several functions (e.g. create_step)

@artemist artemist force-pushed the ca-resolve-two-phase branch 2 times, most recently from 3c7b586 to 7eb2319 Compare April 7, 2026 18:49
@artemist
Copy link
Copy Markdown
Member

artemist commented Apr 7, 2026

Rebased onto #1642

@artemist artemist force-pushed the ca-resolve-two-phase branch from 7eb2319 to 62abe6f Compare April 7, 2026 20:03
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
Comment on lines +406 to +411
} else {
None
}
};

if let Some(resolved_path) = resolved {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine this with the previous branch? Or at least we could compute a "do we want to resolve" bool, and then have the actual resolving (which much succeed) and this stuff in the same branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doable, though it will take out a lock on the step's derivation for longer than strictly necessary.

Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
Comment thread subprojects/hydra-queue-runner/src/state/mod.rs Outdated
Comment thread subprojects/hydra/sql/hydra.sql Outdated
@artemist artemist force-pushed the ca-resolve-two-phase branch 2 times, most recently from 4be2bbb to 0344518 Compare April 13, 2026 14:31
@Ericson2314
Copy link
Copy Markdown
Member

Unfortunately, once #1667 is merged in, this PR will hang.

@Ericson2314 Ericson2314 force-pushed the ca-resolve-two-phase branch 2 times, most recently from 1f18bcf to f040eaa Compare April 14, 2026 12:15
@artemist artemist force-pushed the ca-resolve-two-phase branch 5 times, most recently from 90af869 to 3e6dae1 Compare April 16, 2026 14:52
@Ericson2314 Ericson2314 force-pushed the ca-resolve-two-phase branch from 3e6dae1 to 9c67fd6 Compare April 24, 2026 02:56
@Ericson2314 Ericson2314 force-pushed the ca-resolve-two-phase branch 4 times, most recently from df3f3e9 to 7c69e41 Compare April 26, 2026 21:17
Instead of resolving at `StepInfo` construction and carrying two drv
identities through the gRPC layer, resolve in
`realise_drv_on_valid_machine` once all deps are built. If resolution
yields a different drv, the original step is marked `Resolved` and a new
`Step` is created and punted back to the scheduler. This grants more
flexibility and can be extended to dynamic derivations, which won't map
1:1 with the original derivations.

The unresolved-to-resolved drv path mapping is kept in an in-memory
`HashMap` on `State`, and `try_resolve` translates dependency drv paths
through this map before querying the database for outputs. This avoids a
schema migration for now, at the cost of losing the mapping on restart.

CA derivations that cannot be fully resolved are treated as build
failures, avoiding builder-dependent inconsistencies.

All dependency outputs of a `CAFloating` derivation must be recorded in
the hydra database for resolution to work. Outputs built or substituted
by hydra are, but outputs only on the local nix store (e.g. part of the
system configuration) are not. Record these upon creating a step, making
it possible to build derivations from `contentAddressedByDefault`
nixpkgs.

Co-Authored-By: Artemis Tosini <artemis.tosini@obsidian.systems>
Co-Authored-By: John Ericson <John.Ericson@Obsidian.Systems>
Co-Authored-By: Amaan Qureshi <git@amaanq.com>
@Ericson2314 Ericson2314 force-pushed the ca-resolve-two-phase branch from 7c69e41 to 3cbcdbe Compare May 2, 2026 19:30
Copy link
Copy Markdown
Member

@Ericson2314 Ericson2314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we can temporarily do this without a DB changes, allowing it to go in right away.

(Without a DB migration, and without hydra.nixos.org enabling CA derivations, it is functionally a no-op there.)

@Ericson2314 Ericson2314 enabled auto-merge May 2, 2026 19:36
@Ericson2314 Ericson2314 added this pull request to the merge queue May 2, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 2, 2026
@Ericson2314 Ericson2314 added this pull request to the merge queue May 2, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 2, 2026
@Ericson2314 Ericson2314 added this pull request to the merge queue May 2, 2026
Merged via the queue into NixOS:master with commit d0febc0 May 2, 2026
3 of 4 checks passed
@Ericson2314 Ericson2314 deleted the ca-resolve-two-phase branch May 2, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants