CI: serialize precompile workers for python-using groups#1182
Merged
ChrisRackauckas merged 1 commit intoSciML:masterfrom Apr 10, 2026
Merged
CI: serialize precompile workers for python-using groups#1182ChrisRackauckas merged 1 commit intoSciML:masterfrom
ChrisRackauckas merged 1 commit intoSciML:masterfrom
Conversation
The `OptimizationSciPy` (and occasionally `OptimizationPyCMA`) jobs have
been observed to fail at the precompile stage with
InitError: UndefRefError: access to undefined reference
[1] _pyjl_new(...)
@ PythonCall.JlWrap.Cjl ~/.julia/packages/PythonCall/.../JlWrap/C.jl:24
...
[11] __init__()
@ PythonCall.JlWrap ~/.julia/packages/PythonCall/.../JlWrap/JlWrap.jl:70
during initialization of module JlWrap
with the *same* PythonCall version (`0.9.31`) on which other CI runs
succeed (`OptimizationSciPy lts` passes on master run 24144818227 but
the same job fails on PR run 23171032225). The intermittency, identical
package versions, and the failure happening inside PythonCall's own
wrapper-type registration during JlWrap.__init__ all point at a parallel
precompile race: when multiple precompile workers spin up wrapper-type
registration concurrently, one of them observes a not-yet-populated
Python type slot and throws `UndefRefError`.
`Pkg.precompile()` reads `JULIA_NUM_PRECOMPILE_TASKS` (see
base/precompilation.jl:437) to size its parallel-task semaphore. Setting
it to `1` for the python-using jobs serializes precompile workers and
removes the race. The cost on these small subpackages is negligible.
Targeted to OptimizationSciPy and OptimizationPyCMA only — other groups
keep the default parallel precompile.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces #1180 (closed) with a substantive non-retry fix.
The
OptimizationSciPy(and occasionallyOptimizationPyCMA) jobs have been observed to fail at the precompile stage withwith the same
PythonCall v0.9.31on which other CI runs succeed —OptimizationSciPy, ltspasses on master run24144818227but the same job fails on PR #1169 run23171032225.Root cause
The intermittency + identical package versions + failure happening inside PythonCall's own wrapper-type registration in
JlWrap.__init__all point at a parallel precompile race.JlWrap.__init__(.../src/JlWrap/JlWrap.jl:54-75) doesWhen multiple precompile workers spin up wrapper-type registration concurrently, one observes a not-yet-populated Python type slot inside
_pyjl_newand throwsUndefRefError. This explains the every-N-th-run flake without any other moving parts.Fix
Pkg.precompile()readsJULIA_NUM_PRECOMPILE_TASKS(Julia base/precompilation.jl:437) to size its parallel-task semaphore. Setting it to1serializes precompile workers and removes the race.Targeted to OptimizationSciPy and OptimizationPyCMA only. Other groups keep their default parallel precompile, so there's no global slowdown. The cost on these small python-wrapping subpackages is a few seconds.
Why not retry
A retry-on-failure approach (#1180) would have masked the race instead of removing it. Subsequent failures of unrelated bugs in those groups would also have been silently retried. Per maintainer feedback,
JULIA_NUM_PRECOMPILE_TASKS=1is the more honest, targeted fix.Test plan
JULIA_NUM_PRECOMPILE_TASKS=1confirmed to be read by Julia 1.11 (base/precompilation.jl:437)yaml.safe_loadif:block scoped to the two python-using groups only🤖 Generated with Claude Code