Skip to content

Commit f06bb4a

Browse files
committed
feat(embed): bake the embedded flag into the sysimage
Address review feedback from @cjdoris on #773: > if you know that you are embedded, then you can find the libptr (by > calling into the C-API functions, which will be globally available in > this case) [...] if we can 'bake in' the fact that PythonCall is > embedded into the sysimg, then we won't need any of these preferences. > I wonder if we could simply do '@eval PythonCall _is_embedded=true' or > something when we make the sysimg, so it's baked into PythonCall, then > test for this variable in 'PythonCall.__init__'? > [...] could you document the basic steps to actually create a sysimg? > I don't think you need to go much into the why [...] nor mention > Main.__PythonCall_libptr (which is internal). Instead, you can pretty > much just say that you need to set the prefs [...]. But what's not > totally obvious is how you set up a project and these prefs and use > PackageCompiler to actually make the sysimg. Design ------ Add a module-level 'const _is_embedded = Ref(false)' on PythonCall, flipped at sysimage build time via PackageCompiler's 'script=' keyword (NOT 'precompile_execution_file=', which runs in a separate child process whose state is not snapshotted). The mutated value is captured in the snapshot; at runtime, 'PythonCall.__init__' reads it and takes the embedded path. A 'const Ref' is preferred over a rebound non-const global so the C submodule can 'import' the name once and read it without 'parentmodule' indirection. Same baked-into-sysimage behaviour as the literal '@eval' form suggested in review. libpython is opened from the existing 'lib' preference / JULIA_PYTHONCALL_LIB (added in 0.9.33). The PR does not introduce new preferences or environment variables. The interpreter's executable path is resolved via 'sys.executable' using PyImport_ImportModule + PyObject_GetAttrString + PyUnicode_AsUTF8AndSize - stable across all supported CPython versions and platforms. If '_is_embedded[]' is true but 'Py_IsInitialized()' returns 0 - e.g. the sysimage is loaded by a 'julia.exe' child of 'Base.compilecache' rather than by juliacall - init_context resets CTX and downstream module __init__s short-circuit. PythonCall loads as inactive instead of erroring. Files ----- src/PythonCall.jl: declare 'const _is_embedded = Ref(false)'. src/C/C.jl: import _is_embedded into the C submodule. src/C/context.jl: rewrite init_context() embedded branch; add _embedded_program_path() reading sys.executable. src/Core/Core.jl, src/Convert/Convert.jl, src/Wrap/Wrap.jl, src/JlWrap/JlWrap.jl, src/JlWrap/C.jl, src/Compat/Compat.jl: guard __init__ on CTX.is_initialized for the inactive-load case. docs/src/juliacall.md: rewrite the 'Baking PythonCall into a system image' section with a worked example. CHANGELOG.md: Unreleased entry.
1 parent 2829ab4 commit f06bb4a

12 files changed

Lines changed: 155 additions & 57 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# Changelog
22

33
## Unreleased
4-
* Support baking `PythonCall` into a juliacall system image via the new opt-in
5-
`embedded` preference / `JULIA_PYTHONCALL_EMBEDDED` option, removing the
6-
`using PythonCall` cost from cold start. No behaviour change unless opted in.
4+
* Support baking `PythonCall` into a juliacall system image. Set
5+
`PythonCall._is_embedded[] = true` in a PackageCompiler `script=` and the
6+
resulting sysimg takes the embedded path at startup. Set the `lib`
7+
preference / `JULIA_PYTHONCALL_LIB` to libpython's path so the embedded
8+
PythonCall can open it.
79
* Added option `lib` to JuliaCall. Setting this will skip the discovery subprocess.
810
* Bug fixes.
911

docs/src/juliacall.md

Lines changed: 48 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -115,25 +115,54 @@ systems that may be readonly. Note that the project set in
115115
`PYTHON_JULIACALL_PROJECT` *must* already have PythonCall.jl installed and it
116116
*must* match the JuliaCall version, otherwise loading Julia will fail.
117117

118-
### Baking PythonCall into a system image
119-
120-
For the fastest possible startup you can compile `PythonCall` itself (alongside
121-
your own packages) into a system image with
122-
[PackageCompiler.jl](https://github.com/JuliaLang/PackageCompiler.jl), so that
123-
the `using PythonCall` performed at startup is a memory-map rather than a load.
124-
125-
When `PythonCall` is baked into the system image its `__init__` runs *during*
126-
`jl_init_with_image`, before juliacall's bootstrap has defined the
127-
`Main.__PythonCall_libptr` global it normally uses to detect that it is
128-
embedded. To support this, set the `embedded` preference (or the
129-
`JULIA_PYTHONCALL_EMBEDDED=yes` environment variable) together with the `lib`
130-
preference / `JULIA_PYTHONCALL_LIB` pointing at the running interpreter's
131-
libpython. With `embedded` set, PythonCall takes the embedded path even without
132-
the global and opens libpython by path (it is already loaded in the process, so
133-
this is just a handle). The default is `no`, leaving normal behaviour
134-
unchanged. Use this together with `PYTHON_JULIACALL_SYSIMAGE` (below), and
135-
`PYTHON_JULIACALL_EXE` / `PYTHON_JULIACALL_PROJECT` so juliacall resolves the
136-
baked environment directly.
118+
### [Baking PythonCall into a system image](@id baking-sysimage)
119+
120+
The first `import juliacall` in a fresh process is slow - typically 10-20
121+
seconds in a clean container - because Julia starts, deserialises
122+
`PythonCall` from cache, and JIT-compiles the bridge's hot paths. Long-
123+
running processes amortise that cost. Short-lived ones - serverless
124+
functions, queue workers, CI jobs that start, handle one request, and
125+
exit - pay it on every invocation.
126+
127+
Compiling `PythonCall` into a system image with
128+
[PackageCompiler.jl](https://github.com/JuliaLang/PackageCompiler.jl)
129+
collapses load+compile into a memory-map at startup, typically cutting
130+
that cost by an order of magnitude. To bake the resulting image so
131+
`import juliacall` picks it up automatically, set
132+
`PythonCall._is_embedded[] = true` inside the sysimage-build process.
133+
134+
PackageCompiler's `precompile_execution_file=` is run in a separate child
135+
process whose state is not snapshotted, so the flag must be set via the
136+
`script=` keyword instead.
137+
138+
```julia
139+
# bake_embedded.jl
140+
PythonCall._is_embedded[] = true
141+
```
142+
143+
```julia
144+
using PackageCompiler
145+
create_sysimage(["PythonCall"];
146+
sysimage_path = "myapp.so",
147+
script = "bake_embedded.jl",
148+
project = ".",
149+
)
150+
```
151+
152+
Pass `precompile_execution_file=` alongside `script=` to also bake your own
153+
hot code paths into the image.
154+
155+
At runtime, point juliacall at the resulting sysimage via
156+
[`PYTHON_JULIACALL_SYSIMAGE`](@ref julia-config), and set the
157+
[`lib`](@ref pythoncall-config) preference / `JULIA_PYTHONCALL_LIB` to the
158+
path of the host's libpython - the embedded path needs an explicit handle
159+
to libpython since the bridge does not load the interpreter itself.
160+
161+
#### Subprocess behaviour
162+
163+
If a julia process without a running Python interpreter loads a sysimage
164+
baked with `_is_embedded[] = true` (for example a `Base.compilecache`
165+
child), `PythonCall` loads as inactive - no error, no Python state.
137166

138167
## [Configuration](@id julia-config)
139168

src/C/C.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ if @load_preference("exe", "@CondaPkg") == "@CondaPkg"
1919
end
2020

2121
import ..PythonCall:
22-
python_executable_path, python_library_path, python_library_handle, python_version
22+
python_executable_path, python_library_path, python_library_handle, python_version,
23+
_is_embedded
2324

2425
include("consts.jl")
2526
include("pointers.jl")

src/C/context.jl

Lines changed: 84 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -105,46 +105,57 @@ on_main_thread
105105

106106
function init_context()
107107

108-
# Normally PythonCall is embedded when Python (via juliacall) defines the
109-
# global `Main.__PythonCall_libptr`, set by juliacall's bootstrap *after*
110-
# `jl_init_with_image`. If PythonCall is baked into a juliacall system
111-
# image, its `__init__` runs *during* `jl_init_with_image` — before that
112-
# global exists — yet we are still embedded (Python is the running host).
113-
# The opt-in `embedded` preference / `JULIA_PYTHONCALL_EMBEDDED` forces the
114-
# embedded path in that case; libpython is obtained by path since it is
115-
# already loaded in this process. Unset, behaviour is unchanged.
108+
# Embedded if juliacall set Main.__PythonCall_libptr or the sysimage baked
109+
# `_is_embedded[]` to `true`.
116110
has_libptr = hasproperty(Base.Main, :__PythonCall_libptr)
117-
CTX.is_embedded = has_libptr || Utils.getpref_embedded()
111+
CTX.is_embedded = has_libptr || _is_embedded[]
118112

119113
if CTX.is_embedded
114+
# Locate libpython.
120115
if has_libptr
121-
# In this case, getting a handle to libpython is easy
122116
CTX.lib_ptr = Base.Main.__PythonCall_libptr::Ptr{Cvoid}
123117
else
124-
# Baked into a sysimage: open libpython by path (the `lib`
125-
# preference / JULIA_PYTHONCALL_LIB). dlopen of an
126-
# already-loaded library just returns a handle to it.
127-
lib_path = something(Utils.getpref_lib(), Some(nothing))
128-
lib_path === nothing && error(
129-
"JULIA_PYTHONCALL_EMBEDDED is set but libpython is unknown; " *
130-
"set the `lib` preference or JULIA_PYTHONCALL_LIB to its path.",
131-
)
132-
lib_ptr = dlopen_e(lib_path, CTX.dlopen_flags)
133-
lib_ptr == C_NULL &&
134-
error("Python library $(repr(lib_path)) could not be opened.")
135-
CTX.lib_path = lib_path
136-
CTX.lib_ptr = lib_ptr
118+
lib_path = Utils.getpref_lib()
119+
if lib_path !== nothing
120+
lib_ptr = dlopen_e(lib_path, CTX.dlopen_flags)
121+
if lib_ptr != C_NULL
122+
CTX.lib_path = lib_path
123+
CTX.lib_ptr = lib_ptr
124+
end
125+
end
137126
end
138-
init_pointers()
139-
# Check Python is initialized
140-
Py_IsInitialized() == 0 && error("Python is not already initialized.")
141-
CTX.is_initialized = true
142-
CTX.which = :embedded
143-
exe_path = Utils.getpref_exe()
144-
if exe_path != ""
145-
CTX.exe_path = exe_path
146-
# this ensures PyCall uses the same Python interpreter
147-
get!(ENV, "PYTHON", exe_path)
127+
128+
embedded_ok = false
129+
if CTX.lib_ptr != C_NULL
130+
init_pointers()
131+
embedded_ok = Py_IsInitialized() != 0
132+
end
133+
134+
if embedded_ok
135+
CTX.is_initialized = true
136+
CTX.which = :embedded
137+
exe_pref = Utils.getpref_exe()
138+
if exe_pref != ""
139+
CTX.exe_path = exe_pref
140+
get!(ENV, "PYTHON", exe_pref)
141+
else
142+
exe_path = _embedded_program_path()
143+
if exe_path !== nothing
144+
CTX.exe_path = exe_path
145+
get!(ENV, "PYTHON", exe_path)
146+
end
147+
end
148+
elseif has_libptr
149+
error("PythonCall is in embedded mode but no Python interpreter is running in this process.")
150+
else
151+
# Either the `lib` preference is unset, or Python is not running
152+
# in this process (e.g. a julia.exe child of `Base.compilecache`
153+
# loaded a sysimage baked for the embedded path). Leave PythonCall
154+
# inactive instead of erroring.
155+
CTX.is_embedded = false
156+
CTX.lib_ptr = C_NULL
157+
CTX.lib_path = missing
158+
return
148159
end
149160
else
150161
# Find Python executable
@@ -347,6 +358,46 @@ function init_context()
347358
return
348359
end
349360

361+
# Return `sys.executable` as a String, or nothing. Requires init_pointers().
362+
function _embedded_program_path()
363+
import_mod = dlsym_e(CTX.lib_ptr, :PyImport_ImportModule)
364+
getattr = dlsym_e(CTX.lib_ptr, :PyObject_GetAttrString)
365+
asutf8 = dlsym_e(CTX.lib_ptr, :PyUnicode_AsUTF8AndSize)
366+
decref = dlsym_e(CTX.lib_ptr, :Py_DecRef)
367+
errclear = dlsym_e(CTX.lib_ptr, :PyErr_Clear)
368+
(import_mod == C_NULL || getattr == C_NULL || asutf8 == C_NULL ||
369+
decref == C_NULL || errclear == C_NULL) && return nothing
370+
371+
sys_mod = ccall(import_mod, Ptr{Cvoid}, (Ptr{Cchar},), "sys")
372+
if sys_mod == C_NULL
373+
ccall(errclear, Cvoid, ())
374+
return nothing
375+
end
376+
result = nothing
377+
try
378+
exec_obj = ccall(getattr, Ptr{Cvoid}, (Ptr{Cvoid}, Ptr{Cchar}), sys_mod, "executable")
379+
if exec_obj == C_NULL
380+
ccall(errclear, Cvoid, ())
381+
return nothing
382+
end
383+
try
384+
size_ref = Ref{Cssize_t}(0)
385+
cstr = ccall(asutf8, Ptr{Cchar}, (Ptr{Cvoid}, Ref{Cssize_t}), exec_obj, size_ref)
386+
if cstr == C_NULL
387+
ccall(errclear, Cvoid, ())
388+
return nothing
389+
end
390+
size_ref[] == 0 && return nothing
391+
result = unsafe_string(cstr, size_ref[])
392+
finally
393+
ccall(decref, Cvoid, (Ptr{Cvoid},), exec_obj)
394+
end
395+
finally
396+
ccall(decref, Cvoid, (Ptr{Cvoid},), sys_mod)
397+
end
398+
return result
399+
end
400+
350401
function Base.show(io::IO, ::MIME"text/plain", ctx::Context)
351402
show(io, typeof(io))
352403
print(io, ":")

src/Compat/Compat.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ include("serialization.jl")
2323
include("tables.jl")
2424

2525
function __init__()
26+
C.CTX.is_initialized || return
2627
init_gui()
2728
init_pyshow()
2829
end

src/Convert/Convert.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ include("numpy.jl")
3636
include("pandas.jl")
3737

3838
function __init__()
39+
C.CTX.is_initialized || return
3940
init_pyconvert()
4041
init_ctypes()
4142
init_numpy()

src/Core/Core.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,9 @@ include("juliacall.jl")
209209
include("pyconst_macro.jl")
210210

211211
function __init__()
212+
# Skip if C bailed out (e.g. a julia.exe child of Base.compilecache
213+
# loaded a sysimage baked for the embedded path).
214+
C.CTX.is_initialized || return
212215
init_consts()
213216
init_datetime()
214217
init_stdlib()

src/JlWrap/C.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,7 @@ function init_c()
364364
end
365365

366366
function __init__()
367+
C.CTX.is_initialized || return
367368
init_c()
368369
end
369370

src/JlWrap/JlWrap.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ include("set.jl")
5151
include("callback.jl")
5252

5353
function __init__()
54+
C.CTX.is_initialized || return
5455
init_base()
5556
init_raw()
5657
init_any()

src/PythonCall.jl

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@ module PythonCall
22

33
const ROOT_DIR = dirname(@__DIR__)
44

5+
"""
6+
PythonCall._is_embedded
7+
8+
Marks the running sysimage as embedded in a Python host. Set to `true` in a
9+
PackageCompiler `script=` to bake the embedded path into the sysimage.
10+
"""
11+
const _is_embedded = Ref(false)
12+
513
include("API/API.jl")
614
include("Utils/Utils.jl")
715
include("NumpyDates/NumpyDates.jl")

0 commit comments

Comments
 (0)