Skip to content

Commit afbcecb

Browse files
committed
pyembed: support loading in-memory extension modules on Windows
There exists a dark arts mechanism for loading Windows PE files from memory. A mechanism to facilitate this is implemented in the MemoryModule library at https://github.com/fancycode/MemoryModule. I have published a `memory-module-sys` Rust crate to expose bindings to this library. This enables Rust to load DLLs from memory. Previously in PyOxidizer, we taught the embedded Python resources data structure to define the contents of a shared library extension module to be imported from memory. This commit combines the two efforts and enables the `pyembed` crate to import Python extension modules which reside in memory. Getting this working took a fair amount of effort. There were a handful of attempts that did not pan out. Some of the failed attempts appeared to work. But they were subtly broken due to e.g. the `LazyLoader` importer assuming that the `sys.modules()` entry wouldn't be modified. In the end, the final implementation emulates CPython's extension module loading mechanism as closely as possible. This was the only way I was able to preserve compatibility with `LazyLoader` (just implementing `exec_module()` without `create_module()` appears impossible - at least without writing our own lazy module implementation). While this commit produces working results, it is far from feature complete. We still do not handle library dependencies properly. We will likely need to teach the embedded resources data structure about the existence of shared library resources and dependencies from extension modules so that shared libraries can be imported from memory when an extension module is imported. Because this commit utilizes some CPython APIs outside the paved road of CPython APIs, we had to contribute support for these symbols to python3-sys (dgrunwald/rust-cpython#210). This is why we now depend on a specific Git commit of python3-sys and the cpython crates. This means we can't release pyembed to crates.io until a new version of these crates is published... We're likely a ways off from a new release, as I don't want to solidify the new embedded resources format until it has more features. So hopefully this isn't a problem...
1 parent 0771233 commit afbcecb

8 files changed

Lines changed: 372 additions & 23 deletions

File tree

Cargo.lock

Lines changed: 42 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/history.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,14 @@ Bug Fixes
4545
New Features
4646
^^^^^^^^^^^^
4747

48+
* Windows binaries can now import extension modules defined as shared libraries
49+
(e.g. `.pyd` files) from memory. PyOxidizer will detect `.pyd` files during
50+
packaging and embed them into the binary as resources. When the module
51+
is imported, the extension module/shared library is loaded from memory
52+
and initialized. This feature enables PyOxidizer to package pre-built
53+
extension modules (e.g. from Windows binary wheels published on PyPI)
54+
while still maintaining the property of a (mostly) self-contained
55+
executable.
4856
* Multiple bytecode optimization levels can now be embedded in binaries.
4957
Previously, it was only possible to embed bytecode for a given module
5058
at a single optimization level.

docs/packaging.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -387,7 +387,8 @@ Adding Extension Modules At Run-Time
387387
====================================
388388

389389
Normally, Python extension modules are compiled into the binary as part
390-
of the embedded Python interpreter.
390+
of the embedded Python interpreter or embedded Python resources data
391+
structure.
391392

392393
``PyOxidizer`` also supports providing additional extension modules at run-time.
393394
This can be useful for larger Rust applications providing extension modules

docs/packaging_pitfalls.rst

Lines changed: 59 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -224,13 +224,65 @@ C is used to implement extension modules.)
224224
The way this typically works is some build system (often ``distutils`` via a
225225
``setup.py`` script) produces a shared library file containing the extension.
226226
On Linux and macOS, the file extension is typically ``.so``. On Windows, it
227-
is ``.pyd``. Python's importing mechanism looks for these files in addition
228-
to normal ``.py`` and ``.pyc`` files when an ``import`` is requested.
229-
230-
PyOxidizer currently has :ref:`limited support <status_extension_modules>` for
231-
extension modules. Under some circumstances, building extension modules as
232-
part of regular package build machinery *just works* and the resulting
233-
extension module can be embedded in the produced binary.
227+
is ``.pyd``. When an ``import`` is requested, Python's importing mechanism
228+
looks for these files in addition to normal ``.py`` and ``.pyc`` files. If
229+
an extension module is found, Python will ``dlopen()`` the file and load the
230+
shared library into the process. It will then call into an initialization
231+
function exported by that shared library to obtain a Python module instance.
232+
233+
Python packaging has defined various conventions for distributing pre-compiled
234+
extension modules in *wheels*. If you see an e.g.
235+
``<package>-<version>-cp38-cp38-win_amd64.whl``,
236+
``<package>-<version>-cp38-cp38-manylinux2014_x86_64.whl``, or
237+
``<package>-<version>-cp38-cp38-macosx_10_9_x86_64.whl`` file, you are
238+
installing a Python package with a pre-compiled extension module. Inside the
239+
*wheel* is a shared library providing the extension module. And that shared
240+
library is configured to work with a Python distribution (typically ``CPython``)
241+
built in a specific way. e.g. with a ``libpythonXY`` shared library exporting
242+
Python symbols.
243+
244+
PyOxidizer currently has :ref:`some support <status_extension_modules>` for
245+
extension modules. The way this works depends on the platform and Python
246+
distribution.
247+
248+
Dynamically Linked Python Distributions on Windows
249+
--------------------------------------------------
250+
251+
When using a dynamically linked Python distribution on Windows (e.g.
252+
via the ``flavor="standalone_dynamic"`` argument to
253+
:ref:`config_default_python_distribution`, PyOxidizer:
254+
255+
* Supports importing shared library extension modules (e.g. ``.pyd`` files)
256+
from memory.
257+
* Automatically detects and uses ``.pyd`` files from pre-built binary
258+
packages installed as part of packaging.
259+
* Automatically detects and uses ``.pyd`` files produced during package
260+
building.
261+
262+
However, there are caveats to this support!
263+
264+
PyOxidizer doesn't currently support resolving additional library
265+
dependencies from ``.pyd`` extension modules / shared libraries when
266+
importing from memory. If an extension module depends on another shared
267+
library (almost certainly a ``.dll``) outside the normal set of libraries
268+
(namely the C Runtime and other common Windows system DLLs), you will
269+
need to manually package this library next to the application ``.exe``.
270+
Failure to do this could result in a failure at ``import`` time.
271+
272+
PyOxidizer does support loading shared library extension modules from
273+
``.pyd`` files on the filesystem like a typical Python program. So
274+
if you cannot make in-memory extension module importing work, you
275+
can fall back to packaging a ``.pyd`` file in a directory registered
276+
on ``sys.path``, as set through the :ref:`config_python_interpreter_config`
277+
Starlark primitive.
278+
279+
Extension Modules Everywhere Else
280+
---------------------------------
281+
282+
If PyOxidizer is not able to easily reuse a Python extension module
283+
built or distributed in a traditional manner, it will attempt to
284+
compile the extension module from source in a way that is compatible
285+
with the PyOxidizer distribution and application configuration.
234286

235287
The way PyOxidizer achieves this is a bit crude, but effective.
236288

docs/status.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,8 @@ binaries.
4444
Native Extension Modules
4545
------------------------
4646

47-
Building and using compiled extension modules (e.g. C extensions) is
48-
partially supported.
47+
Using compiled extension modules (e.g. C extensions) is partially
48+
supported.
4949

5050
Building C extensions to be embedded in the produced binary works
5151
for Windows, Linux, and macOS.

pyembed/Cargo.toml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,16 @@ links = "pythonXY"
1414
[dependencies]
1515
# Update documentation in lib.rs when new dependencies are added.
1616
byteorder = "1"
17-
cpython = "0.4"
17+
cpython = { git = "https://github.com/dgrunwald/rust-cpython", rev = "7fb4dd2e59ccf0fbf6bbe874b602e52b8aa4a8c1" }
1818
jemalloc-sys = { version = "0.3", optional = true }
1919
lazy_static = "1.4"
2020
libc = "0.2"
21-
python3-sys = "0.4"
21+
python3-sys = { git = "https://github.com/dgrunwald/rust-cpython", rev = "7fb4dd2e59ccf0fbf6bbe874b602e52b8aa4a8c1" }
2222
uuid = { version = "0.8", features = ["v4"] }
2323

24+
[target.'cfg(windows)'.dependencies]
25+
memory-module-sys = "0.1"
26+
2427
[dev-dependencies]
2528
pyoxidizer = { version = "0.7.0-pre", path = "../pyoxidizer" }
2629

0 commit comments

Comments
 (0)