Skip to content

Commit 2075f2d

Browse files
authored
Merge pull request #61 from benoitc/feature/simplify-execution-model
Simplify execution model to worker + owngil modes (v3.0.0)
2 parents c0c00a8 + 45a7fcc commit 2075f2d

51 files changed

Lines changed: 3544 additions & 4055 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 80 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,91 @@
11
# Changelog
22

3-
## 2.4.0 (Unreleased)
3+
## 3.0.0 (Unreleased)
44

5-
### Added
5+
### Breaking Changes
6+
7+
- **Simplified execution model** - Only two public execution modes: `worker` and `owngil`
8+
- `worker`: Dedicated pthread per context with stable thread affinity (default)
9+
- `owngil`: Dedicated pthread + subinterpreter with own GIL (Python 3.14+)
10+
- Removed `multi_executor` and `free_threaded` from public API
11+
- Internal capability detection still tracks Python features
12+
13+
- **Removed `py:num_executors/0`** - Contexts now use per-context worker threads
14+
instead of a shared executor pool. This function is no longer needed.
15+
16+
- **`py:execution_mode/0` returns `worker | owngil`** - Based on the `context_mode`
17+
application configuration. Previously returned internal capabilities like
18+
`free_threaded`, `subinterp`, or `multi_executor`.
619

7-
- **Context thread affinity** - Contexts in MULTI_EXECUTOR mode are now assigned a
8-
fixed executor thread at creation. All operations (call, eval, exec) from the same
9-
context run on the same OS thread, preventing thread state corruption in libraries
10-
like numpy and PyTorch that have thread-local state.
20+
- **Removed `py:async_stream/3,4`** - Streaming async generators was never
21+
implemented behind the API and always returned `{error, stream_not_implemented}`.
22+
Use `py:stream_start/3,4` for sync generators; async-generator support may
23+
return in a later release.
24+
25+
- **Removed `num_executors` / `num_async_workers` configuration** - Both keys
26+
were no-ops after the v3.0 worker rework. Configure context count via
27+
`num_contexts` and the rate-limit ceiling via `max_concurrent`.
28+
29+
- **Strict context-mode validation at the NIF boundary** - `py_nif:context_create/1`
30+
now returns `{error, {invalid_mode, Atom}}` for anything other than `worker | owngil`.
31+
Previously, callers that bypassed `py_context` (notably `py_reactor_context`)
32+
silently mapped any unknown atom — including legacy `auto` and `subinterp`
33+
to worker mode. Code that relied on that loophole must pass `worker` (or
34+
`owngil`) explicitly.
35+
36+
### Fixed
37+
38+
- **`py:async_call/3,4` + `py:async_await/1,2` round-trip** - Previously the
39+
await receive matched `{py_response, _, _}` while the event loop sent
40+
`{async_result, _, _}`, causing every async call to silently time out.
41+
Async calls now go directly through `py_event_loop:create_task` and
42+
`py_event_loop:await`.
43+
44+
- **`py:async_gather/1,2` actually executes** - Reimplemented as concurrent
45+
`async_call` submission with sequential `async_await`. Returns
46+
`{ok, [Result1, ...]}` on success or `{error, {gather_failed, [{Idx, Reason}, ...]}}`
47+
if any call fails. The previous implementation returned `gather_not_implemented`.
1148

1249
### Changed
1350

14-
- **`py:execution_mode/0` now returns actual mode** - Returns `worker` (default),
15-
`owngil`, `free_threaded`, or `multi_executor` based on actual configuration
16-
instead of Python capability. Previously returned `subinterp` even when using
17-
worker mode.
51+
- **Per-context worker threads** - Each context now gets its own dedicated pthread
52+
that handles all Python operations. This provides stable thread affinity for
53+
numpy/torch/tensorflow compatibility without needing a shared executor pool.
54+
55+
- **Async NIF dispatch** - Context operations use async NIFs with message passing
56+
instead of blocking dirty schedulers. This improves concurrency under load.
57+
58+
- **Request queue per context** - Replaced single-slot request pattern with proper
59+
request queues that support multiple concurrent callers.
60+
61+
- **No global asyncio policy install on Python 3.14+.** `asyncio.set_event_loop_policy`
62+
was deprecated in 3.14 and is removed in 3.16. The Erlang integration's run path
63+
already uses `loop_factory=` (`erlang.run/1`, `asyncio.Runner`) so the global
64+
policy was only a convenience for bare `asyncio.run()` inside `py:exec`. We now
65+
skip the install on 3.14+ to avoid the deprecation warning. On 3.14+ use
66+
`erlang.run(main)` or `asyncio.Runner(loop_factory=erlang.new_event_loop)`
67+
explicitly. Behavior on Python 3.9–3.13 is unchanged. `erlang.install()` raises
68+
`RuntimeError` on 3.14+ (still emits a `DeprecationWarning` and works on 3.12–3.13).
69+
70+
### Removed
1871

19-
- **Removed obsolete subinterp test references** - Test suites updated to reflect
20-
the removal of subinterpreter mode. Tests now use `worker` or `owngil` modes.
72+
- Multi-executor pool (`g_executors[]`, `multi_executor_start/stop`)
73+
- `context_dispatch_call/eval/exec` functions (dead code)
74+
- References to `PY_MODE_MULTI_EXECUTOR` in context operations
75+
- `py_async_pool` legacy gen_server (unused after async API rewire)
76+
- **Explicit `py:subinterp_*` handle API removed.** `py:subinterp_create/0`,
77+
`subinterp_destroy/1`, `subinterp_call/4,5`, `subinterp_eval/2,3`,
78+
`subinterp_exec/2`, `subinterp_cast/4`, `subinterp_async_call/4`,
79+
`subinterp_await/1,2`, and `subinterp_pool_*` are all gone. Use
80+
`py_context:new(#{mode => owngil})` instead — it gives the same
81+
parallelism with OTP supervision and automatic cleanup.
82+
`py:subinterp_supported/0` (capability probe) and `py:parallel/1`
83+
(which routes through the context API) stay.
84+
- Internal `py_execution_mode_t` collapsed from 3 values to 2 (`free_threaded`
85+
/ `gil`); `py_nif:execution_mode/0` returns `free_threaded | gil` instead
86+
of the old `free_threaded | subinterp | multi_executor`.
87+
- `examples/reactor_owngil_example.erl` deleted (called nonexistent
88+
`py:subinterp_reactor_*` functions; pre-existing breakage).
2189

2290
## 2.3.1 (2026-04-01)
2391

README.md

Lines changed: 16 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,9 @@ evaluate expressions, and stream from generators - all without blocking Erlang
1616
schedulers.
1717

1818
**Parallelism options:**
19-
- **Worker mode** (default, recommended) - Works with any Python version. With free-threaded Python (3.13t+), provides true parallelism automatically
20-
- **SHARED_GIL sub-interpreters** (Python 3.12+) - Isolated namespaces, shared GIL (isolation improves in 3.14+)
21-
- **OWN_GIL sub-interpreters** (Python 3.14+) - Each interpreter has its own GIL, true parallelism
22-
- **BEAM processes** - Fan out work across lightweight Erlang processes
19+
- **Worker mode** (default, recommended) - Works with any Python version. With free-threaded Python (3.13t+), provides true parallelism automatically.
20+
- **OWN_GIL sub-interpreters** (Python 3.14+) - Each interpreter has its own GIL, true parallelism.
21+
- **BEAM processes** - Fan out work across lightweight Erlang processes.
2322

2423
Key features:
2524
- **Process-bound environments** - Each Erlang process gets isolated Python state, enabling OTP-supervised Python actors
@@ -302,14 +301,11 @@ Ref = py:async_call(aiohttp, get, [<<"https://api.example.com/data">>]),
302301
{ok, Response} = py:async_await(Ref).
303302

304303
%% Gather multiple async calls concurrently
305-
{ok, Results} = py:async_gather([
304+
{ok, [Users, Posts, Comments]} = py:async_gather([
306305
{aiohttp, get, [<<"https://api.example.com/users">>]},
307306
{aiohttp, get, [<<"https://api.example.com/posts">>]},
308307
{aiohttp, get, [<<"https://api.example.com/comments">>]}
309308
]).
310-
311-
%% Stream from async generators
312-
{ok, Chunks} = py:async_stream(mymodule, async_generator, [args]).
313309
```
314310

315311
## Parallel Execution with Sub-interpreters
@@ -328,7 +324,7 @@ True parallelism without GIL contention using Python 3.14+ OWN_GIL sub-interpret
328324
%% Each call runs in its own interpreter with its own GIL
329325
```
330326

331-
For Python 3.12/3.13, use SHARED_GIL sub-interpreters (`mode => subinterp`) for namespace isolation, but note that parallelism is limited by the shared GIL.
327+
For Python 3.12/3.13 the public modes are `worker` (default) and `owngil` (Python 3.14+ only). Earlier versions run all contexts under the shared main interpreter via dedicated worker threads — namespace isolation between contexts is local-dict based, not via subinterpreters.
332328

333329
## Parallel Processing with BEAM Processes
334330

@@ -590,9 +586,9 @@ ok = py:clear_traces().
590586
%% sys.config
591587
[
592588
{erlang_python, [
593-
{num_workers, 4}, %% Python worker pool size
594-
{max_concurrent, 17}, %% Max concurrent operations (default: schedulers * 2 + 1)
595-
{num_executors, 4} %% Executor threads (multi-executor mode)
589+
{num_contexts, 8}, %% Number of contexts (default: schedulers)
590+
{context_mode, worker}, %% worker | owngil
591+
{max_concurrent, 17} %% Max concurrent operations (default: schedulers * 2 + 1)
596592
]}
597593
].
598594
```
@@ -605,40 +601,34 @@ When creating Python contexts, you can choose the execution mode:
605601

606602
| Mode | Python Version | Description |
607603
|------|----------------|-------------|
608-
| `worker` | Any | Main interpreter, shared namespace (default, recommended) |
609-
| `subinterp` | 3.12+ | SHARED_GIL sub-interpreter, isolated namespace |
610-
| `owngil` | 3.14+ | OWN_GIL sub-interpreter, true parallelism |
604+
| `worker` | Any | Dedicated pthread per context, main interpreter namespace (default) |
605+
| `owngil` | 3.14+ | Dedicated pthread + subinterpreter with its own GIL, true parallelism |
611606

612607
```erlang
613608
%% Default: worker mode (recommended)
614609
%% With free-threaded Python (3.13t+), provides true parallelism automatically
615610
{ok, Ctx} = py_context:new(#{}).
616611

617-
%% Explicit subinterpreter with shared GIL (Python 3.12+)
618-
%% Provides namespace isolation but no parallelism
619-
{ok, Ctx} = py_context:new(#{mode => subinterp}).
620-
621612
%% OWN_GIL mode for true parallelism (Python 3.14+ required)
622613
%% Each context runs in its own pthread with independent GIL
623614
{ok, Ctx} = py_context:new(#{mode => owngil}).
624615
```
625616

626-
**Worker mode is recommended** because it works with any Python version and automatically benefits from free-threaded Python (3.13t+) when available.
617+
**Worker mode is recommended** because it works with any Python version and automatically benefits from free-threaded Python (3.13t+) when available. Each context owns a dedicated pthread, providing stable thread affinity for libraries with thread-local state (numpy, torch, tensorflow).
627618

628-
**Why OWN_GIL requires Python 3.14+**: Some C extensions (e.g., `_decimal`, `numpy`) have global state bugs in sub-interpreters on Python 3.12/3.13. These are fixed in Python 3.14. SHARED_GIL mode works on 3.12+ but with caveats for C extensions with global state.
619+
**Why OWN_GIL requires Python 3.14+**: Some C extensions (e.g., `_decimal`, `numpy`) have global state bugs in sub-interpreters on Python 3.12/3.13. These are fixed in Python 3.14.
629620

630621
### Runtime Detection
631622

632-
Check the current execution mode:
623+
Check the current execution mode (mirrors the `context_mode` application env):
633624
```erlang
634-
py:execution_mode(). %% => free_threaded | subinterp | multi_executor
625+
py:execution_mode(). %% => worker | owngil
635626
```
636627

637628
| Mode | Python Version | Parallelism |
638629
|------|----------------|-------------|
639-
| Free-threaded | 3.13+ (nogil) | True parallel, no GIL |
640-
| Sub-interpreter | 3.12+ | Per-interpreter GIL |
641-
| Multi-executor | Any | GIL contention |
630+
| `worker` (default) | Any | One pthread per context; true parallelism on free-threaded 3.13t+ |
631+
| `owngil` | 3.14+ | Per-interpreter GIL, true parallelism across contexts |
642632

643633
## Error Handling
644634

c_src/py_convert.c

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -95,13 +95,19 @@ static void shared_dict_capsule_destructor(PyObject *capsule) {
9595
* @return true if obj is a numpy ndarray, false otherwise
9696
*/
9797
static inline bool is_numpy_ndarray(PyObject *obj) {
98-
/* Use cached type for fast isinstance check when available.
99-
* The cache is only valid in the main interpreter - subinterpreters
100-
* have their own object space, so we fall back to attribute detection. */
101-
if (g_numpy_ndarray_type != NULL && g_execution_mode != PY_MODE_SUBINTERP) {
98+
/* The cache is populated in the main interpreter. On builds where
99+
* subinterpreters can be created (and the runtime isn't free-threaded,
100+
* which short-circuits subinterp use) a context may be running inside
101+
* a subinterpreter where the cached type is invalid -- fall back to
102+
* duck typing in that case. */
103+
#if defined(HAVE_SUBINTERPRETERS) && !defined(HAVE_FREE_THREADED)
104+
/* Build supports subinterpreters and isn't free-threaded:
105+
* skip the cached fast path. */
106+
#else
107+
if (g_numpy_ndarray_type != NULL) {
102108
return PyObject_IsInstance(obj, g_numpy_ndarray_type) == 1;
103109
}
104-
110+
#endif
105111
/* Fallback: duck typing via attribute detection.
106112
* Check for both 'tolist' method and 'ndim' attribute. */
107113
return PyObject_HasAttrString(obj, "tolist") &&

0 commit comments

Comments
 (0)