Skip to content

Commit d213d9c

Browse files
committed
Doc
1 parent 28d95d2 commit d213d9c

5 files changed

Lines changed: 52 additions & 32 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,12 +73,12 @@ Common mappings (`cv2` -> `opencv-python`, `PIL` -> `Pillow`, etc.) are built in
7373

7474
## Features
7575

76-
- **Automatic dependency detection** -- AST-based, recursive. Untraced helpers, class methods, module-level constants are all captured.
76+
- **Automatic dependency detection** -- AST-based, recursive. Untraced helpers, class methods, module-level constants, class-level attributes, and class decorators are all captured.
7777
- **Third-party package auto-install** -- Workers install missing packages via pip before execution.
7878
- **Async support** -- `async def` functions execute transparently. `await result`, `.arun()`, `.amap()`, and `asyncio.gather` all work out of the box.
7979
- **Notification-based result delivery** -- Push notifications fan out to many waiters via a single backend listener. No polling.
8080
- **Heartbeat & stall detection** -- Workers send periodic heartbeats. Clients raise `TaskStalled` when a worker stops responding.
81-
- **Class methods** -- `self.method()` and `cls.method()` dependencies are detected. Entire class hierarchies (including `super()`) are reconstructed.
81+
- **Class methods** -- `self.method()` and `cls.method()` dependencies are detected. Entire class hierarchies (including `super()`), class-level attributes, decorators (`@dataclass`, etc.), and metaclass keywords are reconstructed.
8282
- **Retry and timeout** -- `@trace(timeout=30, retries=3)` with exponential backoff.
8383
- **Batch submission** -- `func.map([(a1, b1), (a2, b2)])` submits multiple tasks at once.
8484
- **Pluggable backends** -- Redis (`redis://`) for multi-machine, shared memory (`shm://`) for same-machine IPC.

docs/CONTEXT.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
pyfuse is a Python library for distributed function execution via automatic source code serialization. A `@trace` decorator captures a function's source, imports, and full dependency tree via AST analysis. Workers reconstruct and execute functions from scratch with zero prior knowledge of the code, installing missing packages automatically.
66

7-
**Version**: 0.3.0 | **License**: AGPL-3.0 | **Python**: 3.13+ | **Zero runtime dependencies**
7+
**Version**: 0.4.0 | **License**: AGPL-3.0 | **Python**: 3.13+ | **Zero runtime dependencies**
88

99
## Project structure
1010

@@ -16,14 +16,14 @@ pyfuse/
1616
├── py.typed # PEP 561 typed package marker
1717
├── core/
1818
│ ├── task.py # Task dataclass: serializable envelope (graph + args + options)
19-
│ ├── models.py # FunctionNode and ImportInfo dataclasses, content hashing
20-
│ ├── version.py # _VERSION = "0.3.0"
19+
│ ├── models.py # FunctionNode and ImportInfo dataclasses, content hashing (incl. class_keywords, class_attrs, class_decorators)
20+
│ ├── version.py # _VERSION = "0.4.0"
2121
│ └── errors.py # Error, WorkerError, RemoteError, DependencyError, TaskStalled
2222
├── graph/
2323
│ ├── decorator.py # @trace: marks functions, adds .run()/.map()/.arun()/.amap()
2424
│ ├── graph.py # Graph class: registration, auto-discovery, serialization
2525
│ ├── store.py # Content-addressable store: serialize/reconstruct/merge
26-
│ ├── analyzer.py # AST-based source capture, import extraction, dependency detection
26+
│ ├── analyzer.py # AST-based source capture, import extraction, dependency detection, class attrs/decorators
2727
│ └── tracing.py # Runtime call-stack tracing via contextvars (TracingMixin)
2828
└── worker/
2929
├── worker.py # Worker: reconstruct, cache, execute with retry/timeout
@@ -48,7 +48,7 @@ The `Store` is a content-addressable JSON format where each function is identifi
4848

4949
### 2. Core layer (`core/`)
5050

51-
Data models and error types. `FunctionNode` represents a function in the graph. `ImportInfo` represents a single import binding. `Task` is a frozen dataclass that bundles a serialized graph with function name, arguments, and execution options (timeout, retries).
51+
Data models and error types. `FunctionNode` represents a function in the graph, including class metadata (`class_keywords`, `class_attrs`, `class_decorators`). `ImportInfo` represents a single import binding. `Task` is a frozen dataclass that bundles a serialized graph with function name, arguments, and execution options (timeout, retries).
5252

5353
### 3. Worker layer (`worker/`)
5454

@@ -86,19 +86,20 @@ Worker.run(task)
8686
- **Auto-discovery**: When a traced function calls an untraced user-defined function, pyfuse automatically finds and registers it. This is recursive. Class constructors (`MyClass()`), `@staticmethod`, `@classmethod`, and entire class hierarchies (via `super()`) are discovered too.
8787
- **Cross-module inlining**: Imports like `from utils import helper` where `helper` is a user function get converted from import statements to inline dependency edges, making reconstructed code self-contained.
8888
- **Module-level variables**: Constants and assignments (`MAX_RETRIES = 5`, `CONFIG = {...}`) referenced by traced functions are captured and emitted in reconstructed source.
89-
- **Closure handling**: Captured variables are serialized via `repr()` and hoisted as keyword-only parameters with defaults. Traced function references become dependency edges.
89+
- **Class-level attributes**: Class body statements (assignments, annotated assignments, docstrings) are extracted from AST and emitted in reconstructed class blocks. Class decorators (`@dataclass`, etc.) and metaclass keywords (`metaclass=ABCMeta`) are captured and emitted.
90+
- **Closure handling**: Multi-tier capture: `repr()` validation, then lambda source extraction, then auto-discovery for non-traced user functions, then constructor expressions for common stdlib types (`defaultdict`, `Counter`, `deque`), then pickle fallback for picklable objects. Traced function references become dependency edges.
9091
- **Decorator stripping**: `@trace` lines are removed from captured source so reconstructed code doesn't depend on pyfuse.
9192
- **Backend auto-detection**: `connect()` picks Redis or shared memory based on URL scheme. Falls back to `PYFUSE_BACKEND` env var.
9293
- **Worker caching**: Keyed by SHA-256 of all reachable content hashes (sorted + joined). Same code from different clients = cache hit.
9394
- **Async transparency**: Workers detect `async def` functions and run them via `asyncio.run()`. Results are awaitable via `asyncio.Future` fan-out.
9495
- **Notification fan-out**: `ResultWaiter` singleton per backend runs one listener thread and one heartbeat thread, serving all pending `Result` objects. No per-task polling.
9596
- **Heartbeat monitoring**: Workers send heartbeats every 1s. Client-side stall detection tracks when heartbeat *values* last changed using local monotonic clock (no cross-machine timestamp comparison).
9697

97-
## Serialization format (v0.3.0)
98+
## Serialization format (v0.4.0)
9899

99100
```json
100101
{
101-
"version": "0.3.0",
102+
"version": "0.4.0",
102103
"objects": {
103104
"<content_hash>": {
104105
"name": "func_name",
@@ -109,7 +110,10 @@ Worker.run(task)
109110
"closure_vars": {},
110111
"closure_func_refs": {},
111112
"module_vars": {},
112-
"class_bases": []
113+
"class_bases": [],
114+
"class_keywords": {},
115+
"class_attrs": [],
116+
"class_decorators": []
113117
}
114118
},
115119
"deps": {"<hash>": ["<dep_hash>", ...]},
@@ -155,7 +159,7 @@ pytest # run all tests
155159
pytest tests/test_api.py # specific module
156160
```
157161

158-
15 test modules covering: API surface, AST analysis, async features (aresult, await, arun, amap, gather, heartbeat, stall detection, notification-based result delivery), auto-discovery, dependency management, graph operations, integration scenarios, remote execution, runtime tracing, shared memory backend, store operations, stress tests (47 functions across 7 files), task serialization, temp venv management, and worker caching/execution.
162+
15 test modules covering: API surface, AST analysis, async features (aresult, await, arun, amap, gather, heartbeat, stall detection, notification-based result delivery), auto-discovery (including metaclass keywords, class attributes, class decorators, `__init_subclass__`), dependency management, graph operations, integration scenarios, remote execution, runtime tracing (including closure capture of non-traced functions, lambdas, constructor expressions, pickle fallback), shared memory backend, store operations, stress tests (47 functions across 7 files), task serialization, temp venv management, and worker caching/execution.
159163

160164
## Development
161165

@@ -175,3 +179,6 @@ pytest # test suite
175179
- Backend implementations must satisfy the `Backend` ABC in `backends/base.py`. New methods (`notify_result`, `subscribe_results`, `get_heartbeats`) are non-abstract with safe defaults -- custom backends don't break.
176180
- `ResultWaiter` in `result.py` is a per-backend singleton with two daemon threads (listener + heartbeat). It uses `loop.call_soon_threadsafe()` for async fan-out and `threading.Event` for sync fan-out.
177181
- `install_package_as()` is a no-op at runtime; the AST analyzer in `decorator.py`/`analyzer.py` detects the `with` block pattern and tags `ImportInfo` objects with the package name.
182+
- `_capture_closure()` in `graph.py` uses a multi-tier strategy: repr validation → traced functions → lambdas (source extraction) → non-traced user functions (auto-registration) → constructor expressions (defaultdict/Counter/deque) → pickle fallback → warning. Returns function objects for auto-registration.
183+
- `_set_class_metadata()` in `graph.py` captures class-level attributes and decorators from the class source AST. Called from both `_auto_register_class` and `_discover_self_call_deps` to handle both constructor-discovered and directly-traced method classes.
184+
- `_resolve_class_bases()` now also extracts class definition keywords (e.g., `metaclass=ABCMeta`) and adds necessary imports for keyword values.

docs/QUICK_START.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ The output is a JSON string containing each function's source, imports, and depe
308308

309309
```json
310310
{
311-
"version": "0.3.0",
311+
"version": "0.4.0",
312312
"objects": {
313313
"a1b2...": {"name": "add", "source": "...", "imports": [], ...},
314314
"c3d4...": {"name": "hypotenuse", "source": "...", "imports": [...], ...}
@@ -383,7 +383,7 @@ store_a = json.loads(serialize(func_a))
383383
store_b = json.loads(serialize(func_b))
384384

385385
merged = json.dumps({
386-
"version": "0.3.0",
386+
"version": "0.4.0",
387387
"objects": {**store_a["objects"], **store_b["objects"]},
388388
"deps": {**store_a["deps"], **store_b["deps"]},
389389
"refs": {**store_a["refs"], **store_b["refs"]},
@@ -402,9 +402,14 @@ merged = json.dumps({
402402
| `obj.method()` with type annotation | Type annotation resolution |
403403
| `obj.method()` without annotation | Runtime tracing on first call |
404404
| Module-level constants (`MAX = 5`) | Captured and emitted in reconstructed source |
405+
| Class-level attributes (`count = 0`) | Extracted from class source AST |
406+
| Class decorators (`@dataclass`) | Extracted from class source AST |
407+
| Metaclass keywords (`metaclass=ABCMeta`) | Extracted from class definition keywords |
405408
| Standard library imports (`json`, `csv`) | Kept as import statements |
406409
| Third-party imports (`numpy`, `yaml`) | Kept as imports, auto-installed on worker |
407-
| Closure variables | Captured via `repr()` |
410+
| Closure variables | Captured via `repr()`, constructor expressions, or pickle |
411+
| Non-traced functions in closures | Auto-discovered and registered as dependencies |
412+
| Lambda functions in closures | Source extracted via AST |
408413
| `__slots__` objects as arguments | Serialized via MRO slot inspection |
409414
| Generators and async functions | Supported with proxy wrappers |
410415

@@ -413,7 +418,6 @@ merged = json.dumps({
413418
- Functions without source code (builtins, `exec`'d, REPL-defined)
414419
- Dynamic imports (`__import__()`, `importlib.import_module()` in function bodies)
415420
- Relative star imports (`from . import *`)
416-
- Metaclasses and `__init_subclass__` hooks
417421
- Circular dependencies (raises `CycleError`)
418422

419423
## Error handling

docs/TECHNICAL_OVERVIEW.md

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ pyfuse/
2525
__main__.py CLI: python -m pyfuse worker/run/info/serialize/reconstruct
2626
_venv.py Temporary virtual environment management
2727
core/
28-
models.py FunctionNode, ImportInfo dataclasses (incl. content hashing)
28+
models.py FunctionNode, ImportInfo dataclasses (incl. content hashing, class metadata)
2929
task.py Task: serializable envelope bundling graph + arguments
3030
version.py Version constant
3131
errors.py Error, WorkerError, RemoteError, DependencyError, TaskStalled
3232
graph/
3333
decorator.py @trace: marks functions for remote execution
34-
analyzer.py AST-based source and dependency analysis
34+
analyzer.py AST-based source and dependency analysis, class attrs/decorators
3535
graph.py Dependency graph: registration, auto-discovery, runtime tracing
3636
store.py Content-addressable store: serialization, reconstruction
3737
tracing.py Runtime call-stack tracing via contextvars
@@ -153,17 +153,23 @@ Cross-module imports (e.g., `from utils import helper`) are converted from impor
153153

154154
Class constructors (`MyClass()`) are auto-discovered: pyfuse registers all user-defined methods of the class. `@staticmethod` and `@classmethod` descriptors are unwrapped and registered correctly. When a method uses `super()`, base classes and their methods are discovered recursively, and `class Foo(Base):` headers are emitted in reconstructed source.
155155

156+
Class-level attributes (assignments, annotated assignments, docstrings) are extracted from the class source AST and emitted in reconstructed class blocks. Class decorators (e.g., `@dataclass`) are captured and emitted above the class header. Metaclass keywords (e.g., `metaclass=ABCMeta`) and other class keywords are extracted from the class definition and included in the reconstructed header.
157+
156158
Module-level constants and variables referenced by traced functions (e.g., `MAX_RETRIES = 5`) are captured and emitted in reconstructed source.
157159

158160
**Not auto-discovered:** standard library functions, third-party packages (kept as imports).
159161

160162
### 5. Closure capture
161163

162-
If the function captures variables from an enclosing scope:
163-
- Values are serialized via `repr()` and validated with `ast.parse()`.
164-
- Valid reprs become keyword-only parameters with defaults in reconstructed code.
165-
- Traced function references are recorded as dependency edges.
166-
- Invalid reprs trigger a warning and are skipped.
164+
If the function captures variables from an enclosing scope, pyfuse uses a multi-tier capture strategy:
165+
166+
1. **`repr()` validation** -- Values whose `repr()` is valid Python (passes `ast.parse()`) are stored directly. They become keyword-only parameters with defaults in reconstructed code.
167+
2. **Traced functions** -- References to `@trace`-decorated functions are recorded as dependency edges.
168+
3. **Lambda functions** -- Source is extracted via `inspect.getsource()` + AST walking, stored as a closure variable expression.
169+
4. **Non-traced user functions** -- Automatically discovered and registered as dependencies (same as traced functions).
170+
5. **Constructor expressions** -- Common stdlib types (`defaultdict`, `Counter`, `deque`) whose `repr()` isn't valid Python are captured via self-contained constructor expressions (e.g., `__import__('collections').defaultdict(int, {'a': 1})`).
171+
6. **Pickle fallback** -- Picklable objects are serialized via `pickle.dumps()` + base64 encoding into a self-contained expression.
172+
7. **Warning** -- Objects that can't be captured by any method trigger a warning with the variable name and type.
167173

168174
### 6. Runtime tracing
169175

@@ -204,7 +210,7 @@ When arguments contain class instances, a custom JSON encoder serializes them vi
204210
```json
205211
{
206212
"id": "a1b2c3d4e5f6",
207-
"graph": "{\"version\": \"0.3.0\", \"objects\": {...}, ...}",
213+
"graph": "{\"version\": \"0.4.0\", \"objects\": {...}, ...}",
208214
"function": "mymodule.hypotenuse",
209215
"args": [3.0, 4.0],
210216
"kwargs": {},
@@ -274,7 +280,7 @@ Functions are stored in a content-addressable JSON format. Each function is iden
274280

275281
```json
276282
{
277-
"version": "0.3.0",
283+
"version": "0.4.0",
278284
"objects": {
279285
"a1b2...": {
280286
"name": "add",
@@ -302,7 +308,7 @@ Functions are stored in a content-addressable JSON format. Each function is iden
302308
}
303309
```
304310

305-
Optional fields (`closure_vars`, `closure_func_refs`, `module_vars`, `class_bases`) are omitted when empty.
311+
Optional fields (`closure_vars`, `closure_func_refs`, `module_vars`, `class_bases`, `class_keywords`, `class_attrs`, `class_decorators`) are omitted when empty.
306312

307313
### Content hashing
308314

@@ -312,6 +318,7 @@ Optional fields (`closure_vars`, `closure_func_refs`, `module_vars`, `class_base
312318
| `imports` (sorted), `owner_class` | `deps` (structural, not content) |
313319
| `closure_vars`, `closure_func_refs` (sorted) | |
314320
| `module_vars` (sorted), `class_bases` | |
321+
| `class_keywords` (sorted), `class_attrs`, `class_decorators` | |
315322

316323
Because dependencies are excluded from the hash, adding or removing an edge never changes a node's hash. This enables workers to cache objects by hash and request only missing ones: `missing = incoming.keys() - cached.keys()`.
317324

@@ -323,7 +330,7 @@ Given a store and a target function name:
323330
2. **Walk** -- BFS through `deps` to collect all transitive dependencies.
324331
3. **Sort** -- Topological sort: dependencies before dependents.
325332
4. **Deduplicate imports** -- Merge imports across all functions.
326-
5. **Assemble** -- Emit imports, then module-level variable assignments, then functions in order. Methods are grouped into `class` blocks with proper base classes. Closure variables become keyword-only parameters with defaults.
333+
5. **Assemble** -- Emit imports, then module-level variable assignments, then functions in order. Methods are grouped into `class` blocks with decorators, base classes, metaclass keywords, class-level attributes, and methods. Closure variables become keyword-only parameters with defaults.
327334

328335
## Data model
329336

@@ -356,6 +363,9 @@ One function in the dependency graph:
356363
| `closure_func_refs` | `dict[str, str]` -- references to traced functions captured in closures |
357364
| `module_vars` | `dict[str, str]` -- module-level variable assignments (name -> source) |
358365
| `class_bases` | `list[str]` -- base class names for methods in classes with inheritance |
366+
| `class_keywords` | `dict[str, str]` -- class definition keywords (e.g., `{"metaclass": "ABCMeta"}`) |
367+
| `class_attrs` | `list[str]` -- class-level attribute source lines (assignments, docstrings) |
368+
| `class_decorators` | `list[str]` -- class decorator source strings (without `@` prefix) |
359369

360370
## Dependency auto-installation
361371

@@ -449,16 +459,15 @@ When a module contains `from X import *`:
449459
- Circular dependencies raise `CycleError` during reconstruction.
450460

451461
### Closure capture
452-
- Values whose `repr()` is not valid Python (file handles, sockets, etc.) are skipped with a warning.
453-
- Non-traced callables captured in closures are skipped.
462+
- Objects that are neither repr-serializable, picklable, nor user-defined callables are skipped with a warning (e.g., file handles, sockets).
454463

455464
### Imports
456465
- Relative star imports (`from . import *`) are not supported.
457466
- Aliased cross-module imports (`from utils import helper as h`) are skipped to avoid name mismatches.
458467

459468
### Classes
460-
- Metaclasses and `__init_subclass__` hooks are not replayed on the worker.
461-
- Class-level attributes that aren't assignments (e.g., descriptors created by external decorators) may not be captured.
469+
- Metaclasses and `__init_subclass__` hooks are replayed when the parent class is in the dependency tree (i.e., referenced via `super()` or constructor call).
470+
- Class-level attributes defined via complex descriptors or external decorators (beyond simple assignments) may not be captured.
462471

463472
## CLI reference
464473

pyfuse/graph/store.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -377,7 +377,7 @@ def reconstruct(self, function_name: str) -> str:
377377
# -- Serialization -------------------------------------------------------
378378

379379
def to_dict(self) -> dict[str, Any]:
380-
"""Export as a dict in v0.3.0 format."""
380+
"""Export as a dict in v0.4.0 format."""
381381
qname_to_hash = self._refs
382382

383383
# Build objects with closure_func_refs converted to hashes

0 commit comments

Comments
 (0)