You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: resolve merge conflicts with main and update v2 fallback tests
- Take main's version for task_runner.py and async_task_runner.py
(includes transient error handling, 404/405 detection)
- Update test_v2_fallback.py: _v2_available -> _use_update_v2,
501 -> 405, fix e2e test infinite loop (mock batch_poll to return
empty on second call)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
When a worker picks up a task, the Conductor server starts a `responseTimeoutSeconds` timer. If the worker doesn't send an update before the timer expires, the server marks the task as timed out and re-queues it for retry.
4
+
5
+
For long-running tasks (agent tool calls, LLM inference, data processing, batch jobs), the worker is actively executing but the server thinks it's dead. **Lease extension** solves this by automatically sending heartbeats that reset the timeout timer.
6
+
7
+
## How It Works
8
+
9
+
When `lease_extend_enabled=True`:
10
+
11
+
1. Worker picks up a task with `responseTimeoutSeconds > 0`
12
+
2. SDK starts tracking the task for heartbeats
13
+
3. At **80% of `responseTimeoutSeconds`**, SDK sends a heartbeat (`TaskResult.extend_lease=True`)
14
+
4. Server resets the task's `updateTime` to now, giving a fresh `responseTimeoutSeconds` window
15
+
5. Heartbeats continue until the task completes, fails, or the worker shuts down
The heartbeat fires at 80% of `responseTimeoutSeconds` (matching the Java SDK). This gives a 20% safety margin — if a heartbeat is slightly delayed, the task still has time before the server times it out.
26
+
27
+
## Quick Start
28
+
29
+
```python
30
+
from conductor.client.worker.worker_task import worker_task
|**Worker memory**| Task stays in worker memory | Task is released, re-polled with fresh context |
115
+
116
+
You can combine both — enable `lease_extend_enabled` for safety while also using `TaskInProgress` for incremental polling.
117
+
118
+
## Important Constraints
119
+
120
+
-**`responseTimeoutSeconds`** is the time between updates. This is what heartbeats reset.
121
+
-**`timeoutSeconds`** is the overall SLA wall-clock ceiling. **Cannot be extended by heartbeat.** Once exceeded, the task is TIMED_OUT regardless of heartbeats.
122
+
- Heartbeats only fire when `responseTimeoutSeconds > 0` and `lease_extend_enabled = True`.
123
+
- If the heartbeat interval would be less than 1 second (i.e., `responseTimeoutSeconds < 1.25`), heartbeats are skipped.
124
+
125
+
## Retry on Failure
126
+
127
+
If a heartbeat API call fails, the SDK retries up to 3 times with backoff (`1s`, `1.5s`, `2s`). If all retries fail, the error is logged and the SDK tries again on the next poll loop iteration. If the network is truly partitioned, the server will eventually time out the task — this is correct behavior.
128
+
129
+
## Example
130
+
131
+
See [examples/lease_extension_example.py](examples/lease_extension_example.py) for a complete runnable example that:
132
+
- Defines a long-running worker with `lease_extend_enabled=True`
133
+
- Creates a workflow with a short `responseTimeoutSeconds`
134
+
- Runs the workflow and proves the task completes despite sleeping longer than the timeout
> **Already in a virtual environment?** Skip the `venv` step and run `pip install conductor-python` directly. On macOS, Windows, or in containers where system Python is not locked down, you can also install globally.
69
+
65
70
## 60-Second Quickstart
66
71
67
72
**Step 1: Create a workflow**
@@ -101,7 +106,7 @@ Create a `quickstart.py` with the following:
101
106
```python
102
107
from conductor.client.automator.task_handler import TaskHandler
103
108
from conductor.client.configuration.configuration import Configuration
104
-
from conductor.client.orkes_clients import OrkesClients
109
+
from conductor.client.orkes_clients import OrkesClients# works with OSS Conductor and Orkes Conductor
105
110
from conductor.client.workflow.conductor_workflow import ConductorWorkflow
106
111
from conductor.client.worker.worker_task import worker_task
107
112
@@ -127,6 +132,8 @@ def main():
127
132
workflow.register(overwrite=True)
128
133
129
134
# Start polling for tasks (one worker subprocess per worker function).
135
+
# Note: scan_for_annotated_workers=True only discovers @worker_task functions that have
136
+
# already been imported. If workers are in a separate module, import it first.
130
137
with TaskHandler(configuration=config, scan_for_annotated_workers=True) as task_handler:
131
138
task_handler.start_processes()
132
139
@@ -162,7 +169,7 @@ python quickstart.py
162
169
> See [Configuration](#configuration) for details.
163
170
164
171
That's it — you just defined a worker, built a workflow, and executed it. Open the Conductor UI (default:
165
-
[http://localhost:8127](http://localhost:8127)) to see the execution.
172
+
[http://localhost:8080](http://localhost:8080)) to see the execution.
See [examples/task_context_example.py](examples/task_context_example.py) for all patterns (polling, retry-aware logic, async context, input access).
273
280
281
+
### Lease Extension for Long-Running Tasks
282
+
283
+
For tasks that run longer than `responseTimeoutSeconds` (e.g., LLM inference, data pipelines, batch jobs), enable automatic lease extension. The SDK sends heartbeats at 80% of `responseTimeoutSeconds`, resetting the server's timeout timer so the task stays alive:
284
+
285
+
```python
286
+
from conductor.client.worker.worker_task import worker_task
Disabled by default. Enable per-worker via decorator, constructor, or environment variable (`conductor_worker_<task>_lease_extend_enabled=true`). See [LEASE_EXTENSION.md](LEASE_EXTENSION.md) for the full guide.
300
+
274
301
### Monitoring with Metrics
275
302
276
303
Enable Prometheus metrics with a single setting — the SDK exposes poll counts, execution times, error rates, and HTTP latency:
@@ -403,6 +430,7 @@ See the [Examples Guide](examples/README.md) for the full catalog. Key examples:
| [fastapi_worker_service.py](examples/fastapi_worker_service.py) | FastAPI: expose a workflow as an API (+ workers) |`uvicorn examples.fastapi_worker_service:app --port 8081 --workers 1`|
408
436
| [helloworld.py](examples/helloworld/helloworld.py) | Minimal hello world |`python examples/helloworld/helloworld.py`|
@@ -427,6 +455,7 @@ End-to-end examples covering all APIs for each domain:
0 commit comments