Skip to content

Commit 0d2a62d

Browse files
Merge pull request #408 from conductor-oss/docs/execute-running-behavior-and-debug
docs: explain execute() RUNNING-after-timeout behavior and how to debug stuck workflows
2 parents 0927994 + 4e5e0bf commit 0d2a62d

3 files changed

Lines changed: 95 additions & 2 deletions

File tree

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,37 @@ Yes. Conductor ensures workflows complete reliably even in the face of infrastru
490490
491491
No. While Conductor excels at asynchronous orchestration, it also supports synchronous workflow execution when immediate results are required.
492492
493+
**Why did `execute()` return `status: RUNNING` with no output?**
494+
495+
`execute()` blocks until the workflow finishes **or** `wait_for_seconds` elapses (default: 10 s),
496+
whichever comes first. If it times out, you get `status='RUNNING'` — that is correct behavior,
497+
not a bug.
498+
499+
The most common cause: your worker raised an exception. Conductor marks the task FAILED and
500+
schedules a retry after `retryDelaySeconds` (default: **60 s**). The default 10 s wait expires
501+
while the retry is pending, so `execute()` returns before the workflow completes.
502+
503+
**To fix**: increase `wait_for_seconds` to outlast the retry cycle:
504+
505+
```python
506+
# default retryDelaySeconds is 60 — wait long enough to cover one retry
507+
run = executor.execute(name='my_workflow', version=1, workflow_input={...}, wait_for_seconds=70)
508+
```
509+
510+
**To debug** when a workflow is stuck:
511+
512+
```python
513+
# Inspect task statuses and failure reasons
514+
wf = executor.get_workflow(run.workflow_id, include_tasks=True)
515+
for task in wf.tasks:
516+
if task.status in ('FAILED', 'FAILED_WITH_TERMINAL_ERROR'):
517+
print(task.reference_task_name, task.reason_for_incompletion)
518+
```
519+
520+
You can also open the Conductor UI at `<server>/execution/<workflow_id>` — it shows each task's
521+
status, retry count, and the worker exception message directly. Worker tracebacks are also logged
522+
at ERROR level by the SDK in the `TaskHandler` process.
523+
493524
**Do I need to use a Conductor-specific framework?**
494525
495526
No. Conductor is language and framework agnostic. Use your preferred language and framework — the [SDKs](https://github.com/conductor-oss/conductor#conductor-sdks) provide native integration for Python, Java, JavaScript, Go, C#, and more.

docs/WORKFLOW.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,55 @@ workflow_id = workflow_client.execute_workflow(
6161
)
6262
```
6363

64+
> **`wait_for_seconds` and task retries**
65+
>
66+
> `execute()` / `execute_workflow()` block for at most `wait_for_seconds` (default: **10 s**).
67+
> If the workflow is still running when the timer fires, the call returns with
68+
> `status='RUNNING'` and empty output — this is expected behavior, not an error.
69+
>
70+
> The most common trigger: a worker exception. Conductor marks the task FAILED and waits
71+
> `retryDelaySeconds` (default: **60 s**) before retrying. The default 10 s timeout expires
72+
> during that wait, so you see `RUNNING`. Set `wait_for_seconds` to a value larger than
73+
> `retryDelaySeconds` to ensure the call waits through at least one retry cycle:
74+
>
75+
> ```python
76+
> run = executor.execute(
77+
> name='my_workflow', version=1, workflow_input={...},
78+
> wait_for_seconds=70 # covers one retry at the default 60 s delay
79+
> )
80+
> ```
81+
82+
#### Debugging a stuck workflow
83+
84+
When a workflow returns `RUNNING` or never completes, use these steps to find out why.
85+
86+
**1. Check the Conductor UI**
87+
88+
Open `<server>/execution/<workflow_id>`. The timeline view shows each task's status, retry
89+
count, and the worker exception message — usually the fastest way to diagnose a failure.
90+
91+
**2. Inspect task statuses programmatically**
92+
93+
`get_workflow` with `include_tasks=True` returns the full task list. Check failed tasks for
94+
their `reason_for_incompletion`:
95+
96+
```python
97+
wf = executor.get_workflow(workflow_id, include_tasks=True)
98+
for task in wf.tasks:
99+
print(task.reference_task_name, task.status, task.reason_for_incompletion)
100+
```
101+
102+
**3. Read the worker logs**
103+
104+
When a worker function raises an exception, the SDK catches it, logs the traceback at ERROR
105+
level, and reports the task as FAILED. Worker logs come from the `TaskHandler` process — check
106+
the terminal output or your process manager's log stream.
107+
108+
**Note on `reason_for_incompletion` on `WorkflowRun`**
109+
110+
`WorkflowRun.reason_for_incompletion` is deprecated. Use `get_workflow(id, include_tasks=True)`
111+
and read `task.reason_for_incompletion` on the specific failed task instead (see step 2 above).
112+
64113
### Fetch a workflow execution
65114
66115
#### Exclude tasks

src/conductor/client/workflow/executor/workflow_executor.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,21 @@ def execute_workflow_with_return_strategy(self, request: StartWorkflowRequest, w
9191
def execute(self, name: str, version: Optional[int] = None, workflow_input: Any = None,
9292
wait_until_task_ref: Optional[str] = None, wait_for_seconds: int = 10,
9393
request_id: Optional[str] = None, correlation_id: Optional[str] = None, domain: Optional[str] = None) -> WorkflowRun:
94-
"""Executes a workflow with StartWorkflowRequest and waits for the completion of the workflow or until a
95-
specific task in the workflow """
94+
"""Execute a workflow synchronously and wait for it to complete.
95+
96+
Returns when the workflow reaches a terminal state or ``wait_for_seconds`` elapses.
97+
If the timeout fires first, returns ``status='RUNNING'`` with empty output — not an error.
98+
99+
**Getting RUNNING with no output after a worker exception?** The default ``wait_for_seconds=10`` is shorter than the
100+
default task ``retryDelaySeconds=60``. A failing worker triggers a 60 s retry wait,
101+
so the 10 s timeout fires while the retry is pending. Raise ``wait_for_seconds``
102+
(e.g. 70) or inspect failed tasks::
103+
104+
wf = executor.get_workflow(run.workflow_id, include_tasks=True)
105+
for task in wf.tasks:
106+
if task.status in ('FAILED', 'FAILED_WITH_TERMINAL_ERROR'):
107+
print(task.reference_task_name, task.reason_for_incompletion)
108+
"""
96109
workflow_input = workflow_input or {}
97110
if request_id is None:
98111
request_id = str(uuid.uuid4())

0 commit comments

Comments
 (0)