Skip to content

Commit e8702f5

Browse files
committed
Fix pdb / breakpoint() hang in workflow code (#1104)
When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt -- including from a sandboxed workflow run under pytest. Four pieces: - Inline dispatch on the asyncio main thread (via loop.call_soon to avoid nesting inside the dispatch task's __step() and tripping Python 3.14's task-entry validation). - breakpoint removed from the sandbox's invalid builtins so the call reaches the worker hook. Nothing else is relaxed. - A Pdb subclass that lands at the workflow's own frame, suspends sandbox checks during each REPL interaction, and overrides q/Ctrl-D to continue the workflow instead of failing it with BdbQuit. - A defensive sys.breakpointhook that raises a clear RuntimeError when breakpoint() is called from a workflow worker thread without debug_mode, replacing the previous silent hang. When debug_mode is not set, the worker's dispatch and sandbox config are unchanged. Adds a README subsection on debugging workflows and five tests at tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14. Closes #1104.
1 parent 24badcf commit e8702f5

6 files changed

Lines changed: 572 additions & 47 deletions

File tree

README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ informal introduction to the features and their implementation.
8282
- [Customizing the Sandbox](#customizing-the-sandbox)
8383
- [Passthrough Modules](#passthrough-modules)
8484
- [Invalid Module Members](#invalid-module-members)
85+
- [Debugging Workflows with `breakpoint()` / `pdb`](#debugging-workflows-with-breakpoint--pdb)
8586
- [Known Sandbox Issues](#known-sandbox-issues)
8687
- [Global Import/Builtins](#global-importbuiltins)
8788
- [Sandbox is not Secure](#sandbox-is-not-secure)
@@ -1241,6 +1242,80 @@ my_worker = Worker(..., workflow_runner=SandboxedWorkflowRunner(restrictions=my_
12411242

12421243
See the API for more details on exact fields and their meaning.
12431244

1245+
##### Debugging Workflows with `breakpoint()` / `pdb`
1246+
1247+
Setting `debug_mode=True` on the `Worker` (or `TEMPORAL_DEBUG=1` in the environment) routes workflow activations
1248+
onto the asyncio main thread instead of a worker thread pool. This lets `breakpoint()` and `pdb.set_trace()`
1249+
inside workflow code open an interactive REPL — without it, pdb hangs because its `input()` call would run on a
1250+
thread that does not own the controlling TTY.
1251+
1252+
A minimal runnable example:
1253+
1254+
```python
1255+
import asyncio
1256+
from datetime import timedelta
1257+
1258+
from temporalio import workflow
1259+
from temporalio.client import Client
1260+
from temporalio.worker import Worker
1261+
1262+
1263+
@workflow.defn
1264+
class DebugMeWorkflow:
1265+
@workflow.run
1266+
async def run(self) -> str:
1267+
x = 42
1268+
breakpoint() # interactive pdb prompt opens at this line
1269+
return f"x was {x}"
1270+
1271+
1272+
async def main() -> None:
1273+
client = await Client.connect("localhost:7233")
1274+
async with Worker(
1275+
client,
1276+
task_queue="debug-me",
1277+
workflows=[DebugMeWorkflow],
1278+
debug_mode=True,
1279+
):
1280+
result = await client.execute_workflow(
1281+
DebugMeWorkflow.run,
1282+
id="debug-me-wf",
1283+
task_queue="debug-me",
1284+
task_timeout=timedelta(minutes=10), # see caveat below
1285+
)
1286+
print(result)
1287+
1288+
1289+
if __name__ == "__main__":
1290+
asyncio.run(main())
1291+
```
1292+
1293+
Run with `python debug_me.py`, or under pytest with `pytest -s` (the `-s` flag disables pytest's stdin
1294+
capture). At the `(Pdb)` prompt you'll land at the line where `breakpoint()` was called, with workflow
1295+
locals in scope. Try `p x`, `n`, `c`, `q`.
1296+
1297+
**Quitting cleanly.** Typing `q` or hitting Ctrl-D continues the workflow rather than raising `BdbQuit`
1298+
(which would fail the workflow task). To genuinely abort, kill the outer process with Ctrl-C.
1299+
1300+
Two caveats when pausing at a breakpoint inside a workflow:
1301+
1302+
1. **Workflow task timeout.** Temporal expires a workflow task after ~10 seconds by default. If you sit at the
1303+
`(Pdb)` prompt longer than that, the server reassigns the task and your workflow replays from the start when
1304+
you continue — re-hitting the breakpoint. Pass `task_timeout=timedelta(minutes=N)` to `execute_workflow` /
1305+
`start_workflow` to give yourself debugging headroom:
1306+
1307+
```python
1308+
await client.execute_workflow(MyWorkflow.run, ..., task_timeout=timedelta(minutes=10))
1309+
```
1310+
1311+
2. **Deterministic replay.** Workflows are deterministic and replay from history; any wall-clock pause violates
1312+
that contract. For post-mortem debugging without these caveats, use the [Replayer](#replayer) on a recorded
1313+
history instead of live debugging.
1314+
1315+
Calling `breakpoint()` from sandboxed workflow code without `debug_mode` raises a sandbox
1316+
`RestrictedWorkflowAccessError` with a message pointing at `debug_mode=True`, so the failure mode is loud
1317+
and the fix is obvious.
1318+
12441319
##### Known Sandbox Issues
12451320

12461321
Below are known sandbox issues. As the sandbox is developed and matures, some may be resolved.

temporalio/worker/_debugger.py

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
from __future__ import annotations
2+
3+
import dataclasses
4+
import sys
5+
from types import FrameType, TracebackType
6+
7+
import temporalio.workflow
8+
from temporalio.worker.workflow_sandbox._runner import SandboxedWorkflowRunner
9+
10+
from ._workflow_instance import WorkflowRunner
11+
12+
__all__ = [
13+
"_install_workflow_breakpoint_hook",
14+
"_relax_sandbox_for_debugger",
15+
"_temporal_workflow_breakpoint_hook",
16+
]
17+
18+
_ORIGINAL_BREAKPOINTHOOK = sys.breakpointhook
19+
20+
21+
def _build_workflow_pdb_class() -> type:
22+
"""Build a Pdb subclass that suspends sandbox restrictions during the REPL.
23+
24+
pdb's cmdloop touches ``readline.get_completer`` and other
25+
sandbox-restricted internals each time it interacts with the user; we
26+
bracket each interaction with ``_sandbox_unrestricted.value = True`` and
27+
restore the previous value afterwards. Outside the REPL the sandbox
28+
stays intact.
29+
30+
``pdb`` is imported lazily because it's a debug-only dependency that
31+
pulls in ``cmd``/``bdb``/``linecache``; no reason to pay that cost at
32+
worker import time.
33+
"""
34+
import pdb
35+
36+
from temporalio.workflow._sandbox import _sandbox_unrestricted
37+
38+
class _WorkflowPdb(pdb.Pdb):
39+
# The `interaction` signature differs across Python versions: 3.10-3.12
40+
# typeshed names the second parameter `traceback: TracebackType | None`,
41+
# while 3.13+ renames it `tb_or_exc` and widens the type to include
42+
# `BaseException`. No single signature satisfies both stubs, so we
43+
# suppress the override check.
44+
def interaction( # type: ignore[override]
45+
self,
46+
frame: FrameType | None,
47+
tb_or_exc: TracebackType | BaseException | None,
48+
) -> None:
49+
prev = getattr(_sandbox_unrestricted, "value", False)
50+
_sandbox_unrestricted.value = True
51+
try:
52+
super().interaction(frame, tb_or_exc) # type: ignore[arg-type]
53+
finally:
54+
_sandbox_unrestricted.value = prev
55+
56+
# Override `q`/`quit`/`exit`/EOF (Ctrl-D) to behave like `continue`.
57+
# Default pdb raises `BdbQuit`, which propagates as an uncaught
58+
# exception out of workflow.run, fails the workflow task, and
59+
# triggers a server retry storm during teardown. For a debug
60+
# session the user almost always wants "stop debugging and let the
61+
# workflow finish" — that's `continue`. Users who truly want to
62+
# abort can Ctrl-C the outer shell.
63+
def do_quit(self, arg: str) -> bool | None:
64+
self.message(
65+
"[Temporal] 'q'/Ctrl-D continues the workflow. "
66+
"Ctrl-C the outer shell to abort."
67+
)
68+
return self.do_continue(arg)
69+
70+
do_q = do_exit = do_quit
71+
do_EOF = do_quit
72+
73+
return _WorkflowPdb
74+
75+
76+
def _temporal_workflow_breakpoint_hook(*args: object, **kwargs: object) -> object:
77+
"""``sys.breakpointhook`` that handles ``breakpoint()`` inside workflows.
78+
79+
Only installed when ``debug_mode`` is enabled on the Worker. From inside
80+
a workflow activation: drops the user into a custom Pdb at the workflow's
81+
own frame, with sandbox restrictions suspended during the REPL. From
82+
anywhere else (test code, helpers, etc.): delegates to whatever hook was
83+
previously installed.
84+
"""
85+
if not temporalio.workflow.in_workflow():
86+
# Not inside a workflow activation — let pytest's wrapper, ipdb, or
87+
# whatever else is configured handle it.
88+
return _ORIGINAL_BREAKPOINTHOOK(*args, **kwargs)
89+
# Inside a workflow: drop the user into pdb at the caller's frame (the
90+
# workflow's `run` method, where breakpoint() was actually written) rather
91+
# than landing inside this hook. Bypassing the configured breakpoint hook
92+
# also avoids pytest's pdb wrapper, which assumes a test-code context and
93+
# touches sandbox-restricted internals during its terminal-writer setup.
94+
# `sandbox_unrestricted()` lifts member checks for the duration of the
95+
# REPL so pdb's own initialization (readline, etc.) isn't blocked.
96+
# `skip` tells pdb not to stop in our hook frame or the contextlib
97+
# plumbing — without it pdb's first step lands at the `with` teardown
98+
# instead of the user's next workflow line.
99+
caller_frame = sys._getframe(1)
100+
with temporalio.workflow.unsafe.sandbox_unrestricted():
101+
pdb_cls = _build_workflow_pdb_class()
102+
pdb_cls(
103+
skip=[
104+
"temporalio.worker._debugger",
105+
"temporalio.workflow._sandbox",
106+
"contextlib",
107+
]
108+
).set_trace(caller_frame)
109+
return None
110+
111+
112+
def _install_workflow_breakpoint_hook() -> None:
113+
"""Set ``sys.breakpointhook`` to the workflow hook if it isn't already."""
114+
if sys.breakpointhook is not _temporal_workflow_breakpoint_hook:
115+
sys.breakpointhook = _temporal_workflow_breakpoint_hook
116+
117+
118+
def _relax_sandbox_for_debugger(workflow_runner: WorkflowRunner) -> WorkflowRunner:
119+
"""Allow ``breakpoint()`` past the sandbox so it can reach the worker hook.
120+
121+
The sandbox flags ``breakpoint`` as non-deterministic by default; without
122+
this relaxation the call raises before our breakpoint hook can run.
123+
Once inside the hook, the hook itself enters ``sandbox_unrestricted()``
124+
for the duration of the debugger session, so pdb's internals (readline,
125+
os.environ, etc.) aren't blocked either — without permanently dropping
126+
sandbox checks for the rest of workflow execution.
127+
"""
128+
if not isinstance(workflow_runner, SandboxedWorkflowRunner):
129+
return workflow_runner
130+
131+
restrictions = workflow_runner.restrictions
132+
invalid = restrictions.invalid_module_members
133+
builtins_matcher = invalid.children.get("__builtins__")
134+
if builtins_matcher is None:
135+
return workflow_runner
136+
137+
# `breakpoint` may sit either in `children` (as a leaf matcher with a
138+
# custom error message) or in `use` (the legacy flat form). Strip from
139+
# whichever shape is present.
140+
has_child = "breakpoint" in builtins_matcher.children
141+
has_use = "breakpoint" in builtins_matcher.use
142+
if not (has_child or has_use):
143+
return workflow_runner
144+
145+
new_children = {
146+
k: v for k, v in builtins_matcher.children.items() if k != "breakpoint"
147+
}
148+
new_use = set(builtins_matcher.use) - {"breakpoint"}
149+
new_builtins = dataclasses.replace(
150+
builtins_matcher, children=new_children, use=new_use
151+
)
152+
new_invalid = dataclasses.replace(
153+
invalid, children={**invalid.children, "__builtins__": new_builtins}
154+
)
155+
new_restrictions = dataclasses.replace(
156+
restrictions, invalid_module_members=new_invalid
157+
)
158+
return dataclasses.replace(workflow_runner, restrictions=new_restrictions)

0 commit comments

Comments
 (0)