Skip to content

Commit 242a890

Browse files
feat: add workflow stage resume (#747)
* feat: add workflow stage resume * test: cover workflow resume output processors * fix: harden workflow resume metadata * fix: address workflow resume review feedback * feat: add workflow review gate controls * fix: validate workflow metadata shape * fix: validate workflow rerun inputs
1 parent 597ad0b commit 242a890

4 files changed

Lines changed: 945 additions & 15 deletions

File tree

fern/versions/latest/pages/concepts/workflow-chaining.mdx

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,38 @@ workflow.add_stage("cleanup", cleanup)
102102

103103
This is useful for final cleanup, schema transforms, and format-specific export preparation.
104104

105+
## Resume
106+
107+
Workflow names are durable artifact identities. Reusing the same name with `resume=ResumeMode.IF_POSSIBLE` reuses compatible completed stages, resumes a matching partial stage through `DataDesigner.create(..., resume=ResumeMode.ALWAYS)`, and reruns the first changed or missing stage plus its descendants.
108+
109+
```python
110+
from data_designer.interface import ResumeMode
111+
112+
results = workflow.run(resume=ResumeMode.IF_POSSIBLE)
113+
```
114+
115+
Use `ResumeMode.ALWAYS` for strict resume before the first recovered checkpoint. A changed stage or missing selected output raises instead of starting fresh. If a matching partial stage resumes successfully, descendants are recreated from that stage's current output.
116+
117+
## Review gates
118+
119+
Use `targets` to materialize an intermediate stage without running the rest of the workflow. `export_stage()` writes the selected stage output for review. After review, pass the approved parquet as a stage output override and resume the downstream target.
120+
121+
```python
122+
draft_results = workflow.run(targets="drafts")
123+
draft_results.export_stage("drafts", "drafts_for_review.parquet")
124+
125+
results = workflow.run(
126+
targets="expanded",
127+
resume=ResumeMode.IF_POSSIBLE,
128+
stage_output_overrides={"drafts": "approved.parquet"},
129+
)
130+
```
131+
132+
If the reviewed data replaces a stage's selected output in place, run with `resume=ResumeMode.IF_POSSIBLE` and `rerun_from="expanded"` to rebuild that stage and its descendants from the current boundary output.
133+
105134
## Current limits
106135

107136
- Stages are linear. DAGs, parallel branches, and joins are planned separately.
108-
- Stage-level resume is not implemented yet.
109137
- `push_to_hub()` does not support selected processor or callback outputs yet. Use `export()` for the selected workflow output.
110138
- `on_success` callbacks are trusted user code. If a callback returns a path, Data Designer reads that path as the next stage input.
111139
- The artifact layout is intended for inspection, but it is not yet a stable public contract.

0 commit comments

Comments
 (0)