Replies: 1 comment
-
|
I've created an issue from this so it's easier to track: #3758. Please follow along there. Closing this discussion. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
When resuming a workflow from a checkpoint after a failure, the failed executor doesn't re-execute. The workflow appears to skip directly to completion, treating the previously-failed executor as if it had succeeded.
Environment
InMemoryCheckpointStorage(customTrackingCheckpointStoragewrapper)The Problem
What I'm Trying to Do
I have a sequential order processing workflow with these steps:
Tool-1- Fetch pricing dataTool-2- Execute validation rulesTool-3- Convert rules to natural languageTool-5- Save results to databaseworkflow_complete- Terminal nodeWhen an executor fails (e.g., step 3 throws an exception), I want to:
What Actually Happens
When I resume from a checkpoint:
ExecutorStartedEvent, no state updates)Expected Behavior
When resuming from a checkpoint, I expect:
ExecutorStartedEvent,MessageSentEvent, etc.)Code Example
Checkpoint Capture
Resume Attempt
Workflow Structure
Observed Logs
First Attempt (Initial Run)
Second Attempt (Resume)
That's it. No executor events, no re-execution.
What I've Tried
Using
run_from_checkpoint()instead ofrun_stream()Capturing different checkpoints
Creating new workflow instance per retry
Adding a terminal executor
Verifying checkpoint contains state
Questions
Is
run_stream(checkpoint_id=X, checkpoint_storage=storage)the correct API for resuming?Do I need to configure executors differently for checkpoint resume?
Does checkpoint resume require workflow state to be structured differently?
Is there a way to debug which checkpoint version is loaded?
What Would Help
Additional Context
I'm implementing automatic retry with checkpoint resume to avoid redundant work. The workflow makes external API calls that are expensive/slow, so I want to:
The checkpoint system seems to save state correctly, but the resume mechanism doesn't trigger executor re-execution, making checkpoints ineffective for retry scenarios.
Any guidance would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions