feat(state,engine): run checkpointing + resume (agentctl run --resume)#127
Merged
Conversation
Persist per-run execution snapshots in SQLite so workflow runs can pause and resume. Adds migration 003, RunCheckpoint model, SaveCheckpoint, GetLatestCheckpoint, and UpdateRunStatus with cascade on trace retention. Closes #105 (state layer). Co-authored-by: Cursor <cursoragent@cursor.com>
Write canonical JSON checkpoints after every completed step. Resume rehydrates interpolation context from the latest checkpoint. ErrInterrupted signals a clean pause for future approval gates; stub via InterruptAfterStepIndex for tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Resume reuses the persisted run row and checkpoint, emits run.resumed trace events, and exits 0 on interrupted runs awaiting human action. Co-authored-by: Cursor <cursoragent@cursor.com>
ReviewGate [WARN]
|
Automated reviewSummaryFeature introduces run checkpointing and resume functionality. Findings
|
Persist checkpoints before run_steps succeeded rows to close the crash replay window. Pin workflow_spec_hash and environment_name on runs (migration 004) and reject resume on drift. Add checkpoint payload version, size bounds, and step validation; typed run status constants; DRY prepareProject helper; CLI Args validation exit 2; happy-path resume integration test. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_checkpointstable (migration003) withSaveCheckpoint,GetLatestCheckpoint, andUpdateRunStatusonRuntimeStore. Checkpoints cascade when trace retention prunes runs.${input.*},${steps.*}) and accumulated cost. Resume continues from the next step without replaying earlier steps.agentctl run --resume <run-id>to rehydrate and continue an interrupted or crash-recovered run. Interrupted runs exit cleanly (statusinterrupted, exit code 0).Closes #105
Design notes
context_jsonis engine-owned and opaque to the CLI surface.engine.ErrInterrupted+ optionalInterruptAfterStepIndexstub simulate approval-gate pauses until HITL (feat(engine,policy): human-in-the-loop approvals (approve | reject | edit | switch) #106) lands.run.interrupted,run.resumed.Test plan
make ci(gofmt, vet,go test -race ./...)DeleteRunsStartedBeforerunningcheckpoint; reject completed checkpoint--resumevalidation paths (missing run, conflicting args)Made with Cursor