Build resume lifecycle + SKIPPED status UI polish#141
Merged
Conversation
Fix the silent reuse of resume_build_id in sd.build*: the SDK now emits
a BUILD_RESUMED event when an existing build is picked up, so a build
that previously terminated (FAILED/COMPLETED/etc.) flips back to RUNNING
and surfaces "running (resumed)" in the UI. Builds list is now ordered
by Build.last_active_at so a resumed build jumps to the top instead of
staying buried at its original created_at position.
Backwards compat:
- New SDK + old API: POST /builds/{id}/resume 404s, the registry client
swallows the missing-route 404 with a warning (same pattern as
task_skip / dependencies). Build runs to completion locally.
- Old SDK + new API: no-op, behaviour unchanged. Existing rows get
last_active_at backfilled from created_at by the migration.
Also includes UI polish on TaskStatus rendering (drive-by, asked for in
the same PR):
- Add "skipped" to the TaskStatus type and render it in amber across
StatusBadge / TaskNode / TaskExplorerTable so it's visible against
dark-blue table rows (was effectively black-on-dark-blue before).
- Add the missing statuses (suspended, completed-set) to the BuildView
status filter dropdown and to the search-bar status autocomplete on
the API side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds first-class “build resume” semantics end-to-end so sd.build(resume_build_id=...) updates registry/UI state (instead of silently reusing an ID that remains FAILED in the UI), and polishes UI support for the skipped task status.
Changes:
- SDK: emit
BUILD_RESUMEDvia newbuild_resume/build_resume_aioregistry calls (with backwards-compatible 404 swallow for older APIs). - API/DB: add
POST /builds/{id}/resume, deriveis_resumedin build status replay, and addBuild.last_active_atto sort builds by recent activity. - UI: render “running (resumed)” for resumed builds; add
skippedstatus to types, filters, and consistent amber styling.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| lib/stardag/tests/test_registry/test_api_registry.py | Adds tests verifying missing-route 404 handling for build resume calls. |
| lib/stardag/tests/test_build/test_sequential.py | Ensures sequential build path calls build_resume when resume_build_id is provided. |
| lib/stardag/tests/test_build/test_concurrent.py | Ensures concurrent async build path calls build_resume_aio when resume_build_id is provided. |
| lib/stardag/tests/test_build/conftest.py | Extends the recording registry to track build_resume/build_resume_aio invocations. |
| lib/stardag/src/stardag/registry/_base.py | Introduces no-op build_resume/build_resume_aio APIs for registry backends. |
| lib/stardag/src/stardag/registry/_api_registry.py | Implements POST /builds/{id}/resume calls with 404 backwards-compat swallowing + logging. |
| lib/stardag/src/stardag/build/_sequential.py | Emits resume event to registry when resuming builds (sync + async sequential). |
| lib/stardag/src/stardag/build/_concurrent.py | Emits resume event to registry when resuming builds (async concurrent). |
| app/stardag-ui/src/types/task.ts | Adds skipped task status and optional build.is_resumed field. |
| app/stardag-ui/src/components/TaskNode.tsx | Adds amber styling for skipped tasks in DAG nodes. |
| app/stardag-ui/src/components/TaskFilters.tsx | Adds Skipped/Cancelled options to the status filter dropdown. |
| app/stardag-ui/src/components/TaskExplorerTable.tsx | Adds colors for suspended/skipped/cancelled and ensures pending fallback styling. |
| app/stardag-ui/src/components/StatusBadge.tsx | Adds amber styling for skipped status (normal + muted). |
| app/stardag-ui/src/components/BuildView.tsx | Passes is_resumed into BuildStatusBadge in breadcrumbs. |
| app/stardag-ui/src/components/BuildStatusBadge.tsx | Renders “running (resumed)” label + tooltip when isResumed is true. |
| app/stardag-ui/src/components/BuildsList.tsx | Reuses BuildStatusBadge and surfaces resumed state in builds list. |
| app/stardag-api/tests/test_search.py | Pins /tasks/search/values?key=status to return full filterable status set. |
| app/stardag-api/tests/test_builds.py | Adds coverage for resume behavior, is_resumed semantics, and list ordering by activity. |
| app/stardag-api/src/stardag_api/services/status.py | Adds BUILD_RESUMED replay semantics + is_resumed return value in derived build status. |
| app/stardag-api/src/stardag_api/schemas.py | Extends BuildResponse with is_resumed. |
| app/stardag-api/src/stardag_api/routes/search.py | Returns full filterable status value suggestions (excluding unregistered). |
| app/stardag-api/src/stardag_api/routes/builds.py | Adds resume endpoint, touches last_active_at on event creation, and sorts builds by last_active_at. |
| app/stardag-api/src/stardag_api/models/enums.py | Adds EventType.BUILD_RESUMED. |
| app/stardag-api/src/stardag_api/models/build.py | Adds Build.last_active_at column + index for environment/activity sorting. |
| app/stardag-api/migrations/versions/20260507_121615_add_build_resumed_event_type_and_last_.py | Migrates last_active_at with backfill + index. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback: the previous design touched ``Build.last_active_at``
on every task event, which would serialise concurrent task workers on a
row-level exclusive lock against the same build row and bloat ``builds``
with MVCC versions under the kind of high task concurrency stardag
explicitly supports (Modal/Prefect distributed runners).
Switch to lifecycle-only touching: the column is bumped only by the five
build-level event handlers (BUILD_RESUMED / BUILD_COMPLETED /
BUILD_FAILED / BUILD_CANCELLED / BUILD_EXIT_EARLY) and the INSERT
default. The hot path (task event endpoints, single + bulk register) no
longer issues an UPDATE against ``builds``. The list-builds sort key is
now "most recent lifecycle change", which is the cleaner semantic — a
long-running mid-execution build won't bump position, but its
``status=running`` badge already signals activity, and a resumed build
still pops to the top because BUILD_RESUMED is itself a lifecycle event.
Also pick up two smaller review nits:
- ``list_builds`` now orders by ``(last_active_at desc, id desc)`` —
``Build.id`` is a UUID7 so it's a stable, time-correlated tiebreaker
for builds that share a timestamp. Without it, paginating across a
tie can yield duplicates or skips.
- ``test_list_builds_orders_resumed_first`` adds short ``asyncio.sleep``
gaps between creates so it's robust on coarse-resolution CI clocks,
and ``test_resume_build_not_found`` now pins the resource-level
response body (``{"detail": "Build not found"}``) so the SDK's
missing-route-vs-resource-404 distinction stays explicit on the API
side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Copilot review on PR #141: ``get_build_status`` documented ``is_resumed`` as "latest build-level event is BUILD_RESUMED", but the BUILD_STARTED branch wasn't resetting the flag. The SDK never emits BUILD_STARTED after BUILD_RESUMED in normal flow (it's one-or-the-other at build init), so this is a theoretical bug, not an observable one. But the replay shouldn't rely on event ordering — admin/manual event inserts could produce any sequence — so make BUILD_STARTED clear the flag like the terminal branches already do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
andhus
added a commit
that referenced
this pull request
May 8, 2026
Patch release covering the build-resume lifecycle fix + SKIPPED status UI polish merged in #141. No breaking changes — pip install -U stardag is sufficient. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a UX bug where
sd.build(resume_build_id=...)silently reused the build_id without notifying the registry — so a previously-failed build kept showing as failed in the UI even while the SDK was actively running tasks under it. Resumed builds now flip back to RUNNING with a "running (resumed)" badge and jump to the top of the Home list.What's in the PR
BUILD_RESUMEDevent whenresume_build_idis set (inbuild,build_aio,build_sequential,build_sequential_aio).POST /builds/{id}/resumeendpoint, newEventType.BUILD_RESUMED, status replay treats it likeBUILD_STARTED(flips status toRUNNING, clearscompleted_at, setsis_resumed: trueuntil the next terminal event).Build.last_active_atcolumn, touched only on build-level lifecycle events (BUILD_RESUMED/BUILD_COMPLETED/BUILD_FAILED/BUILD_CANCELLED/BUILD_EXIT_EARLY). Build list sorts by(last_active_at desc, id desc)— UUID7 tiebreaker keeps pagination stable across timestamp ties. Migration backfills existing rows fromcreated_at.BuildStatusBadgeshowsrunning (resumed)when applicable.last_active_atwrite semanticsTouching is intentionally confined to lifecycle events, not every task event. With the high task concurrency stardag supports (Modal/Prefect distributed runners can fan out hundreds of concurrent task events against one build), an UPDATE per task event would serialise workers on a row-level exclusive lock against the build row and bloat
buildswith MVCC versions. The trade-off is that a long-running mid-execution build won't bump its position from intermediate task events — but itsstatus=runningbadge already signals activity, and "most recent lifecycle change" is the cleaner sort key. The bug-fix goal (resumed build pops to the top) is preserved becauseBUILD_RESUMEDis itself a lifecycle event.A read-time JOIN against
MAX(events.created_at)was also considered but rejected: it would add an O(builds_in_env) cost to every list query (≈100ms for a 1k-build environment, scaling poorly with workspace size), versus the column-based design's sub-millisecond list latency at any scale.Backwards compatibility
POST /resume404s; the registry client swallows the missing-route 404 with a warning (same_is_route_not_foundpattern astask_skip/dependencies). Build still runs to completion; only the registry-side status flip is missing until the API is upgraded.last_active_atis set on insert by DEFAULT and gets bumped by the new build-level handlers, so list ordering is correct without SDK cooperation.Drive-by UI polish (same PR by request)
TaskStatustype now includes"skipped"(was missing →StatusBadgerendered black on default for skipped, near-invisible on dark rows).skippednow uses amber (warmer thanpending's yellow, distinct fromcancelled's grey) consistently acrossStatusBadge,TaskNode(DAG), andTaskExplorerTable.TaskFiltersdropdown: added missingSkippedandCancelledoptions./tasks/search/values?key=statusautocomplete: was hardcoded to{pending, running, completed, failed}; now returns the full filterable set (still excludes the internalunregisteredphantom marker).Test plan
pytest app/stardag-api/tests/) — 257 passed including 4 new resume tests + 1 status-values pinpytest lib/stardag/tests/ --ignore=tests/test_integration) — 525 passed including 2 new build_resume tests + 4 new 404-fallback testssd.build(resume_build_id=<failed-build-id>)and verify the UI badge flips to "running (resumed)" and the build moves to the top of Home🤖 Generated with Claude Code