ci: cap the E2E job with timeout-minutes#145
Open
viniciusdc wants to merge 1 commit into
Open
Conversation
…runner The E2E job had no timeout, so when a setup step hangs (a foundational-services install recently wedged on an unbounded readiness wait, well before the operator is even deployed) the job sat in-progress until GitHub's 6h default, holding a runner the whole time. Add a 30-minute job cap -- a healthy run finishes well under that, so this only bites genuine hangs, failing them in minutes. This is a standalone resilience fix. Replacing the hand-rolled foundational setup (dev/scripts/services/*) with the nebari-sandbox action is tracked separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
timeout-minutes: 30cap to thetest-e2ejob. It had no timeout, so a hung setup step ran until GitHub's 6-hour default, holding a runner the entire time.Why
On a recent PR the
Install foundational servicesstep (dev/scripts/services/install.sh) wedged — apparently on an unbounded readiness wait — and the job sat in-progress for 40+ minutes with no end in sight (it would have run to 6h). That step runs before the operator is even deployed, so it's an environment/setup hang, not a code failure. A job-level cap is the robust backstop regardless of which inner command stalls.A healthy E2E run finishes well under 30 minutes (the foundational step normally completes in ~2 min), so this only ever trips on a genuine hang — failing it in minutes instead of squatting a runner.
Scope
Deliberately minimal and standalone. The strategic fix — replacing the hand-rolled
dev/scripts/services/*setup with thenebari-dev/action-nebari-sandboxplatformprofile — is a larger, multi-repo effort tracked separately. This just stops the bleeding in the meantime.Test plan
jobs.test-e2e.timeout-minutes == 30.