Prevent race leading to stale instances and runners

We observed a gradual buildup of runners and instances that ultimately led to our CI grind to a halt.

The problem seems to be that the `start` job is not marked as `always()`. This open a race condition where an EC2 is started, but the `start` job gets cancelled before it reports back. In that case, the `stop` job can't terminate the instance because it has not yet received its name. Similarly, a runner can be left orphaned.

It seems that the `start` job in a `ec2-github-runner` based workflow _must_ be marked `always()` so it cannot be cancelled, and the above race does not happen.

Note that if the cancellation of `start` jobs is common if the workflow is part of a concurrency group. For example, if it is triggered upon updates to a fixed PR, occasional fast back-to-back updates to the same PR would lead to the race, and the buildup of orphaned runners and instances.

**Suggestion:** Change the `REAME.md` to mark `start` as `always()`, and document that this is important.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent race leading to stale instances and runners #206

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Prevent race leading to stale instances and runners #206

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions