Skip to content

Trap EXIT to deregister runner when runner exits by itself#582

Open
jonathan528i wants to merge 6 commits intomyoung34:masterfrom
jonathan528i:users/jonathan528i/4-8/exit-fix
Open

Trap EXIT to deregister runner when runner exits by itself#582
jonathan528i wants to merge 6 commits intomyoung34:masterfrom
jonathan528i:users/jonathan528i/4-8/exit-fix

Conversation

@jonathan528i
Copy link
Copy Markdown

@jonathan528i jonathan528i commented Apr 8, 2026

Currently, if the runner application crashes or exits natively with an error (e.g., exit code 1), the deregister_runner function in entrypoint.sh is never called. This leaves orphaned runner instances in github. Closes #581.

The current trap code only looks for external signals to kill/stop the runner process, but not when the runner process exits by itself. By also trapping EXIT, we can ensure deregister_runner is called whenever the runner exits.

Changes

  • entrypoint.sh: Added EXIT to the trap_with_arg list so the script guarantees deregistration regardless of how the container's main process terminates.
  • goss_trap_exit.yaml: Added a new test to validate if exit code 1 is received, that deregister happens correctly.
  • .github/workflows/test.yml: Integrated the new test into the CI pipeline for both Ubuntu and Debian builds.

Testing

Locally ran the new test without the change to trap and verified it failed. Re-ran the test with the change and verified it passes. Also ran goss_full.yaml and goss_reusage_fail to ensure the change didn't break existing tests.

@jonathan528i jonathan528i changed the title Trap EXIT Trap EXIT to deregister runner when runner exits by itself Apr 8, 2026
@myoung34
Copy link
Copy Markdown
Owner

The new test looks like theres a race condition to pass

@jonathan528i
Copy link
Copy Markdown
Author

jonathan528i commented Apr 10, 2026

The new test looks like theres a race condition to pass

My best guess is due to the exit call in deregister_runner function. I'm not sure if the exit call in different bash environments will either return the error code that triggered the trap, or the last executed command before the exit call in the function (which would be 0, therefore causing the test to fail since it's expecting 1)?

I've updated the code but we'll have to see if that fixes the tests once the pipeline runs again.

@myoung34
Copy link
Copy Markdown
Owner

Seems to always fail for arm64

@myoung34
Copy link
Copy Markdown
Owner

My concern is how flakey these tests are; arm64 fails consistently

Comment thread entrypoint.sh
}

deregister_runner() {
COMMAND_EXIT_CODE=$?
Copy link
Copy Markdown
Owner

@myoung34 myoung34 Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue here is 2 things:

  • that EXIT doesnt trap signals
  • qemu is doing something differently with how EXIT works here (why arm64 only, cant confirm what though)
Suggested change
COMMAND_EXIT_CODE=$?
local COMMAND_EXIT_CODE="${1:-EXIT}"
echo "Caught ${COMMAND_EXIT_CODE} - Deregistering runner"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deregistration doesn't happen when runner process exits or crashes

2 participants