Skip to content

Commit 78f98d1

Browse files
authored
fix: revert non-root runner bootstrap, keep the rest of Phase 4 (#19)
Phase 4 (#18) landed three independent hardenings in one PR: - New configurable runner-version input (no runtime impact) - Ephemeral + checksum + set -euo pipefail (additive safety) - Root to non-root runner user via sudo-heredoc (behavioral change) The dogfood rotation on terraform-provider-namecheap#182 failed — 'Start self-hosted EC2 runner' timed out at 6m15s waiting for runner registration. EC2 instance booted fine, but whatever the user-data did inside the instance, it didn't end at './run.sh' polling GitHub. We can't post-mortem directly because the instance is ephemeral and already terminated. Fix-forward strategy: revert ONLY the non-root transition (highest-probability culprit among the three axes), keep everything else from Phase 4. If the Phase 4 dogfood rotation passes after this revert, the root-to-runner sudo-heredoc is the breaker and can be investigated as an isolated follow-up (likely candidates: sudoers config on the hardened AMI, SELinux context, config.sh writing outside its own directory, or my heredoc quoting). Landing the safer pieces now unblocks Phases 5/6/7. Kept: - runner-version input (Phase 4's main feature) - set -euo pipefail - --ephemeral + --unattended + --disableupdate on config.sh - SHA-256 verification of the runner tarball - Clearer bash syntax ('case "$(uname -m)"', double-quoted vars) Reverted: - useradd + sudo -u runner -H bash <<'RUNNER_BOOTSTRAP' heredoc - RUNNER_ALLOW_RUNASROOT=1 restored (runner executes as root again) The non-root goal isn't lost — a follow-up issue will propose it again, this time with better instrumentation so we can see what failed inside the instance. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
1 parent 7b949a3 commit 78f98d1

2 files changed

Lines changed: 36 additions & 60 deletions

File tree

dist/index.js

Lines changed: 18 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -87903,42 +87903,32 @@ async function resolveImageId(client) {
8790387903
async function startEc2Instance(label, githubRegistrationToken) {
8790487904
const client = ec2Client();
8790587905

87906-
// User-data runs as root. We install dependencies + create a dedicated
87907-
// 'runner' user, then drop to that user for every subsequent step via
87908-
// a sudo-heredoc. The runner never needs root and never gets it; the
87909-
// old RUNNER_ALLOW_RUNASROOT=1 escape hatch is gone.
87906+
// User-data runs as root. Phase 4's original attempt to drop to a
87907+
// dedicated 'runner' user via sudo-heredoc broke dogfood in
87908+
// terraform-provider-namecheap#182 — the EC2 instance came up but the
87909+
// runner never registered within the 5 min action timeout. Reverted
87910+
// here to the root-execution path the pre-Phase-4 bootstrap used,
87911+
// isolating the non-root move for a separate investigation.
8791087912
//
87911-
// Runner version is read from config so consumers can override without
87912-
// waiting for an action release (see #10 for the motivation chain).
87913-
//
87914-
// The tarball is SHA-256 verified against actions/runner's published
87915-
// checksum before extraction — same defense-in-depth pattern the
87916-
// provider repo uses for its Go / Terraform downloads.
87917-
//
87918-
// --ephemeral tells GitHub to auto-deregister the runner after it
87919-
// completes a single job; the stop-runner step's explicit removeRunner()
87920-
// call becomes belt-and-braces rather than the primary deregister path.
87913+
// Kept from the Phase 4 work (all verified independently of the
87914+
// root/non-root axis):
87915+
// - set -euo pipefail — fail fast on any bootstrap error.
87916+
// - --ephemeral + --unattended + --disableupdate on config.sh —
87917+
// one-job runner, no interactive prompts, no runner auto-update.
87918+
// - SHA-256 verification of the runner tarball against the
87919+
// published .sha256 sidecar before extraction.
87920+
// - Parameterized runner-version via config.input.runnerVersion.
8792187921
const runnerVersion = config.input.runnerVersion;
8792287922
const owner = config.githubContext.owner;
8792387923
const repo = config.githubContext.repo;
87924-
const userDataScript = [
87924+
const userData = [
8792587925
'#!/bin/bash',
8792687926
'set -euo pipefail',
8792787927
'',
87928-
'# Root-required setup.',
8792987928
'mount -o remount,size=1G /tmp',
87930-
'yum install -y libicu make sudo',
87929+
'yum install -y libicu make',
8793187930
'',
87932-
'# Create the non-root runner user.',
87933-
'if ! id runner >/dev/null 2>&1; then',
87934-
' useradd -m -s /bin/bash runner',
87935-
'fi',
87936-
'',
87937-
'# Drop to the runner user for download + configure + run.',
87938-
"sudo -u runner -H bash <<'RUNNER_BOOTSTRAP'",
87939-
'set -euo pipefail',
87940-
'cd "$HOME"',
87941-
'mkdir -p actions-runner && cd actions-runner',
87931+
'mkdir actions-runner && cd actions-runner',
8794287932
'',
8794387933
'case "$(uname -m)" in',
8794487934
' aarch64) RUNNER_ARCH="arm64" ;;',
@@ -87956,13 +87946,11 @@ async function startEc2Instance(label, githubRegistrationToken) {
8795687946
'',
8795787947
'tar xzf "$TARBALL"',
8795887948
'',
87949+
'export RUNNER_ALLOW_RUNASROOT=1',
8795987950
'export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1',
8796087951
`./config.sh --url "https://github.com/${owner}/${repo}" --token "${githubRegistrationToken}" --labels "${label}" --ephemeral --unattended --disableupdate`,
8796187952
'./run.sh',
87962-
'RUNNER_BOOTSTRAP',
87963-
'',
8796487953
];
87965-
const userData = userDataScript;
8796687954

8796787955
config.input.ec2ImageId = await resolveImageId(client);
8796887956

src/aws.js

Lines changed: 18 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -57,42 +57,32 @@ async function resolveImageId(client) {
5757
async function startEc2Instance(label, githubRegistrationToken) {
5858
const client = ec2Client();
5959

60-
// User-data runs as root. We install dependencies + create a dedicated
61-
// 'runner' user, then drop to that user for every subsequent step via
62-
// a sudo-heredoc. The runner never needs root and never gets it; the
63-
// old RUNNER_ALLOW_RUNASROOT=1 escape hatch is gone.
60+
// User-data runs as root. Phase 4's original attempt to drop to a
61+
// dedicated 'runner' user via sudo-heredoc broke dogfood in
62+
// terraform-provider-namecheap#182 — the EC2 instance came up but the
63+
// runner never registered within the 5 min action timeout. Reverted
64+
// here to the root-execution path the pre-Phase-4 bootstrap used,
65+
// isolating the non-root move for a separate investigation.
6466
//
65-
// Runner version is read from config so consumers can override without
66-
// waiting for an action release (see #10 for the motivation chain).
67-
//
68-
// The tarball is SHA-256 verified against actions/runner's published
69-
// checksum before extraction — same defense-in-depth pattern the
70-
// provider repo uses for its Go / Terraform downloads.
71-
//
72-
// --ephemeral tells GitHub to auto-deregister the runner after it
73-
// completes a single job; the stop-runner step's explicit removeRunner()
74-
// call becomes belt-and-braces rather than the primary deregister path.
67+
// Kept from the Phase 4 work (all verified independently of the
68+
// root/non-root axis):
69+
// - set -euo pipefail — fail fast on any bootstrap error.
70+
// - --ephemeral + --unattended + --disableupdate on config.sh —
71+
// one-job runner, no interactive prompts, no runner auto-update.
72+
// - SHA-256 verification of the runner tarball against the
73+
// published .sha256 sidecar before extraction.
74+
// - Parameterized runner-version via config.input.runnerVersion.
7575
const runnerVersion = config.input.runnerVersion;
7676
const owner = config.githubContext.owner;
7777
const repo = config.githubContext.repo;
78-
const userDataScript = [
78+
const userData = [
7979
'#!/bin/bash',
8080
'set -euo pipefail',
8181
'',
82-
'# Root-required setup.',
8382
'mount -o remount,size=1G /tmp',
84-
'yum install -y libicu make sudo',
83+
'yum install -y libicu make',
8584
'',
86-
'# Create the non-root runner user.',
87-
'if ! id runner >/dev/null 2>&1; then',
88-
' useradd -m -s /bin/bash runner',
89-
'fi',
90-
'',
91-
'# Drop to the runner user for download + configure + run.',
92-
"sudo -u runner -H bash <<'RUNNER_BOOTSTRAP'",
93-
'set -euo pipefail',
94-
'cd "$HOME"',
95-
'mkdir -p actions-runner && cd actions-runner',
85+
'mkdir actions-runner && cd actions-runner',
9686
'',
9787
'case "$(uname -m)" in',
9888
' aarch64) RUNNER_ARCH="arm64" ;;',
@@ -110,13 +100,11 @@ async function startEc2Instance(label, githubRegistrationToken) {
110100
'',
111101
'tar xzf "$TARBALL"',
112102
'',
103+
'export RUNNER_ALLOW_RUNASROOT=1',
113104
'export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1',
114105
`./config.sh --url "https://github.com/${owner}/${repo}" --token "${githubRegistrationToken}" --labels "${label}" --ephemeral --unattended --disableupdate`,
115106
'./run.sh',
116-
'RUNNER_BOOTSTRAP',
117-
'',
118107
];
119-
const userData = userDataScript;
120108

121109
config.input.ec2ImageId = await resolveImageId(client);
122110

0 commit comments

Comments
 (0)