stale-runner: identity guard; options: symmetric provider wiring#73
Merged
stale-runner: identity guard; options: symmetric provider wiring#73
Conversation
Base-name comparison alone collides when two unrelated binaries share a name — anywhere /usr/bin/foo and /opt/tools/foo coexist, a recycled PID landing on the unrelated foo would pass the check. When the caller passes an absolute path (the common case — the spawn pipeline resolves runnerPath to absolute) compare against the full resolved /proc/PID/exe path instead. Retain a base-name fallback when the caller passes a bare name so the API remains useful in contexts where the caller only knows the binary's base name. Also strip the " (deleted)" suffix the kernel appends when the underlying binary has been unlinked post-exec, so that runners still running from a since-cleaned extract directory are correctly identified. Updated the two colliding tests that previously relied on the loose semantics: one covers the base-name fallback path, one covers the strengthened absolute-path mismatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a PIDStartTime field alongside PID so terminateStaleRunner — and future logic on platforms where /proc/PID/exe comparison is unavailable — can disambiguate a recycled PID from the original runner. Populated with time.Now().UTC() at the same site that writes the PID. The JSON tag uses omitempty so state files written before this field existed still load cleanly with a zero-time default. Tests cover both directions: a round-trip at sub-second precision, and a legacy-format state file missing the field that must load with IsZero() == true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before signalling a PID loaded from the state file, verify the
PID still belongs to the expected runner binary. Between a
crashed runner and the next Run(), the kernel may recycle the
PID onto an unrelated process; sending SIGTERM/SIGKILL to
-PID (the whole process group) at that point would kill the
wrong group.
Adds a cfg.processIsExpected hook alongside the existing
processAlive and killProcess hooks. Default calls
procutil.IsExpectedProcess with the runner binary's base name
(strong match via /proc/PID/exe on Linux, liveness-only on
macOS). Tests that previously exercised the kill path now opt
into cfg.processIsExpected = func(int) bool { return true }
to match their fake-PID scaffolding.
New regression test: PID alive but processIsExpected says false
— must skip termination entirely and not call killProcess.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, a network provider was auto-created only when an egress policy was set. A caller who supplied only WithFirewallRules or WithFirewallDefaultAction(Deny) without a provider got the default runner-side path, which has no filter — the caller's deny-default silently degraded to allow-all. Extend the auto-wire trigger: any non-default firewall configuration (egress policy, non-empty rules, or a non-Allow default) now creates hosted.NewProvider() when the caller did not set one explicitly. Extract the wiring into wireDefaultProvider so the four cases can be unit-tested directly on a config instead of indirectly through Run(). Test introduces a sentinelProvider for the explicit-provider case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 5 of the hardening series. Two related correctness changes
plus the scaffolding to support them.
Stronger process identity check.
procutil.IsExpectedProcesspreviously compared only basenames of
/proc/PID/exeagainst the expected binary. Twounrelated binaries sharing a name on the same host could
therefore collide. When the caller passes an absolute path
(the common case — the spawn pipeline resolves
runnerPathto absolute), the comparison is now against the full
resolved exe path. A base-name fallback remains for callers
that pass bare names. Also strips the
(deleted)suffix thekernel appends when the underlying binary has been unlinked
post-exec.
Record PIDStartTime in persisted state. A new field on
state.StatealongsidePID, populated at spawn time withtime.Now().UTC(). Backward-compatible: missing field onlegacy files unmarshals to zero time. Intended for future
PID-disambiguation on platforms without
/proc/PID/exe(tracked as Phase 8).
Guard
terminateStaleRunneragainst recycled PIDs.Before sending SIGTERM/SIGKILL to
-PID(the whole processgroup), verify via a new
cfg.processIsExpectedhook thatthe PID still belongs to the runner binary. Default calls
procutil.IsExpectedProcesswith the runner base name.Without this, a PID recycled onto an unrelated session
leader would have its entire group signalled.
Auto-wire a network provider for firewall-only configs.
Previously, a hosted provider was auto-created only when
EgressPolicywas set. Callers who supplied onlyWithFirewallRulesorWithFirewallDefaultAction(Deny)got the default runner-side path, which has no filter —
the caller's deny-default silently degraded to allow-all.
Extend the trigger: any non-default firewall configuration
now creates
hosted.NewProvider()when no provider is setexplicitly.
Commits are bisectable; each lands with its own tests.
Test plan
task verifygreen at every commitTestIsExpectedProcess_AbsolutePathMismatch{,Fails}— absolutepath with differing dir no longer matches even when base names align.
TestIsExpectedProcess_SelfBaseName{Fallback}— bare namestill matches via the fallback path.
TestManager_PIDStartTime_RoundTrip— sub-second fidelitypreserved through Save/Load.
TestManager_Load_MissingPIDStartTime_IsZero— legacy statefiles without the field load as zero time.
TestTerminateStaleRunner_RecycledPID_Skipped— PID alivebut not expected →
killProcessnot called.TestWireDefaultProvider/*— four table entries covering no-firewall, egress policy, firewall-rules-only, deny-default-only,
and explicit-provider-preserved.
TestTerminateStaleRunner_*updated to set thenew
cfg.processIsExpectedhook where they exercise thekill path.
brood-boxlocalreplace: bbox rebuilds,VM boots, SSH + hooks + workspace round-trip clean.
🤖 Generated with Claude Code