Skip to content

Commit 7d3bb7a

Browse files
heavygeecursoragent
andcommitted
feat(runner): persist disableVersionHandoff opt-out in settings.json
The HAPI_DISABLE_VERSION_HANDOFF=1 env-var contract introduced by feat/runner-skip-version-handoff-flag relies on every CLI invocation inheriting the flag. That works for the systemd unit (drop-in sets it once) but leaks badly: any terminal that runs `hapi runner start-sync` without exporting the var sees mtime drift, kills the live runner, then gets SIGTERM'd itself - leaving the machine offline. Today's 22:40 BST incident reproduced exactly this pattern (operator-launched start-sync at PID 65341, env dump shows no HAPI_DISABLE_VERSION_HANDOFF, fell through to the mtime block, killed live runner PID 24935). Add a persisted fallback: settings.runnerDisableVersionHandoff:true in ~/.hapi/settings.json gets the same effect as the env var. Once written once, every CLI invocation from any context honors it without relying on environment inheritance. Env var still wins when set (operator override). Default off (npm consumers see no behavior change). Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent c1708e3 commit 7d3bb7a

3 files changed

Lines changed: 34 additions & 11 deletions

File tree

cli/src/persistence.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,15 @@ interface Settings {
2121
apiUrl?: string
2222
// Legacy field name (for migration, read-only)
2323
serverUrl?: string
24+
/**
25+
* Persisted equivalent of HAPI_DISABLE_VERSION_HANDOFF=1. When true, the
26+
* mtime-driven self-restart paths (heartbeat watcher AND fresh-invocation
27+
* version check) are skipped regardless of how the runner was launched.
28+
* Use this when an environment variable is too fragile (e.g. operators
29+
* regularly invoke the CLI from terminals without exporting the flag).
30+
* The env var still wins when set; this is the fallback.
31+
*/
32+
runnerDisableVersionHandoff?: boolean
2433
}
2534

2635
const defaultSettings: Settings = {}

cli/src/runner/controlClient.ts

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -185,12 +185,17 @@ export async function isRunnerRunningCurrentlyInstalledHappyVersion(): Promise<b
185185
const currentMachineId = settings.machineId;
186186

187187
try {
188-
// When HAPI_DISABLE_VERSION_HANDOFF=1 is set on the live runner (operator
189-
// owns supervision via systemd/tmux/soup), a fresh CLI invocation must NOT
190-
// treat the running runner as stale just because source mtimes shifted.
191-
// Otherwise `hapi runner start` would kill the live runner mid-rebuild.
192-
if (process.env.HAPI_DISABLE_VERSION_HANDOFF === '1') {
193-
logger.debug('[RUNNER CONTROL] HAPI_DISABLE_VERSION_HANDOFF=1 set, skipping mtime/version drift check');
188+
// When the version-handoff opt-out is in effect (env var OR persisted
189+
// setting), a fresh CLI invocation must NOT treat the running runner as
190+
// stale just because source mtimes shifted. Otherwise `hapi runner start`
191+
// would kill the live runner mid-rebuild. The env var is the operator
192+
// contract; the persisted setting is the safety net for terminals or
193+
// scripts that forget to export it. See 2026-05-31 22:40 incident
194+
// retrospective in docs/plans/2026-05-31-runner-self-restart-bluedeploy-fix.md.
195+
const handoffDisabledByEnv = process.env.HAPI_DISABLE_VERSION_HANDOFF === '1';
196+
const handoffDisabledBySetting = settings.runnerDisableVersionHandoff === true;
197+
if (handoffDisabledByEnv || handoffDisabledBySetting) {
198+
logger.debug(`[RUNNER CONTROL] Version-handoff disabled (env=${handoffDisabledByEnv}, setting=${handoffDisabledBySetting}); skipping mtime/version drift check`);
194199
} else {
195200
const currentCliMtimeMs = getInstalledCliMtimeMs();
196201
if (typeof currentCliMtimeMs === 'number' && typeof state.startedWithCliMtimeMs === 'number') {

cli/src/runner/run.ts

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import { configuration } from '@/configuration';
1111
import packageJson from '../../package.json';
1212
import { getEnvironmentInfo } from '@/ui/doctor';
1313
import { spawnHappyCLI } from '@/utils/spawnHappyCLI';
14-
import { writeRunnerState, RunnerLocallyPersistedState, readRunnerState, acquireRunnerLock, releaseRunnerLock } from '@/persistence';
14+
import { writeRunnerState, RunnerLocallyPersistedState, readRunnerState, readSettings, acquireRunnerLock, releaseRunnerLock } from '@/persistence';
1515
import { isProcessAlive, isWindows, killProcess, killProcessByChildProcess } from '@/utils/process';
1616
import { PERMISSION_MODES } from '@hapi/protocol/modes';
1717
import { withRetry } from '@/utils/time';
@@ -808,11 +808,20 @@ export async function startRunner(options: { workspaceRoots?: string[] } = {}):
808808
// Check if runner needs update.
809809
// Skip entirely when the operator owns process supervision (systemd, tmux,
810810
// soup rebuilds, etc.) and source mtimes change for reasons unrelated to
811-
// an actual npm upgrade. HAPI_DISABLE_VERSION_HANDOFF=1 keeps the rest of
812-
// the heartbeat (session pruning, state file persistence) intact.
813-
if (process.env.HAPI_DISABLE_VERSION_HANDOFF === '1') {
811+
// an actual npm upgrade. The opt-out is honored via either:
812+
// * HAPI_DISABLE_VERSION_HANDOFF=1 in the runner's environment, OR
813+
// * runnerDisableVersionHandoff:true in ~/.hapi/settings.json
814+
// The persisted setting exists so terminals/scripts that forget to
815+
// export the env var still inherit the opt-out (2026-05-31 22:40 BST
816+
// incident retrospective; see docs/plans/2026-05-31-runner-self-restart-bluedeploy-fix.md).
817+
// Either path keeps the rest of the heartbeat (session pruning, state
818+
// file persistence) intact.
819+
const handoffDisabledByEnv = process.env.HAPI_DISABLE_VERSION_HANDOFF === '1';
820+
const heartbeatSettings = await readSettings();
821+
const handoffDisabledBySetting = heartbeatSettings.runnerDisableVersionHandoff === true;
822+
if (handoffDisabledByEnv || handoffDisabledBySetting) {
814823
if (process.env.DEBUG) {
815-
logger.debug('[RUNNER RUN] HAPI_DISABLE_VERSION_HANDOFF=1 set, skipping mtime/version drift self-restart');
824+
logger.debug(`[RUNNER RUN] Version-handoff disabled (env=${handoffDisabledByEnv}, setting=${handoffDisabledBySetting}); skipping mtime/version drift self-restart`);
816825
}
817826
} else {
818827
const installedCliMtimeMs = getInstalledCliMtimeMs();

0 commit comments

Comments
 (0)