You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(cli): make the deploy lock all-or-nothing across hosts and expand its metadata (#3163)
* fix(cli): make the deploy lock all-or-nothing across hosts and expand its metadata
DEP-1 (#2957): lock acquire/release went through $dispatchAny ->
SshPool.onAny, which is first-success-wins and swallows per-host
failures. On any multi-host fleet, contention on host 1 raised, onAny
caught it, acquired a fresh lock on host 2, and the concurrent deploy
proceeded — mutual exclusion held only for single-host configs. Release
was also onAny, so the released host could differ from the acquired
host, stranding stale locks.
Now the deploy lock is acquired on EVERY (deduped) host, sequentially
in config order with raise=true; on a partial failure the
already-acquired locks are rolled back (never the contended host's —
that lock belongs to the other deploy) and the per-host error surfaces
as Wheels.Deploy.LockAcquireFailed naming the failing host. Release
fans out to the exact hosts the lock was acquired on. The manual
'wheels deploy lock acquire/release/status' verbs follow the same
fleet-wide semantics. SshPool.onAny itself is unchanged — the proxy
boot check still legitimately uses it (Wave 2 scope).
DEP-10 (#2957): LockCommands.acquire wrapped the whole symlink target
in single quotes while claiming $(hostname)/$(date) were resolved by
the remote shell — single quotes suppress command substitution, so lock
metadata recorded the literal text. The target is now three
concatenated shell words: shellEscape(user) + a double-quoted
substitution segment + shellEscape(message), so the metadata expands
while hostile metacharacters in user/message stay inert.
Refs #2957 (Wave 1: DEP-1a, DEP-1b, DEP-10)
Signed-off-by: Peter Amiri <peter@alurium.com>
* fix(cli): per-host best-effort lock release so a dead host cannot strand the fleet
Review findings on the all-or-nothing lock PR: allowFail only mapped to
{raise: false} inside SshClient.run, which suppresses nonzero exit codes
but not transport failures.
- DeployLockCli release/status fanned out via SshPool.onEach, which
pre-resolves a connection for every host before running anything, so
one unreachable host aborted the whole verb with zero commands
executed — turning the prescribed stale-lock recovery path into a
dead-end exactly when a host died mid-deploy. Both verbs now dispatch
per host with a per-host try/catch and report skipped hosts in the
summary instead of throwing.
- DeployMainCli's finally-block release claimed it could never shadow
the original deploy exception, but a transport failure (dead cached
connection, startSession throws inside the onEach task, rethrown from
future.get) propagated out of the finally and replaced the in-flight
deploy error. The release is now per-host best-effort; skipped hosts
are logged, never thrown, and every healthy host is still released.
- $rollbackAcquiredLocks in both CFCs is host-granular for the same
reason: one dead host no longer stops the rollback from clearing the
locks on the remaining healthy hosts.
- FakeSshPool now mirrors the real pool's transport semantics so specs
can exercise these paths: failConnection(host) models the eager
connect throw (onEach aborts wholesale, sequential fails lazily,
onAny skips to the next host) and a scripted transportError result
throws from run() regardless of raise=false. Verified the new specs
fail against the previous production code.
Real SSH remains unverifiable in the harness — FakeSshPool mirrors the
verified real-pool semantics (SshPool.onEach pre-resolve, SshClient.run
transport throws) and the full CLI suite passes.
Signed-off-by: Peter Amiri <peter@alurium.com>
---------
Signed-off-by: Peter Amiri <peter@alurium.com>
Co-authored-by: Peter Amiri <petera@pai.com>
-`wheels deploy` lock acquisition is now all-or-nothing across the fleet: the lock is acquired on every (deduped) host sequentially in config order with failures surfaced, already-acquired locks are rolled back on a partial failure (`Wheels.Deploy.LockAcquireFailed` names the contended host; the contended host's own lock is never touched), and release fans out to every acquired host. Previously the first-success-wins `onAny` dispatch swallowed contention on one host and silently re-acquired on another, so concurrent deploys were only mutually excluded on single-host configs — and release could target a different host than acquire, stranding stale locks. The manual `wheels deploy lock acquire/release/status` verbs follow the same fleet-wide semantics (#2957)
2
+
- Deploy lock metadata now actually expands `$(hostname)` and `$(date --iso-8601=seconds)` on the remote: the symlink target double-quotes the substitution segment while keeping the user and message inert via `shellEscape` single-quoting — previously the whole target was single-quoted, which suppressed command substitution and recorded the literal `$(hostname)` text (#2957)
0 commit comments