fix(cli): surface deploy remote output, write the audit log, seed flat-alias ssh config (#2957)#3166
Conversation
…t-alias ssh config (#2957) Wave 3 of the #2957 deploy-orchestration campaign (observability + ssh aliases). DEP-6a: the dispatch closures dropped every ssh.run() result, so read verbs (audit, details, app logs/details/containers/images, lock status) returned only host-count summaries in live mode. Add a sequential collecting dispatcher that surfaces remote stdout host-prefixed ([host] line), with stderr fallback for tolerated failures (lock status on a missing lock). DEP-6b: AuditorCommands.record() had zero call sites, so 'wheels deploy audit' tailed /tmp/kamal-audit.log — a file this tool never wrote. Deploy now brackets the work with started/completed records (after lock acquire, before lock release), rollback records the target version, and setup records each accessory boot. All records dispatch allowFail so a read-only-fs or similar can never fail the deploy. DEP-7: the flat 'bootstrap'/'exec' aliases — the only CLI-reachable form (#2677 picocli shadowing) — constructed a bare 'new SshPool()' that ignored deploy.yml's ssh: block (user/port/keys -> root@host:22 + ssh-agent). They now route through $deployBuildSshPool(opts.configPath) like every other verb; a source-level guard spec bans bare SshPool construction in Module.cfc. TDD red-first: 12 new specs (FakeSshPool + source inspection), all red before the fix, green after. CLI suite in the lucee7 docker harness: 1018 pass / 0 fail / 2 known docker-env errors (SshClientSpec/SshPoolSpec docker-not-found), up from the 1006-pass baseline. Signed-off-by: Peter Amiri <peter@alurium.com>
There was a problem hiding this comment.
Wheels Bot — Reviewer
TL;DR: Wave 3 of the #2957 deploy campaign — read verbs now surface host-prefixed remote output (DEP-6a), the audit log is actually written (DEP-6b), and the flat bootstrap/exec aliases seed their SSH pool from deploy.yml (DEP-7). I verified every claim against the source and found no blocking issues: the closure-collection pattern is correct, the audit bracketing matches the stated ordering, Module.cfc is clean of bare pools, and the 12 new specs exercise both happy and failure paths. Verdict: comment — two non-blocking suggestions below.
Verified (evidence)
- Closure safety. All three collect helpers (
DeployMainCli.cfc:339,DeployAppCli.cfc:139,DeployLockCli.cfc:99) capturecmd/doRaiseinto locals and accumulate via a shared struct (var ctx = {lines: []}) — the exact pattern CLAUDE.md anti-pattern #10 prescribes, mirroring the existing$dispatchhelpers. sequentialis genuinely required, not just stylistic.SshPool.onEach(SshPool.cfc:57) submits to a thread-pool executor, so appending to a shared CFML array from its callbacks would race;SshPool.sequential(SshPool.cfc:119) is a plain serial loop. The choice is sound and the doc comments say why.- Audit bracketing matches the claim.
startedrecord dispatched after lock acquire / before pull (DeployMainCli.cfc:110),completedafter the role loop / before thefinallylock release (DeployMainCli.cfc:139), bothallowFail=true. Rollback (DeployMainCli.cfc:209) and setup accessory boots likewise. The ordering spec inDeployMainCliSpec.cfcpins acquire < started < pull and run < completed < release. - The "failing audit never fails deploy" spec is meaningful.
FakeSshPool.runhonors theopts.raisecontract (FakeSshPool.cfc:99-101), so theexitCode: 1expectation would throw if the call site weren'tallowFail. - DEP-7 ban check holds. Post-fix,
Module.cfccontains zeronew modules.wheels.services.deploy.lib.SshPool(occurrences and 11$deployBuildSshPool(opts.configPath)call sites; the flatcase "bootstrap":/case "exec":labels precede the nestedserverbranch, soDeployAliasSshPoolSpec.$caseBody's first-occurrence slicing targets the right bodies. Source-level Module.cfc inspection has 7 prior-art specs (e.g.McpHiddenToolsSpec,ConsoleCommandSpec). - Dry-run behavior preserved. Each
$dispatchCollectdry-run branch appends the same[host] cmdlines todryRunBufferas the$dispatchit replaces (including$dispatchAnyCollectwriting onlyhosts[1], matching$dispatchAny's any-one-host semantics atDeployLockCli.cfc:80-85).
Suggestions (non-blocking)
auditverb still hard-fails on a host that has never deployed.DeployMainCli.cfc:266—$dispatchCollect(hosts, cmd, dryRun)leavesallowFailat itsfalsedefault, sotailon a missing/tmp/kamal-audit.lograisesWheels.Deploy.RemoteExecutionFailedand aborts mid-fleet. This is pre-existing behavior (the old$dispatchraised too), but this PR gavelock statusthe nicer treatment —allowFail=true+ stderr fallback (DeployLockCli.cfc:54) — and the same would suitaudit: a fresh host would showtail: cannot open '/tmp/kamal-audit.log'as advisory output instead of an exception. Fine as a follow-up.- Blank lines in remote output are dropped.
listToArray(text, chr(10))skips empty list elements, so blank lines in e.g.app logsoutput silently disappear (DeployMainCli.cfc:361, mirrored in the other two helpers). Cosmetic fordocker ps-style output; if log fidelity ever matters,listToArray(text, chr(10), true)preserves them.
Tests
12 new specs, red-first per the PR evidence table (1006 → 1018 pass, 0 fail). Coverage spans happy path (stdout surfaced host-prefixed), failure path (stderr fallback on nonzero exit, failing audit record tolerated), ordering (bracket records vs. lock/pull/run), and a structural regression guard for DEP-7. All extend the established wheels.wheelstest.system.BaseSpec deploy-spec pattern with FakeSshPool.
Docs & Commits
- Changelog fragment
changelog.d/deploy-w3-observability.fixed.mdpresent with validfixedtype — no directCHANGELOG.mdedit. ✓ .ai/wheels/deploy.mdstays accurate (itswheels deploy auditline — "tail /tmp/kamal-audit.log on each server" — is now more true, since the file finally gets written).- Single commit, header
fix(cli): …at 95 chars, type/scope valid percommitlint.config.js, DCOSigned-off-bypresent, body explains the why. ✓
Wave 3 of the #2957 deploy-orchestration campaign: observability (DEP-6a/6b) + flat-alias ssh config (DEP-7). Refs #2957.
Closes these items from the 2026-06-12 re-scope checklist on #2957:
What changed
DEP-6a — read verbs surface remote output. New
$dispatchCollect(DeployMainCli + DeployAppCli, mirrored;$dispatchAnyCollecton DeployLockCli) runs read commands viaSshPool.sequential/onAny, collectsresult.stdouthost-prefixed ([host] line), and appends it to the live-mode summary. Stderr is the fallback when a tolerated (allowFail) command produced no stdout — solock statuson an un-held lock shows readlink's "No such file" diagnostic instead of nothing. Covered verbs:audit,details(app + proxy + accessories),app logs/details/containers/images,lock status. Sequential execution keeps output order stable and the collection single-threaded (no shared-array races on the real pool's executor). Dry-run behavior unchanged.DEP-6b — the audit log is actually written.
$deploy()now brackets the work withauditor.record("started deploy of version X")(after lock acquire, before pull) andrecord("completed deploy of version X")(after the role loop, before lock release);rollback()recordsrolled back to version X;setup()recordsbooted accessory <name>per accessory on its hosts. All audit dispatches passallowFail=true— observability can never fail a deploy (spec pins a read-only-fs record failure not aborting the deploy).DEP-7 — flat aliases honor
ssh:config.case "bootstrap"/case "exec"inModule.cfcnow constructDeployServerCliwith$deployBuildSshPool(opts.configPath)(SshPoolFactory seeds user/port/first key from deploy.yml) instead of a barenew SshPool()(root@22 + ssh-agent). NewDeployAliasSshPoolSpecis a source-level guard, following the established Module.cfc spec pattern (Module extendsmodules.BaseModule, only resolvable at LuCLI runtime; the pool has no accessor through DeployServerCli): it bans any barenew modules.wheels.services.deploy.lib.SshPool(in Module.cfc and asserts both alias case-bodies call the factory. The factory's config-seeding behavior itself is pinned by the existingSshPoolFactorySpec.Evidence
TDD red-first: 12 new specs, all failing before the fix, all green after:
DeployMainCliSpec: audit/details surface host-prefixed stdout; deploy brackets work with started/completed records (ordering pinned: acquire < started < pull, run < completed < release); rollback + setup-accessory records; failing audit record never fails the deploy.DeployAppCliSpec:logs/containerssurface host-prefixed stdout.DeployLockCliSpec:statussurfaces readlink stdout; stderr fallback when no lock is held.DeployAliasSshPoolSpec(new): no bare SshPool in Module.cfc; both flat alias bodies call$deployBuildSshPool(opts.configPath).CLI suite in the lucee7 docker harness (
/wheels/cli/tests?format=json):The 2 errors are the known docker-env artifacts (
SshClientSpec/SshPoolSpecdocker-not-found inside the harness container) — present on the baseline too.Real SSH / docker-remote behavior is unverifiable in this harness — FakeSshPool specs + the dry-run flows are the verification bar here, per the campaign convention. The env-gated
E2EDeploySpec(DEPLOY_E2E=1) exercises the same$deploy()path over real sshd and is unaffected (audit records are plainecho >> /tmp/kamal-audit.logappends through the shim-logged shell).🤖 Generated with Claude Code