Skip to content

fix(exec): propagate skips transitively under keep-going; UTF8-safe display_name; reap ephemeral images; fix cache clean#130

Merged
markovejnovic merged 2 commits into
mainfrom
relprep/4-fix-exec-keepgoing-skips-and-cleanup
Jun 10, 2026
Merged

fix(exec): propagate skips transitively under keep-going; UTF8-safe display_name; reap ephemeral images; fix cache clean#130
markovejnovic merged 2 commits into
mainfrom
relprep/4-fix-exec-keepgoing-skips-and-cleanup

Conversation

@markovejnovic

Copy link
Copy Markdown
Contributor

Summary

Four execution/cleanup correctness fixes for the local (hm run) backend:

  • exec-skip-propagation-1 (high) — under -k/--keep-going, a failed step's transitive dependents (grandchildren) ran anyway on a clean base image. A skipped step returns exit_code 0, but the downstream gate keyed on exit_code != 0, so a skipped predecessor never tripped it. Adds an explicit failed_or_skipped flag to StepOutcome (set on fail, timeout, cancel, and skip) and gates descendants on that instead, so a skip cascades through BuildsIn chains.
  • exec-cmd-utf8-panic-1 (medium)display_name truncation byte-sliced the command at offset 39 (&cmd[..39]), panicking when byte 39 was not a UTF-8 char boundary. Now truncates by chars.
  • vm-ephemeral-image-leak-1 (medium) — uncached leaf steps commit ephemeral:<cid> images that the registry never tracks and nothing ever removed. The scheduler now collects every ephemeral committed snapshot and reaps it at run end (success or failure) via a new StepRunner::reap_snapshots -> HmVm::remove_snapshot path.
  • cache-clean-wrong-remediation-1 (medium)hm cache clean deleted registry.db but left every tagged image, and its docker image prune advice (dangling-only) could not remove them. It now enumerates the registry's snapshot IDs (new ImageRegistry::all_snapshot_ids) and removes each image via the Docker backend before deleting the DB; the misleading prune warning is dropped.

Findings addressed

  • exec-skip-propagation-1
  • exec-cmd-utf8-panic-1
  • vm-ephemeral-image-leak-1
  • cache-clean-wrong-remediation-1

Verification evidence

Targeted checks run inside the worktree (touched crates only; full Docker-dependent e2e tests are #[ignore]d and cannot run in the sandbox without starting a Docker daemon):

$ cargo check -p hm-vm -p hm-exec -p harmont-cli
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.10s

$ cargo clippy -p hm-vm -p hm-exec -p harmont-cli --all-targets -- -D warnings
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s   # no warnings

$ cargo test -p hm-vm --lib
    test result: ok. 11 passed; 0 failed; 0 ignored
    # includes new registry::tests::all_snapshot_ids_returns_every_entry

$ cargo test -p hm-exec --lib
    test result: ok. 24 passed; 0 failed; 0 ignored

The skip-propagation regression guard (keep_going_skips_transitive_dependents, an A->B->C chain asserting B and C are both skipped — no step_start — when A fails) and the ephemeral-image reaping check are #[ignore = "requires Docker daemon"] integration tests in crates/hm/tests/keep_going.rs; they pass per the implementer's verification but are not runnable here under the shared-machine guardrails.

Judge verdict

Approved. No blocking issues. Implementer verification passed.


Generated by the Harmont release-readiness workflow (automated; needs human review before merge). 🤖

markovejnovic and others added 2 commits June 10, 2026 18:54
…phemeral images, fix cache clean

Four execution/cleanup correctness fixes for the local backend.

exec-skip-propagation-1 (high): under -k/--keep-going a failed step's
transitive (grandchild) dependents ran anyway on a clean base image. A
skipped step returns exit_code 0, but the downstream gate keyed on
`exit_code != 0`, so a skipped predecessor never tripped it. Add an
explicit `failed_or_skipped` flag to StepOutcome (set on fail, timeout,
cancel, and skip) and gate descendants on that instead, so a skip
cascades through BuildsIn chains. Adds an A->B->C chain e2e test
asserting B and C are both skipped (no step_start) when A fails.

exec-cmd-utf8-panic-1 (medium): display_name truncation byte-sliced the
command at offset 39 (`&cmd[..39]`), panicking when byte 39 was not a
UTF-8 char boundary. Truncate by chars instead.

vm-ephemeral-image-leak-1 (medium): uncached leaf steps commit
`ephemeral:<cid>` images that the registry never tracks and nothing ever
removed. The scheduler now collects every ephemeral committed snapshot
and reaps it at run end (success or failure) via a new
StepRunner::reap_snapshots -> HmVm::remove_snapshot path. Verified: zero
ephemeral images remain after a local run.

cache-clean-wrong-remediation-1 (medium): `hm cache clean` deleted
registry.db but left every tagged image, and its `docker image prune`
advice (dangling-only) could not remove them. Enumerate the registry's
snapshot IDs (new ImageRegistry::all_snapshot_ids) and remove each image
via the Docker backend before deleting the DB; drop the misleading
prune warning.
@markovejnovic markovejnovic merged commit 3ae2129 into main Jun 10, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant