Skip to content

fix(install): skip cache for unpinned virtual deps when on-disk hash matches lockfile (closes #1548)#1553

Merged
danielmeppiel merged 3 commits into
mainfrom
danielmeppiel/fix-1548-content-hash-mismatch
May 30, 2026
Merged

fix(install): skip cache for unpinned virtual deps when on-disk hash matches lockfile (closes #1548)#1553
danielmeppiel merged 3 commits into
mainfrom
danielmeppiel/fix-1548-content-hash-mismatch

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Summary

Closes #1548. apm install was failing on the second run with a false-positive Content hash mismatch ... supply-chain attack error for unpinned git/virtual-file dependencies. This PR adds the missing cache-skip branch so the redundant re-download (and the false alert it triggered) goes away, without weakening supply-chain detection.

Problem

When a git or virtual-file dep was recorded in apm.lock.yaml without a resolved_commit -- typically because the underlying transport fell back from partial-clone to full bare-clone (observed on Azure DevOps repos whose server does not support partial-clone filter v2) and APM had no SHA to pin against the floating ref -- the cache-skip logic in _resolve_download_strategy (src/apm_cli/install/phases/integrate.py:113-160) had no branch covering the case. The two existing branches required either:

  1. locked_dep.resolved_commit to be set and not "cached", or
  2. locked_dep.source == "registry".

Unpinned git deps fell through both, lockfile_match stayed False, and the package was re-downloaded on every install. A re-clone of a floating ref can produce a fresh hash that does not byte-match the lockfile-recorded content_hash, so the supply-chain check at src/apm_cli/install/sources.py:766-787 then sys.exit(1) with "supply-chain attack" wording -- the false positive reported in #1548.

Fix

Add cache-skip parity for the unpinned class. In _resolve_download_strategy, after the existing registry branch, add an elif for (locked_dep, locked_dep.content_hash, not ref_changed, not update_refs). When _should_skip_redownload confirms the on-disk content still hashes to the lockfile-recorded value, set lockfile_match = True (with the _via_content_hash_only flag for the existing self-heal pipeline). The package is intact, so re-downloading it is wasted work and the source of the false positive.

This matches Option 1 from the triage brief. Option 2 (record a resolved_commit even on partial-clone-fallback paths) is more architecturally correct but ripples through the lockfile schema and self-heal chain; this PR keeps the change isolated and reversible.

Supply-chain boundary

The supply-chain check is preserved. The new branch only fires when verify_package_hash(install_path, locked_dep.content_hash) returns True -- i.e. on-disk bytes already hash to the lockfile-recorded value. We never suppress a fresh download that would otherwise have produced a different hash than the lockfile; we only suppress a redundant download whose result we have already verified against the lockfile via the on-disk content. Real divergence (file actually tampered) still falls through to the fresh-download path where sources.py runs the existing mismatch check.

Tests

tests/unit/install/test_integrate_phase3w5.py -- three new cases on TestResolveDownloadStrategy:

  • test_unpinned_git_dep_skips_when_on_disk_hash_matches -- positive: with resolved_commit=None, content_hash="deadbeef", source="git" and verify_package_hash -> True, asserts skip_download is True. Fails at HEAD without the fix; passes after. Verified the mutation-break gate by git stash of the integrate.py change: this test fails, confirming the fix is the cause of the green.
  • test_unpinned_git_dep_redownloads_when_on_disk_hash_differs -- negative gate: same setup, verify_package_hash -> False, asserts skip_download is False. Proves a real on-disk mismatch is NOT masked by the new branch.
  • test_unpinned_git_dep_no_skip_when_lockfile_lacks_content_hash -- guard: when neither resolved_commit nor content_hash are present there is no integrity anchor, so the branch must not fire.

Full suite: 1638 passed across tests/unit/install/, tests/unit/test_install_command.py, tests/unit/test_drift_registry.py, tests/unit/test_drift_detection.py.

How to test locally

  1. Construct an apm.yml with a git dep on a repo whose server cannot serve a partial-clone (or any virtual-file dep that lands without a resolved_commit).
  2. apm install once -- this succeeds and writes a lockfile entry with content_hash set and resolved_commit empty.
  3. apm install a second time. Before this PR: [x] Content hash mismatch ... supply-chain attack. After this PR: install succeeds, the cached package is reused, no re-download, no false alert.

Validation evidence

  • uv run --extra dev ruff check src/ tests/ -- silent.
  • uv run --extra dev ruff format --check src/ tests/ -- silent.
  • uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/ -- 10.00/10.
  • bash scripts/lint-auth-signals.sh -- clean.
  • Targeted suite: 22 / 22 in test_integrate_phase3w5.py; broader: 1638 / 1638 across install + drift modules.

ASCII-only.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…matches lockfile

When a git/virtual-file dependency was recorded in the lockfile without a
resolved_commit (e.g. ADO partial-clone fallback could not pin a SHA, or
a virtual-file dep was carved out without a commit anchor), the second
install bypassed the content-hash cache-skip and re-downloaded the
package every time. Re-clones of floating refs can yield a slightly
different fresh hash, which then trips the supply-chain mismatch check
in sources.py with a false-positive 'supply-chain attack' alert.

Add cache-skip parity with the existing resolved_commit and registry
branches: when the lockfile recorded a content_hash for an unpinned dep
and the on-disk content still hashes to that value, skip re-download.
Supply-chain detection is preserved -- the new branch only fires when
on-disk content is verified unchanged; any real divergence falls
through to the fresh-download flow where the existing mismatch check
still runs.

closes #1548

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 29, 2026 21:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a false-positive supply-chain Content hash mismatch failure on the second apm install run for unpinned git/virtual-file dependencies by allowing cache reuse when the on-disk content still matches the lockfile-recorded content_hash.

Changes:

  • Extend _resolve_download_strategy cache-skip logic to handle lockfile entries that have a content_hash but no resolved_commit (and no ref drift / update mode).
  • Add unit tests covering the new skip/no-skip behavior for unpinned git dependencies based on verify_package_hash.
  • Add a changelog entry describing the fix.
Show a summary per file
File Description
src/apm_cli/install/phases/integrate.py Adds a cache-skip branch for unpinned deps when on-disk content matches the lockfile content_hash.
tests/unit/install/test_integrate_phase3w5.py Adds regression tests for the new unpinned-dependency cache-skip behavior (positive + negative + guard).
CHANGELOG.md Documents the fix under Unreleased.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread CHANGELOG.md Outdated

### Fixed

- `apm install` no longer fails the second run with a false-positive `Content hash mismatch ... supply-chain attack` error for unpinned git/virtual-file dependencies whose lockfile entry could not record a `resolved_commit` (e.g. ADO partial-clone fallback paths). When the on-disk content still hashes to the lockfile-recorded value, the redundant re-download is now skipped, mirroring the existing cache-skip parity with pinned and registry deps. Real content divergence still falls through to the supply-chain verification path. (closes #1548) -- by @OskarKlintrot
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

danielmeppiel commented May 30, 2026

APM Review Panel: ship_now

The follow-up pass is clean: PR #1553 now has the fixture-backed roundtrip test, avoids redundant tree hashing after verified content-hash reuse, and keeps the content-hash-only trust invariant explicit.

cc @danielmeppiel @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

The panel sees the original false-positive supply-chain failure addressed without weakening integrity checks. The new pre-download content-hash-only branch verifies the on-disk tree before reuse, records that verification in InstallContext, and lets the sequential integrate phase reuse that proof instead of hashing the same tree again. Real divergence still reaches the fresh-download supply-chain mismatch path. CI is green on 3922f9e0.

Panel summary

Persona B R N Takeaway
Python Architect 0 0 0 The state handoff through InstallContext.content_hash_verified_deps keeps the optimization explicit and localized.
CLI Logging Expert 0 0 0 No user-facing output changes; existing mismatch messaging remains intact.
DevX UX Expert 0 0 0 Reinstall behavior now matches package-manager expectations: unchanged locked content is reused quietly.
Supply Chain Security Expert 0 0 0 The content hash remains the only trust signal for no-commit lockfile entries, and mismatches still fail closed.
OSS Growth Hacker 0 0 0 The changelog now leads with the user-visible false-positive error.
Doc Writer 0 0 0 Changelog coverage is sufficient; no Starlight page drift found for this internal install-path fix.
Test Coverage Expert 0 0 0 Added integration roundtrip coverage plus mutation-break evidence for the no-resolved_commit content-hash replay path.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Recommendation

Ship when the maintainer is ready. All foldable panel follow-ups were addressed in the pushed commit, Copilot has no inline comments, and all PR checks are successful.


Full per-persona findings

Python Architect

No findings.

CLI Logging Expert

No findings.

DevX UX Expert

No findings.

Supply Chain Security Expert

No findings.

OSS Growth Hacker

No findings.

Auth Expert -- inactive

No authentication or credential-resolution behavior changed.

Doc Writer

No findings.

Test Coverage Expert

No findings.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

danielmeppiel and others added 2 commits May 30, 2026 17:54
Fold shepherd follow-ups for PR #1553 by adding an integration regression test for content-hash-only lockfile replay, teaching pre-download to honor verified content_hash entries with no resolved_commit, and avoiding a second tree hash after a verified skip-redownload decision.

Addresses panel follow-ups for issue #1548.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 5b6bc83 into main May 30, 2026
12 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel/fix-1548-content-hash-mismatch branch May 30, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm install fail second time with Content hash mismatch

3 participants