Skip to content

✨ feat(resolver): share candidate caches across pip install resolvers#13989

Open
gaborbernat wants to merge 1 commit into
pypa:mainfrom
gaborbernat:pip-tools-share-candidate-cache
Open

✨ feat(resolver): share candidate caches across pip install resolvers#13989
gaborbernat wants to merge 1 commit into
pypa:mainfrom
gaborbernat:pip-tools-share-candidate-cache

Conversation

@gaborbernat
Copy link
Copy Markdown

@gaborbernat gaborbernat commented May 8, 2026

A pip install invocation runs more than one resolver. The user's request goes through one main Resolver, and every in-process build environment that has to install build dependencies for an sdist or editable spawns its own. ⚡ Build deps overlap heavily across builds (setuptools, wheel, hatchling, and friends), and many of them also overlap with the user's runtime requirements. Today every resolver pays the cost of building every LinkCandidate from scratch because the cache that amortises construction lives on the per-invocation Factory and dies with it. Downstream tools that drive even more resolutions in one process (pip-tools's lock pipeline fans out tens of resolvers per command) feel this much harder.

Both new parameters default to None, so every existing caller keeps the current per-invocation ownership. Pass a dict and Factory writes into the caller's dict instead of its own; identity, not equality, is what makes the optimisation composable across Resolver instances. The cache type stays the existing Cache[LinkCandidate] / Cache[EditableCandidate] aliases the module already uses internally.

InprocessBuildEnvironmentInstaller uses the new API end-to-end. It owns one cache pair as instance state and threads it into every _make_resolver() call, so a pip install foo bar that builds two sdists with overlapping build deps amortises candidate construction across the two build envs. The BuildEnvironmentInstaller Protocol exposes the pair as link_candidate_cache / editable_candidate_cache attributes (typed Cache[…] | None); the main resolver picks them up off preparer.build_env_installer and joins the same cache, so a package that's both a runtime dep of the user's request and a build dep of an sdist they're installing is constructed exactly once for the whole pip install. SubprocessBuildEnvironmentInstaller reports None because its builds run in a separate pip process and the dicts cannot cross the boundary, so the main resolver falls back to per-instance ownership in that path.

Behaviour for direct-URL resolution, build failures (which stay per-Factory deliberately), the installed-candidate cache, and the extras-candidate cache is untouched. The Protocol is internal to pip; downstream BuildEnvironmentInstaller implementations need to add two attributes returning None to satisfy the typing.

Benchmark on pip install --use-feature inprocess-build-deps --no-binary :all: --no-deps against six hatchling-built sdists (h11, anyio, sniffio, httpcore, httpx, idna — overlapping build-system.requires), n=4 paired alternating runs:

mode baseline real patched real baseline user patched user speedup (user)
cold HTTP cache (--no-cache-dir) 8.65 s 8.30 s 5.29 s 5.18 s +2.3%
hot HTTP cache, no wheel cache 7.86 s 7.65 s 5.20 s 5.01 s +3.8%

The hot scenario clears pip's wheel cache between iterations so build envs actually run; HTTP metadata stays cached, which matches typical CI / fresh-checkout state with prior pip cache. Cold-cache iterations re-fetch every metadata document from the index. Patched wins 4/4 paired iterations in both modes; the user-CPU number is the cleaner signal because real time mixes in network/build subprocess waits the cache cannot affect.

The win scales with the number of resolvers per command. With six sdists the main resolver plus six build envs share one cache pair; the more sdists or the more they overlap, the larger the amortisation. Downstream consumer at the other extreme: pip-tools profiled at 9.3 s of CPU in _iter_built across 15 k calls on --all-extras --all-groups --jobs 1 of datamodel-code-generator (~22 resolution passes per cohort × 3 cohorts). The duplicated LinkCandidate.__init__ work between passes is the main remaining lever after the bytes/Distribution caches pip-tools already shares via monkey-patches; this PR is what lets that tool drop the patch and feed the cache through a stable API instead.

@gaborbernat gaborbernat force-pushed the pip-tools-share-candidate-cache branch from 9fd87cf to 1294f08 Compare May 8, 2026 21:14
@gaborbernat gaborbernat force-pushed the pip-tools-share-candidate-cache branch from 1294f08 to ec4127a Compare May 8, 2026 21:58
@pfmoore
Copy link
Copy Markdown
Member

pfmoore commented May 8, 2026

Can I check if I'm understanding the motivation here? Is this only of use to people calling pip's internal API directly, or is there a benefit to users of the pip CLI that I've not understood?

@gaborbernat
Copy link
Copy Markdown
Author

This is purely so that PIP-Lock can use a shared cache across multiple resolvers.

Comment thread news/13989.feature.rst Outdated
@gaborbernat gaborbernat force-pushed the pip-tools-share-candidate-cache branch from ec4127a to 9ccbe71 Compare May 8, 2026 23:38
@gaborbernat
Copy link
Copy Markdown
Author

Good call — switched to news/13989.trivial.rst so the change doesn't surface in NEWS. Force-pushed.

@notatallshaw
Copy link
Copy Markdown
Member

notatallshaw commented May 8, 2026

I think this PR can be extended to be useful to pip, this can be trivially used in InprocessBuildEnvironmentInstaller which can call multiple resolvers per pip install (if there are multiple builds per install).

A slightly more complicated solution could share the main resolver's cache with the in process build's cache.

If it's not too difficult I would prefer having new code have functionality in pip, so we are less likely to break it in the future.

@gaborbernat gaborbernat force-pushed the pip-tools-share-candidate-cache branch from 9ccbe71 to 975c634 Compare May 9, 2026 19:35
Tools that drive multiple resolutions in one process — pip-tools' lock
pipeline fans out across cohorts × extras × groups, dispatching tens of
resolver invocations against the same package set — pay the
``LinkCandidate.__init__`` cost on every pass because the cache that
amortises it is owned by the per-invocation ``Factory`` and gets thrown
away with it. ``pip install`` itself runs one resolution per invocation
and therefore sees no benefit, but it also sees no cost: the dicts are
already constructed and threaded through the same code paths.

Surface the existing ``_link_candidate_cache`` and
``_editable_candidate_cache`` as opt-in constructor parameters on
``Factory`` (and pass-through on ``Resolver``). Default to ``None``,
which preserves the current per-instance ownership for every existing
caller; pass a dict, and the Factory writes into the caller's dict
instead of its own. Identity, not equality, is what makes the
optimisation safe to compose across resolver invocations.

The change is a pure constructor surface widening: no semantics shift,
no behaviour change for ``pip install``, no new public types — the
``Cache`` alias was already module-public via the file's ``__all__``-
free namespace. Downstream tools that want the optimisation get a
two-line wiring; nobody else is affected.
@gaborbernat gaborbernat force-pushed the pip-tools-share-candidate-cache branch from 975c634 to 4239403 Compare May 9, 2026 19:42
@gaborbernat gaborbernat changed the title ✨ feat(resolver): expose candidate caches as constructor parameters ✨ feat: share candidate caches across pip install's resolvers May 9, 2026
@gaborbernat
Copy link
Copy Markdown
Author

Done — wired the more ambitious version too. The BuildEnvironmentInstaller Protocol now exposes link_candidate_cache / editable_candidate_cache attributes (typed Cache[LinkCandidate] | None / Cache[EditableCandidate] | None). InprocessBuildEnvironmentInstaller returns its own cache dicts; SubprocessBuildEnvironmentInstaller returns None because the dicts cannot cross the subprocess boundary. make_resolver reads the caches off preparer.build_env_installer and forwards them, so a package that's both a runtime dep of the user's request and a build dep of an sdist they're installing is constructed exactly once for the whole pip install.

News fragment promoted from .trivial.rst to .feature.rst since the change is now user-visible (pip install foo bar with overlapping deps will run faster). Force-pushed.

@gaborbernat gaborbernat changed the title ✨ feat: share candidate caches across pip install's resolvers ✨ feat(resolver): share candidate caches across pip install resolvers May 9, 2026
@gaborbernat gaborbernat requested a review from notatallshaw May 9, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants