✨ feat(resolver): share candidate caches across pip install resolvers#13989
✨ feat(resolver): share candidate caches across pip install resolvers#13989gaborbernat wants to merge 1 commit into
Conversation
9fd87cf to
1294f08
Compare
1294f08 to
ec4127a
Compare
|
Can I check if I'm understanding the motivation here? Is this only of use to people calling pip's internal API directly, or is there a benefit to users of the pip CLI that I've not understood? |
|
This is purely so that PIP-Lock can use a shared cache across multiple resolvers. |
ec4127a to
9ccbe71
Compare
|
Good call — switched to |
|
I think this PR can be extended to be useful to pip, this can be trivially used in A slightly more complicated solution could share the main resolver's cache with the in process build's cache. If it's not too difficult I would prefer having new code have functionality in pip, so we are less likely to break it in the future. |
9ccbe71 to
975c634
Compare
Tools that drive multiple resolutions in one process — pip-tools' lock pipeline fans out across cohorts × extras × groups, dispatching tens of resolver invocations against the same package set — pay the ``LinkCandidate.__init__`` cost on every pass because the cache that amortises it is owned by the per-invocation ``Factory`` and gets thrown away with it. ``pip install`` itself runs one resolution per invocation and therefore sees no benefit, but it also sees no cost: the dicts are already constructed and threaded through the same code paths. Surface the existing ``_link_candidate_cache`` and ``_editable_candidate_cache`` as opt-in constructor parameters on ``Factory`` (and pass-through on ``Resolver``). Default to ``None``, which preserves the current per-instance ownership for every existing caller; pass a dict, and the Factory writes into the caller's dict instead of its own. Identity, not equality, is what makes the optimisation safe to compose across resolver invocations. The change is a pure constructor surface widening: no semantics shift, no behaviour change for ``pip install``, no new public types — the ``Cache`` alias was already module-public via the file's ``__all__``- free namespace. Downstream tools that want the optimisation get a two-line wiring; nobody else is affected.
975c634 to
4239403
Compare
|
Done — wired the more ambitious version too. The News fragment promoted from |
A
pip installinvocation runs more than one resolver. The user's request goes through one mainResolver, and every in-process build environment that has to install build dependencies for an sdist or editable spawns its own. ⚡ Build deps overlap heavily across builds (setuptools,wheel,hatchling, and friends), and many of them also overlap with the user's runtime requirements. Today every resolver pays the cost of building everyLinkCandidatefrom scratch because the cache that amortises construction lives on the per-invocationFactoryand dies with it. Downstream tools that drive even more resolutions in one process (pip-tools's lock pipeline fans out tens of resolvers per command) feel this much harder.Both new parameters default to
None, so every existing caller keeps the current per-invocation ownership. Pass a dict andFactorywrites into the caller's dict instead of its own; identity, not equality, is what makes the optimisation composable acrossResolverinstances. The cache type stays the existingCache[LinkCandidate]/Cache[EditableCandidate]aliases the module already uses internally.InprocessBuildEnvironmentInstalleruses the new API end-to-end. It owns one cache pair as instance state and threads it into every_make_resolver()call, so apip install foo barthat builds two sdists with overlapping build deps amortises candidate construction across the two build envs. TheBuildEnvironmentInstallerProtocol exposes the pair aslink_candidate_cache/editable_candidate_cacheattributes (typedCache[…] | None); the main resolver picks them up offpreparer.build_env_installerand joins the same cache, so a package that's both a runtime dep of the user's request and a build dep of an sdist they're installing is constructed exactly once for the wholepip install.SubprocessBuildEnvironmentInstallerreportsNonebecause its builds run in a separate pip process and the dicts cannot cross the boundary, so the main resolver falls back to per-instance ownership in that path.Behaviour for direct-URL resolution, build failures (which stay per-
Factorydeliberately), the installed-candidate cache, and the extras-candidate cache is untouched. The Protocol is internal to pip; downstreamBuildEnvironmentInstallerimplementations need to add two attributes returningNoneto satisfy the typing.Benchmark on
pip install --use-feature inprocess-build-deps --no-binary :all: --no-depsagainst six hatchling-built sdists (h11,anyio,sniffio,httpcore,httpx,idna— overlappingbuild-system.requires), n=4 paired alternating runs:--no-cache-dir)The hot scenario clears pip's wheel cache between iterations so build envs actually run; HTTP metadata stays cached, which matches typical CI / fresh-checkout state with prior pip cache. Cold-cache iterations re-fetch every metadata document from the index. Patched wins 4/4 paired iterations in both modes; the user-CPU number is the cleaner signal because real time mixes in network/build subprocess waits the cache cannot affect.
The win scales with the number of resolvers per command. With six sdists the main resolver plus six build envs share one cache pair; the more sdists or the more they overlap, the larger the amortisation. Downstream consumer at the other extreme:
pip-toolsprofiled at 9.3 s of CPU in_iter_builtacross 15 k calls on--all-extras --all-groups --jobs 1ofdatamodel-code-generator(~22 resolution passes per cohort × 3 cohorts). The duplicatedLinkCandidate.__init__work between passes is the main remaining lever after the bytes/Distribution cachespip-toolsalready shares via monkey-patches; this PR is what lets that tool drop the patch and feed the cache through a stable API instead.