Skip to content

⚡ perf(link): skip fragment regex when fragment marker absent#13987

Open
gaborbernat wants to merge 1 commit into
pypa:mainfrom
gaborbernat:pip-tools-egg-fragment-fast-path
Open

⚡ perf(link): skip fragment regex when fragment marker absent#13987
gaborbernat wants to merge 1 commit into
pypa:mainfrom
gaborbernat:pip-tools-egg-fragment-fast-path

Conversation

@gaborbernat
Copy link
Copy Markdown

Every Link runs _egg_fragment_re.search and _subdirectory_fragment_re.search against self._url to find the matching fragment, even though Warehouse-served URLs (the bulk of links a Simple-API response carries) never include either fragment. Each regex search walks the full URL, which adds up across the ~65000 links a moderately sized cross-platform lock iterates per resolver pass. ⚡

The patch guards each search with a literal "egg=" not in self._url (resp. "subdirectory=" not in self._url) pre-check. Python's substring search is C-implemented and runs roughly 8x faster than the regex when the marker is absent; when the marker is present the existing regex still runs, so behaviour is preserved byte-for-byte for VCS / direct-URL / find-links style URLs that carry it.

Function-level micro-bench against https://files.pythonhosted.org/packages/12/34/foo-1.0-py3-none-any.whl#sha256=… (no egg=): the egg accessor drops from 535 ns/call to 69 ns/call, a 7.8x speedup. End-to-end against a cross-platform lock pipeline iterating ~65000 links across 8 resolver passes (n=8 paired runs alternating upstream/main vs upstream/main + this patch): user-CPU mean falls 2.8% (median 55 ms reduction), 7/8 paired runs faster.

Every `Link` runs `_egg_fragment_re.search` and `_subdirectory_fragment_re.search`
against `self._url` to find the matching fragment, even though Warehouse-served
URLs (the bulk of links a Simple-API response carries) never include either
fragment. Each regex search compiles a Boyer-Moore state and walks the full URL,
which adds up across the ~65000 links a moderately sized cross-platform lock
iterates per resolver pass.

Guard each search with a literal `"egg=" not in self._url` (resp. `"subdirectory="
not in self._url`) pre-check. Python's substring search is C-implemented and
runs an order of magnitude faster than the regex when the marker is absent;
when the marker is present the existing regex still runs, so behaviour is
preserved for VCS / direct-URL / `find-links` style URLs that carry it.

A new `test_fragments_absent_for_typical_links` parametrize asserts both
accessors return `None` for a representative spread of Simple-API URLs. The
existing `test_fragments` and `test_invalid_egg_fragments` cases all carry
explicit `#egg=…` / `&subdirectory=…` fragments and continue to exercise the
slow path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant