Skip to content

Commit 8ba2dbe

Browse files
committed
⚡ perf(link): skip fragment regex when fragment marker absent
Every `Link` runs `_egg_fragment_re.search` and `_subdirectory_fragment_re.search` against `self._url` to find the matching fragment, even though Warehouse-served URLs (the bulk of links a Simple-API response carries) never include either fragment. Each regex search compiles a Boyer-Moore state and walks the full URL, which adds up across the ~65000 links a moderately sized cross-platform lock iterates per resolver pass. Guard each search with a literal `"egg=" not in self._url` (resp. `"subdirectory=" not in self._url`) pre-check. Python's substring search is C-implemented and runs an order of magnitude faster than the regex when the marker is absent; when the marker is present the existing regex still runs, so behaviour is preserved for VCS / direct-URL / `find-links` style URLs that carry it. A new `test_fragments_absent_for_typical_links` parametrize asserts both accessors return `None` for a representative spread of Simple-API URLs. The existing `test_fragments` and `test_invalid_egg_fragments` cases all carry explicit `#egg=…` / `&subdirectory=…` fragments and continue to exercise the slow path.
1 parent f7bfe28 commit 8ba2dbe

3 files changed

Lines changed: 39 additions & 0 deletions

File tree

news/13987.feature.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Speed up ``Link`` construction by skipping the ``egg=`` and
2+
``subdirectory=`` fragment regex searches when the corresponding marker is
3+
absent from the URL, which is the common case for links returned by the
4+
Simple-API JSON response.

src/pip/_internal/models/link.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,11 @@ def url_without_fragment(self) -> str:
475475
)
476476

477477
def _egg_fragment(self) -> str | None:
478+
# Cheap pre-check: the regex below cannot match unless ``egg=`` appears
479+
# somewhere in the URL, which it never does for the bulk of links a
480+
# Simple-API response carries.
481+
if "egg=" not in self._url:
482+
return None
478483
match = self._egg_fragment_re.search(self._url)
479484
if not match:
480485
return None
@@ -491,6 +496,9 @@ def _egg_fragment(self) -> str | None:
491496

492497
@property
493498
def subdirectory_fragment(self) -> str | None:
499+
# Cheap pre-check: same shape as ``_egg_fragment`` above.
500+
if "subdirectory=" not in self._url:
501+
return None
494502
match = self._subdirectory_fragment_re.search(self._url)
495503
if not match:
496504
return None

tests/unit/test_link.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,33 @@ def test_fragments(self) -> None:
8181
assert "eggname" == Link(url).egg_fragment
8282
assert "subdir" == Link(url).subdirectory_fragment
8383

84+
@pytest.mark.parametrize(
85+
"url",
86+
[
87+
pytest.param(
88+
"https://files.pythonhosted.org/packages/12/34/foo-1.0-py3-none-any.whl",
89+
id="pypi-wheel",
90+
),
91+
pytest.param(
92+
"https://files.pythonhosted.org/packages/12/34/foo-1.0-py3-none-any.whl#sha256=abc",
93+
id="pypi-wheel-with-hash",
94+
),
95+
pytest.param("https://example.com/path/to/file.tar.gz", id="archive"),
96+
pytest.param(
97+
"https://example.com/path/to/file.tar.gz?build=1",
98+
id="archive-with-query",
99+
),
100+
],
101+
)
102+
def test_fragments_absent_for_typical_links(self, url: str) -> None:
103+
"""Links served by a Simple-API response have no ``egg=`` or
104+
``subdirectory=`` fragment. The accessors MUST return ``None`` for
105+
them (the implementation may take a fast path here).
106+
"""
107+
link = Link(url)
108+
assert link.egg_fragment is None
109+
assert link.subdirectory_fragment is None
110+
84111
@pytest.mark.parametrize(
85112
"fragment",
86113
[

0 commit comments

Comments
 (0)