Skip to content

Commit fa05ad2

Browse files
authored
Merge pull request #1173 from tiran/session-auth
feat(auth): add URL-dispatched session auth for GitHub and GitLab
2 parents 0ccad33 + cae6a49 commit fa05ad2

7 files changed

Lines changed: 356 additions & 68 deletions

File tree

README.md

Lines changed: 5 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -29,35 +29,11 @@ Fromager can also build wheels in collections, rather than individually. Managin
2929

3030
This approach makes Fromager especially useful in Python-heavy domains like AI, where reproducibility and compatibility across complex dependency trees are essential.
3131

32-
## Using private registries
33-
34-
Fromager uses the [requests](https://requests.readthedocs.io) library and `pip`
35-
at different points for talking to package registries. Both support
36-
authenticating to remote servers in various ways. The simplest way to integrate
37-
the authentication with fromager is to have a
38-
[netrc](https://docs.python.org/3/library/netrc.html) file with a valid entry
39-
for the host. The file will be read from `~/.netrc` by default. Another location
40-
can be specified by setting the `NETRC` environment variable.
41-
42-
For example, to use a gitlab package registry, use a [personal
43-
access
44-
token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#create-a-personal-access-token)
45-
as documented in [this
46-
issue](https://gitlab.com/gitlab-org/gitlab/-/issues/350582):
47-
48-
```plaintext
49-
machine gitlab.com login oauth2 password $token
50-
```
51-
52-
## Determining versions via GitHub tags
53-
54-
In some cases, the builder might have to use tags on GitHub to determine the version of a project instead of looking at
55-
pypi.org. To avoid rate limit or to access private GitHub repository, a [personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) can be passed to fromager by setting
56-
the following environment variable:
57-
58-
```shell
59-
GITHUB_TOKEN=<access_token>
60-
```
32+
## Authentication
33+
34+
Fromager automatically authenticates to GitHub and GitLab APIs using
35+
credentials from netrc or environment variables. See the
36+
[authentication guide](docs/how-tos/authentication.md) for details.
6137

6238
## Additional docs
6339

docs/how-tos/authentication.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Authentication
2+
3+
Fromager automatically authenticates to GitHub and GitLab APIs using
4+
credentials from netrc or environment variables. Credentials are
5+
resolved lazily on the first request to each host.
6+
7+
Authentication is recommended to avoid [API rate limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api) (especially
8+
for GitHub) and required to access private repositories or registries.
9+
10+
## Credential lookup order
11+
12+
For each host, fromager checks the following sources in order and uses
13+
the first match:
14+
15+
**GitHub** (`GITHUB_API_URL`, default `https://api.github.com`):
16+
17+
1. [netrc](https://docs.python.org/3/library/netrc.html) entry for
18+
the host -- the password is used as the token
19+
2. `GITHUB_TOKEN` environment variable
20+
21+
**GitLab** (`CI_SERVER_URL`, default `https://gitlab.com`):
22+
23+
1. netrc entry for the host -- if the login is `gitlab-ci-token` a
24+
CI job token header is used, otherwise a private token header
25+
2. `CI_JOB_TOKEN` environment variable
26+
3. `GITLAB_PRIVATE_TOKEN` environment variable
27+
28+
## netrc
29+
30+
The [requests](https://requests.readthedocs.io) library, `pip`, and
31+
`git` all read credentials from `~/.netrc`. Another location can be
32+
specified by setting the `NETRC` environment variable. Note that
33+
`git` uses libcurl for HTTPS transport and libcurl only supports the
34+
`NETRC` variable since [8.16.0](https://curl.se/ch/8.16.0.html)
35+
(2025-09-10). Older versions only read `$HOME/.netrc`.
36+
37+
For example, to authenticate to a GitLab package registry with a
38+
[personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#create-a-personal-access-token):
39+
40+
```text
41+
machine gitlab.com login pat password $token
42+
```
43+
44+
To authenticate to the GitHub API with a
45+
[personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens):
46+
47+
```text
48+
machine api.github.com login pat password $token
49+
```
50+
51+
## Environment variables
52+
53+
To authenticate via environment variables instead of netrc:
54+
55+
```shell
56+
# GitHub personal access token (avoids API rate limits)
57+
export GITHUB_TOKEN=<access_token>
58+
59+
# GitLab CI job token (set automatically in CI pipelines)
60+
export CI_JOB_TOKEN=<job_token>
61+
62+
# GitLab personal/project access token
63+
export GITLAB_PRIVATE_TOKEN=<access_token>
64+
```

docs/how-tos/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Essential guides for initial setup and first builds.
2121
.. toctree::
2222
:maxdepth: 1
2323

24+
authentication
2425
containers
2526
bootstrap-constraints
2627

docs/http-retry.md

Lines changed: 19 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The retry system provides:
2020

2121
- **GitHub API rate limit handling** with proper reset time detection
2222

23-
- **GitHub authentication** automatically applied for GitHub API requests via `GITHUB_TOKEN` environment variable
23+
- **Automatic authentication** for GitHub and GitLab APIs (see {doc}`how-tos/authentication`)
2424

2525
- **Temporary file handling** to prevent partial downloads
2626

@@ -37,11 +37,11 @@ export FROMAGER_HTTP_BACKOFF_FACTOR=2.0
3737

3838
# Request timeout in seconds (default: 120)
3939
export FROMAGER_HTTP_TIMEOUT=180
40-
41-
# Token for GitHub API authentication (prevents rate limiting)
42-
export GITHUB_TOKEN=your_github_token_here
4340
```
4441

42+
Authentication credentials (`GITHUB_TOKEN`, `GITLAB_PRIVATE_TOKEN`,
43+
etc.) are documented in {doc}`how-tos/authentication`.
44+
4545
## Error Types Handled
4646

4747
The retry mechanism specifically handles these error conditions:
@@ -73,35 +73,27 @@ The retry functionality is automatically enabled for all HTTP operations in From
7373

7474
### For Plugin Developers
7575

76-
If you're writing plugins that need HTTP functionality, you can use the retry session:
76+
If you're writing plugins that need HTTP functionality, use the
77+
shared session from `request_session`. It includes retry handling
78+
and automatic authentication for GitHub and GitLab:
7779

7880
```python
79-
from fromager.http_retry import get_retry_session
80-
81-
# Get a session with retry capabilities
82-
session = get_retry_session()
81+
from fromager.request_session import session
8382

8483
# Use it like a normal requests session
85-
response = session.get("https://example.com/api/data")
84+
response = session.get("https://pkg.test/api/data")
8685
response.raise_for_status()
8786
```
8887

89-
For more advanced retry configuration:
88+
To register authentication for additional hosts:
9089

9190
```python
92-
from fromager.http_retry import create_retry_session
93-
94-
# Custom retry configuration
95-
retry_config = {
96-
"total": 5,
97-
"backoff_factor": 2.0,
98-
"status_forcelist": [429, 502, 503, 504],
99-
}
100-
101-
session = create_retry_session(
102-
retry_config=retry_config,
103-
timeout=60.0
104-
)
91+
from fromager.request_session import session_auth
92+
93+
def _resolve_my_auth(scheme: str, hostname: str) -> dict[str, str]:
94+
return {"Authorization": "Bearer my-token"}
95+
96+
session_auth.add("https://my-registry.test", _resolve_my_auth)
10597
```
10698

10799
### Decorating Functions with Retry Logic
@@ -128,7 +120,7 @@ The retry system logs important events:
128120

129121
- **WARNING**: When retries are attempted with backoff times
130122
- **ERROR**: When all retry attempts are exhausted
131-
- **DEBUG**: Detailed retry configuration and GitHub token status
123+
- **DEBUG**: Detailed retry configuration and authentication resolution
132124

133125
Example log output:
134126

@@ -151,13 +143,13 @@ INFO saved /path/to/package.tar.gz
151143

152144
If you're seeing many retries, consider:
153145

154-
- Setting `GITHUB_TOKEN` for GitHub API calls (automatically applied to GitHub requests)
146+
- Configuring authentication credentials (see {doc}`how-tos/authentication`) to avoid API rate limits
155147
- Increasing timeout values for slow connections
156148
- Checking network connectivity and DNS resolution
157149

158150
### API Rate Limiting
159151

160-
- Use `GITHUB_TOKEN` for GitHub repositories
152+
- Configure credentials via netrc or environment variables (see {doc}`how-tos/authentication`)
161153
- Consider using a local package mirror for PyPI
162154
- Monitor API usage if using private registries
163155

src/fromager/request_session.py

Lines changed: 136 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
1+
from __future__ import annotations
2+
3+
import logging
14
import os
5+
from typing import TYPE_CHECKING
6+
from urllib.parse import urlparse
7+
8+
import requests.auth
9+
from requests.utils import get_netrc_auth
210

3-
from .http_retry import create_retry_session
11+
from .http_retry import RetryHTTPAdapter
12+
13+
logger = logging.getLogger(__name__)
414

515
# Enhanced retry configuration for fromager
616
FROMAGER_RETRY_CONFIG = {
@@ -11,8 +21,128 @@
1121
"raise_on_status": False,
1222
}
1323

14-
# Create a session with enhanced retry capabilities
15-
session = create_retry_session(
16-
retry_config=FROMAGER_RETRY_CONFIG,
17-
timeout=float(os.environ.get("FROMAGER_HTTP_TIMEOUT", "120.0")),
18-
)
24+
GITHUB_API_URL = os.environ.get("GITHUB_API_URL", "https://api.github.com")
25+
26+
GITLAB_CI_SERVER_URL = os.environ.get("CI_SERVER_URL", "https://gitlab.com")
27+
GITLAB_JOB_TOKEN_NAME = "gitlab-ci-token"
28+
29+
30+
if TYPE_CHECKING:
31+
from collections.abc import Callable
32+
33+
_AuthCallback = Callable[[str, str], dict[str, str]]
34+
35+
36+
class SessionAuth(requests.auth.AuthBase):
37+
"""Authentication handler that dispatches by ``(scheme, hostname)``.
38+
39+
The requests library only supports a single ``session.auth`` handler
40+
and does not provide per-host authentication on mounted adapters.
41+
This class fills that gap by mapping ``(scheme, hostname)`` keys to
42+
auth resolver callbacks. On the first request to a given host the
43+
callback is invoked and the result is cached.
44+
"""
45+
46+
def __init__(self) -> None:
47+
self._callbacks: dict[tuple[str, str], _AuthCallback] = {}
48+
self._cache: dict[tuple[str, str], dict[str, str]] = {}
49+
50+
def add(self, url: str, callback: _AuthCallback) -> None:
51+
"""Register a resolver *callback* for the scheme and hostname of *url*."""
52+
parsed = urlparse(url)
53+
scheme = parsed.scheme
54+
hostname = parsed.hostname or ""
55+
if scheme not in {"http", "https"}:
56+
raise ValueError(f"Unsupported scheme {scheme!r} in URL {url!r}")
57+
if not hostname:
58+
raise ValueError(f"Missing hostname in URL {url!r}")
59+
key = (scheme, hostname)
60+
self._cache.pop(key, None)
61+
self._callbacks[key] = callback
62+
63+
def get(self, url: str) -> dict[str, str]:
64+
"""Resolve and return the auth headers for *url*.
65+
66+
Invokes the registered callback on first access and caches the
67+
result. Returns an empty dict when no callback is registered.
68+
"""
69+
parsed = urlparse(url)
70+
key = (parsed.scheme, parsed.hostname or "")
71+
if key not in self._cache:
72+
callback = self._callbacks.get(key)
73+
self._cache[key] = callback(*key) if callback else {}
74+
return dict(self._cache[key])
75+
76+
def __call__(self, r: requests.PreparedRequest) -> requests.PreparedRequest:
77+
auth_header = self.get(r.url or "")
78+
if auth_header:
79+
r.headers.update(auth_header)
80+
return r
81+
82+
83+
def _resolve_github_auth(scheme: str, hostname: str) -> dict[str, str]:
84+
"""Resolve GitHub auth header from netrc or environment."""
85+
url = f"{scheme}://{hostname}"
86+
netrc_auth = get_netrc_auth(url)
87+
if netrc_auth is not None:
88+
_login, password = netrc_auth
89+
logger.debug("GitHub auth: using netrc credentials for %s", url)
90+
return {"Authorization": f"token {password}"}
91+
92+
token = os.environ.get("GITHUB_TOKEN")
93+
if token:
94+
logger.debug("GitHub auth: using GITHUB_TOKEN environment variable")
95+
return {"Authorization": f"token {token}"}
96+
return {}
97+
98+
99+
def _resolve_gitlab_auth(scheme: str, hostname: str) -> dict[str, str]:
100+
"""Resolve GitLab auth header from netrc or environment."""
101+
url = f"{scheme}://{hostname}"
102+
netrc_auth = get_netrc_auth(url)
103+
if netrc_auth is not None:
104+
login, password = netrc_auth
105+
header = "JOB-TOKEN" if login == GITLAB_JOB_TOKEN_NAME else "PRIVATE-TOKEN"
106+
logger.debug("GitLab auth: using netrc credentials for %s (%s)", url, header)
107+
return {header: password}
108+
109+
token = os.environ.get("CI_JOB_TOKEN")
110+
if token:
111+
logger.debug("GitLab auth: using CI_JOB_TOKEN environment variable")
112+
return {"JOB-TOKEN": token}
113+
114+
token = os.environ.get("GITLAB_PRIVATE_TOKEN")
115+
if token:
116+
logger.debug("GitLab auth: using GITLAB_PRIVATE_TOKEN environment variable")
117+
return {"PRIVATE-TOKEN": token}
118+
return {}
119+
120+
121+
def create_session() -> tuple[requests.Session, SessionAuth]:
122+
"""Create a requests session with retry and authentication.
123+
124+
Mounts a `RetryHTTPAdapter` on ``http://`` and ``https://``.
125+
Registers lazy auth callbacks for GitHub and GitLab on a
126+
`SessionAuth` handler keyed by ``(scheme, hostname)``.
127+
128+
Returns the session and its `SessionAuth` so callers can
129+
register additional auth callbacks via ``auth.add()``.
130+
"""
131+
adapter = RetryHTTPAdapter(
132+
retry_config=FROMAGER_RETRY_CONFIG,
133+
timeout=float(os.environ.get("FROMAGER_HTTP_TIMEOUT", "120.0")),
134+
)
135+
136+
s = requests.Session()
137+
s.mount("http://", adapter)
138+
s.mount("https://", adapter)
139+
140+
auth = SessionAuth()
141+
auth.add(GITHUB_API_URL, _resolve_github_auth)
142+
auth.add(GITLAB_CI_SERVER_URL, _resolve_gitlab_auth)
143+
s.auth = auth
144+
145+
return s, auth
146+
147+
148+
session, session_auth = create_session()

src/fromager/resolver.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1134,12 +1134,6 @@ def _find_tags(
11341134
identifier: str,
11351135
) -> Iterable[Candidate]:
11361136
headers = {"accept": "application/vnd.github+json"}
1137-
1138-
# Add GitHub authentication if available
1139-
github_token = os.environ.get("GITHUB_TOKEN")
1140-
if github_token:
1141-
headers["Authorization"] = f"token {github_token}"
1142-
11431137
nexturl = self.api_url.format(self=self)
11441138
while nexturl:
11451139
resp = session.get(nexturl, headers=headers)

0 commit comments

Comments
 (0)