Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 5 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,35 +29,11 @@ Fromager can also build wheels in collections, rather than individually. Managin

This approach makes Fromager especially useful in Python-heavy domains like AI, where reproducibility and compatibility across complex dependency trees are essential.

## Using private registries

Fromager uses the [requests](https://requests.readthedocs.io) library and `pip`
at different points for talking to package registries. Both support
authenticating to remote servers in various ways. The simplest way to integrate
the authentication with fromager is to have a
[netrc](https://docs.python.org/3/library/netrc.html) file with a valid entry
for the host. The file will be read from `~/.netrc` by default. Another location
can be specified by setting the `NETRC` environment variable.

For example, to use a gitlab package registry, use a [personal
access
token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#create-a-personal-access-token)
as documented in [this
issue](https://gitlab.com/gitlab-org/gitlab/-/issues/350582):

```plaintext
machine gitlab.com login oauth2 password $token
```

## Determining versions via GitHub tags

In some cases, the builder might have to use tags on GitHub to determine the version of a project instead of looking at
pypi.org. To avoid rate limit or to access private GitHub repository, a [personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) can be passed to fromager by setting
the following environment variable:

```shell
GITHUB_TOKEN=<access_token>
```
## Authentication

Fromager automatically authenticates to GitHub and GitLab APIs using
credentials from netrc or environment variables. See the
[authentication guide](docs/how-tos/authentication.md) for details.

## Additional docs

Expand Down
64 changes: 64 additions & 0 deletions docs/how-tos/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Authentication

Fromager automatically authenticates to GitHub and GitLab APIs using
credentials from netrc or environment variables. Credentials are
resolved lazily on the first request to each host.

Authentication is recommended to avoid [API rate limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api) (especially
for GitHub) and required to access private repositories or registries.

## Credential lookup order

For each host, fromager checks the following sources in order and uses
the first match:

**GitHub** (`GITHUB_API_URL`, default `https://api.github.com`):

1. [netrc](https://docs.python.org/3/library/netrc.html) entry for
the host -- the password is used as the token
2. `GITHUB_TOKEN` environment variable

**GitLab** (`CI_SERVER_URL`, default `https://gitlab.com`):

1. netrc entry for the host -- if the login is `gitlab-ci-token` a
CI job token header is used, otherwise a private token header
2. `CI_JOB_TOKEN` environment variable
3. `GITLAB_PRIVATE_TOKEN` environment variable

## netrc

The [requests](https://requests.readthedocs.io) library, `pip`, and
`git` all read credentials from `~/.netrc`. Another location can be
specified by setting the `NETRC` environment variable. Note that
`git` uses libcurl for HTTPS transport and libcurl only supports the
`NETRC` variable since [8.16.0](https://curl.se/ch/8.16.0.html)
(2025-09-10). Older versions only read `$HOME/.netrc`.

For example, to authenticate to a GitLab package registry with a
[personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#create-a-personal-access-token):

```text
machine gitlab.com login pat password $token
```

To authenticate to the GitHub API with a
[personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens):

```text
machine api.github.com login pat password $token
```

## Environment variables

To authenticate via environment variables instead of netrc:

```shell
# GitHub personal access token (avoids API rate limits)
export GITHUB_TOKEN=<access_token>

# GitLab CI job token (set automatically in CI pipelines)
export CI_JOB_TOKEN=<job_token>

# GitLab personal/project access token
export GITLAB_PRIVATE_TOKEN=<access_token>
```
1 change: 1 addition & 0 deletions docs/how-tos/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Essential guides for initial setup and first builds.
.. toctree::
:maxdepth: 1

authentication
containers
bootstrap-constraints

Expand Down
46 changes: 19 additions & 27 deletions docs/http-retry.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The retry system provides:

- **GitHub API rate limit handling** with proper reset time detection

- **GitHub authentication** automatically applied for GitHub API requests via `GITHUB_TOKEN` environment variable
- **Automatic authentication** for GitHub and GitLab APIs (see {doc}`how-tos/authentication`)

- **Temporary file handling** to prevent partial downloads

Expand All @@ -37,11 +37,11 @@ export FROMAGER_HTTP_BACKOFF_FACTOR=2.0

# Request timeout in seconds (default: 120)
export FROMAGER_HTTP_TIMEOUT=180

# Token for GitHub API authentication (prevents rate limiting)
export GITHUB_TOKEN=your_github_token_here
```

Authentication credentials (`GITHUB_TOKEN`, `GITLAB_PRIVATE_TOKEN`,
etc.) are documented in {doc}`how-tos/authentication`.

## Error Types Handled

The retry mechanism specifically handles these error conditions:
Expand Down Expand Up @@ -73,35 +73,27 @@ The retry functionality is automatically enabled for all HTTP operations in From

### For Plugin Developers

If you're writing plugins that need HTTP functionality, you can use the retry session:
If you're writing plugins that need HTTP functionality, use the
shared session from `request_session`. It includes retry handling
and automatic authentication for GitHub and GitLab:

```python
from fromager.http_retry import get_retry_session

# Get a session with retry capabilities
session = get_retry_session()
from fromager.request_session import session

# Use it like a normal requests session
response = session.get("https://example.com/api/data")
response = session.get("https://pkg.test/api/data")
response.raise_for_status()
```

For more advanced retry configuration:
To register authentication for additional hosts:

```python
from fromager.http_retry import create_retry_session

# Custom retry configuration
retry_config = {
"total": 5,
"backoff_factor": 2.0,
"status_forcelist": [429, 502, 503, 504],
}

session = create_retry_session(
retry_config=retry_config,
timeout=60.0
)
from fromager.request_session import session_auth

def _resolve_my_auth(scheme: str, hostname: str) -> dict[str, str]:
return {"Authorization": "Bearer my-token"}

session_auth.add("https://my-registry.test", _resolve_my_auth)
```

### Decorating Functions with Retry Logic
Expand All @@ -128,7 +120,7 @@ The retry system logs important events:

- **WARNING**: When retries are attempted with backoff times
- **ERROR**: When all retry attempts are exhausted
- **DEBUG**: Detailed retry configuration and GitHub token status
- **DEBUG**: Detailed retry configuration and authentication resolution

Example log output:

Expand All @@ -151,13 +143,13 @@ INFO saved /path/to/package.tar.gz

If you're seeing many retries, consider:

- Setting `GITHUB_TOKEN` for GitHub API calls (automatically applied to GitHub requests)
- Configuring authentication credentials (see {doc}`how-tos/authentication`) to avoid API rate limits
- Increasing timeout values for slow connections
- Checking network connectivity and DNS resolution

### API Rate Limiting

- Use `GITHUB_TOKEN` for GitHub repositories
- Configure credentials via netrc or environment variables (see {doc}`how-tos/authentication`)
Comment thread
rd4398 marked this conversation as resolved.
- Consider using a local package mirror for PyPI
- Monitor API usage if using private registries

Expand Down
142 changes: 136 additions & 6 deletions src/fromager/request_session.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
from __future__ import annotations

import logging
import os
from typing import TYPE_CHECKING
from urllib.parse import urlparse

import requests.auth
from requests.utils import get_netrc_auth

from .http_retry import create_retry_session
from .http_retry import RetryHTTPAdapter

logger = logging.getLogger(__name__)

# Enhanced retry configuration for fromager
FROMAGER_RETRY_CONFIG = {
Expand All @@ -11,8 +21,128 @@
"raise_on_status": False,
}

# Create a session with enhanced retry capabilities
session = create_retry_session(
retry_config=FROMAGER_RETRY_CONFIG,
timeout=float(os.environ.get("FROMAGER_HTTP_TIMEOUT", "120.0")),
)
GITHUB_API_URL = os.environ.get("GITHUB_API_URL", "https://api.github.com")

GITLAB_CI_SERVER_URL = os.environ.get("CI_SERVER_URL", "https://gitlab.com")
GITLAB_JOB_TOKEN_NAME = "gitlab-ci-token"


if TYPE_CHECKING:
from collections.abc import Callable

_AuthCallback = Callable[[str, str], dict[str, str]]


class SessionAuth(requests.auth.AuthBase):
"""Authentication handler that dispatches by ``(scheme, hostname)``.

The requests library only supports a single ``session.auth`` handler
and does not provide per-host authentication on mounted adapters.
This class fills that gap by mapping ``(scheme, hostname)`` keys to
auth resolver callbacks. On the first request to a given host the
callback is invoked and the result is cached.
"""

def __init__(self) -> None:
self._callbacks: dict[tuple[str, str], _AuthCallback] = {}
self._cache: dict[tuple[str, str], dict[str, str]] = {}

def add(self, url: str, callback: _AuthCallback) -> None:
"""Register a resolver *callback* for the scheme and hostname of *url*."""
parsed = urlparse(url)
scheme = parsed.scheme
hostname = parsed.hostname or ""
if scheme not in {"http", "https"}:
raise ValueError(f"Unsupported scheme {scheme!r} in URL {url!r}")
if not hostname:
raise ValueError(f"Missing hostname in URL {url!r}")
key = (scheme, hostname)
self._cache.pop(key, None)
self._callbacks[key] = callback

def get(self, url: str) -> dict[str, str]:
"""Resolve and return the auth headers for *url*.

Invokes the registered callback on first access and caches the
result. Returns an empty dict when no callback is registered.
"""
parsed = urlparse(url)
key = (parsed.scheme, parsed.hostname or "")
if key not in self._cache:
callback = self._callbacks.get(key)
self._cache[key] = callback(*key) if callback else {}
return dict(self._cache[key])

def __call__(self, r: requests.PreparedRequest) -> requests.PreparedRequest:
auth_header = self.get(r.url or "")
if auth_header:
r.headers.update(auth_header)
return r


def _resolve_github_auth(scheme: str, hostname: str) -> dict[str, str]:
"""Resolve GitHub auth header from netrc or environment."""
url = f"{scheme}://{hostname}"
netrc_auth = get_netrc_auth(url)
if netrc_auth is not None:
_login, password = netrc_auth
logger.debug("GitHub auth: using netrc credentials for %s", url)
return {"Authorization": f"token {password}"}

token = os.environ.get("GITHUB_TOKEN")
if token:
logger.debug("GitHub auth: using GITHUB_TOKEN environment variable")
return {"Authorization": f"token {token}"}
return {}


def _resolve_gitlab_auth(scheme: str, hostname: str) -> dict[str, str]:
"""Resolve GitLab auth header from netrc or environment."""
url = f"{scheme}://{hostname}"
netrc_auth = get_netrc_auth(url)
if netrc_auth is not None:
login, password = netrc_auth
header = "JOB-TOKEN" if login == GITLAB_JOB_TOKEN_NAME else "PRIVATE-TOKEN"
logger.debug("GitLab auth: using netrc credentials for %s (%s)", url, header)
return {header: password}

token = os.environ.get("CI_JOB_TOKEN")
if token:
logger.debug("GitLab auth: using CI_JOB_TOKEN environment variable")
return {"JOB-TOKEN": token}

token = os.environ.get("GITLAB_PRIVATE_TOKEN")
if token:
logger.debug("GitLab auth: using GITLAB_PRIVATE_TOKEN environment variable")
return {"PRIVATE-TOKEN": token}
return {}
Comment thread
rd4398 marked this conversation as resolved.


def create_session() -> tuple[requests.Session, SessionAuth]:
"""Create a requests session with retry and authentication.

Mounts a `RetryHTTPAdapter` on ``http://`` and ``https://``.
Registers lazy auth callbacks for GitHub and GitLab on a
`SessionAuth` handler keyed by ``(scheme, hostname)``.

Returns the session and its `SessionAuth` so callers can
register additional auth callbacks via ``auth.add()``.
"""
adapter = RetryHTTPAdapter(
retry_config=FROMAGER_RETRY_CONFIG,
timeout=float(os.environ.get("FROMAGER_HTTP_TIMEOUT", "120.0")),
)

s = requests.Session()
s.mount("http://", adapter)
s.mount("https://", adapter)

auth = SessionAuth()
auth.add(GITHUB_API_URL, _resolve_github_auth)
auth.add(GITLAB_CI_SERVER_URL, _resolve_gitlab_auth)
s.auth = auth

return s, auth


session, session_auth = create_session()
6 changes: 0 additions & 6 deletions src/fromager/resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -1134,12 +1134,6 @@ def _find_tags(
identifier: str,
) -> Iterable[Candidate]:
headers = {"accept": "application/vnd.github+json"}

# Add GitHub authentication if available
github_token = os.environ.get("GITHUB_TOKEN")
if github_token:
headers["Authorization"] = f"token {github_token}"

nexturl = self.api_url.format(self=self)
while nexturl:
resp = session.get(nexturl, headers=headers)
Expand Down
Loading
Loading