feat(extensions): authenticate GitHub-hosted catalog and download requests with GITHUB_TOKEN/GH_TOKEN#2087
Conversation
…uests with GITHUB_TOKEN/GH_TOKEN
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token authentication to extension catalog fetching and extension ZIP downloads so catalogs/assets hosted in private GitHub repos work when GITHUB_TOKEN/GH_TOKEN is set, while aiming to avoid leaking credentials to non-GitHub hosts.
Changes:
- Introduces
ExtensionCatalog._make_request(url)to attach anAuthorizationheader for GitHub-hosted URLs when a token is available. - Updates all
urllib.request.urlopen(...)call sites inExtensionCatalogto use the new request builder. - Adds unit/integration tests for the request/header behavior and updates user docs to reflect token usage for both catalogs and downloads.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/extensions.py |
Adds _make_request and routes catalog/download urlopen calls through it to support authenticated GitHub fetches. |
tests/test_extensions.py |
Adds tests validating auth header behavior and that urlopen receives a Request containing the header. |
extensions/EXTENSION-USER-GUIDE.md |
Updates env var documentation and adds an example for private GitHub-hosted catalogs. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
|
|
||
| headers: Dict[str, str] = {} | ||
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | ||
| if token and any( | ||
| host in url | ||
| for host in ("raw.githubusercontent.com", "github.com", "api.github.com") | ||
| ): |
There was a problem hiding this comment.
The GitHub-hosted URL check uses substring matching (host in url), which can incorrectly attach the token to non-GitHub hosts (e.g., https://github.com.evil.com/... or https://internal.example.com/path/github.com/...) and violates the stated goal of preventing credential leakage. Parse the URL and compare urlparse(url).hostname (lowercased) against an allowlist of exact hostnames instead of scanning the full URL string.
| headers: Dict[str, str] = {} | |
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | |
| if token and any( | |
| host in url | |
| for host in ("raw.githubusercontent.com", "github.com", "api.github.com") | |
| ): | |
| from urllib.parse import urlparse | |
| headers: Dict[str, str] = {} | |
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | |
| hostname = (urlparse(url).hostname or "").lower() | |
| github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | |
| if token and hostname in github_hosts: |
| catalog = self._make_catalog(temp_dir) | ||
| req = catalog._make_request("https://internal.example.com/catalog.json") | ||
| assert "Authorization" not in req.headers | ||
|
|
There was a problem hiding this comment.
Current tests cover a generic non-GitHub domain, but they don't cover common spoofing cases that would slip through the current substring-based domain check (e.g., https://github.com.evil.com/... or a non-GitHub host whose path/query contains github.com). Add negative tests for these URL shapes to ensure the auth header is never attached outside the intended allowlist.
| def test_make_request_token_not_added_for_github_lookalike_host(self, temp_dir, monkeypatch): | |
| """Auth header is not attached to non-GitHub hosts that only contain github.com in the hostname.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://github.com.evil.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers | |
| def test_make_request_token_not_added_for_non_github_host_with_github_in_path(self, temp_dir, monkeypatch): | |
| """Auth header is not attached when a non-GitHub host includes github.com only in the URL path.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://evil.example.com/github.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers | |
| def test_make_request_token_not_added_for_non_github_host_with_github_in_query(self, temp_dir, monkeypatch): | |
| """Auth header is not attached when a non-GitHub host includes github.com only in the query string.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://evil.example.com/download?source=https://github.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers |
Description
Fixes #2037. Closes the authentication gap introduced when multi-catalog support landed in #1707.
Before this change, all network requests in
ExtensionCatalogused bareurllib.request.urlopen(url)with no headers. Any catalog or extension ZIP hosted in a private GitHub repository would silently fail with HTTP 404, regardless of whetherGITHUB_TOKENorGH_TOKENwas set in the environment.This PR adds a
_make_request(url)helper onExtensionCatalogthat attaches anAuthorization: token <value>header when:GITHUB_TOKENorGH_TOKENis present in the environment, andraw.githubusercontent.com,github.com, orapi.github.com)Non-GitHub URLs are always fetched without credentials to prevent token leakage to third-party hosts.
The three affected call sites are:
_fetch_single_catalog— fetches catalog JSON from a configured catalog URLfetch_catalog— legacy single-catalog path used whenSPECKIT_CATALOG_URLis setdownload_extension— downloads extension ZIP from a release asset URLNo behavior change for users without a token set — the code path is identical to before.
Documentation in
EXTENSION-USER-GUIDE.mdhas been updated: the existingGH_TOKEN/GITHUB_TOKENtable entry (which described the token as "for downloads" only) now accurately reflects that it covers catalog fetches as well, and a private-catalog usage example has been added.Testing
uv run specify --help— CLI loads correctly, all commands presentmainbefore this change:TestManifestPathTraversal::test_record_file_rejects_absolute_pathTestCommandRegistrar::test_codex_skill_registration_uses_fallback_script_variant_without_init_optionsTestExtensionCatalogintests/test_extensions.py:_make_request: no-token path,GITHUB_TOKEN,GH_TOKENfallback, precedence when both are set, non-GitHub URL never gets header (security),api.github.comdomainurlopenand assert the capturedRequestobject carries the auth header — one for_fetch_single_catalog, one fordownload_extensionAI Disclosure
This PR was implemented with AI assistance via Claude Code.