Skip to content

feat(extensions): authenticate GitHub-hosted catalog and download requests with GITHUB_TOKEN/GH_TOKEN#2087

Open
anasseth wants to merge 1 commit intogithub:mainfrom
anasseth:feat/github-token-auth-private-catalogs
Open

feat(extensions): authenticate GitHub-hosted catalog and download requests with GITHUB_TOKEN/GH_TOKEN#2087
anasseth wants to merge 1 commit intogithub:mainfrom
anasseth:feat/github-token-auth-private-catalogs

Conversation

@anasseth
Copy link
Copy Markdown

@anasseth anasseth commented Apr 4, 2026

Description

Fixes #2037. Closes the authentication gap introduced when multi-catalog support landed in #1707.

Before this change, all network requests in ExtensionCatalog used bare urllib.request.urlopen(url) with no headers. Any catalog or extension ZIP hosted in a private GitHub repository would silently fail with HTTP 404, regardless of whether GITHUB_TOKEN or GH_TOKEN was set in the environment.

This PR adds a _make_request(url) helper on ExtensionCatalog that attaches an Authorization: token <value> header when:

  • GITHUB_TOKEN or GH_TOKEN is present in the environment, and
  • the target URL is a GitHub-hosted domain (raw.githubusercontent.com,
    github.com, or api.github.com)

Non-GitHub URLs are always fetched without credentials to prevent token leakage to third-party hosts.

The three affected call sites are:

  • _fetch_single_catalog — fetches catalog JSON from a configured catalog URL
  • fetch_catalog — legacy single-catalog path used when SPECKIT_CATALOG_URL is set
  • download_extension — downloads extension ZIP from a release asset URL

No behavior change for users without a token set — the code path is identical to before.

Documentation in EXTENSION-USER-GUIDE.md has been updated: the existing GH_TOKEN/GITHUB_TOKEN table entry (which described the token as "for downloads" only) now accurately reflects that it covers catalog fetches as well, and a private-catalog usage example has been added.

Testing

  • Ran uv run specify --help — CLI loads correctly, all commands present
  • Ran full test suite: 1131 passed, 5 skipped, 2 failed
    • Both failures are pre-existing on main before this change:
    • TestManifestPathTraversal::test_record_file_rejects_absolute_path
    • TestCommandRegistrar::test_codex_skill_registration_uses_fallback_script_variant_without_init_options
  • 8 new tests added to TestExtensionCatalog in tests/test_extensions.py:
    • 6 unit tests for _make_request: no-token path, GITHUB_TOKEN, GH_TOKEN fallback, precedence when both are set, non-GitHub URL never gets header (security), api.github.com domain
    • 2 integration tests that mock urlopen and assert the captured Request object carries the auth header — one for _fetch_single_catalog, one for download_extension

AI Disclosure

  • I did use AI assistance (describe below)
    This PR was implemented with AI assistance via Claude Code.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GitHub-token authentication to extension catalog fetching and extension ZIP downloads so catalogs/assets hosted in private GitHub repos work when GITHUB_TOKEN/GH_TOKEN is set, while aiming to avoid leaking credentials to non-GitHub hosts.

Changes:

  • Introduces ExtensionCatalog._make_request(url) to attach an Authorization header for GitHub-hosted URLs when a token is available.
  • Updates all urllib.request.urlopen(...) call sites in ExtensionCatalog to use the new request builder.
  • Adds unit/integration tests for the request/header behavior and updates user docs to reflect token usage for both catalogs and downloads.
Show a summary per file
File Description
src/specify_cli/extensions.py Adds _make_request and routes catalog/download urlopen calls through it to support authenticated GitHub fetches.
tests/test_extensions.py Adds tests validating auth header behavior and that urlopen receives a Request containing the header.
extensions/EXTENSION-USER-GUIDE.md Updates env var documentation and adds an example for private GitHub-hosted catalogs.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment on lines +1425 to +1431

headers: Dict[str, str] = {}
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
if token and any(
host in url
for host in ("raw.githubusercontent.com", "github.com", "api.github.com")
):
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub-hosted URL check uses substring matching (host in url), which can incorrectly attach the token to non-GitHub hosts (e.g., https://github.com.evil.com/... or https://internal.example.com/path/github.com/...) and violates the stated goal of preventing credential leakage. Parse the URL and compare urlparse(url).hostname (lowercased) against an allowlist of exact hostnames instead of scanning the full URL string.

Suggested change
headers: Dict[str, str] = {}
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
if token and any(
host in url
for host in ("raw.githubusercontent.com", "github.com", "api.github.com")
):
from urllib.parse import urlparse
headers: Dict[str, str] = {}
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
hostname = (urlparse(url).hostname or "").lower()
github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"}
if token and hostname in github_hosts:

Copilot uses AI. Check for mistakes.
catalog = self._make_catalog(temp_dir)
req = catalog._make_request("https://internal.example.com/catalog.json")
assert "Authorization" not in req.headers

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current tests cover a generic non-GitHub domain, but they don't cover common spoofing cases that would slip through the current substring-based domain check (e.g., https://github.com.evil.com/... or a non-GitHub host whose path/query contains github.com). Add negative tests for these URL shapes to ensure the auth header is never attached outside the intended allowlist.

Suggested change
def test_make_request_token_not_added_for_github_lookalike_host(self, temp_dir, monkeypatch):
"""Auth header is not attached to non-GitHub hosts that only contain github.com in the hostname."""
monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken")
catalog = self._make_catalog(temp_dir)
req = catalog._make_request("https://github.com.evil.com/org/repo/releases/download/v1/ext.zip")
assert "Authorization" not in req.headers
def test_make_request_token_not_added_for_non_github_host_with_github_in_path(self, temp_dir, monkeypatch):
"""Auth header is not attached when a non-GitHub host includes github.com only in the URL path."""
monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken")
catalog = self._make_catalog(temp_dir)
req = catalog._make_request("https://evil.example.com/github.com/org/repo/releases/download/v1/ext.zip")
assert "Authorization" not in req.headers
def test_make_request_token_not_added_for_non_github_host_with_github_in_query(self, temp_dir, monkeypatch):
"""Auth header is not attached when a non-GitHub host includes github.com only in the query string."""
monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken")
catalog = self._make_catalog(temp_dir)
req = catalog._make_request("https://evil.example.com/download?source=https://github.com/org/repo/releases/download/v1/ext.zip")
assert "Authorization" not in req.headers

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support GITHUB_TOKEN authentication for private catalog and extension download URLs

4 participants