Skip to content

tsp-client: fetch spec files from GitHub before falling back to git#15607

Open
jeremymeng wants to merge 2 commits into
Azure:mainfrom
jeremymeng:tspclient/try-avoiding-git-clone
Open

tsp-client: fetch spec files from GitHub before falling back to git#15607
jeremymeng wants to merge 2 commits into
Azure:mainfrom
jeremymeng:tspclient/try-avoiding-git-clone

Conversation

@jeremymeng
Copy link
Copy Markdown
Member

Summary

Adds a GitHub-first download path to tsp-client so spec files are pulled directly from azure-rest-api-specs/-pr instead of always doing a sparse git clone. The existing git path stays in place as a silent fallback.

Wired into both call sites in src/commands.ts:

  • initCommand URL branch — fetches just tspconfig.yaml ahead of the rest of init
  • syncCommand clone branch — fetches the spec directory plus every additionalDirectories entry into the same cloneDir/<directory> layout that downstream code expects

Strategy order

Decided once per call:

  1. gh CLI when installed (uses existing gh auth so private repos and higher rate limits work)
  2. fetch against the GitHub REST API and raw.githubusercontent.com, sending Authorization: Bearer <token> when GITHUB_TOKEN (preferred) or GH_TOKEN is set
  3. Existing cloneRepo + sparseCheckout + addSpecFiles + checkoutCommit fallback

Any error in 1 or 2 is logged at debug level and we silently fall back to the git path. There is no opt-out flag.

Implementation notes

  • New module src/githubFetch.ts exposes tryFetchSpecFromGitHub and tryFetchFileFromGitHub returning boolean so callers can fall back transparently.
  • Recursively traverses the directory tree via the GitHub Trees API (recursive=1), transparently handling truncated: true by recursing into immediate subtrees.
  • Downloads use raw.githubusercontent.com (smaller than the contents API and works for files of any size).
  • Bounded retry on 429/5xx with exponential backoff; concurrency cap of 8 for blob downloads.
  • Pinned to commit SHAs (already required by resolveTspConfigUrl), so caching at GitHub's edge works well and there's no race with branch updates.

Validation

  • npm run build — clean
  • npm test — 81 passed / 9 pre-existing skips, including 15 new tests in test/githubFetch.spec.ts covering token preference, URL builders, segment-walking listTree, truncation handling, non-2xx → fallback, end-to-end directory download, and Authorization: Bearer header propagation.

Changelog

Unreleased entry added in tools/tsp-client/CHANGELOG.md.

Add a GitHub-first download path used by `init` (for `tspconfig.yaml`)
and by `sync` (for the spec directory plus any `additionalDirectories`).

Strategies are tried in order:
1. `gh` CLI when installed (uses existing `gh auth` for private repos / rate limits)
2. `fetch` against the GitHub REST API and `raw.githubusercontent.com`,
   sending `Authorization: Bearer` when `GITHUB_TOKEN` (preferred) or `GH_TOKEN` is set
3. existing `cloneRepo` + `sparseCheckout` + `addSpecFiles` + `checkoutCommit` fallback

Failures are logged at debug level and silently fall back; downstream behavior
(`cp` into `srcDir`, `additionalDirectories` handling, error reporting) is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 12, 2026 23:43
@jeremymeng
Copy link
Copy Markdown
Member Author

I found that when running tsp-client update locally it takes > 20 seconds to do the sparse checkout of the specs repo. Trying fetch first is much faster.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates tools/tsp-client to prefer downloading TypeSpec spec content directly from GitHub (via gh or the GitHub REST API + raw.githubusercontent.com) instead of always doing a sparse git clone, while keeping the existing git-based path as a fallback.

Changes:

  • Added a new githubFetch module that can list/download pinned spec trees and files from GitHub with retries and bounded concurrency.
  • Updated initCommand and syncCommand to attempt GitHub-based fetches first, then fall back to the existing sparse-clone workflow.
  • Added Vitest coverage for URL building, tree walking/truncation handling, auth header propagation, and end-to-end directory/file download behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tools/tsp-client/src/githubFetch.ts Implements GitHub-first listing/downloading (gh CLI + REST/raw) and exposes best-effort fetch APIs for callers.
tools/tsp-client/src/commands.ts Wires GitHub-first fetch into init (single file) and sync (directory + additionalDirectories) with git fallback.
tools/tsp-client/test/githubFetch.spec.ts Adds unit/integration-style tests covering token selection, URL builders, tree traversal (incl. truncation), and downloads.
tools/tsp-client/CHANGELOG.md Documents the new default GitHub-first behavior and auth env vars used.
Comments suppressed due to low confidence (1)

tools/tsp-client/src/githubFetch.ts:487

  • Similar to tryFetchSpecFromGitHub: if downloadFileFromGitHub fails after creating parent directories, returning false can leave a non-empty cloneDir and break the subsequent git clone fallback. Ensure failure leaves the filesystem in a clean/empty state for the fallback path (e.g., write to a temp file then rename, and/or remove created directories on error).
  try {
    const opts = await resolveGitHubFetchOptions();
    Logger.debug(
      `Attempting GitHub-based file fetch: ${args.repo}@${args.commit}:${args.path} via ${describeStrategy(opts)}`,
    );
    await downloadFileFromGitHub({ ...args, opts });
    return true;
  } catch (err) {
    Logger.debug(
      `GitHub-based file fetch failed; falling back to git sparse clone: ${(err as Error).message}`,
    );
    return false;
  }

Comment thread tools/tsp-client/src/githubFetch.ts Outdated
Comment on lines +434 to +462
try {
const opts = await resolveGitHubFetchOptions();
Logger.debug(
`Attempting GitHub-based spec fetch: ${args.repo}@${args.commit} via ${describeStrategy(opts)}`,
);
await downloadDirectoryFromGitHub({
repo: args.repo,
commit: args.commit,
directory: args.directory,
destRoot: args.destRoot,
opts,
});
for (const dir of args.additionalDirectories ?? []) {
Logger.debug(`Fetching additional directory from GitHub: ${dir}`);
await downloadDirectoryFromGitHub({
repo: args.repo,
commit: args.commit,
directory: dir,
destRoot: args.destRoot,
opts,
});
}
return true;
} catch (err) {
Logger.debug(
`GitHub-based spec fetch failed; falling back to git sparse clone: ${(err as Error).message}`,
);
return false;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 88f098e. tryFetchSpecFromGitHub and tryFetchFileFromGitHub now empty their destination directory contents before returning false, so the subsequent git clone fallback writes into a clean directory. tryFetchFileFromGitHub gained an optional destRoot parameter (wired from initCommand) for this purpose. New tests empties destRoot when a spec fetch fails partway through and empties destRoot when a single-file fetch fails verify the cleanup.

Comment on lines +365 to +372
export async function resolveGitHubFetchOptions(): Promise<GitHubFetchOptions> {
const useGhCli = await isGhCliAvailable();
let token = getGitHubToken();
if (!token && useGhCli) {
token = await readGhAuthToken();
}
return { useGhCli, token };
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 88f098e. The strategy order is now gh -> REST -> git: when the gh CLI attempt fails (gh installed but unauthenticated, transient gh api failure, etc.), we cleanup partial writes and retry once via the REST API path (preserving any token from GITHUB_TOKEN / GH_TOKEN / gh auth token) before falling back to git. Centralized in a new runWithFallback helper. New test retries via REST when the gh CLI strategy fails covers this via a stubbed gh subprocess runner.

1. On any GitHub-fetch failure, empty destRoot before returning false so the
   subsequent `git clone` fallback isn't blocked by partial writes.

2. When the gh CLI strategy fails (e.g. gh installed but unauthenticated),
   transparently retry once via the REST API path before giving up. The
   strategy order is now gh -> REST -> git instead of gh -> git.

Implementation: extract a `runWithFallback` helper that owns strategy
selection, retry, and cleanup. Add a `destRoot` cleanup hook to
`tryFetchFileFromGitHub` and wire it from `initCommand`. Expose a
`_setGhRunnerForTests` seam so the gh subprocess can be stubbed; add 3 new
tests covering spec-fetch cleanup, file-fetch cleanup, and gh -> REST retry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants