jobs/index: Add `SquashIndexViaApi` job by Turbo87 · Pull Request #13517 · rust-lang/crates.io

Turbo87 · 2026-04-27T18:26:23Z

The existing SquashIndex job has been OOMing the production worker because git push regenerates the full pack on every squash. This adds an alternative job that drives the same squash through GitHub's API (get_ref -> get_commit -> create_commit -> create_ref -> drift re-check -> update_ref), so the server can reuse objects it already has instead of receiving a fresh pack.

The original SquashIndex job stays in place as a fallback. Both share the repository queue so index-writing jobs continue to serialize, even though the new job does not write to the local bare repo.

Along the way the crates_io_github crate gains:

parse_github_slug() helper
A configurable base URL on RealGitHubClient plus a mockito-based test harness
get_ref() / get_commit() trait methods (unauthenticated, sufficient for the public index repos)
create_commit() / create_ref() / update_ref() write methods authenticated via AccessToken

Environment now carries an Arc<dyn GitHubClient> so jobs can talk to GitHub directly, and crates-admin enqueue-job exposes the new job for manual operator use.

Store the GitHub API base URL in a `Url` field instead of hardcoding `https://api.github.com` inside `_request`, and route all requests through `base_url.join(...)`. `new()` still defaults to the production URL; a private `with_base_url` constructor is introduced so upcoming mockito-based tests can point the client at a local mock server. No behavior change for production callers.

Add `mockito` as a dev-dependency and introduce a `tests` module that points `RealGitHubClient` at a local mock server via `with_base_url`. A first test covers `get_user`, exercising the bearer-auth path and verifying that requests hit the configured base URL. This paves the way for covering additional endpoints without reaching out to the real GitHub API.

These let us read a repository's current ref state and commit/tree SHAs over the GitHub REST API instead of via a local git clone. They take no authentication since the crates.io index repos are public and the 60/hour unauthenticated limit is more than sufficient for this use case. A new `examples/fetch_ref.rs` binary exercises both methods end-to-end against `api.github.com`.

Adds `create_commit`, `create_ref`, and `update_ref` to `GitHubClient`, all authenticated via an `&AccessToken`. `update_ref` strips a leading `refs/` when building the URL; `create_ref` sends the fully qualified ref in the body. Writes share a new private `_mutate` helper that mirrors the `apply_auth` closure shape of `_request`.

Adds a `github: Arc<dyn GitHubClient>` field to `Environment` and plumbs it through the background worker binary and the test harness. `App.github` changes from `Box` to `Arc` so the same client instance can flow into both `App` and `Environment` in tests without cloning the underlying mock. No consumer yet; `SquashIndexViaApi` will use it in the next commit.

Collapses the index via GitHub's REST Git Database API (`get_ref` -> `get_commit` -> `create_commit` -> `create_ref` -> drift re-check -> `update_ref`) instead of shelling out to `git push`. Avoids the pack generation that has been OOMing the production worker, since the server already has the objects. The existing `SquashIndex` job stays in place as a fallback; both share the `repository` queue so index-writing jobs serialize even though this one does not touch the local bare repo. Two integration tests cover the happy path and the drift-check bail.

Mirrors the existing `SquashIndex` subcommand so operators can manually enqueue the API-driven squash via `crates-admin enqueue-job squash_index_via_api`.

Turbo87 · 2026-04-27T18:30:54Z

as mentioned in https://rust-lang.zulipchat.com/#narrow/channel/318791-t-crates-io/topic/index.20squashing/near/590951284, I've successfully tested this PR on our staging environment. the staging index is significantly smaller than the production index, but I'm fairly confident that the same logic should also work on production. if not, we can still fall back to the original background job for now.

LawnGnome

There's quite a lot here. I think I understand how it all fits together, but I don't think I'd want to have to pass an exam that deep dived into it.

If there's one thing that just worries me a tad here, it's the possibility of GitHub itself drifting in terms of API behaviour. The test coverage here is excellent, but relies on mocks, and this feels like a part of the GitHub API that is lower level than most users ever touch. I don't practically know how we'd test it without significant effort — we'd presumably need a smoke test type thing with a real repo in a known state that we could manipulate, and it's probably not worth it, but I like to write down my vague worries just in case.

Anyway, since this has been tested on staging already, I'm happy for us to give it a go.

View changes since this review

LawnGnome · 2026-04-27T23:30:41Z

+        let squash_start = Instant::now();
+        let input = CreateCommit {
+            message: &message,
+            tree: &tree_sha,
+            parents: &[],
+        };
+        let new_commit = github.create_commit(&owner, &repo, &input, &auth).await?;


It took me a few read-throughs to understand exactly why this works — the fact tree objects are logically separate from commits is one of those things that you (well, I, at least) don't tend to think about much as an explicit part of the data model that underpins Git.

Perhaps we could have a brief comment somewhere in here explaining the theory for the benefit of future crates.io maintainers? (Or, more likely, me in six months.)

I've added 097eeba which should hopefully make this clearer :)

Co-authored-by: Adam Harvey <adam@adamharvey.name>

Move the explanatory doc comment onto the struct itself and expand it to spell out how blobs, trees and commits relate in Git's object model, so a future reader can see at a glance why reusing the existing tree SHA on a fresh parentless commit avoids any blob or pack work.

Turbo87 added 9 commits April 27, 2026 20:19

jobs/index/squash: Extract shared helpers

9f11474

github: Add parse_github_slug() helper

19360e3

admin/enqueue_job: Expose SquashIndexViaApi job

cca980e

Mirrors the existing `SquashIndex` subcommand so operators can manually enqueue the API-driven squash via `crates-admin enqueue-job squash_index_via_api`.

rustbot added the A-backend ⚙️ label Apr 27, 2026

Turbo87 added the C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear label Apr 27, 2026

Turbo87 requested a review from a team April 27, 2026 18:31

LawnGnome approved these changes Apr 27, 2026

View reviewed changes

Turbo87 and others added 2 commits April 28, 2026 08:21

Apply suggestions from code review

25fdef6

Co-authored-by: Adam Harvey <adam@adamharvey.name>

Turbo87 enabled auto-merge April 28, 2026 06:28

Turbo87 merged commit 542b8e5 into rust-lang:main Apr 28, 2026
13 checks passed

Turbo87 deleted the squash-via-api branch April 28, 2026 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs/index: Add `SquashIndexViaApi` job#13517

jobs/index: Add `SquashIndexViaApi` job#13517
Turbo87 merged 11 commits into
rust-lang:mainfrom
Turbo87:squash-via-api

Turbo87 commented Apr 27, 2026 •

edited

Loading

Uh oh!

Turbo87 commented Apr 27, 2026

Uh oh!

LawnGnome left a comment •

edited by rustbot

Loading

Uh oh!

Uh oh!

Uh oh!

LawnGnome Apr 27, 2026

Uh oh!

Turbo87 Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Turbo87 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related

Uh oh!

Turbo87 commented Apr 27, 2026

Uh oh!

LawnGnome left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LawnGnome Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Turbo87 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Turbo87 commented Apr 27, 2026 •

edited

Loading

LawnGnome left a comment •

edited by rustbot

Loading