Run fork PR unit tests offline from a pre-warmed Go module cache#5803
Conversation
Fork PRs cannot mint OIDC tokens, so they cannot authenticate to JFrog to resolve Go modules. This workflow downloads all modules through JFrog on a trusted runner and saves them as an immutable, timestamped GitHub Actions cache. It runs daily (GitHub evicts caches after 7 days without access), on pushes to main that change go.mod/go.sum, and on manual dispatch with an optional pr_number input that overlays only go.mod/go.sum from a fork PR -- never fork source code. Co-authored-by: Isaac
The tests job previously failed for fork PRs at the JFrog OIDC setup step, because GitHub never issues OIDC tokens to fork PR runs. Add an is-fork input to the setup-go-build-environment action: fork runs skip JFrog entirely, restore the module cache saved by the Warm Go Cache workflow (restore-only; the warmer is the sole cache writer), and resolve modules offline through Go's file:// proxy. Same-repo runs keep the existing JFrog behavior unchanged, and the required check context stays 'tests'. The coverage upload is skipped for fork PRs since they cannot access CODECOV_TOKEN. Co-authored-by: Isaac
Explain in the PR auto-comment and CONTRIBUTING.md that fork PR unit tests run offline from a pre-warmed dependency cache, and that PRs changing go.mod/go.sum need a maintainer to dispatch the Warm Go Cache workflow with the PR number before the tests check can pass. Co-authored-by: Isaac
Pass pr_number to the fork-overlay step through an env var instead of interpolating it into the script body, and require it to be numeric. Also validate the fork repo name and ref taken from the GitHub API response, since both are attacker-controlled, and reject refs starting with "-" so git fetch cannot parse them as options. Co-authored-by: Isaac
…y in the cache warmer Co-authored-by: Isaac
| shell: bash | ||
| run: | | ||
| if [ "${{ github.event.pull_request.head.repo.full_name }}" != "" ] && \ | ||
| [ "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.repository }}" ]; then |
There was a problem hiding this comment.
FYI, I vaguely remember that this was not "good enough" for a 100% foolproof solution. E.g., some entity may not be using a fork and still not have access to secrets/proxy.
But it should cover most of the cases.
There was a problem hiding this comment.
I remember: some bots (e.g., Dependabot ) don't use a fork, but uses a scoped token which doesn't have access to secrets.
So such PRs will not work with this approach. Maybe we can do a similar approach that we did for tests. Try to read the token: if it's not there for any reason, default to the cache.
There was a problem hiding this comment.
SG, I'll take a stab at this.
| - .github/workflows/warm-go-cache.yml | ||
| schedule: | ||
| - cron: "0 6 * * *" # Daily; GitHub evicts caches not accessed for 7 days | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Maybe I am being over cautious, but workflow executions are public. Someone could open a benign PR, monitor the workflow execution and, when this workflow is triggered, quickly push a malicious update to the PR. Should we use a commit hash instead of PR number?
I think this may be ok. The reason is that malicious versions should not be in the proxy to begin with, so they would not be able to be downloaded into the cache even with such "switcharoo". We can use the hash if we want, but I am fine with using the PR number too.
There was a problem hiding this comment.
I'm ok with pr for now too
Replace the fork-name check that routed unit tests to the pre-warmed Go module cache with a capability check on ACTIONS_ID_TOKEN_REQUEST_URL. The runner sets that variable only when id-token:write is effectively granted, which GitHub denies to both fork and Dependabot PRs. The previous check matched forks only, so Dependabot PRs (branched within the repo) were routed to the JFrog proxy path and failed for lack of OIDC access. Co-authored-by: Isaac
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Changes
Adds a "Warm Go Cache" workflow and a fork-aware path to the
testsjob so that PRs from forks can pass the required build/unit-test check without JFrog credentials.Why
Problem
Fork PRs currently fail the required
testscheck unconditionally: the build authenticates to the JFrog Go module proxy via GitHub OIDC, and GitHub does not issue OIDC tokens to fork PR workflow runs, so the very first setup step fails (Unable to get ACTIONS_ID_TOKEN_REQUEST_URL) before any code is compiled. This blocks all external contributions and any Databricks employee contributing from a personal fork.Solution
This ports the cache-warming SOP
warm-go-cache.yml(new) — the sole writer of thego-modules-*Actions cache. Runs on the hardened runner group with JFrog OIDC, resolves modules with the exact command CI uses (jf go mod vendor), verifies the cache resolves offline, and saves it under a timestamped key. Triggers: daily cron (beats 7-day cache eviction), pushes tomaintouchinggo.mod/go.sum, and manual dispatch.testsjob fork routing — fork PRs skip JFrog entirely: restore the warmed cache (restore-only), resolve modules offline through Go'sfile://proxy, runmake testas usual. The check context staystests. Same-repo runs are unchanged.Maintainer flow for fork PRs that change dependencies
If a fork PR modifies
go.mod/go.sum, itstestscheck fails on cache miss. After reviewing the dependency change, dispatch Actions → Warm Go Cache → Run workflow → pr_number = N: the warmer fetches onlygo.mod/go.sumfrom the fork (never source code; the PR number and head ref are validated before use), rebuilds the cache, and the contributor re-runs the failed check. The fork-PR auto-comment and CONTRIBUTING.md document this.Security properties
workflow_dispatchinputs / PR head fields are validated against injection.Tests
actionlintand the repo's conftest policies pass on all changed workflow/action files.main— please dispatch "Warm Go Cache" once right after merge.This pull request and its description were written by Isaac.