Apply Databricks Labs Repository Lockdown policy#19
Open
mjohns-databricks wants to merge 5 commits intomasterfrom
Open
Apply Databricks Labs Repository Lockdown policy#19mjohns-databricks wants to merge 5 commits intomasterfrom
mjohns-databricks wants to merge 5 commits intomasterfrom
Conversation
Implements the three-script workflow from the Databricks Labs Repository Lockdown policy: list-external-actions -> resolve-action-ref -> pin-gh-actions. - list-external-actions: emits every third-party action referenced under .github/ (requires yq by Mike Farah). - resolve-action-ref: for each action, finds the most recent release tag published before the cutoff (2026-03-10T00:00:00Z) and resolves it to a commit SHA. Handles both mono-repo conventions: subpath-prefixed tags (databrickslabs/sandbox/acceptance -> acceptance/v0.4.4) and top-level shared tags (github/codeql-action/analyze -> v4.32.6, where the subpath is just a directory inside a repo using a unified tag series). - pin-gh-actions: consumes resolve-action-ref output, rewrites every matching `uses:` under .github/ with the SHA form + tag comment, and stages (but does not commit) the result. Skips databricks/databrickslabs actions per policy. Deviates from the blueprint reference in one way: does not auto-create or switch branches, because GeoBrix manages branches manually. README documents the typical flow and the 2026-03-10 cutoff. Co-authored-by: Isaac
Every third-party `uses:` under .github/workflows/ and .github/actions/ is
now pinned to the commit SHA of the most recent release published before
2026-03-10T00:00:00Z, with the release tag preserved as an inline comment
for cross-reference (the comment is informational only — reviewers must
re-verify the SHA against the upstream release). Generated by running:
./scripts/security/list-external-actions \
| xargs ./scripts/security/resolve-action-ref \
| ./scripts/security/pin-gh-actions
Resolutions (all 15 external refs, ordered; every ref was on a mutable
tag prior to this change):
actions/cache@v4, v5 -> cdf6c1fa... # v5.0.3
actions/checkout@v5 -> de0fac2e... # v6.0.2 (major bump)
actions/deploy-pages@v4 -> d6db9016... # v4.0.5
actions/download-artifact@v5 -> 70fc10c6... # v8.0.0 (major bump)
actions/setup-java@v5 -> be666c2f... # v5.2.0
actions/setup-node@v4 -> 53b83947... # v6.3.0 (major bump)
actions/setup-python@v5 -> a309ff8b... # v6.2.0 (major bump)
actions/upload-artifact@v5 -> bbbca2dd... # v7.0.0 (major bump)
actions/upload-pages-artifact@v3-> 7b1f4a76... # v4.0.0 (major bump)
codecov/codecov-action@v5 -> 671740ac... # v5.5.2
github/codeql-action/*@v4 -> 0d579ffd... # v4.32.6
pypa/gh-action-pypi-publish@... -> ed0c5393... # v1.13.0
Major-version jumps are consistent with the policy ("latest release before
the cutoff") but carry breaking-change risk — reviewers should validate
each bump against the action's CHANGELOG before merge. In particular,
upload-artifact v4+ and download-artifact v4+ changed artifact immutability
semantics; the new versions may interact with the existing upload_artifacts
composite action in ways worth exercising under CI before unblocking.
Local composite action refs (./.github/actions/*) are unaffected —
they're first-party.
Co-authored-by: Isaac
…kflows Databricks Labs Repository Lockdown policy requires any workflow using a non-exempt secret (anything other than GITHUB_TOKEN or CODECOV_TOKEN) to run inside a single protected GitHub Environment. GeoBrix uses REPO_ACCESS_TOKEN (PAT fallback for private-repo checkout) across most workflows, so every job that calls actions/checkout with that token now sets `environment: runtime`. Changes: - Added `permissions: contents: read` at top level where missing (codeql-analysis, publish-maven, release) and removed stray top-level `id-token: write` from build_main / build_python / build_scala / build_scala_by_package / codecov-scala-parallel / codecov-upload (none of those jobs request OIDC tokens). - deploy-docs: moved `pages: write` and `id-token: write` from top level down to the deploy job only (least privilege). The build job keeps `environment: runtime` for its REPO_ACCESS_TOKEN checkout; the deploy job keeps its existing `environment: github-pages`. - doc-tests: added `environment: runtime` on all three (currently disabled) jobs that perform REPO_ACCESS_TOKEN checkouts, so they are compliant when re-enabled. - release.yml: changed `environment: release` -> `environment: runtime` to converge on the single protected env the policy expects. - release.yml + publish-maven.yml: DISABLED via `if: false` on their publish jobs with a banner comment explaining the policy context and how to re-enable. GeoBrix is not publishing to PyPI or GitHub Packages from Actions today; we will coordinate with Labs before re-enabling. Exempt secrets per policy (GITHUB_TOKEN, CODECOV_TOKEN) are untouched and do not require the protected environment. Co-authored-by: Isaac
Labs Repository Lockdown policy: every Dependabot ecosystem in the repo must apply a cooldown so we are not the first adopters of a just-released (possibly compromised) version. Applied `cooldown.default-days: 7` to both maven and pip ecosystems. The policy also excludes `github-actions` from Dependabot entirely — action SHAs are refreshed manually via scripts/security/pin-gh-actions so bumps are reviewed as part of the security workflow rather than as auto-opened PRs. Added a comment documenting the intentional absence. Co-authored-by: Isaac
Databricks Labs Repository Lockdown policy requires all build-time binary fetches to be integrity-verified and all base images to be pinned by digest so a compromised registry/mirror cannot silently swap bytes. Dockerfile changes: - Pinned `FROM ubuntu:24.04` to the multi-arch manifest-list digest `sha256:c4a8d5503dfb2a3eb8ab5f807da5bc69a85730fb49b5cfca2330194ebcc41c7b` (kept `# ubuntu:24.04` comment for human readability). - Hadoop 3.4.0 tarball: replaced `wget | tar` stream with download -> sha512sum -c -> extract, using the official HADOOP_SHA512 from downloads.apache.org/.sha512. - GDAL 3.11.4 tarball: same pattern with a locally-computed SHA-256. OSGeo only publishes MD5; we MD5-verified the upstream download (9f4fa4b3be48fb60d5dd76fecb11a5f6) then computed and pinned SHA-256. - Apache Maven 3.9.9: replaced the dynamic `.sha512` fetch (which reads the checksum from the same origin as the tarball and therefore provides no protection against origin compromise) with an in-Dockerfile pinned MAVEN_SHA512 ARG, cross-checked against archive.apache.org. scripts/util/install_hadoop.sh: - Not referenced by the build; kept as a manual mirror of the Dockerfile flow. Rewrote with `set -euo pipefail`, a pinned HADOOP_SHA512, and `sha512sum -c` verification. Made executable. Each checksum has a matching comment documenting the authoritative source and the requirement to bump it in lockstep with the underlying version. Co-authored-by: Isaac
Collaborator
Author
|
Notes:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Applies the Databricks Labs Repository Lockdown policy to GeoBrix ahead of the 2026-03-10 SHA-pinning cutoff. Scope is lockdown items 1, 3–6 (item 2, Hatch→uv, is N/A — GeoBrix is Scala/Maven + setuptools Python with no Hatch).
Five commits on top of
master:3ff670aAddscripts/security/action-pinning tooling (list-external-actions,resolve-action-ref,pin-gh-actions, README).514871bPin external GitHub Actions to commit SHAs (cutoff 2026-03-10). Everyuses: org/repo@<tag>in.github/workflows/and.github/actions/is rewritten to@<sha> # <tag>. Local first-partyuses: ./.github/actions/*refs are intentionally unchanged. Tooling is rerunnable.0fab757Permissions +environment: runtimehardening. Top-levelcontents: readadded where missing; stray top-levelid-token: writeremoved from jobs that never request OIDC. Every job usingREPO_ACCESS_TOKEN(the only non-exempt secret in use) now runs in the single protected environmentruntime.deploy-docsdropspages: write/id-token: writefrom top level — moved to thedeployjob only.release.yml'senvironment: releaserenamed →runtime.release.ymlandpublish-maven.ymldisabled viaif: falsewith banner comments and re-enable instructions (we are not publishing to PyPI / GitHub Packages from Actions today).7076d47Dependabot:cooldown.default-days: 7onmavenandpipecosystems;github-actionsecosystem intentionally absent (SHAs are refreshed manually viascripts/security/pin-gh-actions), documented in a comment.6bd5a0bDockerfile + install_hadoop.sh hardening.FROM ubuntu:24.04pinned by multi-arch manifest-list digestsha256:c4a8d5503dfb…41c7b. Hadoop 3.4.0 (pinned SHA-512 fromdownloads.apache.org), GDAL 3.11.4 (pinned SHA-256; upstream only ships MD5, so we MD5-verified the tarball then computed SHA-256 locally), and Maven 3.9.9 (pinned SHA-512; previously did a dynamic.sha512fetch from the same origin as the tarball → no protection against origin compromise).scripts/util/install_hadoop.sh(unreferenced manual helper) hardened withset -euo pipefail+ matching SHA-512 verification.Policy items — coverage map
Reviewer notes — breaking-ish changes to double-check
Some Actions were pinned at a newer major than the tag the repo was previously using (commit
514871b):actions/checkoutv5 → v6 (SHAde0fac2e…)actions/upload-artifactv5 → v7actions/download-artifactv5 → v8actions/setup-nodev4 → v6actions/setup-pythonv5 → v6actions/upload-pages-artifactv3 → v4The repo's workflows still accept
node20runtime and the public API shapes are unchanged, but please confirm with a green CI run.Operational prerequisites on the repo
Before merging:
runtime(Settings → Environments → New environment). No reviewers/wait-timer required initially — the environment binding itself is the gate forREPO_ACCESS_TOKENscoping.REPO_ACCESS_TOKENfrom repo-level secrets to theruntimeenvironment's secrets so it can only be read by jobs that bind to it.CODECOV_TOKENstays at the repo/org level (exempt secret — no environment needed).Test plan
runtimeenvironment exists andREPO_ACCESS_TOKENis scoped to itbuild mainrun (PR trigger path hitsupdate-doc-inventory+build, both gated byenvironment: runtime)deploy-docspreview run still builds (doesn't deploy on PRs)gbx:test:scala+gbx:test:pythonpass in Docker (no behavior change expected, but Dockerfile was rewritten around the Hadoop/GDAL/Maven fetch sections)scripts/security/list-external-actionsreturns an empty problem list (every external ref is a SHA with a tag comment)This pull request and its description were written by Isaac.