Skip to content

Route integration-tests trigger through emu-access runner#5609

Merged
mihaimitrea-db merged 1 commit into
mainfrom
mihaimitrea-db/unblock-trigger-tests-emu-access
Apr 21, 2026
Merged

Route integration-tests trigger through emu-access runner#5609
mihaimitrea-db merged 1 commit into
mainfrom
mihaimitrea-db/unblock-trigger-tests-emu-access

Conversation

@mihaimitrea-db

Copy link
Copy Markdown
Contributor

Why

Between 2026-04-17 and 2026-04-20, the databricks-eng org tightened its IP allow list. Since then, every pull_request-triggered Integration Tests run on this repo has failed silently:

  • The trigger-tests job in .github/workflows/integration-tests.yml runs on databricks-deco-testing-runner-group (label ubuntu-latest-deco).
  • That job calls actions/create-github-app-token with owner: \${{ secrets.ORG_NAME }} (= databricks-eng), which resolves the installation via /repos/databricks-eng/.../installation.
  • The deco runner pool's egress IPs are no longer on the databricks-eng allow list, so that lookup returns 403.
  • The downstream gh workflow run terraform-isolated-pr.yml -R databricks-eng/eng-dev-ecosystem never dispatches.
  • Merges still land only because the merge_group auto-approve job rubber-stamps the check without running tests.

The databricks-release-runner-group-emu-access pool's egress IPs are on the databricks-eng allow list, so moving the cross-org dispatch job to that pool unblocks the lookup.

What changed

Minimal 2-line runner swap on the trigger-tests job only:

Before After
group: databricks-deco-testing-runner-group group: databricks-release-runner-group-emu-access
labels: ubuntu-latest-deco labels: linux-ubuntu-latest-emu-access

Not changed

  • check-token job — runs a shell script checking secret presence; no external calls, stays on deco.
  • auto-approve job — creates a same-org check via context.repo, unaffected by the cross-org allow list, stays on deco.
  • Any other workflow (tagging.yml uses the same app-token action but for same-org release work; unaffected).
  • Private-side eng-dev-ecosystem workflows — no changes required; the private workflow already accepts pull_request_number + commit_sha.

Why a single runner swap (vs. Go SDK's job split)

The Go SDK fix (databricks/databricks-sdk-go#1638) split trigger-tests into a create-check job (stays on deco, creates a same-org check run) + a trigger-tests job (moves to emu-access, does the cross-org dispatch). That was needed only because the Go SDK workflow creates a check_run on the public repo and passes check_run_id into the private workflow.

This repo's workflow does not create a check run — trigger-tests calls gh workflow run terraform-isolated-pr.yml and nothing else. No check_run_id is produced or passed. Same shape as Python SDK and Java SDK, so the minimal single-runner swap is the right fix. No job splitting, no new dependencies, no check_run_id plumbing.

Reference PRs (same pattern, already merged)

Test plan

The PR's own Integration Tests run is the test. Expected outcome:

  • `trigger-tests` runs on `linux-ubuntu-latest-emu-access` and `create-github-app-token` succeeds (no 403).
  • A `terraform-isolated-pr` `workflow_dispatch` event appears on `databricks-eng/eng-dev-ecosystem`.
  • The `Integration Tests` check on this PR transitions to `success` / `failure` based on the dispatched run.
  • Existing `merge_group` `auto-approve` path still works unchanged (not touched by this PR).

NO_CHANGELOG=true

@mihaimitrea-db mihaimitrea-db requested a review from a team as a code owner April 20, 2026 15:29
@mihaimitrea-db mihaimitrea-db requested a review from a team as a code owner April 20, 2026 15:29
@github-actions

Copy link
Copy Markdown
Contributor

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/terraform

Inputs:

  • PR number: 5609
  • Commit SHA: 1305fb62072f86b893edc51b90c397b0830241c6

Checks will be approved automatically on success.

@mihaimitrea-db mihaimitrea-db added this pull request to the merge queue Apr 21, 2026
Merged via the queue into main with commit 795a828 Apr 21, 2026
12 of 13 checks passed
@mihaimitrea-db mihaimitrea-db deleted the mihaimitrea-db/unblock-trigger-tests-emu-access branch April 21, 2026 07:54
tanmay-db pushed a commit that referenced this pull request Apr 29, 2026
## Why

Between **2026-04-17** and **2026-04-20**, the `databricks-eng` org
tightened its IP allow list. Since then, every `pull_request`-triggered
Integration Tests run on this repo has failed silently:

- The `trigger-tests` job in `.github/workflows/integration-tests.yml`
runs on `databricks-deco-testing-runner-group` (label
`ubuntu-latest-deco`).
- That job calls `actions/create-github-app-token` with `owner: \${{
secrets.ORG_NAME }}` (= `databricks-eng`), which resolves the
installation via `/repos/databricks-eng/.../installation`.
- The deco runner pool's egress IPs are no longer on the
`databricks-eng` allow list, so that lookup returns **403**.
- The downstream `gh workflow run terraform-isolated-pr.yml -R
databricks-eng/eng-dev-ecosystem` never dispatches.
- Merges still land only because the `merge_group` `auto-approve` job
rubber-stamps the check without running tests.

The `databricks-release-runner-group-emu-access` pool's egress IPs
**are** on the `databricks-eng` allow list, so moving the cross-org
dispatch job to that pool unblocks the lookup.

## What changed

Minimal 2-line runner swap on the `trigger-tests` job only:

| Before | After |
|--------|-------|
| `group: databricks-deco-testing-runner-group` | `group:
databricks-release-runner-group-emu-access` |
| `labels: ubuntu-latest-deco` | `labels:
linux-ubuntu-latest-emu-access` |

### Not changed
- `check-token` job — runs a shell script checking secret presence; no
external calls, stays on deco.
- `auto-approve` job — creates a same-org check via `context.repo`,
unaffected by the cross-org allow list, stays on deco.
- Any other workflow (`tagging.yml` uses the same app-token action but
for same-org release work; unaffected).
- Private-side `eng-dev-ecosystem` workflows — no changes required; the
private workflow already accepts `pull_request_number` + `commit_sha`.

## Why a single runner swap (vs. Go SDK's job split)

The Go SDK fix (databricks/databricks-sdk-go#1638) split `trigger-tests`
into a `create-check` job (stays on deco, creates a same-org check run)
+ a `trigger-tests` job (moves to emu-access, does the cross-org
dispatch). That was needed **only** because the Go SDK workflow creates
a `check_run` on the public repo and passes `check_run_id` into the
private workflow.

This repo's workflow does **not** create a check run — `trigger-tests`
calls `gh workflow run terraform-isolated-pr.yml` and nothing else. No
`check_run_id` is produced or passed. Same shape as Python SDK and Java
SDK, so the minimal single-runner swap is the right fix. No job
splitting, no new dependencies, no `check_run_id` plumbing.

## Reference PRs (same pattern, already merged)

- Python SDK: databricks/databricks-sdk-py#1396
- Java SDK: databricks/databricks-sdk-java#769
- Go SDK (different shape — split into two jobs — for contrast, not to
mirror): databricks/databricks-sdk-go#1638

## Test plan

The PR's own Integration Tests run is the test. Expected outcome:

- [ ] \`trigger-tests\` runs on \`linux-ubuntu-latest-emu-access\` and
\`create-github-app-token\` succeeds (no 403).
- [ ] A \`terraform-isolated-pr\` \`workflow_dispatch\` event appears on
\`databricks-eng/eng-dev-ecosystem\`.
- [ ] The \`Integration Tests\` check on this PR transitions to
\`success\` / \`failure\` based on the dispatched run.
- [ ] Existing \`merge_group\` \`auto-approve\` path still works
unchanged (not touched by this PR).

NO_CHANGELOG=true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants