Skip to content

acc: add rejecting proxy for local tests#5483

Merged
denik merged 8 commits into
mainfrom
denik/internet-sandbo
Jun 11, 2026
Merged

acc: add rejecting proxy for local tests#5483
denik merged 8 commits into
mainfrom
denik/internet-sandbo

Conversation

@denik

@denik denik commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Installs a per-test blocking HTTP proxy for all Local=true acceptance tests. Unintentional outbound internet access now fails fast with a clear diagnostic (method, host, User-Agent) instead of silently hanging or timing out.

How it works:

  • A per-test proxy server binds to 127.0.0.1:0 and returns HTTP 400 Bad Request to every request
  • HTTP 400 (not TCP RST) means the Databricks SDK treats the failure as an HTTP error, not a retriable IO error — avoiding the SDK's 5-minute ECONNREFUSED retry loop
  • t.Errorf is called for real external hosts, so the test is immediately marked as failed with a useful message
  • t.Logf (no failure) is used for RFC 2606 reserved TLDs (.test, .example, .invalid) and loopback IPs — these are intentional unreachable fixtures or the Terraform provider talking to the local test server

Proxy settings injected for local runs:

  • HTTPS_PROXY=<blocking proxy> — only HTTPS, because the local test server is plain HTTP
  • NO_PROXY=127.0.0.1,localhost — Python's urllib (used by helper scripts) doesn't auto-bypass loopback
  • CHECKPOINT_DISABLE=1 — stops Terraform from phoning home to checkpoint-api.hashicorp.com

Test fixture fixes (tests that were accidentally relying on real internet access):

  • Six auth test scripts: replace real Databricks hostnames (accounts.cloud.databricks.com, unified.databricks.com, non.existing.subdomain.databricks.com) with .test TLD equivalents per the RFC 2606 repo convention
  • bundle/deploy/mlops-stacks: set Local=false — it clones from github.com/databricks/mlops-stacks
  • bundle/templates-machinery/wrong-url: switch from .com to .invalid TLD

Test plan

  • go test ./acceptance -count=1 passes locally (pre-existing flaky tests in parallel runs are unrelated — they also fail without this change)
  • Real internet access from local tests now produces an immediate, attributed failure instead of a timeout

This pull request and its description were written by Isaac.

@denik denik temporarily deployed to test-trigger-is June 9, 2026 14:20 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 9, 2026 14:20 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 9, 2026 15:03 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 9, 2026 15:03 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Commit: a26f59d

Run: 27332349637

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 15 264 968 7:13
🟨​ aws windows 7 15 266 966 14:05
💚​ aws-ucws linux 7 15 360 882 7:27
💚​ aws-ucws windows 7 15 362 880 12:20
💚​ azure linux 1 17 267 966 6:19
💚​ azure windows 1 17 269 964 10:48
💚​ azure-ucws linux 1 17 365 878 8:00
💚​ azure-ucws windows 1 17 367 876 12:07
💚​ gcp linux 1 17 263 969 7:09
💚​ gcp windows 1 17 265 967 11:08
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 💚​R 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 29 slowest tests (at least 2 minutes):
duration env testname
6:13 aws-ucws windows TestAccept
6:07 azure-ucws windows TestAccept
5:43 azure windows TestAccept
5:36 gcp windows TestAccept
4:47 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:16 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:10 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:08 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:43 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:29 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:15 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:09 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:09 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:05 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:03 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:58 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:55 aws linux TestAccept
2:52 gcp linux TestAccept
2:52 azure linux TestAccept
2:51 azure-ucws linux TestAccept
2:48 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:48 aws-ucws linux TestAccept
2:45 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:44 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:38 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:37 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:32 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:31 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:21 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@denik denik temporarily deployed to test-trigger-is June 10, 2026 08:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 08:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 08:28 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 08:28 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 08:56 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 08:56 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 09:50 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 09:50 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 10:02 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 10, 2026 10:02 — with GitHub Actions Inactive
@denik denik changed the title acceptance: add internet sandbox for local tests acceptance: add rejecting proxy for local tests Jun 10, 2026
@denik denik changed the title acceptance: add rejecting proxy for local tests acc: add rejecting proxy for local tests Jun 10, 2026
@denik denik enabled auto-merge June 10, 2026 10:14
denik added 7 commits June 11, 2026 09:49
Installs a per-test blocking HTTP proxy for all Local=true acceptance
tests. Unintentional outbound internet access now fails fast with a clear
diagnostic instead of silently hanging or timing out.

The proxy listens on a random loopback port and responds with HTTP 400
(not a TCP RST) so the Databricks SDK's httpclient doesn't treat the
failure as a retriable IO error (ECONNREFUSED triggers a 5-minute retry
loop). For each blocked request it calls t.Errorf with method, host, and
User-Agent so the failure is attributed to the right test.

Exemptions (t.Logf only, no test failure):
- RFC 2606 §2 reserved TLDs (.test, .example, .invalid) — intentional
  unreachable fixtures used in negative test cases
- Loopback IPs (127.x.x.x, ::1) — the Terraform provider routes its
  HTTP requests to the local test server through HTTPS_PROXY even though
  Go's standard library skips the proxy for loopback destinations

Other changes in this commit:
- HTTPS_PROXY only (not HTTP_PROXY): the local test server is plain HTTP,
  so setting HTTP_PROXY would intercept its traffic
- NO_PROXY=127.0.0.1,localhost: Python's urllib doesn't auto-bypass the
  proxy for loopback addresses, so the helper scripts need this exemption
- CHECKPOINT_DISABLE=1: stops Terraform from phoning home to
  checkpoint-api.hashicorp.com on every terraform-engine test
- Replace real Databricks hostnames with .test TLD equivalents in six
  auth test fixtures that were accidentally relying on internet access
- bundle/deploy/mlops-stacks: Local=false (clones from github.com)
- bundle/templates-machinery/wrong-url: use .invalid TLD

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
- wrong-url/test.toml: update replacement rules for the .invalid TLD
  (old rules referenced the old .com URL); normalize proxy error to a
  stable "(redacted)" form instead of relying on system-specific paths
- cmd/auth/profiles/script: sort hostmetadata warning lines before
  writing to stdout so the golden file is stable across concurrent
  profile loading (warnings appeared in non-deterministic order because
  profiles are loaded in parallel goroutines)
- blocking_proxy.go: use fmt.Fprintf instead of []byte(fmt.Sprintf)
  (modernize linter fix)

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
Previously a new TCP listener was started per test. Now a single shared
listener is created lazily via sync.OnceValue; errors are attributed to
the top-level TestAccept t. Pass -debugsandbox to get a per-test proxy
that attributes failures to the exact subtest.

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
Replace SetupSharedProxy + StartBlockingProxy with a single
StartBlockingProxy(t, hint). The caller decides scope: testAccept
passes the top-level t (shared for the run) and a hint suggesting
-debugsandbox; runTest passes its own subtest t when -debugsandbox
is set. This removes the sync.OnceValue / atomic.Pointer machinery.

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is June 11, 2026 07:50 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 11, 2026 07:50 — with GitHub Actions Inactive
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik force-pushed the denik/internet-sandbo branch from c213591 to a26f59d Compare June 11, 2026 07:53
@denik denik temporarily deployed to test-trigger-is June 11, 2026 07:54 — with GitHub Actions Inactive
Comment on lines -20 to +28
$CLI auth profiles --skip-validate --output json
# auth profiles loads profiles concurrently so hostmetadata warnings appear in
# non-deterministic order; sort them for a stable golden file.
$CLI auth profiles --skip-validate --output json 2>&1 | python3 -c "
import sys
lines = sys.stdin.readlines()
meta = sorted(l for l in lines if '[hostmetadata]' in l)
rest = [l for l in lines if '[hostmetadata]' not in l]
sys.stdout.writelines(meta + rest)
"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a redundant change. I thought --skip-validate doesn't make network calls? Also there's no hostmetadata warning in the output.txt

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point #5549

@denik denik added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit 9bf4857 Jun 11, 2026
26 checks passed
@denik denik deleted the denik/internet-sandbo branch June 11, 2026 11:03
}
// Only block HTTPS: the local test server is plain HTTP (http://127.0.0.1:PORT)
// so HTTP_PROXY would intercept its traffic. All real external calls use HTTPS.
cmd.Env = append(cmd.Env, "HTTPS_PROXY="+proxyURL)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set HTTP_PROXY as well.NO_PROXY below will avoid intercepting testserver calls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, #5596

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants