Skip to content

acc: simulate dashboard eventual consistency and harden dashboard tests#5724

Open
denik wants to merge 13 commits into
mainfrom
denik/eventual-consistency-part2
Open

acc: simulate dashboard eventual consistency and harden dashboard tests#5724
denik wants to merge 13 commits into
mainfrom
denik/eventual-consistency-part2

Conversation

@denik

@denik denik commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #5694.

We've seen the dashboard API be eventually consistent: a GET right after a create
can 404 before the write propagates. The dashboard acceptance tests run on cloud
(Cloud = true) and read a dashboard immediately after deploy, so they are exposed
to this.

  • Add a testserver EC mode for dashboards: under the eventual-consistency token
    (opt-in via TESTS_STALE_ONCE=1, direct engine only) the first GET of a dashboard
    after it is created returns 404. This reproduces the cloud window deterministically.
  • Enable it for the dashboards directory and wrap the first lakeview get after each
    create with retry.py, so reads right after deploy are retried rather than assumed
    to succeed.

Test-only: no engine changes. Making the direct engine itself resilient to the same
404 (plan/refresh reads) is a separate follow-up.

This pull request and its description were written by Isaac.

@denik denik temporarily deployed to test-trigger-is June 25, 2026 14:58 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 14:58 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/bundle/ - needs approval

29 files changed
Suggested: @pietern
Also eligible: @janniklasrose, @andrewnester, @shreyas-goenka, @anton-107, @lennartkats-db

General files (require maintainer)

7 files changed
Based on git history:

  • @pietern -- recent work in libs/testserver/, acceptance/, acceptance/internal/

Any maintainer (@andrewnester, @anton-107, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:12 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:12 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:18 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:18 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:28 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:28 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:35 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 15:35 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 9745574

Run: 28605117939

Env 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 10 13 230 1041 4:12
💚​ aws windows 10 13 232 1039 4:04
💚​ aws-ucws linux 10 13 314 959 5:06
💚​ aws-ucws windows 10 13 316 957 4:07
💚​ azure linux 4 15 230 1040 4:47
💚​ azure windows 4 15 232 1038 3:36
💚​ azure-ucws linux 4 15 316 956 5:41
💚​ azure-ucws windows 4 15 318 954 4:11
💚​ gcp linux 4 15 229 1042 3:54
💚​ gcp windows 4 15 231 1040 3:51
23 interesting tests: 13 SKIP, 10 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
💚​ TestAccept 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
💚​ TestFetchRepositoryInfoAPI_FromRepo/root 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
💚​ TestFetchRepositoryInfoAPI_FromRepo/subdir 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
Top 5 slowest tests (at least 2 minutes):
duration env testname
3:03 gcp windows TestAccept
2:59 azure-ucws windows TestAccept
2:55 aws windows TestAccept
2:55 aws-ucws windows TestAccept
2:43 azure windows TestAccept

@denik denik temporarily deployed to test-trigger-is June 25, 2026 18:55 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 18:55 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 19:17 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 25, 2026 19:17 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 26, 2026 14:04 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 26, 2026 14:04 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 29, 2026 09:25 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 29, 2026 09:25 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is July 1, 2026 09:29 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is July 1, 2026 09:29 — with GitHub Actions Inactive
denik added 4 commits July 2, 2026 16:26
Add a testserver eventual-consistency mode for dashboards: under the EC token
(opt-in via TESTS_STALE_ONCE=1, direct engine only) the first GET of a dashboard
after it is created returns 404, reproducing deterministically the cloud
propagation window these tests run against (Cloud = true).

Enable it for the dashboards directory and wrap the first lakeview get after each
create with retry.py, so reads right after deploy are retried rather than assumed
to succeed. This is a follow-up to #5694 and de-flakes the same class of cloud
reads. Test-only: no engine changes.

Co-authored-by: Isaac
Several dashboard tests read the same dashboard twice right after deploy (once
to capture the etag for a replacement, once to display it). Capture the response
once and derive both from it, removing the redundant API call.

Also clarify the DashboardTrash comment: like DashboardUpdate, it applies
immediately; only DashboardCreate stages the eventual-consistency 404.

Co-authored-by: Isaac
denik added 8 commits July 2, 2026 16:27
It's not about retrying: the terraform provider reads the dashboard back after
create and fails on the 404.

Co-authored-by: Isaac
retry.py is a shebang script; under MSYS_NO_PATHCONV=1 git-bash passes its own
path to the native python unconverted, so python can't find it. The dir-wide
MSYS_NO_PATHCONV in dashboards/test.toml was redundant (only detect-change needs
it, and it sets its own), so drop it; detect-change keeps it and runs retry.py
via `env -u MSYS_NO_PATHCONV`.

Co-authored-by: Isaac
Move the eventual-consistency read logic off the call sites and into an
EventualMap[K,V] that owns the map and the EC flag. Handlers now do
s.Dashboards.Read / ReadStrong / Write / Put instead of fetching the
EventualValue and branching on s.EventualConsistency.

Co-authored-by: Isaac
On Windows git-bash passes retry.py's own path to the native Python unconverted
under MSYS_NO_PATHCONV, so Python can't find the script (same problem the
envsubst() wrapper already guards against). Add a retry() wrapper mirroring
envsubst() and route all dashboard retry.py calls through it. Restore the
dir-wide MSYS_NO_PATHCONV that an earlier commit removed.

Co-authored-by: Isaac
Document that immediate-visibility on update is a deliberate simplification:
staging a stale value would produce a successful-but-stale 200 that breaks
plan-time reads the engine doesn't yet tolerate. Only create stages a 404.

Co-authored-by: Isaac
Use Write (not Put) on update so the first read after an update returns the
stale pre-update value, like a real backend. Tests that read right after an
update now wait for the new value with a content-aware retry (retry --until /
--until-not). detect-change opts out: its out-of-band-modification check reads
inside bundle plan, an engine-level read no test retry can fix.

Co-authored-by: Isaac
@denik denik force-pushed the denik/eventual-consistency-part2 branch from f55a64e to ed61463 Compare July 2, 2026 15:15
@denik denik temporarily deployed to test-trigger-is July 2, 2026 15:15 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is July 2, 2026 15:15 — with GitHub Actions Inactive
Capturing $DASHBOARD in the function but reading it at the top level was
confusing. Register the DASHBOARD_ID and ETAG replacements where the value is
read, inside deploy_dashboard.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is July 2, 2026 16:19 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is July 2, 2026 16:19 — with GitHub Actions Inactive
@denik denik enabled auto-merge July 2, 2026 16:24
if isTruePtr(config.IsServicePrincipal) {
token = testserver.ServicePrincipalTokenPrefix + tokenSuffix
testUser = testserver.TestUserSP
} else if staleOnceEnabled(testEnv) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an environment variable for this in the test server and special token seems unnecessarily complex. Why not add an option to the acc test framework and just propogate that when creating the test server?

This test server is shared between multiple tests. Are we sure that the current test is isolated to the scope of the test that set the env var?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special token is needed so that shared testserver can switch the behaviour for a given client.

Regarding env var vs config field - both would work but env var can be composed with EnvMatrix and I think it would be useful to run some tests under both modes to assert that tests are resilient towards eventual consistency but not overfitting to it.

@denik denik requested a review from shreyas-goenka July 3, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants