Skip to content

Commit 813b754

Browse files
authored
direct: retry 504 errors (#5349)
## Changes Retry resource methods that return an error that has http_code equal to 504 but that has not been retried by SDK. This affects all DoRead, DoUpdate(WithID), DoDelete. This affects DoCreate for grants and permissions. Note, for DoCreate the retry functionality is opt-in - implementations need to wrap error with retrySafe(). For other methods the retry is always enabled. ## Why We've seen reports where deploy fails with > Error: cannot create resources.pipelines.<pipeline>.permissions: The service at /api/2.0/permissions/pipelines/<pipeline_id> is taking too long to process your request. (504 TEMPORARILY_UNAVAILABLE) We also saw that terraform does custom retries for 504/GET databricks/terraform-provider-databricks#4355 Note, the two cases are different - the first one is "cannot create" so it refers to PUT. ## Tests New testserver feature that allows injecting expiring faults in a given endpoint. See fault.py. New acceptance tests make use of fault.py to check failures in plan/create/update for permissions.
1 parent d62dcbb commit 813b754

28 files changed

Lines changed: 558 additions & 16 deletions

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,6 @@
88
* `experimental open` now opens every DABs resource type that has a workspace URL, picking up `catalogs`, `schemas`, `volumes`, `database_instances`, `database_catalogs`, `synced_database_tables`, `postgres_catalogs`, `postgres_synced_tables`, `quality_monitors`, `vector_search_endpoints`, and `vector_search_indexes` ([#5346](https://github.com/databricks/cli/pull/5346)).
99

1010
### Bundles
11+
* Retry transient HTTP 504 Gateway Timeout errors in direct deployment engine ([#5349](https://github.com/databricks/cli/pull/5349)).
1112

1213
### Dependency updates

acceptance/bin/fault.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
#!/usr/bin/env python3
2+
"""Set up a fault rule on the testserver for the current test token.
3+
4+
Usage: fault.py PATTERN STATUS_CODE OFFSET TIMES
5+
6+
PATTERN HTTP method and path, supports trailing * wildcard,
7+
e.g. "PUT /api/2.0/permissions/pipelines/*"
8+
STATUS_CODE HTTP status code to return, e.g. 504
9+
OFFSET number of requests to let through before fault starts
10+
TIMES number of times to return the fault response
11+
12+
The rule is scoped to the current DATABRICKS_TOKEN so it only affects
13+
the test that registers it, even when tests share a server.
14+
"""
15+
16+
import json
17+
import os
18+
import sys
19+
import urllib.request
20+
21+
host = os.environ.get("DATABRICKS_HOST", "")
22+
token = os.environ.get("DATABRICKS_TOKEN", "")
23+
24+
if not host:
25+
print("DATABRICKS_HOST not set", file=sys.stderr)
26+
sys.exit(1)
27+
28+
if len(sys.argv) != 5:
29+
print(f"usage: {sys.argv[0]} PATTERN STATUS_CODE OFFSET TIMES", file=sys.stderr)
30+
sys.exit(1)
31+
32+
pattern, status_code, offset, times = sys.argv[1], int(sys.argv[2]), int(sys.argv[3]), int(sys.argv[4])
33+
body = '{"error_code":"INJECTED","message":"Fault injected by test."}'
34+
35+
data = json.dumps(
36+
{
37+
"pattern": pattern,
38+
"status_code": status_code,
39+
"body": body,
40+
"offset": offset,
41+
"times": times,
42+
}
43+
).encode()
44+
45+
req = urllib.request.Request(
46+
f"{host}/__testserver/fault",
47+
data=data,
48+
headers={"Content-Type": "application/json", "Authorization": f"Bearer {token}"},
49+
method="POST",
50+
)
51+
urllib.request.urlopen(req)
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
bundle:
2+
name: test-bundle
3+
4+
resources:
5+
pipelines:
6+
foo:
7+
name: foo
8+
permissions:
9+
- level: CAN_VIEW
10+
user_name: viewer@example.com

acceptance/bundle/resources/permissions/pipelines/504/create/out.test.toml

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle/default/files...
2+
Deploying resources...
3+
Warn: deploying resources.pipelines.foo.permissions: retrying after 504 Gateway Timeout from PUT /api/2.0/permissions/pipelines/[UUID]
4+
Updating deployment state...
5+
Deployment complete!
6+
7+
>>> print_requests.py //api/2.0/permissions/pipelines
8+
"PUT /api/2.0/permissions/pipelines/[UUID]"
9+
"PUT /api/2.0/permissions/pipelines/[UUID]"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Inject a single 504 on the first permissions PUT to simulate a transient error.
2+
# Permissions Set is idempotent, so DoCreate opts in via retrySafe and the deploy succeeds.
3+
fault.py "PUT /api/2.0/permissions/pipelines/*" 504 0 1
4+
5+
$CLI bundle deploy
6+
7+
# Two PUT requests should appear: the initial 504 and the successful retry.
8+
trace print_requests.py //api/2.0/permissions/pipelines | jq '.method + " " + .path'
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
RecordRequests = true
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
bundle:
2+
name: test-bundle
3+
4+
resources:
5+
pipelines:
6+
foo:
7+
name: foo
8+
permissions:
9+
- level: CAN_VIEW
10+
user_name: viewer@example.com

acceptance/bundle/resources/permissions/pipelines/504/plan/out.test.toml

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/test-bundle/default/files...
2+
Deploying resources...
3+
Updating deployment state...
4+
Deployment complete!
5+
6+
>>> [CLI] bundle plan
7+
Warn: planning resources.pipelines.foo.permissions: retrying after 504 Gateway Timeout from GET /api/2.0/permissions/pipelines/[UUID]
8+
Plan: 0 to add, 0 to change, 0 to delete, 2 unchanged
9+
10+
>>> print_requests.py //api/2.0/permissions/pipelines --get --oneline
11+
2 {"method": "GET", "path": "/api/2.0/permissions/pipelines/[UUID]"}
12+
13+
>>> [CLI] bundle plan
14+
Warn: planning resources.pipelines.foo.permissions: retrying after 504 Gateway Timeout from GET /api/2.0/permissions/pipelines/[UUID]
15+
Warn: planning resources.pipelines.foo.permissions: retrying after 504 Gateway Timeout from GET /api/2.0/permissions/pipelines/[UUID]
16+
Plan: 0 to add, 0 to change, 0 to delete, 2 unchanged
17+
3 {"method": "GET", "path": "/api/2.0/permissions/pipelines/[UUID]"}
18+
19+
>>> musterr [CLI] bundle plan
20+
Warn: planning resources.pipelines.foo.permissions: retrying after 504 Gateway Timeout from GET /api/2.0/permissions/pipelines/[UUID]
21+
Warn: planning resources.pipelines.foo.permissions: retrying after 504 Gateway Timeout from GET /api/2.0/permissions/pipelines/[UUID]
22+
Error: cannot plan resources.pipelines.foo.permissions: reading id="/pipelines/[UUID]": Fault injected by test. (504 INJECTED)
23+
24+
Endpoint: GET [DATABRICKS_URL]/api/2.0/permissions/pipelines/[UUID]?
25+
HTTP Status: 504 Gateway Timeout
26+
API error_code: INJECTED
27+
API message: Fault injected by test.
28+
29+
Error: planning failed
30+
31+
3 {"method": "GET", "path": "/api/2.0/permissions/pipelines/[UUID]"}

0 commit comments

Comments
 (0)