Skip to content

Include more details from SDK/API errors in the diagnostics#3379

Merged
denik merged 6 commits into
mainfrom
denik/better-errors
Aug 18, 2025
Merged

Include more details from SDK/API errors in the diagnostics#3379
denik merged 6 commits into
mainfrom
denik/better-errors

Conversation

@denik

@denik denik commented Aug 7, 2025

Copy link
Copy Markdown
Contributor

Changes

  • Any API error passed to diag.FromErr or logdiag.LogError will automatically be formatted to include HTTP status/message, HTTP method + full url, API code and message.
  • Custom error from direct deployment is removed, it's no longer needed.

Why

SDK errors by default only contain "message" from payload which sometimes looks like this "Request 123 failed".
DABs/CLI do not add a lot of context to errors, so that message is all you see.
This makes debugging very hard. I've been fixing this in direct deployment by

  • adding context to errors everywhere
  • tagging errors with endpoint

This PR makes tagging unnecessary and adds better error handling to all of CLI - as long as you use logdiag.LogError, it'll add those critical details, see create-error test for example of the output.

Requires databricks/databricks-sdk-go#1261

Tests

Existing tests.

alerts The alerts API can be used to perform CRUD operations on alerts.
alerts-legacy The alerts API can be used to perform CRUD operations on alerts.
alerts-v2 New version of SQL Alerts.
dashboards In general, there is little need to modify dashboards using the API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commands removed?

@denik denik Aug 7, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not regenerate SDK-based code, just removed things that did not compile. These will be cleaned up once we upgrade SDK.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can rebase to the latest main on CLI, which has SDK 0.79.0 already, so everything should compile there

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks! done - these are gone now.

@denik denik requested a review from andrewnester August 7, 2025 14:07
Comment thread libs/diag/sdk_error.go
return e.Error() + extra
}

func FormatAPIErrorDetails(e error) string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a unit test for this to check the output when there is or no ResponseWrapper/Request/etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, added here 5123233

@denik denik force-pushed the denik/better-errors branch from b52cf0a to c2fb868 Compare August 7, 2025 14:24
@denik denik temporarily deployed to test-trigger-is August 7, 2025 14:24 — with GitHub Actions Inactive
Comment thread libs/diag/sdk_error.go
if len(httpStatus) == 0 {
httpStatus = strconv.Itoa(apiErr.StatusCode)
}
return fmt.Sprintf("Endpoint: %s\nHTTP Status: %s\nAPI error_code: %s\nAPI message: %s", endpoint, httpStatus, apiErr.ErrorCode, apiErr.Message)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the API message always present or can it be empty?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_code -> error code?

They are not guaranteed to be set. I recall that the SDK does some work to populate them with HTTP error codes if there is no actual response, but double check...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SDK tries very hard to populate those (e.g. it parses text and html) and falls back to setting them to UNKNOWN / full body.

So, if backend returns html or text, "API error_code" and "API message" labels are not strictly correct or rather they are represent SDK's view of error_code and message.

However, 99% of cases would get JSON error so "API error_code/message" does refer to error_code/message in the response. This is something that is mentioned in the docs, unlike "error code" which is vague.

BTW, this message is just a reasonably simple starting point. I think this message can be further improved in follow up PRs:

  • We can distinguish JSON vs html/text errors (could be useful for debugging).
  • We can mention useful headers (trace id?).
  • We can include full request/response if user asked for more verbosity.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Aug 7, 2025

Copy link
Copy Markdown
Collaborator

Run: 17041289921

Env ✅‌pass 🔄‌flaky 🙈‌skip
🔄‌ aws linux 306 4 479
🔄‌ aws windows 305 6 478
✅‌ aws-ucws linux 423 378
✅‌ aws-ucws windows 424 377
✅‌ azure linux 310 478
✅‌ azure windows 311 477
✅‌ azure-ucws linux 425 375
✅‌ azure-ucws windows 426 374
✅‌ gcp linux 309 480
✅‌ gcp windows 310 479
9 failing tests:
Test Name aws linux aws windows
TestAccept/bundle/deploy/pipeline/allow-duplicate-names/DATABRICKS_CLI_DEPLOYMENT=direct-exp 🔄‌flaky ✅‌pass
TestAccept/bundle/deploy/pipeline/allow-duplicate-names/DATABRICKS_CLI_DEPLOYMENT=terraform 🔄‌flaky ✅‌pass
TestAccept/bundle/deploy/pipeline/auto-approve ✅‌pass 🔄‌flaky
TestAccept/bundle/deploy/pipeline/auto-approve/DATABRICKS_CLI_DEPLOYMENT=terraform ✅‌pass 🔄‌flaky
TestAccept/bundle/destroy/jobs-and-pipeline ✅‌pass 🔄‌flaky
TestAccept/bundle/destroy/jobs-and-pipeline/DATABRICKS_CLI_DEPLOYMENT=terraform ✅‌pass 🔄‌flaky
TestAccept/bundle/templates/default-python/combinations/classic 🔄‌flaky 🔄‌flaky
TestAccept/bundle/templates/default-python/combinations/classic/DATABRICKS_CLI_DEPLOYMENT=terraform/DLT=yes/NBOOK=no/PY=yes 🔄‌flaky ✅‌pass
TestAccept/bundle/templates/default-python/combinations/classic/DATABRICKS_CLI_DEPLOYMENT=terraform/DLT=yes/NBOOK=yes/PY=no ✅‌pass 🔄‌flaky

@pietern pietern left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check failure highlights that the API error can no longer be marshaled.

Exporting the whole response wrapper in the error feels very broad. To make this more focused and fix the check, you could consider adding 2 accessor functions on the API error to retrieve the corresponding request and response and keep the field to fetch them from unexported. That keeps the marshaled copy of the error identical and fix the check, but enables callers to get more information about errors.

Comment thread libs/diag/sdk_error.go Outdated
}
e = inner
}
return nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to errors.As. Can you use that instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! good idea a514525

Comment thread libs/diag/sdk_error.go Outdated
func FormatAPIErrorSummary(e error) string {
extra := ""
apiErr := findApiErr(e)
if apiErr != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend inverting this: return early if it isn't an API error. Function below does the same.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done.

Comment thread libs/diag/sdk_error.go
if len(httpStatus) == 0 {
httpStatus = strconv.Itoa(apiErr.StatusCode)
}
return fmt.Sprintf("Endpoint: %s\nHTTP Status: %s\nAPI error_code: %s\nAPI message: %s", endpoint, httpStatus, apiErr.ErrorCode, apiErr.Message)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_code -> error code?

They are not guaranteed to be set. I recall that the SDK does some work to populate them with HTTP error codes if there is no actual response, but double check...

@denik denik force-pushed the denik/better-errors branch from c2fb868 to fe79d41 Compare August 8, 2025 09:24
@denik denik temporarily deployed to test-trigger-is August 8, 2025 09:24 — with GitHub Actions Inactive
@denik denik force-pushed the denik/better-errors branch from fe79d41 to 3e19d66 Compare August 8, 2025 09:25
@denik denik temporarily deployed to test-trigger-is August 8, 2025 09:25 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is August 8, 2025 09:34 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is August 8, 2025 09:39 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is August 8, 2025 10:12 — with GitHub Actions Inactive
@denik denik requested a review from pietern August 8, 2025 10:14
@denik denik force-pushed the denik/better-errors branch from 5123233 to cf5d91e Compare August 14, 2025 09:38
@denik denik temporarily deployed to test-trigger-is August 14, 2025 09:38 — with GitHub Actions Inactive
@denik denik changed the title [Preview, needs SDK release] Include mode details from SDK/API errors in the diagnostics Include mode details from SDK/API errors in the diagnostics Aug 14, 2025
@denik denik changed the title Include mode details from SDK/API errors in the diagnostics Include more details from SDK/API errors in the diagnostics Aug 14, 2025
@denik denik changed the title Include more details from SDK/API errors in the diagnostics [Needs SDK 0.80.0+] Include more details from SDK/API errors in the diagnostics Aug 14, 2025
@denik denik force-pushed the denik/better-errors branch from cf5d91e to a798ebb Compare August 18, 2025 09:16
@denik denik temporarily deployed to test-trigger-is August 18, 2025 09:16 — with GitHub Actions Inactive
@denik denik force-pushed the denik/better-errors branch from a798ebb to 7decdae Compare August 18, 2025 09:22
@denik denik temporarily deployed to test-trigger-is August 18, 2025 09:23 — with GitHub Actions Inactive
@denik denik force-pushed the denik/better-errors branch from 7decdae to 7b9c3f6 Compare August 18, 2025 12:53
@denik denik temporarily deployed to test-trigger-is August 18, 2025 12:53 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is August 18, 2025 12:57 — with GitHub Actions Inactive
@denik denik enabled auto-merge August 18, 2025 13:06
@denik denik added this pull request to the merge queue Aug 18, 2025
Merged via the queue into main with commit 4a0a8ab Aug 18, 2025
13 checks passed
@denik denik deleted the denik/better-errors branch August 18, 2025 13:39
@pietern pietern changed the title [Needs SDK 0.80.0+] Include more details from SDK/API errors in the diagnostics Include more details from SDK/API errors in the diagnostics Aug 20, 2025
deco-sdk-tagging Bot added a commit that referenced this pull request Aug 21, 2025
## Release v0.265.0

### CLI
* Fix "cache: token not found" for auth token command ([#3447](#3447))

### Dependency updates
* Upgrade TF provider to 1.87.0 ([#3430](#3430))
* Upgrade Go SDK to 0.81.0 ([#3449](#3449))

### Bundles
* Add support for Lakebase database instances in DABs ([#3283](#3283))
* Add support for Lakebase database catalogs in DABs ([#3436](#3436))
* Improve error message for SDK/API errors ([#3379](#3379))
* Separate generated classes between jobs and pipelines in Python support ([#3428](#3428))
denik added a commit that referenced this pull request May 20, 2026
…iagnostics (#3379)

## Changes
- Any API error passed to diag.FromErr or logdiag.LogError will
automatically be formatted to include HTTP status/message, HTTP method +
full url, API code and message.
- Custom error from direct deployment is removed, it's no longer needed.

## Why
SDK errors by default only contain "message" from payload which
sometimes looks like this "Request 123 failed".
DABs/CLI do not add a lot of context to errors, so that message is all
you see.
This makes debugging very hard. I've been fixing this in direct
deployment by
- adding context to errors everywhere
- tagging errors with endpoint

This PR makes tagging unnecessary and adds better error handling to all
of CLI - as long as you use logdiag.LogError, it'll add those critical
details, see create-error test for example of the output.

Requires databricks/databricks-sdk-go#1261

## Tests
Existing tests.
denik pushed a commit that referenced this pull request May 20, 2026
## Release v0.265.0

### CLI
* Fix "cache: token not found" for auth token command ([#3447](#3447))

### Dependency updates
* Upgrade TF provider to 1.87.0 ([#3430](#3430))
* Upgrade Go SDK to 0.81.0 ([#3449](#3449))

### Bundles
* Add support for Lakebase database instances in DABs ([#3283](#3283))
* Add support for Lakebase database catalogs in DABs ([#3436](#3436))
* Improve error message for SDK/API errors ([#3379](#3379))
* Separate generated classes between jobs and pipelines in Python support ([#3428](#3428))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants