fix(databricks_zerobus sink): defer Unity Catalog schema fetch out of build()#25408
fix(databricks_zerobus sink): defer Unity Catalog schema fetch out of build()#25408flaviofcruz wants to merge 8 commits into
Conversation
30bffb2 to
83ebd94
Compare
petere-datadog
left a comment
There was a problem hiding this comment.
Tested, works as expected, code looks good to me
3e6c015 to
bdd9e09
Compare
|
please add |
It is added here: https://github.com/flaviofcruz/vector/blob/bdd9e0910af6bd3fd2c2459222f402d823b99858/.github/workflows/semantic.yml But I can merge a separate PR if that makes it easier. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2cea4a861
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let schema = service.ensure_schema().await?; | ||
| let records = Self::encode_records(schema, &request.events)?; | ||
| let stream = service.get_or_create_stream(schema).await?; |
There was a problem hiding this comment.
Preserve rejection status for non-retriable sink failures
call now propagates schema/encoding/stream-init failures as Err via ?, but the generic driver treats every Err as EventStatus::Errored (transient) regardless of retryability. That means permanent failures introduced here (for example Unity Catalog 404/401 mapped to SchemaError { retryable: false }, or non-retriable EncodingError) no longer become rejected batches and can be NACKed/retried indefinitely by ack-enabled sources instead of being dropped as permanent failures.
Useful? React with 👍 / 👎.
| /// retryable statuses for HTTP-based sinks lives at | ||
| /// [`crate::sinks::util::http::RetryStrategy::Default`]. | ||
| fn status_is_retryable(status: StatusCode) -> bool { | ||
| status.is_server_error() |
There was a problem hiding this comment.
Exclude HTTP 501 from retryable schema failures
status_is_retryable currently marks all 5xx responses as retryable, but Vector’s default HTTP retry semantics explicitly exclude 501 Not Implemented. This means a permanent 501 returned by Unity Catalog or an intermediate gateway will now be retried as transient, delaying failure and repeatedly replaying the same request instead of rejecting it immediately.
Useful? React with 👍 / 👎.
Summary
databricks_zerobus'sSinkConfig::build()synchronously calls Unity Catalog to fetch the table's protobuf descriptor. If the table doesn't exist or credentials are wrong, the call fails inside build() and Vector exits before the sink even starts — even whenhealthcheck.enabled= false.This aligns the sink with the convention used by AWS S3, Kafka, etc.:
build()does only local setup, the healthcheck does the remote probe, and runtime failures surface per-batch via the existing retry/event-status path.Changes:
Vector configuration
Before: Vector exits at startup with Unity Catalog API returned error 404: ....
After: Vector starts; batches are rejected at ingest time with the same error logged per batch.
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.