Skip to content

feat(sinks): add new databricks_zerobus for Databricks ingestion#24840

Open
flaviofcruz wants to merge 16 commits intovectordotdev:masterfrom
flaviofcruz:upstream-databricks-zerobus
Open

feat(sinks): add new databricks_zerobus for Databricks ingestion#24840
flaviofcruz wants to merge 16 commits intovectordotdev:masterfrom
flaviofcruz:upstream-databricks-zerobus

Conversation

@flaviofcruz
Copy link
Copy Markdown
Contributor

@flaviofcruz flaviofcruz commented Mar 3, 2026

Summary

Databricks provides a Zerobus ingest connector [1], a push based API that writes data directly into Unity Catalog Delta tables. This PR introduces a new vector sink that integrates with Databricks, allowing Vector to push data into Databricks. We use the Databricks provided SDK to implement the sink [2].

Zerobus supports row level ingestion and that's we do here. Zerobus also has arrow batch in experimental mode but we didn't add support for it. We will swap the row level ingestion once it becomes stable and that will be the future default.

With row based ingestion, we extended the BatchSerializerConfig to support a batch serializer that creates vector's of protocol buffer bytes. This makes it the second option for doing batch serialization, along arrow batch.

Users do not have to specify the schema at all, we will fetch the schema for them from Unity Catalog and then use on the API. If users want to do schema changes, they should update their table as needed. We don't have a lot of support for dynamic schema changes at the moment.

Vector configuration

[sinks.databricks_zerobus]
type = "databricks_zerobus"
inputs = ["logs"]
ingestion_endpoint = "https://91041497925470.zerobus.us-west-2.cloud.databricks.com"
table_name = "main.default.zerobus_table"
unity_catalog_endpoint = "https://logfood-us-west-2-mt.cloud.databricks.com/"
[sinks.databricks_zerobus.auth]
strategy = "oauth"
client_id = "<client id>"
client_secret = "<secret>"

How did you test this PR?

Unit tests, running small toy examples and using it in production for actual traffic.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

[1] https://docs.databricks.com/aws/en/ingestion/zerobus-overview
[2] https://github.com/databricks/zerobus-sdk

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@github-actions github-actions Bot added domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation labels Mar 3, 2026
@flaviofcruz flaviofcruz changed the title feat(databricks zerobus): add new databricks_zerobus for ingesting da… feat(databricks zerobus): add new databricks_zerobus for Databricks ingestion Mar 3, 2026
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch 3 times, most recently from 2368e4a to 42bf043 Compare March 12, 2026 17:04
@flaviofcruz flaviofcruz marked this pull request as ready for review March 12, 2026 17:05
@flaviofcruz flaviofcruz requested review from a team as code owners March 12, 2026 17:05
@flaviofcruz flaviofcruz changed the title feat(databricks zerobus): add new databricks_zerobus for Databricks ingestion feat(sinks): add new databricks_zerobus for Databricks ingestion Mar 12, 2026
@github-actions github-actions Bot added the domain: ci Anything related to Vector's CI environment label Mar 12, 2026
@drichards-87 drichards-87 self-assigned this Mar 12, 2026
@drichards-87 drichards-87 removed their assignment Mar 12, 2026
@pront
Copy link
Copy Markdown
Member

pront commented Apr 3, 2026

Thanks @flaviofcruz for this new integration! Apologies for the slow review on this one.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 436d0da4bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
@pront pront added the meta: awaiting author Pull requests that are awaiting their author. label Apr 3, 2026
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 436d0da to 9dcb5d1 Compare April 10, 2026 17:10
@github-actions github-actions Bot removed the meta: awaiting author Pull requests that are awaiting their author. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9dcb5d1e71

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
@pront
Copy link
Copy Markdown
Member

pront commented Apr 10, 2026

FYI I am waiting for @hsuanyi to sign the CLA (see comment) before reviewing this further. Also, there is an resolved review comment.

@pront
Copy link
Copy Markdown
Member

pront commented Apr 13, 2026

FYI I am waiting for @hsuanyi to sign the CLA (see comment) before reviewing this further. Also, there is an resolved review comment.

@flaviofcruz in case you missed the above, we will require all profiles who contributed to this PR to sign the CLA. Happy to review once that is done.

@hsuanyi
Copy link
Copy Markdown

hsuanyi commented Apr 13, 2026

I have read the CLA Document and I hereby sign the CLA

@pront
Copy link
Copy Markdown
Member

pront commented Apr 13, 2026

recheck

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 050409defd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/config.rs Outdated
Comment thread lib/codecs/src/encoding/format/proto_batch.rs
@flaviofcruz
Copy link
Copy Markdown
Contributor Author

@pront really appreciate your work for the review. However, I was looking at the zerobus SDK license and the license could be problematic: https://github.com/databricks/zerobus-sdk/blob/main/LICENSE Do you know if this could be a blocker? Let me know if that is.

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from 050409d to 36a74ef Compare April 13, 2026 22:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36a74ef530

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d7b620bfb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/service.rs Outdated
Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from f643aaf to 31bcdcb Compare May 4, 2026 14:00
@flaviofcruz
Copy link
Copy Markdown
Contributor Author

LGTM, thank you!

There are a few minor issues remaining (please followup and resolve any open threads), but approving to unblock this effort.

Reminder to rebase on #25340 once that is merged.

Thanks, I have rebased and force pushed. I will follow up with the VRL fixes which should address codex comment above. Will also follow up with arrow batch support once I get the greenlight from the zerobus team.

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch 2 times, most recently from c1e2cc3 to 31bcdcb Compare May 4, 2026 16:13
@pront
Copy link
Copy Markdown
Member

pront commented May 4, 2026

Hi @flaviofcruz, there are a few open threads and it's unclear if they are resolved or not. Ping me when ready for a final review.

@flaviofcruz
Copy link
Copy Markdown
Contributor Author

Hi @flaviofcruz, there are a few open threads and it's unclear if they are resolved or not. Ping me when ready for a final review.

Threads should be closed now I think.

@pront pront enabled auto-merge May 4, 2026 18:15
@pront pront added this pull request to the merge queue May 4, 2026
@pront pront removed this pull request from the merge queue due to a manual request May 4, 2026
Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few review comments from the local pass.

Comment thread src/sinks/databricks_zerobus/unity_catalog_schema.rs Outdated
Comment thread src/sinks/databricks_zerobus/service.rs Outdated
Comment thread Cargo.lock Outdated
}

/// Encode a batch of events into a `BatchOutput`.
pub fn encode_batch(&self, events: &[Event]) -> Result<BatchOutput, Error> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting down here that the 10MB limit enforced in config.rs might not hold in all cases. For example, a numeric-heavy schema right at the 10MB boundary could realistically encode over 10MB and fail at the SDK call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just a comment on the sink documentation? Or you have some other proposal?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short mention in the docs is a good idea. But

@flaviofcruz flaviofcruz force-pushed the upstream-databricks-zerobus branch from dd346b6 to d564d98 Compare May 4, 2026 19:50
@github-actions github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label May 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d564d98f70

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sinks/databricks_zerobus/service.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 141f680972

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +149 to +151
When disabled (the default), events are marked as delivered as soon as the
ingestion call completes without error, without waiting for an explicit
server acknowledgement.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct ack-disabled behavior description

This paragraph says that with acknowledgements disabled the sink marks events delivered without waiting for server acknowledgement, but the implementation always waits on wait_for_offset after every ingest and even errors on missing offsets (MissingAckOffset) regardless of sink ack config (src/sinks/databricks_zerobus/service.rs:390-393). Users who disable acknowledgements to avoid server-ack latency/failures will get different runtime behavior than documented, so this guidance should be updated to match the actual sink semantics.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: ci Anything related to Vector's CI environment domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants