Skip to content

Add Vespa provider#63988

Merged
potiuk merged 1 commit intoapache:mainfrom
radu-gheorghe:vespa-provider-pr
Apr 8, 2026
Merged

Add Vespa provider#63988
potiuk merged 1 commit intoapache:mainfrom
radu-gheorghe:vespa-provider-pr

Conversation

@radu-gheorghe
Copy link
Copy Markdown
Contributor

@radu-gheorghe radu-gheorghe commented Mar 20, 2026

Add Vespa.ai provider for Apache Airflow. Includes:

  • A Hook that allows running queries and other requests against Vespa
  • A (deferrable) Operator + trigger to facilitate loading data into Vespa

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: a mix of tools, mostly Cursor, Codex, Claude, Grok. I wouldn't say "generated by", because I made sure I was on top of things. But assisted for sure.


I will follow up with a PROPOSAL message on the dev mailing list.

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Mar 20, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 22, 2026

@radu-gheorghe This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria.

Issues found:

  • Pre-commit / static checks: Failing: CI image checks / Static checks. Run prek run --from-ref main locally to find and fix issues. See Pre-commit / static checks docs.
  • mypy (type checking): Failing: CI image checks / MyPy checks (mypy-providers). Run prek --stage manual mypy-providers --all-files locally to reproduce. You need breeze ci-image build --python 3.10 for Docker-based mypy. See mypy (type checking) docs.
  • Provider tests: Failing: provider distributions tests / Compat 2.11.1:P3.10:, provider distributions tests / Compat 3.0.6:P3.10:, provider distributions tests / Compat 3.1.8:P3.10:. Run provider tests with breeze run pytest <provider-test-path> -xvs. See Provider tests docs.

What to do next:

  • The comment informs you what you need to do.
  • Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
  • There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates.
  • Maintainers will then proceed with a normal review.

There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@radu-gheorghe radu-gheorghe marked this pull request as draft March 23, 2026 07:24
@radu-gheorghe radu-gheorghe marked this pull request as ready for review March 23, 2026 11:34
@radu-gheorghe
Copy link
Copy Markdown
Contributor Author

Sorry for the missing pieces. I'm setting as "Ready for review" for now to see if the CI agrees that it's indeed ready :)

@potiuk potiuk force-pushed the vespa-provider-pr branch from e91b923 to 330a45d Compare April 1, 2026 15:57
@potiuk potiuk force-pushed the vespa-provider-pr branch 2 times, most recently from 8b3b20d to b263e8b Compare April 1, 2026 19:59
@kaxil kaxil requested a review from Copilot April 2, 2026 00:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new vespa provider package to Apache Airflow (hook + deferrable ingest operator/trigger) and updates build/install tooling to better handle transient uv build failures.

Changes:

  • Introduces apache-airflow-providers-vespa with VespaHook, VespaIngestOperator, and VespaFeedTrigger, plus docs and tests.
  • Wires the new provider into the monorepo workspace/extras/CI compose mounts.
  • Adds retry wrappers for uv installs/syncs to mitigate transient cargo/maturin build failures.

Reviewed changes

Copilot reviewed 58 out of 69 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/in_container/install_airflow_and_providers.py Adds a retry wrapper around uv pip install during in-container installs.
scripts/docker/install_airflow_when_building_images.sh Wraps uv commands with cargo/maturin retry logic during image builds.
scripts/ci/docker-compose/tests-sources.yml Mounts Vespa provider tests into the CI test container.
scripts/ci/docker-compose/remove-sources.yml Ensures Vespa provider sources are removed/masked when needed.
pyproject.toml Registers vespa extra, adds workspace member/dependency, and updates mypy paths.
providers/vespa/src/airflow/providers/vespa/hooks/vespa.py Implements VespaHook with connection parsing & feed helper.
providers/vespa/src/airflow/providers/vespa/operators/vespa_ingest.py Adds deferrable ingest operator delegating work to a trigger.
providers/vespa/src/airflow/providers/vespa/triggers/vespa_feed_trigger.py Adds trigger to run feed/update/delete off the worker process.
providers/vespa/src/airflow/providers/vespa/get_provider_info.py Declares provider metadata (hook/operator/trigger/connection fields).
providers/vespa/src/airflow/providers/vespa/init.py Defines provider version and Airflow minimum version guard.
providers/vespa/src/airflow/providers/vespa/hooks/init.py Initializes hooks package.
providers/vespa/src/airflow/providers/vespa/operators/init.py Initializes operators package.
providers/vespa/src/airflow/providers/vespa/triggers/init.py Initializes triggers package.
providers/vespa/src/airflow/init.py Namespace package marker for provider build layout.
providers/vespa/src/airflow/providers/init.py Namespace package marker for provider build layout.
providers/vespa/pyproject.toml Defines the provider distribution and dependencies (incl. pyvespa).
providers/vespa/provider.yaml Adds provider manifest incl. connection schema & UI behavior.
providers/vespa/README.rst Adds generated provider README content.
providers/vespa/docs/index.rst Adds provider documentation index & generated dependency table.
providers/vespa/docs/connections.rst Documents Vespa connection fields and defaults.
providers/vespa/docs/operators/vespa.rst Documents VespaIngestOperator usage and example include.
providers/vespa/docs/security.rst Includes standard provider security documentation.
providers/vespa/docs/installing-providers-from-sources.rst Includes standard “install from sources” docs.
providers/vespa/docs/changelog.rst Adds initial provider changelog stub.
providers/vespa/docs/commits.rst Adds commits stub for release-time generation.
providers/vespa/docs/conf.py Adds Sphinx config for provider doc build.
providers/vespa/tests/conftest.py Registers common pytest plugin for provider tests.
providers/vespa/tests/system/vespa/example_dag_vespa.py Adds system test example DAG for Vespa ingest usage.
providers/vespa/tests/system/init.py Namespace package marker for system tests.
providers/vespa/tests/system/vespa/init.py Initializes Vespa system tests package.
providers/vespa/tests/unit/init.py Namespace package marker for unit tests.
providers/vespa/tests/unit/vespa/init.py Initializes unit test package.
providers/vespa/tests/unit/vespa/hooks/init.py Initializes hook unit tests package.
providers/vespa/tests/unit/vespa/operators/init.py Initializes operator unit tests package.
providers/vespa/tests/unit/vespa/operators/test_vespa_ingest.py Adds unit tests for VespaIngestOperator.
providers/vespa/tests/unit/vespa/triggers/init.py Initializes trigger unit tests package.
providers/vespa/tests/unit/vespa/triggers/test_vespa_feed_trigger.py Adds unit tests for VespaFeedTrigger.
providers/vespa/LICENSE Adds Apache 2.0 license file for provider package.
providers/vespa/NOTICE Adds provider notice file.
providers/vespa/.gitignore Adds basic provider-local gitignore.
docs/spelling_wordlist.txt Adds “Vespa/vespa” and “pyvespa” to spelling whitelist.
airflow-core/docs/extra-packages-ref.rst Documents new apache-airflow[vespa] extra.
Dockerfile.ci Inlines the same uv cargo retry wrapper for CI image build install steps.
Dockerfile Inlines the same uv cargo retry wrapper for production image build install steps.
dev/breeze/doc/images/output_workflow-run_publish-docs.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_sbom_generate-providers-requirements.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_sbom_generate-providers-requirements.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_release-management_publish-docs.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_release-management_publish-docs.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_release-management_prepare-provider-documentation.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_release-management_prepare-provider-documentation.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_release-management_prepare-provider-distributions.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_release-management_prepare-provider-distributions.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_release-management_generate-providers-metadata.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_release-management_generate-providers-metadata.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_release-management_generate-issue-content-providers.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_release-management_generate-issue-content-providers.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_release-management_add-back-references.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_release-management_add-back-references.svg Updates Breeze docs output to include vespa in provider lists.
dev/breeze/doc/images/output_pr_auto-triage.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_pr_auto-triage.svg Updates Breeze docs output to include provider:vespa label.
dev/breeze/doc/images/output_build-docs.txt Updates generated Breeze docs artifact checksum.
dev/breeze/doc/images/output_build-docs.svg Updates Breeze docs output to include vespa in provider lists.
.github/boring-cyborg.yml Adds provider:vespa label mapping for PR auto-labeling.
.github/ISSUE_TEMPLATE/3-airflow_providers_bug_report.yml Adds vespa as a selectable provider in bug reports.
.github/CODEOWNERS Adds codeowners entry for /providers/vespa/.
Comments suppressed due to low confidence (5)

providers/vespa/src/airflow/providers/vespa/hooks/vespa.py:1

  • url is initially rstrip("/"), but that normalization is undone when host includes a scheme (url = host). If a user configures host="https://vespa.example/" and also sets port, this can produce an invalid URL like https://vespa.example/:8080. Consider normalizing (strip trailing /) after the scheme check, and preferably using urllib.parse.urlparse to safely add/override the port only when the URL does not already specify one.
    providers/vespa/src/airflow/providers/vespa/hooks/vespa.py:1
  • The _normalise() docstring says missing id on feed operations is auto-generated, but for Vespa-native bodies containing "fields" you currently append as-is even when "id" is missing. This likely violates Vespa’s requirement for an id and contradicts the docstring. Please either (a) auto-generate an id for operation_type=="feed" in this branch, or (b) raise a clear ValueError for missing id so behavior is consistent across both accepted input formats.
    scripts/in_container/install_airflow_and_providers.py:1
  • subprocess.CompletedProcess.stdout can be either bytes or str depending on how run_command() invokes subprocess (text mode vs binary). Unconditionally calling .decode() will throw AttributeError if stdout is already a str. Consider handling both cases (e.g., decode only when isinstance(stdout, (bytes, bytearray))) to keep the retry helper robust.
    scripts/docker/install_airflow_when_building_images.sh:1
  • This retry function is duplicated in multiple places (this script and the inlined copies in Dockerfile and Dockerfile.ci). That duplication increases the risk of the copies drifting (bugfixes/parameter tweaks applied in one place but not the others). Consider centralizing the function in a single sourced script (e.g., scripts/docker/common.sh) and having the Dockerfiles embed/source that single canonical definition, or generating the Dockerfile here-doc content from the same source to keep behavior consistent.
    providers/vespa/src/airflow/providers/vespa/operators/vespa_ingest.py:1
  • The failure message uses len(event["errors"]) as “document(s) failed”, but event["errors"] is an error list (and can also be a single trigger-level exception entry), so the count may not reflect failed documents. Consider rewording to “error(s)” (or including both event.get("sent") and the error count) and formatting details to avoid dumping large structures directly into the exception string.

@choo121600 choo121600 marked this pull request as draft April 3, 2026 12:43
@choo121600
Copy link
Copy Markdown
Member

@radu-gheorghe This PR has been converted to draft because it does not yet meet our Pull Request quality criteria.

Issues found:

  • Merge conflicts: This PR has merge conflicts with the main branch. Your branch is 32 commits behind main. Please rebase your branch (git fetch origin && git rebase origin/main), resolve the conflicts, and push again. See contributing quick start.
  • Other failing CI checks: Failing: CI image checks / Publish documentation and validate versions. Run prek run --from-ref main locally to reproduce. See static checks docs.

What to do next:

  • The comment informs you what you need to do.
  • Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
  • There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates.
  • Maintainers will then proceed with a normal review.

Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 4, 2026

FYI. @choo121600 -> I am working with @radu-gheorghe -> as adding Vespa provider Trigger an obscure rustup bug with parallell installations #64588 :)

@potiuk potiuk force-pushed the vespa-provider-pr branch from b263e8b to 03ba1bc Compare April 4, 2026 18:05
@radu-gheorghe radu-gheorghe marked this pull request as ready for review April 7, 2026 13:32
@potiuk potiuk merged commit bc26d6b into apache:main Apr 8, 2026
293 checks passed
@radu-gheorghe radu-gheorghe deleted the vespa-provider-pr branch April 9, 2026 05:31
Comment thread providers/sftp/pyproject.toml
Comment thread providers/ssh/pyproject.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants