Add Vespa provider#63988
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
|
@radu-gheorghe This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria. Issues found:
What to do next:
There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
Sorry for the missing pieces. I'm setting as "Ready for review" for now to see if the CI agrees that it's indeed ready :) |
aec4169 to
2f34e23
Compare
7336a13 to
fce62d8
Compare
ac4828d to
2d98a79
Compare
594f1c4 to
f45f68f
Compare
f45f68f to
a82bec5
Compare
a82bec5 to
e91b923
Compare
e91b923 to
330a45d
Compare
8b3b20d to
b263e8b
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new vespa provider package to Apache Airflow (hook + deferrable ingest operator/trigger) and updates build/install tooling to better handle transient uv build failures.
Changes:
- Introduces
apache-airflow-providers-vespawithVespaHook,VespaIngestOperator, andVespaFeedTrigger, plus docs and tests. - Wires the new provider into the monorepo workspace/extras/CI compose mounts.
- Adds retry wrappers for
uvinstalls/syncs to mitigate transient cargo/maturin build failures.
Reviewed changes
Copilot reviewed 58 out of 69 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/in_container/install_airflow_and_providers.py | Adds a retry wrapper around uv pip install during in-container installs. |
| scripts/docker/install_airflow_when_building_images.sh | Wraps uv commands with cargo/maturin retry logic during image builds. |
| scripts/ci/docker-compose/tests-sources.yml | Mounts Vespa provider tests into the CI test container. |
| scripts/ci/docker-compose/remove-sources.yml | Ensures Vespa provider sources are removed/masked when needed. |
| pyproject.toml | Registers vespa extra, adds workspace member/dependency, and updates mypy paths. |
| providers/vespa/src/airflow/providers/vespa/hooks/vespa.py | Implements VespaHook with connection parsing & feed helper. |
| providers/vespa/src/airflow/providers/vespa/operators/vespa_ingest.py | Adds deferrable ingest operator delegating work to a trigger. |
| providers/vespa/src/airflow/providers/vespa/triggers/vespa_feed_trigger.py | Adds trigger to run feed/update/delete off the worker process. |
| providers/vespa/src/airflow/providers/vespa/get_provider_info.py | Declares provider metadata (hook/operator/trigger/connection fields). |
| providers/vespa/src/airflow/providers/vespa/init.py | Defines provider version and Airflow minimum version guard. |
| providers/vespa/src/airflow/providers/vespa/hooks/init.py | Initializes hooks package. |
| providers/vespa/src/airflow/providers/vespa/operators/init.py | Initializes operators package. |
| providers/vespa/src/airflow/providers/vespa/triggers/init.py | Initializes triggers package. |
| providers/vespa/src/airflow/init.py | Namespace package marker for provider build layout. |
| providers/vespa/src/airflow/providers/init.py | Namespace package marker for provider build layout. |
| providers/vespa/pyproject.toml | Defines the provider distribution and dependencies (incl. pyvespa). |
| providers/vespa/provider.yaml | Adds provider manifest incl. connection schema & UI behavior. |
| providers/vespa/README.rst | Adds generated provider README content. |
| providers/vespa/docs/index.rst | Adds provider documentation index & generated dependency table. |
| providers/vespa/docs/connections.rst | Documents Vespa connection fields and defaults. |
| providers/vespa/docs/operators/vespa.rst | Documents VespaIngestOperator usage and example include. |
| providers/vespa/docs/security.rst | Includes standard provider security documentation. |
| providers/vespa/docs/installing-providers-from-sources.rst | Includes standard “install from sources” docs. |
| providers/vespa/docs/changelog.rst | Adds initial provider changelog stub. |
| providers/vespa/docs/commits.rst | Adds commits stub for release-time generation. |
| providers/vespa/docs/conf.py | Adds Sphinx config for provider doc build. |
| providers/vespa/tests/conftest.py | Registers common pytest plugin for provider tests. |
| providers/vespa/tests/system/vespa/example_dag_vespa.py | Adds system test example DAG for Vespa ingest usage. |
| providers/vespa/tests/system/init.py | Namespace package marker for system tests. |
| providers/vespa/tests/system/vespa/init.py | Initializes Vespa system tests package. |
| providers/vespa/tests/unit/init.py | Namespace package marker for unit tests. |
| providers/vespa/tests/unit/vespa/init.py | Initializes unit test package. |
| providers/vespa/tests/unit/vespa/hooks/init.py | Initializes hook unit tests package. |
| providers/vespa/tests/unit/vespa/operators/init.py | Initializes operator unit tests package. |
| providers/vespa/tests/unit/vespa/operators/test_vespa_ingest.py | Adds unit tests for VespaIngestOperator. |
| providers/vespa/tests/unit/vespa/triggers/init.py | Initializes trigger unit tests package. |
| providers/vespa/tests/unit/vespa/triggers/test_vespa_feed_trigger.py | Adds unit tests for VespaFeedTrigger. |
| providers/vespa/LICENSE | Adds Apache 2.0 license file for provider package. |
| providers/vespa/NOTICE | Adds provider notice file. |
| providers/vespa/.gitignore | Adds basic provider-local gitignore. |
| docs/spelling_wordlist.txt | Adds “Vespa/vespa” and “pyvespa” to spelling whitelist. |
| airflow-core/docs/extra-packages-ref.rst | Documents new apache-airflow[vespa] extra. |
| Dockerfile.ci | Inlines the same uv cargo retry wrapper for CI image build install steps. |
| Dockerfile | Inlines the same uv cargo retry wrapper for production image build install steps. |
| dev/breeze/doc/images/output_workflow-run_publish-docs.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_sbom_generate-providers-requirements.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_sbom_generate-providers-requirements.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_release-management_publish-docs.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_release-management_publish-docs.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_release-management_prepare-provider-documentation.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_release-management_prepare-provider-documentation.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_release-management_prepare-provider-distributions.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_release-management_prepare-provider-distributions.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_release-management_generate-providers-metadata.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_release-management_generate-providers-metadata.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_release-management_generate-issue-content-providers.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_release-management_generate-issue-content-providers.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_release-management_add-back-references.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_release-management_add-back-references.svg | Updates Breeze docs output to include vespa in provider lists. |
| dev/breeze/doc/images/output_pr_auto-triage.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_pr_auto-triage.svg | Updates Breeze docs output to include provider:vespa label. |
| dev/breeze/doc/images/output_build-docs.txt | Updates generated Breeze docs artifact checksum. |
| dev/breeze/doc/images/output_build-docs.svg | Updates Breeze docs output to include vespa in provider lists. |
| .github/boring-cyborg.yml | Adds provider:vespa label mapping for PR auto-labeling. |
| .github/ISSUE_TEMPLATE/3-airflow_providers_bug_report.yml | Adds vespa as a selectable provider in bug reports. |
| .github/CODEOWNERS | Adds codeowners entry for /providers/vespa/. |
Comments suppressed due to low confidence (5)
providers/vespa/src/airflow/providers/vespa/hooks/vespa.py:1
urlis initiallyrstrip("/"), but that normalization is undone whenhostincludes a scheme (url = host). If a user configureshost="https://vespa.example/"and also setsport, this can produce an invalid URL likehttps://vespa.example/:8080. Consider normalizing (strip trailing/) after the scheme check, and preferably usingurllib.parse.urlparseto safely add/override the port only when the URL does not already specify one.
providers/vespa/src/airflow/providers/vespa/hooks/vespa.py:1- The
_normalise()docstring says missingidonfeedoperations is auto-generated, but for Vespa-native bodies containing"fields"you currently append as-is even when"id"is missing. This likely violates Vespa’s requirement for anidand contradicts the docstring. Please either (a) auto-generate anidforoperation_type=="feed"in this branch, or (b) raise a clearValueErrorfor missingidso behavior is consistent across both accepted input formats.
scripts/in_container/install_airflow_and_providers.py:1 subprocess.CompletedProcess.stdoutcan be eitherbytesorstrdepending on howrun_command()invokes subprocess (text mode vs binary). Unconditionally calling.decode()will throwAttributeErrorifstdoutis already astr. Consider handling both cases (e.g., decode only whenisinstance(stdout, (bytes, bytearray))) to keep the retry helper robust.
scripts/docker/install_airflow_when_building_images.sh:1- This retry function is duplicated in multiple places (this script and the inlined copies in
DockerfileandDockerfile.ci). That duplication increases the risk of the copies drifting (bugfixes/parameter tweaks applied in one place but not the others). Consider centralizing the function in a single sourced script (e.g.,scripts/docker/common.sh) and having the Dockerfiles embed/source that single canonical definition, or generating the Dockerfile here-doc content from the same source to keep behavior consistent.
providers/vespa/src/airflow/providers/vespa/operators/vespa_ingest.py:1 - The failure message uses
len(event["errors"])as “document(s) failed”, butevent["errors"]is an error list (and can also be a single trigger-level exception entry), so the count may not reflect failed documents. Consider rewording to “error(s)” (or including bothevent.get("sent")and the error count) and formatting details to avoid dumping large structures directly into the exception string.
|
@radu-gheorghe This PR has been converted to draft because it does not yet meet our Pull Request quality criteria. Issues found:
What to do next:
Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack. |
|
FYI. @choo121600 -> I am working with @radu-gheorghe -> as adding Vespa provider Trigger an obscure rustup bug with parallell installations #64588 :) |
b263e8b to
03ba1bc
Compare
03ba1bc to
594927b
Compare
382450c to
5506c8d
Compare
e1b645c to
f7caef3
Compare
Add Vespa.ai provider for Apache Airflow. Includes:
Was generative AI tooling used to co-author this PR?
Generated-by: a mix of tools, mostly Cursor, Codex, Claude, Grok. I wouldn't say "generated by", because I made sure I was on top of things. But assisted for sure.
I will follow up with a PROPOSAL message on the dev mailing list.