Skip to content

feat: add DoclingServeConverter integration#3173

Merged
julian-risch merged 4 commits into
deepset-ai:mainfrom
SyedShahmeerAli12:feat/docling-serve-integration
May 8, 2026
Merged

feat: add DoclingServeConverter integration#3173
julian-risch merged 4 commits into
deepset-ai:mainfrom
SyedShahmeerAli12:feat/docling-serve-integration

Conversation

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor

@SyedShahmeerAli12 SyedShahmeerAli12 commented Apr 16, 2026

Summary

Adds a DoclingServeConverter component that converts documents using a running docling-serve HTTP server, without any heavy ML dependencies (no PyTorch required).

  • Accepts URLs, local file paths, and ByteStream sources
  • Supports MARKDOWN, TEXT, and JSON export formats
  • Optional API key authentication via Haystack Secret
  • Both synchronous (run) and asynchronous (arun) execution
  • 27 unit tests, all passing

Part of #2960

Test plan

  • 27 unit tests passing
  • Lint clean (ruff check, ruff format)
  • Integration test requires a running docling-serve instance (pytest -m integration)

Adds a new `docling-serve-haystack` integration with a `DoclingServeConverter`
component that converts documents via a remote DoclingServe HTTP server instead
of loading heavy ML dependencies locally (no PyTorch required).

- Supports URLs, local file paths, and ByteStream sources
- Export formats: Markdown (default), plain text, JSON
- Both sync `run()` and async `arun()` methods
- Configurable conversion options, timeout, and optional API key auth
- Full unit test suite (mocked httpx) + integration test markers
- CI workflow, labeler, coverage comment, and root README table entry

Closes deepset-ai#2960
Adds a new DoclingServeConverter component that converts documents
by sending them to a running docling-serve HTTP server. Supports
local files, URLs, and ByteStreams; markdown, text, and JSON export
formats; optional API key authentication; and both sync (run) and
async (arun) execution.

Closes deepset-ai#2960
@SyedShahmeerAli12 SyedShahmeerAli12 requested a review from a team as a code owner April 16, 2026 20:12
@SyedShahmeerAli12 SyedShahmeerAli12 requested review from julian-risch and removed request for a team April 16, 2026 20:12
@github-actions github-actions Bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 16, 2026
@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

SyedShahmeerAli12 commented Apr 16, 2026

heyy ..... @julian-risch
this implements the DoclingServeConverter as described in #2960.

Key design decisions:

  • Used httpx instead of requests for native async support (arun())
  • api_key uses Haystack Secret class for secure serialization
  • convert_options is a single dict instead of individual params ...... cleaner and forward-compatible with new
    docling-serve options
  • Sources are base64-encoded and sent as JSON to /v1/convert/source (avoids multipart complexity)

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

SyedShahmeerAli12 commented May 6, 2026

Merge conflicts resolved branch is now up to date with main. Ready for review.

- Add `ghcr.io/docling-project/docling-serve` service to the workflow
- Add `Run integration tests` and `Store combined coverage` steps
- Add push-to-main trigger so integration tests run on merge
- Add `integration-cov-append-retry` hatch script to pyproject.toml
- Update README combined coverage badge from N/A to actual badge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this pull request @SyedShahmeerAli12 . I extended it now by adding integration tests to the CI workflow and can confirm that the integration tests pass for me locally too.

@julian-risch julian-risch merged commit 7e78b98 into deepset-ai:main May 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants