feat: add vLLM Chat Generator by anakin87 · Pull Request #3099 · deepset-ai/haystack-core-integrations

anakin87 · 2026-04-03T08:17:16Z

Related Issues

part of Add vLLM integration: ChatGenerator, Embedder and Reranker #2007
fixes Add vLLM Chat Generator integration to support vLLM specific features like reasoning_content #1958

Proposed Changes:

Add vllm-haystack integration scaffolding
Implement a vLLM Chat Generator: similar to OpenAIChatGenerator but specifically handles reasoning

How did you test it?

CI: new unit tests and integration tests (using Qwen/Qwen3-0.6B)

Notes for the reviewer

I initially inherited from OpenAIChatGenerator, then I realized that I was overriding most methods. Plus, vLLM is a bit simpler: no tools_strict handling, no different endpoint for structured generation... So I ended up building a simple standalone component.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

github-actions · 2026-04-03T09:02:40Z

Coverage report (vllm)

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
integrations/vllm/src/haystack_integrations/components/generators/vllm/chat
chat_generator.py					55, 262, 269, 271, 335, 344, 434-436
Project Total

_{This report was generated by python-coverage-comment-action}

sjrl · 2026-04-09T09:37:26Z

Oh interesting vLLM also supports the Responses API (docs). Obviously too much for this PR but could be good to add as a follow up issue.

sjrl · 2026-04-09T09:57:33Z

+                streaming_chunk = StreamingChunk(
+                    content="",
+                    reasoning=ReasoningContent(reasoning_text=reasoning_text),
+                    index=0,


Just to double check this is always set to 0 since if there is reasoning it is always the first content block returned by the API?

Yes, but because vLLM does not use separate content blocks: reasoning and content are fields on the same delta. See https://docs.vllm.ai/en/latest/features/reasoning_outputs/#streaming-chat-completions

Hmm I see the docs you are pointing to, but I can't see where it definitely says that it will send reasoning and content chunks with the same index number. Is this what you've observed in practice?

An example:

--- Chunk 0 --- choice[0].index: 0 choice[0].delta.role: 'assistant' choice[0].delta.content: '' choice[0].delta.reasoning: 'N/A' choice[0].finish_reason: None --- Chunk 1 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: '\n' choice[0].finish_reason: None --- Chunk 2 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: 'Okay' choice[0].finish_reason: None --- Chunk 3 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ',' choice[0].finish_reason: None --- Chunk 4 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ' the' choice[0].finish_reason: None --- Chunk 5 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ' user' choice[0].finish_reason: None --- Chunk 6 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ' is' choice[0].finish_reason: None --- Chunk 7 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ' asking' choice[0].finish_reason: None --- Chunk 8 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ' for' choice[0].finish_reason: None --- Chunk 9 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: ' the' choice[0].finish_reason: None ... --- Chunk 94 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: '.\n' choice[0].finish_reason: None --- Chunk 95 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: '\n\n' choice[0].delta.reasoning: 'N/A' choice[0].finish_reason: None --- Chunk 96 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: 'Paris' choice[0].delta.reasoning: 'N/A' choice[0].finish_reason: None --- Chunk 97 --- choice[0].index: 0 choice[0].delta.role: None choice[0].delta.content: None choice[0].delta.reasoning: 'N/A' choice[0].finish_reason: 'stop'

anakin87 · 2026-04-09T15:51:34Z

~~There are issues running the model on CPU. Converting this into a draft and maybe reverting to the previous model... I'll investigate~~

See #3099 (comment)

anakin87 · 2026-04-10T09:28:32Z

@sjrl this should be in a better shape now

…aystack-core-integrations into vllm-chat-generator

anakin87 · 2026-04-10T10:23:23Z

In 8ca2088 I added better streaming handling (hopefully)

sjrl

Thanks looks good!

anakin87 added 2 commits April 2, 2026 19:14

draft vllm integration

d07592a

workflow + more tests

64afd71

github-actions Bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 3, 2026

anakin87 added 7 commits April 3, 2026 10:26

simplify installation

ee7606f

improve

49ee751

increase timeout and pydoc fix

837a5d1

show logs

cd60d84

cpu

63124fa

install from github

b551b0d

retry

e2d6fe2

better wf and explanation

4dc6794

anakin87 commented Apr 3, 2026

View reviewed changes

Comment thread .github/workflows/CI_coverage_comment.yml

anakin87 added 4 commits April 3, 2026 11:22

Merge branch 'main' into vllm-chat-generator

23bd80e

better readme + 3.14

c541848

back to 3.13

3172864

drop inheritance

db0d550

anakin87 changed the title ~~Vllm chat generator~~ feat: add vLLM Chat Generator Apr 3, 2026

anakin87 marked this pull request as ready for review April 3, 2026 10:24

anakin87 requested a review from a team as a code owner April 3, 2026 10:24

anakin87 requested review from davidsbatista and removed request for a team April 3, 2026 10:24

anakin87 mentioned this pull request Apr 8, 2026

Gemma 4 - OpenAIResponseChatGenerator Tool Input error deepset-ai/haystack#11040

Closed

1 task

anakin87 requested review from sjrl and removed request for davidsbatista April 9, 2026 07:09