Skip to content

feat: add vLLM Chat Generator#3099

Merged
anakin87 merged 26 commits intomainfrom
vllm-chat-generator
Apr 10, 2026
Merged

feat: add vLLM Chat Generator#3099
anakin87 merged 26 commits intomainfrom
vllm-chat-generator

Conversation

@anakin87
Copy link
Copy Markdown
Member

@anakin87 anakin87 commented Apr 3, 2026

Related Issues

Proposed Changes:

  • Add vllm-haystack integration scaffolding
  • Implement a vLLM Chat Generator: similar to OpenAIChatGenerator but specifically handles reasoning

How did you test it?

CI: new unit tests and integration tests (using Qwen/Qwen3-0.6B)

Notes for the reviewer

I initially inherited from OpenAIChatGenerator, then I realized that I was overriding most methods. Plus, vLLM is a bit simpler: no tools_strict handling, no different endpoint for structured generation... So I ended up building a simple standalone component.

Checklist

@github-actions github-actions bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Coverage report (vllm)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/vllm/src/haystack_integrations/components/generators/vllm/chat
  chat_generator.py 55, 262, 269, 271, 335, 344, 434-436
Project Total  

This report was generated by python-coverage-comment-action

@anakin87 anakin87 changed the title Vllm chat generator feat: add vLLM Chat Generator Apr 3, 2026
@anakin87 anakin87 marked this pull request as ready for review April 3, 2026 10:24
@anakin87 anakin87 requested a review from a team as a code owner April 3, 2026 10:24
@anakin87 anakin87 requested review from davidsbatista and removed request for a team April 3, 2026 10:24
@anakin87 anakin87 requested review from sjrl and removed request for davidsbatista April 9, 2026 07:09
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Apr 9, 2026

Oh interesting vLLM also supports the Responses API (docs). Obviously too much for this PR but could be good to add as a follow up issue.

streaming_chunk = StreamingChunk(
content="",
reasoning=ReasoningContent(reasoning_text=reasoning_text),
index=0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check this is always set to 0 since if there is reasoning it is always the first content block returned by the API?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but because vLLM does not use separate content blocks: reasoning and content are fields on the same delta. See https://docs.vllm.ai/en/latest/features/reasoning_outputs/#streaming-chat-completions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see the docs you are pointing to, but I can't see where it definitely says that it will send reasoning and content chunks with the same index number. Is this what you've observed in practice?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example:

--- Chunk 0 ---
  choice[0].index:         0
  choice[0].delta.role:     'assistant'
  choice[0].delta.content:  ''
  choice[0].delta.reasoning: 'N/A'
  choice[0].finish_reason:  None

--- Chunk 1 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: '\n'
  choice[0].finish_reason:  None

--- Chunk 2 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: 'Okay'
  choice[0].finish_reason:  None

--- Chunk 3 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ','
  choice[0].finish_reason:  None

--- Chunk 4 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ' the'
  choice[0].finish_reason:  None

--- Chunk 5 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ' user'
  choice[0].finish_reason:  None

--- Chunk 6 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ' is'
  choice[0].finish_reason:  None

--- Chunk 7 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ' asking'
  choice[0].finish_reason:  None

--- Chunk 8 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ' for'
  choice[0].finish_reason:  None

--- Chunk 9 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: ' the'
  choice[0].finish_reason:  None

...

--- Chunk 94 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: '.\n'
  choice[0].finish_reason:  None

--- Chunk 95 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  '\n\n'
  choice[0].delta.reasoning: 'N/A'
  choice[0].finish_reason:  None

--- Chunk 96 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  'Paris'
  choice[0].delta.reasoning: 'N/A'
  choice[0].finish_reason:  None

--- Chunk 97 ---
  choice[0].index:         0
  choice[0].delta.role:     None
  choice[0].delta.content:  None
  choice[0].delta.reasoning: 'N/A'
  choice[0].finish_reason:  'stop'

@anakin87 anakin87 marked this pull request as draft April 9, 2026 15:50
@anakin87
Copy link
Copy Markdown
Member Author

anakin87 commented Apr 9, 2026

There are issues running the model on CPU. Converting this into a draft and maybe reverting to the previous model... I'll investigate

See #3099 (comment)

@anakin87
Copy link
Copy Markdown
Member Author

@sjrl this should be in a better shape now

@anakin87 anakin87 marked this pull request as ready for review April 10, 2026 09:28
@anakin87
Copy link
Copy Markdown
Member Author

In 8ca2088 I added better streaming handling (hopefully)

Copy link
Copy Markdown
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks good!

@anakin87 anakin87 merged commit 0172236 into main Apr 10, 2026
11 checks passed
@anakin87 anakin87 deleted the vllm-chat-generator branch April 10, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add vLLM Chat Generator integration to support vLLM specific features like reasoning_content

2 participants