Conversation
Coverage report (vllm)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
integrations/vllm/src/haystack_integrations/components/generators/vllm/chat/chat_generator.py
Show resolved
Hide resolved
integrations/vllm/src/haystack_integrations/components/generators/vllm/chat/chat_generator.py
Show resolved
Hide resolved
|
Oh interesting vLLM also supports the Responses API (docs). Obviously too much for this PR but could be good to add as a follow up issue. |
| streaming_chunk = StreamingChunk( | ||
| content="", | ||
| reasoning=ReasoningContent(reasoning_text=reasoning_text), | ||
| index=0, |
There was a problem hiding this comment.
Just to double check this is always set to 0 since if there is reasoning it is always the first content block returned by the API?
There was a problem hiding this comment.
Yes, but because vLLM does not use separate content blocks: reasoning and content are fields on the same delta. See https://docs.vllm.ai/en/latest/features/reasoning_outputs/#streaming-chat-completions
There was a problem hiding this comment.
Hmm I see the docs you are pointing to, but I can't see where it definitely says that it will send reasoning and content chunks with the same index number. Is this what you've observed in practice?
There was a problem hiding this comment.
An example:
--- Chunk 0 ---
choice[0].index: 0
choice[0].delta.role: 'assistant'
choice[0].delta.content: ''
choice[0].delta.reasoning: 'N/A'
choice[0].finish_reason: None
--- Chunk 1 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: '\n'
choice[0].finish_reason: None
--- Chunk 2 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: 'Okay'
choice[0].finish_reason: None
--- Chunk 3 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ','
choice[0].finish_reason: None
--- Chunk 4 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ' the'
choice[0].finish_reason: None
--- Chunk 5 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ' user'
choice[0].finish_reason: None
--- Chunk 6 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ' is'
choice[0].finish_reason: None
--- Chunk 7 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ' asking'
choice[0].finish_reason: None
--- Chunk 8 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ' for'
choice[0].finish_reason: None
--- Chunk 9 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: ' the'
choice[0].finish_reason: None
...
--- Chunk 94 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: '.\n'
choice[0].finish_reason: None
--- Chunk 95 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: '\n\n'
choice[0].delta.reasoning: 'N/A'
choice[0].finish_reason: None
--- Chunk 96 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: 'Paris'
choice[0].delta.reasoning: 'N/A'
choice[0].finish_reason: None
--- Chunk 97 ---
choice[0].index: 0
choice[0].delta.role: None
choice[0].delta.content: None
choice[0].delta.reasoning: 'N/A'
choice[0].finish_reason: 'stop'
|
See #3099 (comment) |
|
@sjrl this should be in a better shape now |
integrations/vllm/src/haystack_integrations/components/generators/vllm/chat/chat_generator.py
Show resolved
Hide resolved
integrations/vllm/src/haystack_integrations/components/generators/vllm/__init__.py
Show resolved
Hide resolved
|
In 8ca2088 I added better streaming handling (hopefully) |
Related Issues
reasoning_content#1958Proposed Changes:
OpenAIChatGeneratorbut specifically handles reasoningHow did you test it?
CI: new unit tests and integration tests (using
Qwen/Qwen3-0.6B)Notes for the reviewer
I initially inherited from
OpenAIChatGenerator, then I realized that I was overriding most methods. Plus, vLLM is a bit simpler: notools_stricthandling, no different endpoint for structured generation... So I ended up building a simple standalone component.Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.