fix(openai): Handle empty tool call chunks in streaming#9565
fix(openai): Handle empty tool call chunks in streaming#9565Like0x wants to merge 2 commits intodeepset-ai:mainfrom
Conversation
### PR Description **What this PR does / why we need it:** This pull request addresses a `ValueError` that occurs in `OpenAIChatGenerator` when processing a streaming response that includes tool calls. The issue arises because some LLM providers, like OpenAI, can occasionally send a `tool_calls` delta chunk where the `function` object has both `name` and `arguments` as `None`. This empty delta causes a `ValueError` during the instantiation of the `ToolCallDelta` dataclass, as its `__post_init__` validation requires at least `tool_name` or `arguments` to be present. The application crashes with the following error: ``` ValueError: At least one of tool_name or arguments must be provided. ``` This change makes the streaming logic more robust by filtering out these empty, non-functional tool call chunks before they are processed. **How it was implemented:** The fix is implemented within the `_convert_chat_completion_chunk_to_streaming_chunk` function in `haystack/components/generators/chat/openai.py`. A conditional check has been added to iterate through `choice.delta.tool_calls`. It now verifies if `tool_call.function` is `None`, or if both `tool_call.function.name` and `tool_call.function.arguments` are `None`. If an empty tool call is found, it is skipped, and the loop continues to the next item. This prevents the invalid data from being passed to the `ToolCallDelta` constructor, thus avoiding the exception. The logic also ensures that if all tool call deltas in a chunk are filtered out, the empty list is handled correctly and doesn't proceed to create a `StreamingChunk`. **How to test it:** You can use the simplest Agent Tools sample to test. Please customize the api_base_url of OpenAIChatGenerator to https://dashscope.aliyuncs.com/compatible-mode/v1. This is the compatible endpoint of OpenAI. ```python from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack.tools.tool import Tool from haystack.components.agents import Agent from typing import List from haystack.components.generators.utils import print_streaming_chunk from haystack.utils import Secret from haystack import component from haystack.tools import ComponentTool @component class rephraser: def __init__(self): self.result = "" @component.output_types(result=str) def run(self, query: str) -> dict: """ Rephrase the query. Args: query: The query to rephrase. Returns: The rephrased query. """ try: result = "Json had lunch?" return {"result": result} except Exception as e: return {"error": str(e)} # Tool Definition rephraser_tool = ComponentTool( component=rephraser(), name="rephraser", description="Rephrase the query.", parameters={ "type": "object", "properties": { "query": {"type": "string", "description": "Query to rephrase"} }, "required": ["query"] }, ) # Agent Setup agent = Agent( chat_generator=OpenAIChatGenerator( api_base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", api_key=Secret.from_token("xxxxxxxxxx"), model="qwen-plus-latest", streaming_callback=print_streaming_chunk ), system_prompt="You are a helpful assistant. You can chat with the user and use tools to answer questions. Always start by using the 'rephraser' tool to rephrase the user's question. You can use the 'rephraser' tool to rephrase the user's question as many times as you want.", tools=[rephraser_tool], ) # Run the Agent agent.warm_up() response = agent.run(messages=[ChatMessage.from_user("He had lunch?")]) # Output print(response["messages"]) ```
|
Hey @Like0x thanks for looking into this! Before we give this an in depth review could you provide an example list of deltas from qwen-plus where the arguments and name are both none? It'd be nice to better understand if this is potentially a common occurrence from other providers as well. |
|
Is this the information you require? @sjrl |
|
Hey @Like0x yes that is helpful! If it's not too much trouble would it also be possible for you to provide the full I'm just trying to understand why the provider sends a chunk with an empty ChoiceDeltaToolCall. E.g. Is this the chunk that contains the |
|
Hey @sjrl, I've been a bit busy these days and didn't notice the message. I will conduct the subsequent tests. Thank you. |
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.PR Description
What this PR does / why we need it:
This pull request addresses a
ValueErrorthat occurs inOpenAIChatGeneratorwhen processing a streaming response that includes tool calls. The issue arises because some LLM providers, like qwen-plus, can occasionally send atool_callsdelta chunk where thefunctionobject has bothnameandargumentsasNone.This empty delta causes a
ValueErrorduring the instantiation of theToolCallDeltadataclass, as its__post_init__validation requires at leasttool_nameorargumentsto be present. The application crashes with the following error:haystack/haystack/dataclasses/streaming_chunk.py
Lines 33 to 37 in c54a68a
This change makes the streaming logic more robust by filtering out these empty, non-functional tool call chunks before they are processed.
How it was implemented:
The fix is implemented within the
_convert_chat_completion_chunk_to_streaming_chunkfunction inhaystack/components/generators/chat/openai.py.A conditional check has been added to iterate through
choice.delta.tool_calls. It now verifies iftool_call.functionisNone, or if bothtool_call.function.nameandtool_call.function.argumentsareNone. If an empty tool call is found, it is skipped, and the loop continues to the next item. This prevents the invalid data from being passed to theToolCallDeltaconstructor, thus avoiding the exception.The logic also ensures that if all tool call deltas in a chunk are filtered out, the empty list is handled correctly and doesn't proceed to create a
StreamingChunk.How did you test it?
You can use the simplest Agent Tools sample to test.
Please customize the api_base_url of OpenAIChatGenerator to https://dashscope.aliyuncs.com/compatible-mode/v1. This is the compatible endpoint of OpenAI.
You will most likely see this error message. When you switch to deepseek or other advanced models, this problem is unlikely to occur.
Since this PR has modified the core code of OpenAIChatGenerator, please help to review it.