fix(openai): Handle empty tool call chunks in streaming by Like0x · Pull Request #9565 · deepset-ai/haystack

Like0x · 2025-06-29T17:22:03Z

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I documented my code
I ran pre-commit hooks and fixed any issue

PR Description

What this PR does / why we need it:

This pull request addresses a ValueError that occurs in OpenAIChatGenerator when processing a streaming response that includes tool calls. The issue arises because some LLM providers, like qwen-plus, can occasionally send a tool_calls delta chunk where the function object has both name and arguments as None.

This empty delta causes a ValueError during the instantiation of the ToolCallDelta dataclass, as its __post_init__ validation requires at least tool_name or arguments to be present. The application crashes with the following error:

haystack/haystack/dataclasses/streaming_chunk.py

Lines 33 to 37 in c54a68a

    
           def __post_init__(self): 
        
               # NOTE: We allow for name and arguments to both be present because some providers like Mistral provide the 
        
               # name and full arguments in one chunk 
        
               if self.tool_name is None and self.arguments is None: 
        
                   raise ValueError("At least one of tool_name or arguments must be provided.")

ValueError: At least one of tool_name or arguments must be provided.

This change makes the streaming logic more robust by filtering out these empty, non-functional tool call chunks before they are processed.

How it was implemented:

The fix is implemented within the _convert_chat_completion_chunk_to_streaming_chunk function in haystack/components/generators/chat/openai.py.

A conditional check has been added to iterate through choice.delta.tool_calls. It now verifies if tool_call.function is None, or if both tool_call.function.name and tool_call.function.arguments are None. If an empty tool call is found, it is skipped, and the loop continues to the next item. This prevents the invalid data from being passed to the ToolCallDelta constructor, thus avoiding the exception.

The logic also ensures that if all tool call deltas in a chunk are filtered out, the empty list is handled correctly and doesn't proceed to create a StreamingChunk.

How did you test it?

You can use the simplest Agent Tools sample to test.

Please customize the api_base_url of OpenAIChatGenerator to https://dashscope.aliyuncs.com/compatible-mode/v1. This is the compatible endpoint of OpenAI.

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.tools.tool import Tool
from haystack.components.agents import Agent
from typing import List
from haystack.components.generators.utils import print_streaming_chunk
from haystack.utils import Secret
from haystack import component
from haystack.tools import ComponentTool

@component
class rephraser:

    def __init__(self):
        self.result = ""

    @component.output_types(result=str)
    def run(self, query: str) -> dict:
        """
        Rephrase the query.

        Args:
            query: The query to rephrase.

        Returns:
            The rephrased query.
        """
        try:
            result = "Json had lunch?"
            return {"result": result}
        except Exception as e:
            return {"error": str(e)}

# Tool Definition
rephraser_tool = ComponentTool(
    component=rephraser(),
    name="rephraser",
    description="Rephrase the query.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Query to rephrase"}
        },
        "required": ["query"]
    },
)

# Agent Setup
agent = Agent(
    chat_generator=OpenAIChatGenerator(
        api_base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
        api_key=Secret.from_token("xxxxxxxxxx"),
        model="qwen-plus-latest",
        streaming_callback=print_streaming_chunk
    ),
    system_prompt="You are a helpful assistant. You can chat with the user and use tools to answer questions. Always start by using the 'rephraser' tool to rephrase the user's question. You can use the 'rephraser' tool to rephrase the user's question as many times as you want.",
    tools=[rephraser_tool],
)

# Run the Agent
agent.warm_up()
response = agent.run(messages=[ChatMessage.from_user("He had lunch?")])

# Output
print(response["messages"])

You will most likely see this error message. When you switch to deepseek or other advanced models, this problem is unlikely to occur.

Since this PR has modified the core code of OpenAIChatGenerator, please help to review it.

haystack.core.errors.PipelineRuntimeError: The following component failed to run:
Component name: 'chat_generator'
Component type: 'OpenAIChatGenerator'
Error: At least one of tool_name or arguments must be provided.

### PR Description **What this PR does / why we need it:** This pull request addresses a `ValueError` that occurs in `OpenAIChatGenerator` when processing a streaming response that includes tool calls. The issue arises because some LLM providers, like OpenAI, can occasionally send a `tool_calls` delta chunk where the `function` object has both `name` and `arguments` as `None`. This empty delta causes a `ValueError` during the instantiation of the `ToolCallDelta` dataclass, as its `__post_init__` validation requires at least `tool_name` or `arguments` to be present. The application crashes with the following error: ``` ValueError: At least one of tool_name or arguments must be provided. ``` This change makes the streaming logic more robust by filtering out these empty, non-functional tool call chunks before they are processed. **How it was implemented:** The fix is implemented within the `_convert_chat_completion_chunk_to_streaming_chunk` function in `haystack/components/generators/chat/openai.py`. A conditional check has been added to iterate through `choice.delta.tool_calls`. It now verifies if `tool_call.function` is `None`, or if both `tool_call.function.name` and `tool_call.function.arguments` are `None`. If an empty tool call is found, it is skipped, and the loop continues to the next item. This prevents the invalid data from being passed to the `ToolCallDelta` constructor, thus avoiding the exception. The logic also ensures that if all tool call deltas in a chunk are filtered out, the empty list is handled correctly and doesn't proceed to create a `StreamingChunk`. **How to test it:** You can use the simplest Agent Tools sample to test. Please customize the api_base_url of OpenAIChatGenerator to https://dashscope.aliyuncs.com/compatible-mode/v1. This is the compatible endpoint of OpenAI. ```python from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack.tools.tool import Tool from haystack.components.agents import Agent from typing import List from haystack.components.generators.utils import print_streaming_chunk from haystack.utils import Secret from haystack import component from haystack.tools import ComponentTool @component class rephraser: def __init__(self): self.result = "" @component.output_types(result=str) def run(self, query: str) -> dict: """ Rephrase the query. Args: query: The query to rephrase. Returns: The rephrased query. """ try: result = "Json had lunch?" return {"result": result} except Exception as e: return {"error": str(e)} # Tool Definition rephraser_tool = ComponentTool( component=rephraser(), name="rephraser", description="Rephrase the query.", parameters={ "type": "object", "properties": { "query": {"type": "string", "description": "Query to rephrase"} }, "required": ["query"] }, ) # Agent Setup agent = Agent( chat_generator=OpenAIChatGenerator( api_base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", api_key=Secret.from_token("xxxxxxxxxx"), model="qwen-plus-latest", streaming_callback=print_streaming_chunk ), system_prompt="You are a helpful assistant. You can chat with the user and use tools to answer questions. Always start by using the 'rephraser' tool to rephrase the user's question. You can use the 'rephraser' tool to rephrase the user's question as many times as you want.", tools=[rephraser_tool], ) # Run the Agent agent.warm_up() response = agent.run(messages=[ChatMessage.from_user("He had lunch?")]) # Output print(response["messages"]) ```

CLAassistant · 2025-06-29T17:22:11Z

All committers have signed the CLA.

sjrl · 2025-06-30T06:01:37Z

Hey @Like0x thanks for looking into this! Before we give this an in depth review could you provide an example list of deltas from qwen-plus where the arguments and name are both none?

It'd be nice to better understand if this is potentially a common occurrence from other providers as well.

Like0x · 2025-06-30T15:44:59Z

Is this the information you require? @sjrl

2025-06-30 00:44:20 - haystack.components.generators.chat.openai - INFO - choice.delta.tool_calls: [ChoiceDeltaToolCall(index=0, id='call_920cf7601d7743cc9d29ec', function=ChoiceDeltaToolCallFunction(arguments='{"query": "xxxxxxx', name='query_rephraser'), type='function')]
2025-06-30 00:44:20 - haystack.components.generators.chat.openai - INFO - choice.delta.tool_calls: [ChoiceDeltaToolCall(index=0, id='', function=ChoiceDeltaToolCallFunction(arguments='xxxxxxxxxx"}', name=''), type='function')]
2025-06-30 00:44:20 - haystack.components.generators.chat.openai - INFO - choice.delta.tool_calls: [ChoiceDeltaToolCall(index=0, id='', function=ChoiceDeltaToolCallFunction(arguments=None, name=None), type='function')]
2025-06-30 00:44:20 - app.services.rag_generator - ERROR - [request_8d2c1edc1aae4dee8845cd5a27f4e9a5] Agent pipeline failed: The following component failed to run:
Component name: 'chat_generator'
Component type: 'OpenAIReasoningChatGenerator'
Error: At least one of tool_name or arguments must be provided.

sjrl · 2025-07-02T06:31:16Z

Hey @Like0x yes that is helpful! If it's not too much trouble would it also be possible for you to provide the full ChatCompletionChunk?

I'm just trying to understand why the provider sends a chunk with an empty ChoiceDeltaToolCall. E.g. Is this the chunk that contains the finish_reason? Or does it really contain no useful information?

sjrl · 2025-07-02T07:06:46Z

Hey @Like0x I went ahead and made a PR with a different solution to the problem here #9582

Instead of skipping the problematic chunks, I've opted to remove the ValueError from ToolCallDelta. Could you try out the changes in the above PR and tell me if that fixes your problem?

sjrl · 2025-07-04T13:59:06Z

Hey @Like0x we went ahead and merged the different fix for this issue in this PR #9581 Please let us know if you have any more issues!

Like0x · 2025-07-06T12:36:48Z

Hey @sjrl, I've been a bit busy these days and didn't notice the message. I will conduct the subsequent tests. Thank you.

Like0x requested a review from a team as a code owner June 29, 2025 17:22

Like0x requested review from sjrl and removed request for a team June 29, 2025 17:22

Merge branch 'main' into main

ed14876

sjrl mentioned this pull request Jul 2, 2025

ValueError when using streaming with qwen-plus-latest and dashscope #9581

Closed

sjrl self-assigned this Jul 3, 2025

sjrl added the information-needed Information needed from the user label Jul 3, 2025

sjrl closed this Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): Handle empty tool call chunks in streaming#9565

fix(openai): Handle empty tool call chunks in streaming#9565
Like0x wants to merge 2 commits intodeepset-ai:mainfrom
Like0x:main

Like0x commented Jun 29, 2025

Uh oh!

CLAassistant commented Jun 29, 2025 •

edited

Loading

Uh oh!

sjrl commented Jun 30, 2025

Uh oh!

Like0x commented Jun 30, 2025

Uh oh!

sjrl commented Jul 2, 2025

Uh oh!

sjrl commented Jul 2, 2025

Uh oh!

sjrl commented Jul 4, 2025

Uh oh!

Like0x commented Jul 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def __post_init__(self):
	# NOTE: We allow for name and arguments to both be present because some providers like Mistral provide the
	# name and full arguments in one chunk
	if self.tool_name is None and self.arguments is None:
	raise ValueError("At least one of tool_name or arguments must be provided.")

Conversation

Like0x commented Jun 29, 2025

Checklist

PR Description

How did you test it?

Uh oh!

CLAassistant commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjrl commented Jun 30, 2025

Uh oh!

Like0x commented Jun 30, 2025

Uh oh!

sjrl commented Jul 2, 2025

Uh oh!

sjrl commented Jul 2, 2025

Uh oh!

sjrl commented Jul 4, 2025

Uh oh!

Like0x commented Jul 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 29, 2025 •

edited

Loading