Skip to content

fix(openai): Handle empty tool call chunks in streaming#9565

Closed
Like0x wants to merge 2 commits intodeepset-ai:mainfrom
Like0x:main
Closed

fix(openai): Handle empty tool call chunks in streaming#9565
Like0x wants to merge 2 commits intodeepset-ai:mainfrom
Like0x:main

Conversation

@Like0x
Copy link
Copy Markdown

@Like0x Like0x commented Jun 29, 2025

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

PR Description

What this PR does / why we need it:

This pull request addresses a ValueError that occurs in OpenAIChatGenerator when processing a streaming response that includes tool calls. The issue arises because some LLM providers, like qwen-plus, can occasionally send a tool_calls delta chunk where the function object has both name and arguments as None.

This empty delta causes a ValueError during the instantiation of the ToolCallDelta dataclass, as its __post_init__ validation requires at least tool_name or arguments to be present. The application crashes with the following error:

def __post_init__(self):
# NOTE: We allow for name and arguments to both be present because some providers like Mistral provide the
# name and full arguments in one chunk
if self.tool_name is None and self.arguments is None:
raise ValueError("At least one of tool_name or arguments must be provided.")

ValueError: At least one of tool_name or arguments must be provided.

This change makes the streaming logic more robust by filtering out these empty, non-functional tool call chunks before they are processed.

How it was implemented:

The fix is implemented within the _convert_chat_completion_chunk_to_streaming_chunk function in haystack/components/generators/chat/openai.py.

A conditional check has been added to iterate through choice.delta.tool_calls. It now verifies if tool_call.function is None, or if both tool_call.function.name and tool_call.function.arguments are None. If an empty tool call is found, it is skipped, and the loop continues to the next item. This prevents the invalid data from being passed to the ToolCallDelta constructor, thus avoiding the exception.

The logic also ensures that if all tool call deltas in a chunk are filtered out, the empty list is handled correctly and doesn't proceed to create a StreamingChunk.

How did you test it?

You can use the simplest Agent Tools sample to test.

Please customize the api_base_url of OpenAIChatGenerator to https://dashscope.aliyuncs.com/compatible-mode/v1. This is the compatible endpoint of OpenAI.

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.tools.tool import Tool
from haystack.components.agents import Agent
from typing import List
from haystack.components.generators.utils import print_streaming_chunk
from haystack.utils import Secret
from haystack import component
from haystack.tools import ComponentTool

@component
class rephraser:

    def __init__(self):
        self.result = ""

    @component.output_types(result=str)
    def run(self, query: str) -> dict:
        """
        Rephrase the query.

        Args:
            query: The query to rephrase.

        Returns:
            The rephrased query.
        """
        try:
            result = "Json had lunch?"
            return {"result": result}
        except Exception as e:
            return {"error": str(e)}

# Tool Definition
rephraser_tool = ComponentTool(
    component=rephraser(),
    name="rephraser",
    description="Rephrase the query.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Query to rephrase"}
        },
        "required": ["query"]
    },
)

# Agent Setup
agent = Agent(
    chat_generator=OpenAIChatGenerator(
        api_base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
        api_key=Secret.from_token("xxxxxxxxxx"),
        model="qwen-plus-latest",
        streaming_callback=print_streaming_chunk
    ),
    system_prompt="You are a helpful assistant. You can chat with the user and use tools to answer questions. Always start by using the 'rephraser' tool to rephrase the user's question. You can use the 'rephraser' tool to rephrase the user's question as many times as you want.",
    tools=[rephraser_tool],
)

# Run the Agent
agent.warm_up()
response = agent.run(messages=[ChatMessage.from_user("He had lunch?")])

# Output
print(response["messages"])

You will most likely see this error message. When you switch to deepseek or other advanced models, this problem is unlikely to occur.

Since this PR has modified the core code of OpenAIChatGenerator, please help to review it.

haystack.core.errors.PipelineRuntimeError: The following component failed to run:
Component name: 'chat_generator'
Component type: 'OpenAIChatGenerator'
Error: At least one of tool_name or arguments must be provided.

### PR Description

**What this PR does / why we need it:**

This pull request addresses a `ValueError` that occurs in `OpenAIChatGenerator` when processing a streaming response that includes tool calls. The issue arises because some LLM providers, like OpenAI, can occasionally send a `tool_calls` delta chunk where the `function` object has both `name` and `arguments` as `None`.

This empty delta causes a `ValueError` during the instantiation of the `ToolCallDelta` dataclass, as its `__post_init__` validation requires at least `tool_name` or `arguments` to be present. The application crashes with the following error:

```
ValueError: At least one of tool_name or arguments must be provided.
```

This change makes the streaming logic more robust by filtering out these empty, non-functional tool call chunks before they are processed.

**How it was implemented:**

The fix is implemented within the `_convert_chat_completion_chunk_to_streaming_chunk` function in `haystack/components/generators/chat/openai.py`.

A conditional check has been added to iterate through `choice.delta.tool_calls`. It now verifies if `tool_call.function` is `None`, or if both `tool_call.function.name` and `tool_call.function.arguments` are `None`. If an empty tool call is found, it is skipped, and the loop continues to the next item. This prevents the invalid data from being passed to the `ToolCallDelta` constructor, thus avoiding the exception.

The logic also ensures that if all tool call deltas in a chunk are filtered out, the empty list is handled correctly and doesn't proceed to create a `StreamingChunk`.

**How to test it:**
You can use the simplest Agent Tools sample to test.

Please customize the api_base_url of OpenAIChatGenerator to https://dashscope.aliyuncs.com/compatible-mode/v1. This is the compatible endpoint of OpenAI.

```python
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.tools.tool import Tool
from haystack.components.agents import Agent
from typing import List
from haystack.components.generators.utils import print_streaming_chunk
from haystack.utils import Secret
from haystack import component
from haystack.tools import ComponentTool

@component
class rephraser:

    def __init__(self):
        self.result = ""

    @component.output_types(result=str)
    def run(self, query: str) -> dict:
        """
        Rephrase the query.

        Args:
            query: The query to rephrase.

        Returns:
            The rephrased query.
        """
        try:
            result = "Json had lunch?"
            return {"result": result}
        except Exception as e:
            return {"error": str(e)}

# Tool Definition
rephraser_tool = ComponentTool(
    component=rephraser(),
    name="rephraser",
    description="Rephrase the query.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Query to rephrase"}
        },
        "required": ["query"]
    },
)

# Agent Setup
agent = Agent(
    chat_generator=OpenAIChatGenerator(
        api_base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
        api_key=Secret.from_token("xxxxxxxxxx"),
        model="qwen-plus-latest",
        streaming_callback=print_streaming_chunk
    ),
    system_prompt="You are a helpful assistant. You can chat with the user and use tools to answer questions. Always start by using the 'rephraser' tool to rephrase the user's question. You can use the 'rephraser' tool to rephrase the user's question as many times as you want.",
    tools=[rephraser_tool],
)

# Run the Agent
agent.warm_up()
response = agent.run(messages=[ChatMessage.from_user("He had lunch?")])

# Output
print(response["messages"])
```
@Like0x Like0x requested a review from a team as a code owner June 29, 2025 17:22
@Like0x Like0x requested review from sjrl and removed request for a team June 29, 2025 17:22
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 29, 2025

CLA assistant check
All committers have signed the CLA.

@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jun 30, 2025

Hey @Like0x thanks for looking into this! Before we give this an in depth review could you provide an example list of deltas from qwen-plus where the arguments and name are both none?

It'd be nice to better understand if this is potentially a common occurrence from other providers as well.

@Like0x
Copy link
Copy Markdown
Author

Like0x commented Jun 30, 2025

Is this the information you require? @sjrl

2025-06-30 00:44:20 - haystack.components.generators.chat.openai - INFO - choice.delta.tool_calls: [ChoiceDeltaToolCall(index=0, id='call_920cf7601d7743cc9d29ec', function=ChoiceDeltaToolCallFunction(arguments='{"query": "xxxxxxx', name='query_rephraser'), type='function')]
2025-06-30 00:44:20 - haystack.components.generators.chat.openai - INFO - choice.delta.tool_calls: [ChoiceDeltaToolCall(index=0, id='', function=ChoiceDeltaToolCallFunction(arguments='xxxxxxxxxx"}', name=''), type='function')]
2025-06-30 00:44:20 - haystack.components.generators.chat.openai - INFO - choice.delta.tool_calls: [ChoiceDeltaToolCall(index=0, id='', function=ChoiceDeltaToolCallFunction(arguments=None, name=None), type='function')]
2025-06-30 00:44:20 - app.services.rag_generator - ERROR - [request_8d2c1edc1aae4dee8845cd5a27f4e9a5] Agent pipeline failed: The following component failed to run:
Component name: 'chat_generator'
Component type: 'OpenAIReasoningChatGenerator'
Error: At least one of tool_name or arguments must be provided.

@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jul 2, 2025

Hey @Like0x yes that is helpful! If it's not too much trouble would it also be possible for you to provide the full ChatCompletionChunk?

I'm just trying to understand why the provider sends a chunk with an empty ChoiceDeltaToolCall. E.g. Is this the chunk that contains the finish_reason? Or does it really contain no useful information?

@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jul 2, 2025

Hey @Like0x I went ahead and made a PR with a different solution to the problem here #9582

Instead of skipping the problematic chunks, I've opted to remove the ValueError from ToolCallDelta. Could you try out the changes in the above PR and tell me if that fixes your problem?

@sjrl sjrl self-assigned this Jul 3, 2025
@sjrl sjrl added the information-needed Information needed from the user label Jul 3, 2025
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jul 4, 2025

Hey @Like0x we went ahead and merged the different fix for this issue in this PR #9581 Please let us know if you have any more issues!

@sjrl sjrl closed this Jul 4, 2025
@Like0x
Copy link
Copy Markdown
Author

Like0x commented Jul 6, 2025

Hey @sjrl, I've been a bit busy these days and didn't notice the message. I will conduct the subsequent tests. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

information-needed Information needed from the user

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants