📝 Summary & Motivation
The FinishReason literal introduced in StreamingChunk restricts allowed values to a predefined set. This has introduced inconsistencies between the finish_reason values in streaming vs. non-streaming modes across multiple integrations.
These inconsistencies fall into two main categories, explained below.
Case 1: Inconsistency Between StreamingChunks and Final ChatMessage
In both streaming and non-streaming modes, the final ChatMessage typically carries the original finish_reason from the model API. However, the streamed StreamingChunks use a mapped FinishReason enum instead.
This creates a mismatch between what is streamed and what appears in the final message.
Additionally, this leads to inefficient workarounds like:
- We first map the API’s
finish_reason to a FinishReason enum (to comply with StreamingChunk requirements),
- And then map it back to the original string for the final
ChatMessage.
Affected Integrations:
AnthropicChatGenerator
OllamaChatGenerator
Case 2: Inconsistency Between Streaming and Non-Streaming Modes
In some integrations, the finish_reason differs depending on whether streaming is enabled or not, even within the same component.
For example, in GoogleGenAIChatGenerator:
Streaming Mode:
The finish_reason is mapped to a FinishReason enum. This used in streaming chunks and stored in the ChatMessage.
Reference
Non-Streaming Mode:
The model's original finish_reason string is passed directly into the final ChatMessage without mapping.
This results in inconsistencies in the ChatMessage format.
Affected Integrations:
GoogleGenAIChatGenerator
HuggingFaceAPIChatGenerator
Checklist
Tasks
📝 Summary & Motivation
The
FinishReasonliteral introduced inStreamingChunkrestricts allowed values to a predefined set. This has introduced inconsistencies between thefinish_reasonvalues in streaming vs. non-streaming modes across multiple integrations.These inconsistencies fall into two main categories, explained below.
Case 1: Inconsistency Between
StreamingChunksand FinalChatMessageIn both streaming and non-streaming modes, the final
ChatMessagetypically carries the originalfinish_reasonfrom the model API. However, the streamedStreamingChunksuse a mappedFinishReasonenum instead.This creates a mismatch between what is streamed and what appears in the final message.
Additionally, this leads to inefficient workarounds like:
finish_reasonto aFinishReasonenum (to comply with StreamingChunk requirements),ChatMessage.Affected Integrations:
AnthropicChatGeneratorOllamaChatGeneratorCase 2: Inconsistency Between Streaming and Non-Streaming Modes
In some integrations, the
finish_reasondiffers depending on whether streaming is enabled or not, even within the same component.For example, in
GoogleGenAIChatGenerator:Streaming Mode:
The
finish_reasonis mapped to aFinishReasonenum. This used in streaming chunks and stored in theChatMessage.Reference
Non-Streaming Mode:
The model's original
finish_reasonstring is passed directly into the finalChatMessagewithout mapping.This results in inconsistencies in the
ChatMessageformat.Affected Integrations:
GoogleGenAIChatGeneratorHuggingFaceAPIChatGeneratorChecklist
Tasks
mainbranch (Code + Docstrings)