feat: use reasoning field in StreamingChunk for Google GenAI#2900
feat: use reasoning field in StreamingChunk for Google GenAI#2900anakin87 merged 4 commits intodeepset-ai:mainfrom
Conversation
Populate StreamingChunk.reasoning with ReasoningContent instead of storing reasoning deltas as dicts in meta. Update aggregation to read from chunk.reasoning instead of chunk.meta["reasoning_deltas"].
480aa44 to
70af8fa
Compare
|
@Br1an67 thank you for the contribution! Please sign the Contributor License Agreement, then I'll review this PR. |
|
Hi @anakin87, thanks for the reminder! The CLA has been signed — the CLAassistant bot confirmed "All committers have signed the CLA" above. Ready for your review whenever you have a chance. 🙏 |
anakin87
left a comment
There was a problem hiding this comment.
Nice work.
I left a few small comments
| # Add thought signature deltas to meta if available (for multi-turn context) | ||
| if thought_signature_deltas: | ||
| meta["thought_signature_deltas"] = thought_signature_deltas | ||
|
|
There was a problem hiding this comment.
My impression is that we can use the extra field of ReasoningContent (code) to store thought_signature_deltas.
In my opinion, we did something similar in #2849 for Anthropic redacted thinking and thinking signature.
Since all this info is related to reasoning, I'd like to have it grouped into ReasoningContent.
But I haven't tried, and I don't know if this is simple to implement or if it would make the code much more complex. So please try and let me know.
There was a problem hiding this comment.
Thanks for the suggestion! I've moved thought_signature_deltas into ReasoningContent.extra for reasoning chunks, consistent with the Anthropic approach in #2849.
One caveat: StreamingChunk enforces mutual exclusivity between content and reasoning (raises ValueError in __post_init__), so for text/tool-call chunks that also carry thought signatures, the signatures still go in meta. The aggregation logic reads from both sources. This keeps all reasoning-related info grouped into ReasoningContent where possible without breaking the constraint.
There was a problem hiding this comment.
Seeing the code, I now realize that Thought Signatures can be included in non-reasoning response parts.
For this reason, I recommend going back to the previous version of your code where thought_signature_deltas are always stored in meta. This would be simpler and consistent.
I'd just add a comment on top of the code line explaining that this data can be part of non-reasoning content parts.
Makes sense?
There was a problem hiding this comment.
Makes sense! Reverted in bd3d479 — thought_signature_deltas are now always stored in meta with a comment explaining that thought signatures can appear in both reasoning and non-reasoning response parts.
| assert streaming_chunk.tool_calls[5].id is None | ||
| assert streaming_chunk.tool_calls[5].index == 5 | ||
|
|
||
| def test_convert_google_chunk_to_streaming_chunk_with_thought(self, monkeypatch): |
There was a problem hiding this comment.
Could you try using actual objects from Google API in this test?
See test_convert_google_chunk_to_streaming_chunk_real_example for an example
There was a problem hiding this comment.
Good call — updated the test to use actual types.Part, types.Content, types.Candidate, and types.GenerateContentResponse objects, following the pattern in test_convert_google_chunk_to_streaming_chunk_real_example.
….extra Store thought_signature_deltas in ReasoningContent.extra instead of StreamingChunk.meta when reasoning content is present, grouping all reasoning-related info into ReasoningContent. For text/tool-call chunks (where StreamingChunk mutual exclusivity prevents setting both content and reasoning), signatures remain in meta. The aggregation logic reads from both sources. Consistent with the Anthropic approach in PR deepset-ai#2849. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Mock objects with actual types.Part, types.Content, types.Candidate and types.GenerateContentResponse in the test_convert_google_chunk_to_streaming_chunk_with_thought test, following the pattern established in the existing real_example test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thought signatures can appear in both reasoning and non-reasoning response parts, so storing them consistently in meta is simpler than splitting between ReasoningContent.extra and meta.
Related Issues
reasoningfield inStreamingChunkhaystack#10478Proposed Changes:
Populate
StreamingChunk.reasoningwithReasoningContentinstead of storing reasoning deltas as dicts inmeta["reasoning_deltas"]. This aligns the Google GenAI integration with the standardStreamingChunk.reasoningfield, consistent with other integrations (e.g., Ollama in #2850).Changes:
_convert_google_chunk_to_streaming_chunk(): Collect reasoning text from thought parts into aReasoningContentobject and pass it via thereasoningkwarg instead ofmeta["reasoning_deltas"]_aggregate_streaming_chunks_with_reasoning(): Read fromchunk.reasoning.reasoning_textinstead ofchunk.meta["reasoning_deltas"]StreamingChunkmutual exclusivity constraint is respected (only one of content/tool_calls/reasoning set per chunk)How did you test it?
test_convert_google_chunk_to_streaming_chunk_with_thoughtto verify thought parts populateStreamingChunk.reasoninginstead of metatest_aggregate_streaming_chunks_with_reasoningto usechunk.reasoninginstead ofmeta["reasoning_deltas"]Notes for the reviewer
The
thought_signature_deltasremain inmetasince they are Google-specific metadata for multi-turn context preservation, not standard reasoning content.Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.