Skip to content

fix: add stream parameter to disable streaming for custom LLMs#701

Merged
MervinPraison merged 1 commit intomainfrom
claude/issue-615-20250627_112613
Jun 27, 2025
Merged

fix: add stream parameter to disable streaming for custom LLMs#701
MervinPraison merged 1 commit intomainfrom
claude/issue-615-20250627_112613

Conversation

@MervinPraison
Copy link
Copy Markdown
Owner

@MervinPraison MervinPraison commented Jun 27, 2025

Summary

  • Add configurable stream parameter to Agent and LLM classes
  • Replace hardcoded stream=True values with configurable parameter
  • Maintains backward compatibility with default stream=True

Test plan

  • Test agent creation with stream=False
  • Verify backward compatibility (default behavior unchanged)
  • Test with LLM providers that don't support streaming

Fixes #615

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added an option to enable or disable streaming of language model responses in chat and response generation features. Users can now choose whether to receive responses as a stream or all at once.

- Add stream parameter to Agent class with default True
- Add stream parameter to LLM get_response and get_response_async methods
- Replace hardcoded stream=True values with configurable stream parameter
- Maintains backward compatibility while allowing stream disabling

Fixes #615

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 27, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

A new stream boolean parameter has been added to the Agent class and relevant LLM methods, allowing users to enable or disable streaming of language model responses. The parameter is consistently propagated through constructors and method calls, replacing previously hardcoded streaming behavior.

Changes

File(s) Change Summary
src/praisonai-agents/praisonaiagents/agent/agent.py Added stream parameter to Agent constructor and propagated it through methods for streaming control.
src/praisonai-agents/praisonaiagents/llm/llm.py Added stream parameter to get_response and get_response_async; replaced hardcoded streaming flags.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Agent
    participant LLM

    User->>Agent: Instantiate Agent(stream=False)
    User->>Agent: Call chat(prompt)
    Agent->>LLM: get_response(..., stream=False)
    LLM-->>Agent: Response (non-streamed)
    Agent-->>User: Final response
Loading

Assessment against linked issues

Objective (Issue #) Addressed Explanation
Allow disabling stream response in Agent and LLM (#615)

Possibly related PRs

  • Develop #448: Adds a stream parameter to the chat method and propagates it to _chat_completion, related to streaming control in the Agent class.

Poem

A toggle for streaming, now in your hand,
With stream=False, your models will stand.
Responses arrive all at once, not in a stream,
For agents and LLMs, it's now a dream.
🐇✨
Code hops forward, as smooth as can be,
Streaming or not, the choice is now free!


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5884ee9 and 5743127.

📒 Files selected for processing (2)
  • src/praisonai-agents/praisonaiagents/agent/agent.py (7 hunks)
  • src/praisonai-agents/praisonaiagents/llm/llm.py (17 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@MervinPraison MervinPraison merged commit 6237dc5 into main Jun 27, 2025
9 of 10 checks passed
@MervinPraison MervinPraison deleted the claude/issue-615-20250627_112613 branch June 27, 2025 11:50
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @MervinPraison, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the praisonai-agents framework by introducing a configurable stream parameter. This allows users to control whether responses from Large Language Models are streamed or returned as a complete block, which is particularly beneficial for compatibility with diverse LLM providers and for scenarios where streaming is not desired. The changes are implemented to ensure seamless backward compatibility.

Highlights

  • Configurable Streaming: Introduced a new stream boolean parameter to the Agent class constructor and the LLM's get_response and get_response_async methods. This parameter allows users to explicitly enable or disable streaming responses from the underlying Large Language Models.
  • Backward Compatibility: The newly added stream parameter defaults to True in all relevant constructors and method signatures, ensuring that existing codebases continue to function without requiring changes and maintaining the default streaming behavior.
  • Enhanced LLM Integration Flexibility: Replaced hardcoded stream=True values with the configurable stream parameter across various internal calls within the Agent's _chat_completion and chat methods, and the LLM's get_response and get_response_async methods. This provides greater flexibility, especially for integrating with custom LLM providers that may not support streaming.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable stream parameter to the Agent and LLM classes, which is a great addition for handling LLMs that do not support streaming. The implementation is mostly solid, particularly in llm.py.

My main feedback points are in agent.py. There's a high-severity backward compatibility issue in the Agent.chat method where the method-level stream parameter is now ignored, which could cause unexpected behavior for existing users. Additionally, there's an opportunity to refactor a redundant if/else block in _chat_completion to improve code maintainability. Addressing these issues will make this a very strong contribution.

)

response = self._chat_completion(messages, temperature=temperature, tools=tools if tools else None, reasoning_steps=reasoning_steps, stream=stream)
response = self._chat_completion(messages, temperature=temperature, tools=tools if tools else None, reasoning_steps=reasoning_steps, stream=self.stream)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

By changing this to stream=self.stream, the stream parameter in the chat method's signature (line 1127) is now ignored. This breaks backward compatibility for any user who was calling agent.chat(..., stream=False), as their setting would be overridden by the agent's self.stream attribute.

To fix this and maintain the ability to override the stream setting per-call, you should use the stream parameter from the method. If the intent is for self.stream to be a default, the chat method's signature and logic should be updated to reflect that (e.g., by setting the default to None and then choosing self.stream if it's None).

logging.debug(f"{self.name} reflection count {reflection_count + 1}, continuing reflection process")
messages.append({"role": "user", "content": "Now regenerate your response using the reflection you made"})
response = self._chat_completion(messages, temperature=temperature, tools=None, stream=stream)
response = self._chat_completion(messages, temperature=temperature, tools=None, stream=self.stream)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous call in this method, using self.stream here ignores the stream parameter passed to the chat method. This is inconsistent with the method's signature and breaks backward compatibility for per-call overrides.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Agent Chat Stream Parameter Ignored

The Agent.chat method's stream parameter is ignored when using the standard OpenAI client path, as the internal _chat_completion call incorrectly uses self.stream instead of the method's stream argument. This prevents overriding streaming behavior on a per-call basis. This also creates inconsistent behavior, as custom LLM instances correctly respect the stream parameter.

src/praisonai-agents/praisonaiagents/agent/agent.py#L1282-L1378

response = self._chat_completion(messages, temperature=temperature, tools=tools if tools else None, reasoning_steps=reasoning_steps, stream=self.stream)
if not response:
return None
response_text = response.choices[0].message.content.strip()
# Handle output_json or output_pydantic if specified
if output_json or output_pydantic:
# Add to chat history and return raw response
self.chat_history.append({"role": "user", "content": original_prompt})
self.chat_history.append({"role": "assistant", "content": response_text})
if self.verbose:
display_interaction(original_prompt, response_text, markdown=self.markdown,
generation_time=time.time() - start_time, console=self.console)
return response_text
if not self.self_reflect:
self.chat_history.append({"role": "user", "content": original_prompt})
self.chat_history.append({"role": "assistant", "content": response_text})
if self.verbose:
logging.debug(f"Agent {self.name} final response: {response_text}")
display_interaction(original_prompt, response_text, markdown=self.markdown, generation_time=time.time() - start_time, console=self.console)
# Return only reasoning content if reasoning_steps is True
if reasoning_steps and hasattr(response.choices[0].message, 'reasoning_content'):
# Apply guardrail to reasoning content
try:
validated_reasoning = self._apply_guardrail_with_retry(response.choices[0].message.reasoning_content, original_prompt, temperature, tools)
return validated_reasoning
except Exception as e:
logging.error(f"Agent {self.name}: Guardrail validation failed for reasoning content: {e}")
return None
# Apply guardrail to regular response
try:
validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools)
return validated_response
except Exception as e:
logging.error(f"Agent {self.name}: Guardrail validation failed: {e}")
return None
reflection_prompt = f"""
Reflect on your previous response: '{response_text}'.
{self.reflect_prompt if self.reflect_prompt else "Identify any flaws, improvements, or actions."}
Provide a "satisfactory" status ('yes' or 'no').
Output MUST be JSON with 'reflection' and 'satisfactory'.
"""
logging.debug(f"{self.name} reflection attempt {reflection_count+1}, sending prompt: {reflection_prompt}")
messages.append({"role": "user", "content": reflection_prompt})
try:
reflection_response = client.beta.chat.completions.parse(
model=self.reflect_llm if self.reflect_llm else self.llm,
messages=messages,
temperature=temperature,
response_format=ReflectionOutput
)
reflection_output = reflection_response.choices[0].message.parsed
if self.verbose:
display_self_reflection(f"Agent {self.name} self reflection (using {self.reflect_llm if self.reflect_llm else self.llm}): reflection='{reflection_output.reflection}' satisfactory='{reflection_output.satisfactory}'", console=self.console)
messages.append({"role": "assistant", "content": f"Self Reflection: {reflection_output.reflection} Satisfactory?: {reflection_output.satisfactory}"})
# Only consider satisfactory after minimum reflections
if reflection_output.satisfactory == "yes" and reflection_count >= self.min_reflect - 1:
if self.verbose:
display_self_reflection("Agent marked the response as satisfactory after meeting minimum reflections", console=self.console)
self.chat_history.append({"role": "user", "content": prompt})
self.chat_history.append({"role": "assistant", "content": response_text})
display_interaction(prompt, response_text, markdown=self.markdown, generation_time=time.time() - start_time, console=self.console)
# Apply guardrail validation after satisfactory reflection
try:
validated_response = self._apply_guardrail_with_retry(response_text, prompt, temperature, tools)
return validated_response
except Exception as e:
logging.error(f"Agent {self.name}: Guardrail validation failed after reflection: {e}")
return None
# Check if we've hit max reflections
if reflection_count >= self.max_reflect - 1:
if self.verbose:
display_self_reflection("Maximum reflection count reached, returning current response", console=self.console)
self.chat_history.append({"role": "user", "content": prompt})
self.chat_history.append({"role": "assistant", "content": response_text})
display_interaction(prompt, response_text, markdown=self.markdown, generation_time=time.time() - start_time, console=self.console)
# Apply guardrail validation after max reflections
try:
validated_response = self._apply_guardrail_with_retry(response_text, prompt, temperature, tools)
return validated_response
except Exception as e:
logging.error(f"Agent {self.name}: Guardrail validation failed after max reflections: {e}")
return None
logging.debug(f"{self.name} reflection count {reflection_count + 1}, continuing reflection process")
messages.append({"role": "user", "content": "Now regenerate your response using the reflection you made"})
response = self._chat_completion(messages, temperature=temperature, tools=None, stream=self.stream)

Fix in Cursor


BugBot free trial expires on July 22, 2025
You have used $0.00 of your $50.00 spend limit so far. Manage your spend limit in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable Stream response

1 participant