You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upstream bug fix MervinPraison/PraisonAI#1538 (closes #1536) — merged 2026-04-24 — changes user-observable behavior in the streaming path. The current docs at docs/features/streaming.mdx do not describe this behavior at all, so users will encounter new in-stream error strings and retry delays without any reference to point them at.
This is a content update (not a new page). The primary file to update is docs/features/streaming.mdx. No docs/concepts/ changes and no docs.json navigation changes are required.
Line ~3404 — follow-up response after tool execution in streaming
Before: bare litellm.completion(...) — no retry, errors silently logged via logging.error(...) and swallowed; stream generator just ended. Users saw tools run but never received the final synthesised answer.
After: self._completion_with_retry(...) — the same retry-wrapped path used by the initial LLM call (exponential backoff, rate-limit handling via _rate_limiter.acquire() / wait_for_retry()).
After: structured log with a generated error_ref (followup-<ms-timestamp>) plus a user-visible message yielded into the stream:
[Error: Failed to generate final response after tool execution (ref: followup-1713957912345). Please retry. If it continues, try reducing prompt size.]
Line ~3440 — non-streaming fallback inside get_response_stream
Before: bare litellm.completion(...).
After: self._completion_with_retry(...) — same retry parity as the initial call.
Net effect for users:
Streaming + tool-calling flows now survive transient 429s, 503s, and brief network blips on the follow-up call (previously: silent drop of the final answer).
On persistent failure, an explicit error sentence now appears at the end of the stream with a ref: ID — users/ops will see this and should be told what it means and what to do.
Why Docs Need Updating
The user-facing behavioral contract of agent.start(..., stream=True) and agent.iter_stream(...) has changed in two ways that a reader of the current streaming.mdx would not anticipate:
The page never explains that streaming + tools is a two-phase flow (initial stream → tool execution → follow-up stream). Users who see retry delays or the new error sentence have no reference to understand it.
The page's "Handle errors in callbacks" accordion currently says "The emitter catches callback exceptions silently to avoid breaking the stream." — this is now only half the story: the LLM call itself no longer fails silently, and may yield a visible [Error: ...] sentence into the stream on persistent failure.
Requested Changes — docs/features/streaming.mdx
Please apply the following agent-centric, user-friendly updates. Keep the existing tone (concise, active voice, no forbidden phrases per AGENTS.md). Use the standard Mermaid colour scheme (#8B0000, #189AB4, #10B981, #F59E0B, #6366F1 with white text).
1. Add a new section: "Streaming with Tools"
Place it after the existing "Common Patterns" section and before "StreamEvent Protocol". The section should:
Open with a one-sentence intro (e.g. "When your agent uses tools, streaming happens in two phases: the initial response that decides to call tools, and a follow-up response that synthesises the tool results.").
Include a Mermaid sequence diagram showing: User → Agent → LLM (phase 1, streamed) → Tool(s) → LLM (phase 2 follow-up, streamed) → User. Use the standard colour palette.
Include a minimal, copy-paste-runnable agent-centric code example — an Agent with one simple tool (e.g. get_weather) and agent.start(..., stream=True). Imports must be from praisonaiagents import Agent (friendly, no deep submodule imports per AGENTS.md §6.1).
Add a short prose note (one-two sentences) clarifying that both phases go through the same retry-wrapped LLM path, so transient rate-limit / network errors are retried automatically without the caller doing anything.
2. Add a new section: "Error Handling in the Stream"
Place it after "Streaming with Tools" and before "StreamEvent Protocol".
One-sentence intro (e.g. "If the LLM call fails after retries, the stream ends with a visible error sentence instead of silently dropping.").
Show the exact sentinel string the user may receive, verbatim:
[Error: Failed to generate final response after tool execution (ref: followup-1713957912345). Please retry. If it continues, try reducing prompt size.]
Explain each piece in a short table:
Part
Meaning
ref: followup-<timestamp>
Correlation ID logged server-side — share this when reporting issues
Please retry
Retries already ran internally; another attempt may succeed if the root cause was transient
reducing prompt size
Common root cause is context-length or provider capacity errors
Show a minimal consumer-side pattern that detects the sentinel in iter_stream(...):
frompraisonaiagentsimportAgentagent=Agent(instructions="You are a helpful assistant", tools=[...])
full=""forchunkinagent.iter_stream("Research and summarise X"):
full+=chunkprint(chunk, end="", flush=True)
if"[Error:"infulland"ref:"infull:
# Surface ref to your logs / retry externally
...
End with a short <Note> explaining that the initial LLM call and the follow-up LLM call (after tool execution) now share the same retry and rate-limiting behavior — users no longer need to add their own retry wrapper around streaming + tools.
3. Update the existing accordion: "Handle errors in callbacks"
Current text is misleading given the new behavior. Rewrite it to something like:
Two layers of error handling. Callback exceptions are still caught by the emitter to avoid breaking the stream — log them inside your callback. LLM call failures, however, are now retried automatically and, on persistent failure, surface as a visible [Error: ... (ref: ...)] sentence at the end of the stream — check for this sentinel when consuming iter_stream().
4. Update the "Troubleshooting" section
Add one new entry:
"Stream ends with [Error: Failed to generate final response after tool execution (ref: followup-...)]"
The follow-up LLM call (the one that synthesises tool results into a final answer) failed after the built-in retries. Common causes:
Persistent rate limit — pair streaming with a Rate Limiter at higher RPM, or back off the caller.
Context-length overflow — reduce conversation history or tool-result size.
Provider outage — include the ref: ID when reporting. The internal log line (ref=..., model=..., error=...) makes it searchable.
5. Update the "Related" CardGroup at the bottom
Add a third card linking to Rate Limiter, since retry + rate limiting are now explicitly coupled in the follow-up path:
<Cardtitle="Rate Limiter"icon="gauge"href="/docs/features/rate-limiter">
Control request rates across initial and follow-up LLM calls
</Card>
Keep cols at 2 or bump to 3 — your call based on layout.
Optional: Light Cross-Link in docs/features/rate-limiter.mdx
At the end of the "Overview" section of docs/features/rate-limiter.mdx, add one sentence:
The rate limiter is shared by both the initial LLM call and the follow-up call that runs after tool execution in streaming mode — you don't need to configure them separately.
No other changes to that page.
Files to Touch
File
Change type
docs/features/streaming.mdx
Update — add two new sections, edit one accordion, add one troubleshooting entry, extend Related cards
docs/features/rate-limiter.mdx
Optional update — one-sentence cross-link
docs.json
No change — pages already registered
docs/concepts/*
No change — per AGENTS.md §1.8, AI agents must not edit docs/concepts/
SDK Source of Truth (for the doc-writing agent)
Before editing, the implementing agent should read the merged change to verify exact behavior:
get_response_stream(...) — the streaming entry point (follow-up call at ~line 3404, non-streaming fallback at ~line 3440)
_completion_with_retry(...) — the retry wrapper (around line 796)
_call_with_retry(...) — invokes self._rate_limiter.acquire() and exponential backoff via wait_for_retry()
The sync'd mirror inside this repo at praisonaiagents/llm/llm.py can be used as the reference (daily update_repos.sh / .github/workflows/update-repos.yml).
Quality Checklist (per AGENTS.md §9)
The implementing agent must confirm:
All new Mermaid diagrams use the standard palette (#8B0000 / #189AB4 / #10B981 / #F59E0B / #6366F1) with color:#fff and stroke:#7C90A0
Code examples use from praisonaiagents import Agent (no deep submodule paths)
Every example runs copy-paste without modification (no placeholder your-key-here)
No forbidden phrases ("In this section...", "As you can see...", "Let's take a look at...")
The exact error sentinel string is reproduced verbatim, including (ref: followup-<timestamp>)
No files created/edited under docs/concepts/
docs.json remains valid JSON (only touch if navigation actually changes)
Summary
This is a targeted content update, not a new page:
Primary: extend docs/features/streaming.mdx with two new sections (Streaming with Tools, Error Handling in the Stream), one rewritten accordion, one troubleshooting entry, one extra Related card.
Optional: one-sentence cross-link in docs/features/rate-limiter.mdx.
Goal: users who hit the new [Error: ... ref: followup-...] sentinel or notice retry delays on streaming-with-tools can find an authoritative explanation in the docs.
Context
Upstream bug fix MervinPraison/PraisonAI#1538 (closes #1536) — merged 2026-04-24 — changes user-observable behavior in the streaming path. The current docs at
docs/features/streaming.mdxdo not describe this behavior at all, so users will encounter new in-stream error strings and retry delays without any reference to point them at.This is a content update (not a new page). The primary file to update is
docs/features/streaming.mdx. Nodocs/concepts/changes and nodocs.jsonnavigation changes are required.What Changed in the SDK
File:
praisonaiagents/llm/llm.py(PraisonAI repo —src/praisonai-agents/praisonaiagents/llm/llm.py)PR diff summary (12 additions, 3 deletions, 1 file):
Line ~3404 — follow-up response after tool execution in streaming
litellm.completion(...)— no retry, errors silently logged vialogging.error(...)and swallowed; stream generator just ended. Users saw tools run but never received the final synthesised answer.self._completion_with_retry(...)— the same retry-wrapped path used by the initial LLM call (exponential backoff, rate-limit handling via_rate_limiter.acquire()/wait_for_retry()).Line ~3421 — error surfacing
logging.error(f"Follow-up response failed: {e}")only.error_ref(followup-<ms-timestamp>) plus a user-visible message yielded into the stream:Line ~3440 — non-streaming fallback inside
get_response_streamlitellm.completion(...).self._completion_with_retry(...)— same retry parity as the initial call.Net effect for users:
ref:ID — users/ops will see this and should be told what it means and what to do.Why Docs Need Updating
The user-facing behavioral contract of
agent.start(..., stream=True)andagent.iter_stream(...)has changed in two ways that a reader of the currentstreaming.mdxwould not anticipate:[Error: ...]sentence into the stream on persistent failure.Requested Changes —
docs/features/streaming.mdxPlease apply the following agent-centric, user-friendly updates. Keep the existing tone (concise, active voice, no forbidden phrases per
AGENTS.md). Use the standard Mermaid colour scheme (#8B0000, #189AB4, #10B981, #F59E0B, #6366F1 with white text).1. Add a new section: "Streaming with Tools"
Place it after the existing "Common Patterns" section and before "StreamEvent Protocol". The section should:
Agentwith one simple tool (e.g.get_weather) andagent.start(..., stream=True). Imports must befrom praisonaiagents import Agent(friendly, no deep submodule imports perAGENTS.md§6.1).2. Add a new section: "Error Handling in the Stream"
Place it after "Streaming with Tools" and before "StreamEvent Protocol".
One-sentence intro (e.g. "If the LLM call fails after retries, the stream ends with a visible error sentence instead of silently dropping.").
Show the exact sentinel string the user may receive, verbatim:
Explain each piece in a short table:
ref: followup-<timestamp>Please retryreducing prompt sizeShow a minimal consumer-side pattern that detects the sentinel in
iter_stream(...):End with a short
<Note>explaining that the initial LLM call and the follow-up LLM call (after tool execution) now share the same retry and rate-limiting behavior — users no longer need to add their own retry wrapper around streaming + tools.3. Update the existing accordion: "Handle errors in callbacks"
Current text is misleading given the new behavior. Rewrite it to something like:
4. Update the "Troubleshooting" section
Add one new entry:
5. Update the "Related" CardGroup at the bottom
Add a third card linking to Rate Limiter, since retry + rate limiting are now explicitly coupled in the follow-up path:
Keep cols at 2 or bump to 3 — your call based on layout.
Optional: Light Cross-Link in
docs/features/rate-limiter.mdxAt the end of the "Overview" section of
docs/features/rate-limiter.mdx, add one sentence:No other changes to that page.
Files to Touch
docs/features/streaming.mdxdocs/features/rate-limiter.mdxdocs.jsondocs/concepts/*docs/concepts/SDK Source of Truth (for the doc-writing agent)
Before editing, the implementing agent should read the merged change to verify exact behavior:
praisonaiagents/llm/llm.pyget_response_stream(...)— the streaming entry point (follow-up call at ~line 3404, non-streaming fallback at ~line 3440)_completion_with_retry(...)— the retry wrapper (around line 796)_call_with_retry(...)— invokesself._rate_limiter.acquire()and exponential backoff viawait_for_retry()The sync'd mirror inside this repo at
praisonaiagents/llm/llm.pycan be used as the reference (dailyupdate_repos.sh/.github/workflows/update-repos.yml).Quality Checklist (per AGENTS.md §9)
The implementing agent must confirm:
color:#fffandstroke:#7C90A0from praisonaiagents import Agent(no deep submodule paths)your-key-here)(ref: followup-<timestamp>)docs/concepts/docs.jsonremains valid JSON (only touch if navigation actually changes)Summary
This is a targeted content update, not a new page:
docs/features/streaming.mdxwith two new sections (Streaming with Tools, Error Handling in the Stream), one rewritten accordion, one troubleshooting entry, one extra Related card.docs/features/rate-limiter.mdx.[Error: ... ref: followup-...]sentinel or notice retry delays on streaming-with-tools can find an authoritative explanation in the docs.