docs: Update streaming.mdx to cover tool follow-up retries and new in-stream error messages (PraisonAI PR #1538)

## Context

Upstream bug fix **[MervinPraison/PraisonAI#1538](https://github.com/MervinPraison/PraisonAI/pull/1538)** (closes [#1536](https://github.com/MervinPraison/PraisonAI/issues/1536)) — merged 2026-04-24 — changes user-observable behavior in the streaming path. The current docs at `docs/features/streaming.mdx` do not describe this behavior at all, so users will encounter new in-stream error strings and retry delays without any reference to point them at.

This is a **content update** (not a new page). The primary file to update is `docs/features/streaming.mdx`. No `docs/concepts/` changes and no `docs.json` navigation changes are required.

---

## What Changed in the SDK

**File:** `praisonaiagents/llm/llm.py` (PraisonAI repo — `src/praisonai-agents/praisonaiagents/llm/llm.py`)

**PR diff summary (12 additions, 3 deletions, 1 file):**

1. **Line ~3404 — follow-up response after tool execution in streaming**
   - Before: bare `litellm.completion(...)` — no retry, errors silently logged via `logging.error(...)` and swallowed; stream generator just ended. Users saw tools run but never received the final synthesised answer.
   - After: `self._completion_with_retry(...)` — the same retry-wrapped path used by the initial LLM call (exponential backoff, rate-limit handling via `_rate_limiter.acquire()` / `wait_for_retry()`).

2. **Line ~3421 — error surfacing**
   - Before: `logging.error(f"Follow-up response failed: {e}")` only.
   - After: structured log with a generated `error_ref` (`followup-<ms-timestamp>`) plus a **user-visible message yielded into the stream**:
     ```
     [Error: Failed to generate final response after tool execution (ref: followup-1713957912345). Please retry. If it continues, try reducing prompt size.]
     ```

3. **Line ~3440 — non-streaming fallback inside `get_response_stream`**
   - Before: bare `litellm.completion(...)`.
   - After: `self._completion_with_retry(...)` — same retry parity as the initial call.

**Net effect for users:**
- Streaming + tool-calling flows now survive transient 429s, 503s, and brief network blips on the follow-up call (previously: silent drop of the final answer).
- On persistent failure, an explicit error sentence now appears at the end of the stream with a `ref:` ID — users/ops will see this and should be told what it means and what to do.

---

## Why Docs Need Updating

The user-facing behavioral contract of `agent.start(..., stream=True)` and `agent.iter_stream(...)` has changed in two ways that a reader of the current `streaming.mdx` would not anticipate:

1. The page never explains that **streaming + tools is a two-phase flow** (initial stream → tool execution → follow-up stream). Users who see retry delays or the new error sentence have no reference to understand it.
2. The page's "Handle errors in callbacks" accordion currently says *"The emitter catches callback exceptions silently to avoid breaking the stream."* — this is now only half the story: the **LLM call itself** no longer fails silently, and may yield a visible `[Error: ...]` sentence into the stream on persistent failure.

---

## Requested Changes — `docs/features/streaming.mdx`

Please apply the following agent-centric, user-friendly updates. Keep the existing tone (concise, active voice, no forbidden phrases per `AGENTS.md`). Use the standard Mermaid colour scheme (#8B0000, #189AB4, #10B981, #F59E0B, #6366F1 with white text).

### 1. Add a new section: **"Streaming with Tools"**

Place it **after** the existing "Common Patterns" section and **before** "StreamEvent Protocol". The section should:

- Open with a one-sentence intro (e.g. *"When your agent uses tools, streaming happens in two phases: the initial response that decides to call tools, and a follow-up response that synthesises the tool results."*).
- Include a **Mermaid sequence diagram** showing: User → Agent → LLM (phase 1, streamed) → Tool(s) → LLM (phase 2 follow-up, streamed) → User. Use the standard colour palette.
- Include a minimal, copy-paste-runnable agent-centric code example — an `Agent` with one simple tool (e.g. `get_weather`) and `agent.start(..., stream=True)`. Imports must be `from praisonaiagents import Agent` (friendly, no deep submodule imports per `AGENTS.md` §6.1).
- Add a short prose note (one-two sentences) clarifying that **both** phases go through the same retry-wrapped LLM path, so transient rate-limit / network errors are retried automatically without the caller doing anything.

### 2. Add a new section: **"Error Handling in the Stream"**

Place it **after** "Streaming with Tools" and **before** "StreamEvent Protocol".

- One-sentence intro (e.g. *"If the LLM call fails after retries, the stream ends with a visible error sentence instead of silently dropping."*).
- Show the **exact** sentinel string the user may receive, verbatim:
  ```
  [Error: Failed to generate final response after tool execution (ref: followup-1713957912345). Please retry. If it continues, try reducing prompt size.]
  ```
- Explain each piece in a short table:

  | Part | Meaning |
  |------|---------|
  | `ref: followup-<timestamp>` | Correlation ID logged server-side — share this when reporting issues |
  | `Please retry` | Retries already ran internally; another attempt may succeed if the root cause was transient |
  | `reducing prompt size` | Common root cause is context-length or provider capacity errors |

- Show a minimal consumer-side pattern that detects the sentinel in `iter_stream(...)`:
  ```python
  from praisonaiagents import Agent

  agent = Agent(instructions="You are a helpful assistant", tools=[...])

  full = ""
  for chunk in agent.iter_stream("Research and summarise X"):
      full += chunk
      print(chunk, end="", flush=True)

  if "[Error:" in full and "ref:" in full:
      # Surface ref to your logs / retry externally
      ...
  ```
- End with a short `<Note>` explaining that the **initial** LLM call and the **follow-up** LLM call (after tool execution) now share the same retry and rate-limiting behavior — users no longer need to add their own retry wrapper around streaming + tools.

### 3. Update the existing accordion: **"Handle errors in callbacks"**

Current text is misleading given the new behavior. Rewrite it to something like:

> **Two layers of error handling.** Callback exceptions are still caught by the emitter to avoid breaking the stream — log them inside your callback. LLM call failures, however, are now retried automatically and, on persistent failure, surface as a visible `[Error: ... (ref: ...)]` sentence at the end of the stream — check for this sentinel when consuming `iter_stream()`.

### 4. Update the **"Troubleshooting"** section

Add one new entry:

> **"Stream ends with `[Error: Failed to generate final response after tool execution (ref: followup-...)]`"**
>
> The follow-up LLM call (the one that synthesises tool results into a final answer) failed after the built-in retries. Common causes:
> - Persistent rate limit — pair streaming with a [Rate Limiter](/docs/features/rate-limiter) at higher RPM, or back off the caller.
> - Context-length overflow — reduce conversation history or tool-result size.
> - Provider outage — include the `ref:` ID when reporting. The internal log line (`ref=..., model=..., error=...`) makes it searchable.

### 5. Update the "Related" CardGroup at the bottom

Add a third card linking to Rate Limiter, since retry + rate limiting are now explicitly coupled in the follow-up path:

```mdx
<Card title="Rate Limiter" icon="gauge" href="/docs/features/rate-limiter">
  Control request rates across initial and follow-up LLM calls
</Card>
```

Keep cols at 2 or bump to 3 — your call based on layout.

---

## Optional: Light Cross-Link in `docs/features/rate-limiter.mdx`

At the end of the "Overview" section of `docs/features/rate-limiter.mdx`, add one sentence:

> The rate limiter is shared by both the initial LLM call and the follow-up call that runs after tool execution in streaming mode — you don't need to configure them separately.

No other changes to that page.

---

## Files to Touch

| File | Change type |
|------|-------------|
| `docs/features/streaming.mdx` | **Update** — add two new sections, edit one accordion, add one troubleshooting entry, extend Related cards |
| `docs/features/rate-limiter.mdx` | **Optional update** — one-sentence cross-link |
| `docs.json` | **No change** — pages already registered |
| `docs/concepts/*` | **No change** — per AGENTS.md §1.8, AI agents must not edit `docs/concepts/` |

---

## SDK Source of Truth (for the doc-writing agent)

Before editing, the implementing agent should read the merged change to verify exact behavior:

- **PR:** https://github.com/MervinPraison/PraisonAI/pull/1538
- **File:** `praisonaiagents/llm/llm.py`
- **Key methods:**
  - `get_response_stream(...)` — the streaming entry point (follow-up call at ~line 3404, non-streaming fallback at ~line 3440)
  - `_completion_with_retry(...)` — the retry wrapper (around line 796)
  - `_call_with_retry(...)` — invokes `self._rate_limiter.acquire()` and exponential backoff via `wait_for_retry()`

The sync'd mirror inside this repo at `praisonaiagents/llm/llm.py` can be used as the reference (daily `update_repos.sh` / `.github/workflows/update-repos.yml`).

---

## Quality Checklist (per AGENTS.md §9)

The implementing agent must confirm:

- [ ] All new Mermaid diagrams use the standard palette (#8B0000 / #189AB4 / #10B981 / #F59E0B / #6366F1) with `color:#fff` and `stroke:#7C90A0`
- [ ] Code examples use `from praisonaiagents import Agent` (no deep submodule paths)
- [ ] Every example runs copy-paste without modification (no placeholder `your-key-here`)
- [ ] No forbidden phrases ("In this section...", "As you can see...", "Let's take a look at...")
- [ ] The exact error sentinel string is reproduced verbatim, including `(ref: followup-<timestamp>)`
- [ ] No files created/edited under `docs/concepts/`
- [ ] `docs.json` remains valid JSON (only touch if navigation actually changes)

---

## Summary

This is a **targeted content update**, not a new page:
- **Primary:** extend `docs/features/streaming.mdx` with two new sections (Streaming with Tools, Error Handling in the Stream), one rewritten accordion, one troubleshooting entry, one extra Related card.
- **Optional:** one-sentence cross-link in `docs/features/rate-limiter.mdx`.
- **Goal:** users who hit the new `[Error: ... ref: followup-...]` sentinel or notice retry delays on streaming-with-tools can find an authoritative explanation in the docs.

File	Change type
`docs/features/streaming.mdx`	Update — add two new sections, edit one accordion, add one troubleshooting entry, extend Related cards
`docs/features/rate-limiter.mdx`	Optional update — one-sentence cross-link
`docs.json`	No change — pages already registered
`docs/concepts/*`	No change — per AGENTS.md §1.8, AI agents must not edit `docs/concepts/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Update streaming.mdx to cover tool follow-up retries and new in-stream error messages (PraisonAI PR #1538) #247

Context

What Changed in the SDK

Why Docs Need Updating

Requested Changes — `docs/features/streaming.mdx`

1. Add a new section: "Streaming with Tools"

2. Add a new section: "Error Handling in the Stream"

3. Update the existing accordion: "Handle errors in callbacks"

4. Update the "Troubleshooting" section

5. Update the "Related" CardGroup at the bottom

Optional: Light Cross-Link in `docs/features/rate-limiter.mdx`

Files to Touch

SDK Source of Truth (for the doc-writing agent)

Quality Checklist (per AGENTS.md §9)

Summary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Part	Meaning
`ref: followup-<timestamp>`	Correlation ID logged server-side — share this when reporting issues
`Please retry`	Retries already ran internally; another attempt may succeed if the root cause was transient
`reducing prompt size`	Common root cause is context-length or provider capacity errors

docs: Update streaming.mdx to cover tool follow-up retries and new in-stream error messages (PraisonAI PR #1538) #247

Description

Context

What Changed in the SDK

Why Docs Need Updating

Requested Changes — docs/features/streaming.mdx

1. Add a new section: "Streaming with Tools"

2. Add a new section: "Error Handling in the Stream"

3. Update the existing accordion: "Handle errors in callbacks"

4. Update the "Troubleshooting" section

5. Update the "Related" CardGroup at the bottom

Optional: Light Cross-Link in docs/features/rate-limiter.mdx

Files to Touch

SDK Source of Truth (for the doc-writing agent)

Quality Checklist (per AGENTS.md §9)

Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Requested Changes — `docs/features/streaming.mdx`

Optional: Light Cross-Link in `docs/features/rate-limiter.mdx`