Skip to content

docs: Document built-in Tool Circuit Breaker and update Model Failover for PR #1539 wiring#256

Merged
MervinPraison merged 1 commit intomainfrom
claude/issue-250-20260424-1015
Apr 28, 2026
Merged

docs: Document built-in Tool Circuit Breaker and update Model Failover for PR #1539 wiring#256
MervinPraison merged 1 commit intomainfrom
claude/issue-250-20260424-1015

Conversation

@MervinPraison
Copy link
Copy Markdown
Owner

Fixes #250

This PR implements comprehensive documentation for the Tool Circuit Breaker and Model Failover features that were wired in PR #1539.

Changes

New Documentation

  • docs/features/tool-circuit-breaker.mdx - Complete documentation for the built-in tool circuit breaker
    • Agent-centric Quick Start showing zero-config operation
    • Hero Mermaid diagram with standard color palette
    • State diagram for CLOSED/OPEN/HALF_OPEN transitions
    • Configuration options table matching actual SDK defaults
    • Decision tree diagram for tuning guidance
    • Common patterns with Tabs component
    • Best practices with AccordionGroup component

Updated Documentation

  • docs/features/model-failover.mdx - Fixed and enhanced existing failover docs
    • FIXED: Replaced incorrect Agent(failover=manager) with proper LLM(failover_manager=manager) pattern
    • ADDED: New section "How failover activates during retries" with sequence diagram
    • CORRECTED: AuthProfile and FailoverConfig fields to match actual SDK implementation
    • UPDATED: Related CardGroup to include new Tool Circuit Breaker page

Cross-Links Added

  • Added Note boxes to 4 best-practices pages linking to built-in circuit breaker
  • Updated circuit breaker config examples in llm-config.mdx and tool-config.mdx

Navigation

  • Added docs/features/tool-circuit-breaker to Features > Integration & Infrastructure group in docs.json
  • Validated JSON syntax

Generated with Claude Code

…r for PR #1539 wiring (fixes #250)

- Create comprehensive tool-circuit-breaker.mdx following AGENTS.md template
- Fix incorrect Quick Start example in model-failover.mdx (Agent doesn't have failover param)
- Add new section on failover activation during retries with sequence diagram
- Correct AuthProfile and FailoverConfig fields to match actual SDK
- Add cross-links to built-in circuit breaker from 4 best-practices pages
- Update circuit breaker config examples in llm-config.mdx and tool-config.mdx
- Add tool-circuit-breaker to Features > Integration & Infrastructure group in docs.json

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 10:22
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@MervinPraison has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 51 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 51 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13e0cd9e-f9a8-4699-8976-db93545ff267

📥 Commits

Reviewing files that changed from the base of the PR and between 9480b78 and a78ed5c.

📒 Files selected for processing (9)
  • docs.json
  • docs/best-practices/agent-retry-strategies.mdx
  • docs/best-practices/error-handling.mdx
  • docs/best-practices/graceful-degradation.mdx
  • docs/best-practices/task-orchestration.mdx
  • docs/configuration/llm-config.mdx
  • docs/configuration/tool-config.mdx
  • docs/features/model-failover.mdx
  • docs/features/tool-circuit-breaker.mdx
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/issue-250-20260424-1015

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for the new 'Tool Circuit Breaker' feature and significantly updates the 'Model Failover' documentation. The changes include a new feature page, updated configuration guides, and sequence diagrams explaining the failover and circuit breaker logic. However, the review feedback highlights several critical discrepancies where the documentation describes automatic behaviors (such as automatic failover integration and tool circuit breaker wiring) or configuration options (like 'rotate_on_success') that are reportedly missing or inconsistent in the underlying SDK implementation. Additionally, there are logic mismatches between the documented state transitions and the actual code behavior for the circuit breaker's half-open state and graceful degradation settings.

Comment on lines +135 to +139
- On every LLM call, the system first gets the current profile via `get_next_profile()` and applies its `api_key`, `base_url`, and `model` settings
- On success, `mark_success(profile)` is called to track the working provider
- On failure, `mark_failure(profile, error, is_rate_limit=...)` marks the provider as failed, then `get_next_profile()` fetches the next available provider
- Profile switching **overrides** non-retryable classification—one extra attempt is always granted after switching providers
- The LLM automatically updates request parameters (api_key, base_url, model) when switching between profiles
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation describes an automatic failover mechanism integrated into every LLM call, including the claim that profile switching overrides non-retryable classifications. However, the provided LLM class implementation in praisonaiagents/llm/llm.py does not appear to use the FailoverManager or implement this logic in its request methods (e.g., get_response, _call_with_retry). This results in a significant discrepancy between the documented features and the actual SDK implementation.

Comment on lines +40 to +49
Circuit breaker protection is automatically enabled for every tool call with zero configuration needed.

```python
from praisonaiagents import Agent

agent = Agent(
name="Researcher",
instructions="Research the topic",
tools=[my_tool], # Circuit breaker protects my_tool automatically
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation states that circuit breaker protection is "automatically enabled for every tool call with zero configuration needed." However, the provided implementation of tool execution in praisonaiagents/agent/tool_execution.py and praisonaiagents/llm/llm.py does not show any integration with the CircuitBreaker class. If this wiring is intended to be automatic, it should be implemented within the framework's tool execution flow (e.g., in ToolCallExecutor or by wrapping the tool execution function).

| `failover_on_timeout` | `bool` | `True` | Failover on timeout |
| `cooldown_on_rate_limit` | `float` | `60.0` | Rate limit cooldown (seconds) |
| `cooldown_on_error` | `float` | `30.0` | Error cooldown (seconds) |
| `rotate_on_success` | `bool` | `False` | Rotate profiles on success |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The rotate_on_success configuration option is documented here, but it is not currently implemented in the FailoverManager class in praisonaiagents/llm/failover.py. The manager always returns the first available profile in priority order without considering rotation upon successful calls. Please ensure the implementation is updated to support this feature.

CLOSED --> OPEN : 5 failures
OPEN --> HALF_OPEN : 60s elapsed
HALF_OPEN --> CLOSED : 2 successes
HALF_OPEN --> OPEN : failure
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The state diagram indicates that a single failure in the HALF_OPEN state trips the circuit back to OPEN. However, the implementation in praisonaiagents/tools/circuit_breaker.py (lines 323-325) requires the failure_threshold to be met even in the HALF_OPEN state. This means if the monitor_window has expired, the circuit might incorrectly remain in HALF_OPEN after a failure. Please ensure the implementation matches the standard circuit breaker behavior documented here.

| `monitor_window` | `float` | `300.0` | Failure-rate window |
| `enable_health_check` | `bool` | `True` | Periodic health checks |
| `health_check_interval` | `float` | `30.0` | Health-check interval |
| `graceful_degradation` | `bool` | `True` | Return error dict instead of raising |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for graceful_degradation states it will "Return error dict instead of raising", but the implementation in CircuitBreaker.call (in praisonaiagents/tools/circuit_breaker.py) still raises a CircuitBreakerException if no fallback function is provided, even if graceful_degradation is set to True. Consider updating the implementation to return a standardized error dictionary when no fallback is available and this option is enabled, matching the examples provided earlier in the document.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@MervinPraison MervinPraison merged commit ecbbb0a into main Apr 28, 2026
30 checks passed
@MervinPraison MervinPraison deleted the claude/issue-250-20260424-1015 branch April 28, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: Document built-in Tool Circuit Breaker and update Model Failover for PR #1539 wiring

2 participants