docs: Document built-in Tool Circuit Breaker and update Model Failover for PR #1539 wiring#256
Conversation
…r for PR #1539 wiring (fixes #250) - Create comprehensive tool-circuit-breaker.mdx following AGENTS.md template - Fix incorrect Quick Start example in model-failover.mdx (Agent doesn't have failover param) - Add new section on failover activation during retries with sequence diagram - Correct AuthProfile and FailoverConfig fields to match actual SDK - Add cross-links to built-in circuit breaker from 4 best-practices pages - Update circuit breaker config examples in llm-config.mdx and tool-config.mdx - Add tool-circuit-breaker to Features > Integration & Infrastructure group in docs.json Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 51 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive documentation for the new 'Tool Circuit Breaker' feature and significantly updates the 'Model Failover' documentation. The changes include a new feature page, updated configuration guides, and sequence diagrams explaining the failover and circuit breaker logic. However, the review feedback highlights several critical discrepancies where the documentation describes automatic behaviors (such as automatic failover integration and tool circuit breaker wiring) or configuration options (like 'rotate_on_success') that are reportedly missing or inconsistent in the underlying SDK implementation. Additionally, there are logic mismatches between the documented state transitions and the actual code behavior for the circuit breaker's half-open state and graceful degradation settings.
| - On every LLM call, the system first gets the current profile via `get_next_profile()` and applies its `api_key`, `base_url`, and `model` settings | ||
| - On success, `mark_success(profile)` is called to track the working provider | ||
| - On failure, `mark_failure(profile, error, is_rate_limit=...)` marks the provider as failed, then `get_next_profile()` fetches the next available provider | ||
| - Profile switching **overrides** non-retryable classification—one extra attempt is always granted after switching providers | ||
| - The LLM automatically updates request parameters (api_key, base_url, model) when switching between profiles |
There was a problem hiding this comment.
The documentation describes an automatic failover mechanism integrated into every LLM call, including the claim that profile switching overrides non-retryable classifications. However, the provided LLM class implementation in praisonaiagents/llm/llm.py does not appear to use the FailoverManager or implement this logic in its request methods (e.g., get_response, _call_with_retry). This results in a significant discrepancy between the documented features and the actual SDK implementation.
| Circuit breaker protection is automatically enabled for every tool call with zero configuration needed. | ||
|
|
||
| ```python | ||
| from praisonaiagents import Agent | ||
|
|
||
| agent = Agent( | ||
| name="Researcher", | ||
| instructions="Research the topic", | ||
| tools=[my_tool], # Circuit breaker protects my_tool automatically | ||
| ) |
There was a problem hiding this comment.
The documentation states that circuit breaker protection is "automatically enabled for every tool call with zero configuration needed." However, the provided implementation of tool execution in praisonaiagents/agent/tool_execution.py and praisonaiagents/llm/llm.py does not show any integration with the CircuitBreaker class. If this wiring is intended to be automatic, it should be implemented within the framework's tool execution flow (e.g., in ToolCallExecutor or by wrapping the tool execution function).
| | `failover_on_timeout` | `bool` | `True` | Failover on timeout | | ||
| | `cooldown_on_rate_limit` | `float` | `60.0` | Rate limit cooldown (seconds) | | ||
| | `cooldown_on_error` | `float` | `30.0` | Error cooldown (seconds) | | ||
| | `rotate_on_success` | `bool` | `False` | Rotate profiles on success | |
There was a problem hiding this comment.
The rotate_on_success configuration option is documented here, but it is not currently implemented in the FailoverManager class in praisonaiagents/llm/failover.py. The manager always returns the first available profile in priority order without considering rotation upon successful calls. Please ensure the implementation is updated to support this feature.
| CLOSED --> OPEN : 5 failures | ||
| OPEN --> HALF_OPEN : 60s elapsed | ||
| HALF_OPEN --> CLOSED : 2 successes | ||
| HALF_OPEN --> OPEN : failure |
There was a problem hiding this comment.
The state diagram indicates that a single failure in the HALF_OPEN state trips the circuit back to OPEN. However, the implementation in praisonaiagents/tools/circuit_breaker.py (lines 323-325) requires the failure_threshold to be met even in the HALF_OPEN state. This means if the monitor_window has expired, the circuit might incorrectly remain in HALF_OPEN after a failure. Please ensure the implementation matches the standard circuit breaker behavior documented here.
| | `monitor_window` | `float` | `300.0` | Failure-rate window | | ||
| | `enable_health_check` | `bool` | `True` | Periodic health checks | | ||
| | `health_check_interval` | `float` | `30.0` | Health-check interval | | ||
| | `graceful_degradation` | `bool` | `True` | Return error dict instead of raising | |
There was a problem hiding this comment.
The description for graceful_degradation states it will "Return error dict instead of raising", but the implementation in CircuitBreaker.call (in praisonaiagents/tools/circuit_breaker.py) still raises a CircuitBreakerException if no fallback function is provided, even if graceful_degradation is set to True. Consider updating the implementation to return a standardized error dictionary when no fallback is available and this option is enabled, matching the examples provided earlier in the document.
Fixes #250
This PR implements comprehensive documentation for the Tool Circuit Breaker and Model Failover features that were wired in PR #1539.
Changes
New Documentation
Updated Documentation
Cross-Links Added
Navigation
Generated with Claude Code