Skip to content

Expose timeout and max_retries in ChatGenerators to help work around rate limits #9309

@mathislucka

Description

@mathislucka

Is your feature request related to a problem? Please describe.
At the hackathon today we ran into quite a few rate limit issues with the OpenAI and Anthropic API. The main problem is that the number of tokens exceeds the rate limit for number of input tokens per minute. Because these agents might make many tool calls per minute, the number of input tokens accumulates quickly.

Describe the solution you'd like
We subclassed the AnthropicChatGenerator and overwrote the run method so that the call to Anthropic would be retried for rate limit errors after a 60 second waiting time.

This worked but I could imagine more sophisticated ways were maybe users could specify rate limits for the Agent and it could then wait once a request would hit the limit. The chat generators would benefit from simple retry mechanisms.

Update:
We've opted to use the builtin retry mechanisms of the underlying SDKs from the LLM providers.

Remaining ChatGenerators missing timeout and/or max_retries options:

  • CohereChatGenerator
  • GoogleGenAIChatGenerator
  • MetaLlamaChatGenerator
  • OllamaChatGenerator

Metadata

Metadata

Assignees

No one assigned

    Labels

    Contributions wanted!Looking for external contributionsP2Medium priority, add to the next sprint if no P1 available

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions