Is your feature request related to a problem? Please describe.
At the hackathon today we ran into quite a few rate limit issues with the OpenAI and Anthropic API. The main problem is that the number of tokens exceeds the rate limit for number of input tokens per minute. Because these agents might make many tool calls per minute, the number of input tokens accumulates quickly.
Describe the solution you'd like
We subclassed the AnthropicChatGenerator and overwrote the run method so that the call to Anthropic would be retried for rate limit errors after a 60 second waiting time.
This worked but I could imagine more sophisticated ways were maybe users could specify rate limits for the Agent and it could then wait once a request would hit the limit. The chat generators would benefit from simple retry mechanisms.
Update:
We've opted to use the builtin retry mechanisms of the underlying SDKs from the LLM providers.
Remaining ChatGenerators missing timeout and/or max_retries options:
Is your feature request related to a problem? Please describe.
At the hackathon today we ran into quite a few rate limit issues with the OpenAI and Anthropic API. The main problem is that the number of tokens exceeds the rate limit for number of input tokens per minute. Because these agents might make many tool calls per minute, the number of input tokens accumulates quickly.
Describe the solution you'd like
We subclassed the AnthropicChatGenerator and overwrote the run method so that the call to Anthropic would be retried for rate limit errors after a 60 second waiting time.
This worked but I could imagine more sophisticated ways were maybe users could specify rate limits for the Agent and it could then wait once a request would hit the limit. The chat generators would benefit from simple retry mechanisms.
Update:
We've opted to use the builtin retry mechanisms of the underlying SDKs from the LLM providers.
Remaining
ChatGeneratorsmissingtimeoutand/ormax_retriesoptions: