models - add missing latency metrics#821
Conversation
| @@ -3,6 +3,7 @@ | |||
| import json | |||
There was a problem hiding this comment.
SageMaker requires OpenAI compatible payloads hence no latency field.
| @@ -7,6 +7,7 @@ | |||
| import json | |||
| import logging | |||
There was a problem hiding this comment.
Note, the underlying clients in each of these model providers do not provide a latency metric in their usage payloads:
- Anthropic: https://docs.anthropic.com/en/api/messages#response-usage
- OpenAI: https://platform.openai.com/docs/api-reference/assistants (LiteLLM by extension)
- Writer: https://dev.writer.com/api-reference/completion-api/chat-completion#response-usage
- LlamaAPI: https://llama.developer.meta.com/docs/api/chat
Consequently, we calculate the latency ourselves with time.
There was a problem hiding this comment.
Can we ensure we add proper documentation for those providers?
There was a problem hiding this comment.
Let's also ensure that there are comments in code indicating this "why"
There was a problem hiding this comment.
- I could create an Overview section under the Model Providers user guide. I think that would be a good place for outlining what usage metrics are available (among other things).
- I'll add some comments.
| @@ -7,6 +7,7 @@ | |||
| import json | |||
| import logging | |||
There was a problem hiding this comment.
Can we ensure we add proper documentation for those providers?
| @@ -7,6 +7,7 @@ | |||
| import json | |||
| import logging | |||
There was a problem hiding this comment.
Let's also ensure that there are comments in code indicating this "why"
Description
Calculate missing latencyMs metrics across model providers.
Related Issues
N/A
Documentation PR
N/A
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepare: Updated unit testsChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.