OpenAI offers the option, when streaming, to get a last chunk with the token usage for the entire request: https://developers.openai.com/api/reference/resources/chat#(resource)%20chat.completions%20%3E%20(model)%20chat_completion_chunk%20%3E%20(schema)%20%3E%20(property)%20usage .
It would be nice to use this when available, with a fallback to the current implementation (estimating the number of tokens used).
I will propose a PR to address this issue.
OpenAI offers the option, when streaming, to get a last chunk with the token usage for the entire request: https://developers.openai.com/api/reference/resources/chat#(resource)%20chat.completions%20%3E%20(model)%20chat_completion_chunk%20%3E%20(schema)%20%3E%20(property)%20usage .
It would be nice to use this when available, with a fallback to the current implementation (estimating the number of tokens used).
I will propose a PR to address this issue.