feat: support token usage extraction for llama.cpp#7358
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The hard-coded
stream_options={"include_usage": True}in_query_streammay not be supported by all OpenAI-compatible backends using this source; consider making this optional or feature-detected so providers that rejectstream_optionsdon’t fail the whole call. - With the new handling of chunks that have an empty
choiceslist, it might be safer to consistently guard all laterchoiceanddeltaaccesses in_query_stream(including parts of the loop not shown in this diff) to avoid unexpectedNoneattribute errors when providers return non-standard chunks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The hard-coded `stream_options={"include_usage": True}` in `_query_stream` may not be supported by all OpenAI-compatible backends using this source; consider making this optional or feature-detected so providers that reject `stream_options` don’t fail the whole call.
- With the new handling of chunks that have an empty `choices` list, it might be safer to consistently guard all later `choice` and `delta` accesses in `_query_stream` (including parts of the loop not shown in this diff) to avoid unexpected `None` attribute errors when providers return non-standard chunks.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="543" />
<code_context>
- continue
- choice = chunk.choices[0]
- delta = choice.delta
+ logger.debug(f"Received chunk: {chunk}")
+ choice = chunk.choices[0] if chunk.choices else None
+ delta = choice.delta if choice else None
</code_context>
<issue_to_address>
**suggestion (performance):** The debug log for every chunk may be too verbose and also eagerly formats the string.
For high-throughput streaming this can add overhead and clutter. Consider reducing how often/what you log (e.g., sampling or truncating the chunk) and use lazy formatting like `logger.debug("Received chunk: %s", chunk)` so the string isn’t built when debug is off. Also verify that chunk contents aren’t sensitive before logging them in full.
Suggested implementation:
```python
async for chunk in stream:
# Truncate the chunk representation to avoid overly verbose logs while still aiding debugging
logger.debug("Received chunk: %.200s", chunk)
choice = chunk.choices[0] if chunk.choices else None
```
1. Verify that `chunk` does not contain sensitive data before keeping this debug log in production. If it can, consider redacting or logging only non-sensitive fields instead of the full object.
2. If this loop is expected to handle extremely high-throughput streams, you may additionally want to introduce sampling (e.g., log only every Nth chunk) using a counter in `ChatCompletionStreamState` or a local counter variable.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| continue | ||
| choice = chunk.choices[0] | ||
| delta = choice.delta | ||
| logger.debug(f"Received chunk: {chunk}") |
There was a problem hiding this comment.
suggestion (performance): The debug log for every chunk may be too verbose and also eagerly formats the string.
For high-throughput streaming this can add overhead and clutter. Consider reducing how often/what you log (e.g., sampling or truncating the chunk) and use lazy formatting like logger.debug("Received chunk: %s", chunk) so the string isn’t built when debug is off. Also verify that chunk contents aren’t sensitive before logging them in full.
Suggested implementation:
async for chunk in stream:
# Truncate the chunk representation to avoid overly verbose logs while still aiding debugging
logger.debug("Received chunk: %.200s", chunk)
choice = chunk.choices[0] if chunk.choices else None- Verify that
chunkdoes not contain sensitive data before keeping this debug log in production. If it can, consider redacting or logging only non-sensitive fields instead of the full object. - If this loop is expected to handle extremely high-throughput streams, you may additionally want to introduce sampling (e.g., log only every Nth chunk) using a counter in
ChatCompletionStreamStateor a local counter variable.
There was a problem hiding this comment.
Code Review
This pull request updates the OpenAI provider to include usage information in streaming responses and improves the handling of chunks with empty choices. A new test case has been added to ensure usage is correctly extracted from the final stream chunks. Review feedback suggests merging stream_options into the existing payloads to avoid potential keyword argument conflicts and using lazy formatting for debug logging to optimize performance.
| **payloads, | ||
| stream=True, | ||
| extra_body=extra_body, | ||
| stream_options={"include_usage": True}, | ||
| ) |
There was a problem hiding this comment.
Passing stream_options as a keyword argument while also unpacking **payloads can lead to a TypeError: create() got multiple values for keyword argument 'stream_options' if payloads already contains a stream_options key (e.g., from a custom configuration). It is safer to merge the options into payloads before the call or use a dictionary merge during unpacking.
**{**payloads, "stream_options": {**(payloads.get("stream_options") or {}), "include_usage": True}},
stream=True,
extra_body=extra_body,
)| continue | ||
| choice = chunk.choices[0] | ||
| delta = choice.delta | ||
| choice = chunk.choices[0] if chunk.choices else None |
There was a problem hiding this comment.
Logging the entire chunk in every iteration of the stream loop can be very noisy and impact performance, especially since f-strings are evaluated even if the log level is higher than DEBUG. Consider using lazy formatting to avoid unnecessary string construction.
| choice = chunk.choices[0] if chunk.choices else None | |
| logger.debug("Received chunk: %s", chunk) |
Modifications / 改动点
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Handle streaming chat completions that emit a final empty choices chunk while still extracting token usage metadata from the stream.
New Features:
Enhancements:
Tests: