feat: support token usage extraction for llama.cpp by Soulter · Pull Request #7358 · AstrBotDevs/AstrBot

Soulter · 2026-04-04T15:47:57Z

Modifications / 改动点

This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Handle streaming chat completions that emit a final empty choices chunk while still extracting token usage metadata from the stream.

New Features:

Support extracting token usage from usage-only streaming chunks returned by OpenAI-compatible backends.

Enhancements:

Relax streaming parsing logic to safely handle chunks without choices or delta data while still processing tool calls and usage information.

Tests:

Add an async streaming test to verify usage extraction from an empty-choices chunk that only contains usage data.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The hard-coded stream_options={"include_usage": True} in _query_stream may not be supported by all OpenAI-compatible backends using this source; consider making this optional or feature-detected so providers that reject stream_options don’t fail the whole call.
With the new handling of chunks that have an empty choices list, it might be safer to consistently guard all later choice and delta accesses in _query_stream (including parts of the loop not shown in this diff) to avoid unexpected None attribute errors when providers return non-standard chunks.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The hard-coded `stream_options={"include_usage": True}` in `_query_stream` may not be supported by all OpenAI-compatible backends using this source; consider making this optional or feature-detected so providers that reject `stream_options` don’t fail the whole call.
- With the new handling of chunks that have an empty `choices` list, it might be safer to consistently guard all later `choice` and `delta` accesses in `_query_stream` (including parts of the loop not shown in this diff) to avoid unexpected `None` attribute errors when providers return non-standard chunks.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="543" />
<code_context>
-                continue
-            choice = chunk.choices[0]
-            delta = choice.delta
+            logger.debug(f"Received chunk: {chunk}")
+            choice = chunk.choices[0] if chunk.choices else None
+            delta = choice.delta if choice else None
</code_context>
<issue_to_address>
**suggestion (performance):** The debug log for every chunk may be too verbose and also eagerly formats the string.

For high-throughput streaming this can add overhead and clutter. Consider reducing how often/what you log (e.g., sampling or truncating the chunk) and use lazy formatting like `logger.debug("Received chunk: %s", chunk)` so the string isn’t built when debug is off. Also verify that chunk contents aren’t sensitive before logging them in full.

Suggested implementation:

```python
        async for chunk in stream:
            # Truncate the chunk representation to avoid overly verbose logs while still aiding debugging
            logger.debug("Received chunk: %.200s", chunk)
            choice = chunk.choices[0] if chunk.choices else None

```

1. Verify that `chunk` does not contain sensitive data before keeping this debug log in production. If it can, consider redacting or logging only non-sensitive fields instead of the full object.
2. If this loop is expected to handle extremely high-throughput streams, you may additionally want to introduce sampling (e.g., log only every Nth chunk) using a counter in `ChatCompletionStreamState` or a local counter variable.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-04T15:49:18Z

astrbot/core/provider/sources/openai_source.py

-                continue
-            choice = chunk.choices[0]
-            delta = choice.delta
+            logger.debug(f"Received chunk: {chunk}")


suggestion (performance): The debug log for every chunk may be too verbose and also eagerly formats the string.

For high-throughput streaming this can add overhead and clutter. Consider reducing how often/what you log (e.g., sampling or truncating the chunk) and use lazy formatting like logger.debug("Received chunk: %s", chunk) so the string isn’t built when debug is off. Also verify that chunk contents aren’t sensitive before logging them in full.

Suggested implementation:

async for chunk in stream: # Truncate the chunk representation to avoid overly verbose logs while still aiding debugging logger.debug("Received chunk: %.200s", chunk) choice = chunk.choices[0] if chunk.choices else None

Verify that chunk does not contain sensitive data before keeping this debug log in production. If it can, consider redacting or logging only non-sensitive fields instead of the full object.

If this loop is expected to handle extremely high-throughput streams, you may additionally want to introduce sampling (e.g., log only every Nth chunk) using a counter in ChatCompletionStreamState or a local counter variable.

gemini-code-assist

Code Review

This pull request updates the OpenAI provider to include usage information in streaming responses and improves the handling of chunks with empty choices. A new test case has been added to ensure usage is correctly extracted from the final stream chunks. Review feedback suggests merging stream_options into the existing payloads to avoid potential keyword argument conflicts and using lazy formatting for debug logging to optimize performance.

gemini-code-assist · 2026-04-04T15:50:51Z

astrbot/core/provider/sources/openai_source.py

            **payloads,
            stream=True,
            extra_body=extra_body,
+            stream_options={"include_usage": True},
        )


Passing stream_options as a keyword argument while also unpacking **payloads can lead to a TypeError: create() got multiple values for keyword argument 'stream_options' if payloads already contains a stream_options key (e.g., from a custom configuration). It is safer to merge the options into payloads before the call or use a dictionary merge during unpacking.

**{**payloads, "stream_options": {**(payloads.get("stream_options") or {}), "include_usage": True}}, stream=True, extra_body=extra_body, )

gemini-code-assist · 2026-04-04T15:50:51Z

astrbot/core/provider/sources/openai_source.py

-                continue
-            choice = chunk.choices[0]
-            delta = choice.delta
+            choice = chunk.choices[0] if chunk.choices else None


Logging the entire chunk in every iteration of the stream loop can be very noisy and impact performance, especially since f-strings are evaluated even if the log level is higher than DEBUG. Consider using lazy formatting to avoid unnecessary string construction.

Suggested change

choice = chunk.choices[0] if chunk.choices else None

logger.debug("Received chunk: %s", chunk)

feat: support token usage extraction for llama.cpp

1410d94

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 4, 2026

chore: ruff format

35d1232

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 4, 2026

sourcery-ai bot reviewed Apr 4, 2026

View reviewed changes

Soulter merged commit dc9c17c into master Apr 4, 2026
6 checks passed

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support token usage extraction for llama.cpp#7358

feat: support token usage extraction for llama.cpp#7358
Soulter merged 2 commits intomasterfrom
feat/llamacpp

Soulter commented Apr 4, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Apr 4, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	choice = chunk.choices[0] if chunk.choices else None
	logger.debug("Received chunk: %s", chunk)

Uh oh!

Conversation

Soulter commented Apr 4, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications / 改动点

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Soulter commented Apr 4, 2026 •

edited by sourcery-ai bot

Loading