Skip to content

perf(ltm): pre-caption images outside session lock to unblock subsequent messages#7228

Open
Reisenbug wants to merge 4 commits intoAstrBotDevs:masterfrom
Reisenbug:fix/image-recognition-unblock-session
Open

perf(ltm): pre-caption images outside session lock to unblock subsequent messages#7228
Reisenbug wants to merge 4 commits intoAstrBotDevs:masterfrom
Reisenbug:fix/image-recognition-unblock-session

Conversation

@Reisenbug
Copy link
Copy Markdown
Contributor

@Reisenbug Reisenbug commented Mar 31, 2026

开启图片转述功能后,若用户发送图片后紧接着发送文字消息 ,文字消息会被阻塞,必须等待图片转述完全 结束后才能得到回复。
小模型调用的时候延迟尤其明显。
根本原因:图片转述的 LLM 调用在 session_lock 内执行,占用锁期间同 session 的后续消息无法进入处理流程。

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Modifications / 改动点

  • astrbot/core/astr_main_agent.py:新增 pre_caption_images() 函数,在获取 session lock 之前完成图片下载、压缩、转述,结果写入 event extra;_ensure_img_caption() 优先读取预处理结果,避免在 lock 内重复调用小模型。

  • astrbot/core/pipeline/process_stage/method/agent_sub_ stages/internal.py:在 session_lock_manager.acquire_lock() 之前调用 pre_caption_images(),根据配置决定是否等待。

  • astrbot/core/config/default.py:新增配置项 provider_s ettings.image_caption_wait_for_context_order(默认
    True),位于”默认图片转述模型“下方。

  • 开启(默认):后续消息等待图片转述完成后再处理,上 下文顺序正确(图片描述在后续消息之前)。也就是现在的行为。

  • 关闭:后续消息立即响应,图片转述并发执行,但上下文 中图片描述可能出现在后续消息之后。

  • dashboard/src/i18n/:新增 zh-CN 和 en-US 翻译。

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

改后并关闭:
$+ΛʀƊ∪$τ: 03-31 16:54:48
[图片]

$+ΛʀƊ∪$τ: 03-31 16:54:53

Baka!: 03-31 16:54:58
什么黑?我的蝴蝶结是蓝的!

Baka!: 03-31 16:55:02
这个帽子比我的翅膀还大!他这样能看到路吗?


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Allow image captioning to run before acquiring the session lock so that slow caption LLMs no longer block subsequent messages in the same session.

New Features:

  • Add a configurable option to choose whether later messages wait for image captioning to preserve context order or are processed immediately with captioning running concurrently.

Bug Fixes:

  • Prevent text messages from being blocked behind slow image captioning when image transcription is enabled in a session.

Enhancements:

  • Pre-process and cache image captions outside the session lock and reuse them during request handling to avoid redundant caption LLM calls.
  • Expose the new image caption ordering option in the dashboard configuration metadata with localized labels.

@auto-assign auto-assign bot requested review from advent259141 and anka-afk March 31, 2026 10:40
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Mar 31, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In pre_caption_images the config is fetched again even though process() has already retrieved provider_settings; consider passing the relevant config or provider id into pre_caption_images to avoid duplicate lookups and keep configuration handling in one place.
  • When catching the broad Exception in pre_caption_images, log with exc_info=True (or similar) so that stack traces are preserved and diagnosing caption failures is easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `pre_caption_images` the config is fetched again even though `process()` has already retrieved `provider_settings`; consider passing the relevant config or provider id into `pre_caption_images` to avoid duplicate lookups and keep configuration handling in one place.
- When catching the broad `Exception` in `pre_caption_images`, log with `exc_info=True` (or similar) so that stack traces are preserved and diagnosing caption failures is easier.

## Individual Comments

### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="530-541" />
<code_context>
     plugin_context: Context,
     image_caption_provider: str,
 ) -> None:
+    pre_caption = event.get_extra(_PRE_CAPTION_RESULT_KEY)
+    if pre_caption is not None:
+        if pre_caption:
+            req.extra_user_content_parts.append(
+                TextPart(text=f"<image_caption>{pre_caption}</image_caption>")
+            )
+        req.image_urls = []
+        return
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Clarify behavior when pre-caption result is an empty string vs not set vs None.

Right now, an empty-string pre-caption is treated as a successful caption: it suppresses normal captioning and clears `image_urls`, so images are dropped with no caption. If the intent is to short-circuit only on real captions, consider gating both appending the text and clearing `image_urls` behind `if pre_caption:` and leaving `image_urls` unchanged when `pre_caption` is falsy (including `""`).

```suggestion
    plugin_context: Context,
    image_caption_provider: str,
 ) -> None:
+    # If pre-captioning produced a non-empty caption, use it and skip normal captioning.
+    # For None or empty-string results, fall through to the normal captioning path.
+    pre_caption = event.get_extra(_PRE_CAPTION_RESULT_KEY)
+    if pre_caption:
+        req.extra_user_content_parts.append(
+            TextPart(text=f"<image_caption>{pre_caption}</image_caption>")
+        )
+        # We already have a valid caption; skip further image captioning.
+        req.image_urls = []
+        return
+
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Reisenbug Reisenbug changed the title Fix/image recognition unblock session perf(ltm): pre-caption images outside session lock to unblock subsequent messages Mar 31, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a pre-captioning mechanism for images to avoid blocking the session lock during slow LLM calls. It adds a new configuration setting, image_caption_wait_for_context_order, allowing users to choose between strict context ordering and faster response times. Feedback was provided regarding the non-blocking mode, noting that the current implementation still falls back to synchronous captioning within the session lock when the setting is disabled; a code suggestion was offered to skip captioning in this scenario to truly achieve non-blocking behavior.

@RC-CHN
Copy link
Copy Markdown
Member

RC-CHN commented Apr 2, 2026

这个或许算预期行为?是否需要开放这一配置项我个人认为有待商榷,但仍然感谢您做出的贡献

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants