Skip to content

[Bug]mimo-v2.5 STT API 请求缺少 system prompt 和 user text,导致音频转写返回 400 #9113

Description

@NoFizz

What happened / 发生了什么

问题描述
使用 MiMo STT API 进行语音转文本时,在 QQ 平台(OneBot v11 / NapCat)发送语音消息会返回 HTTP 400 错误。经排查,存在以下三个独立问题:

问题 1:请求缺少文字指令
MiMo 官方 API 要求的格式:
根据 MiMo 音频理解文档 (https://mimo.mi.com/docs/zh-CN/quick-start/usage-guide/multimodal-understanding/audio-understanding),正确的请求格式必须包含 system prompt 和 user 中的 text 字段:

{
    "model": "mimo-v2.5",
    "messages": [
        {
            "role": "system",
            "content": "You are a speech transcription assistant. Transcribe the spoken content from the audio exactly and return only the transcription text."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "data:audio/wav;base64,..."
                    }
                },
                {
                    "type": "text",
                    "text": "Please transcribe the content of the audio and return only the transcription text."
                }
            ]
        }
    ],
    "max_completion_tokens": 1024
}

Duplicate of #
AstrBot 原代码实际发送的格式:
astrbot/core/provider/sources/mimo_stt_api_source.py 中,get_text 方法构造的 payload 如下:

payload = {
    "model": self.model_name,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": audio_data_url,
                    },
                },
            ],
        },
    ],
    "max_completion_tokens": 1024,
}

缺少 system 消息,且 user 消息的 content 数组中只有 input_audio,没有 text 字段。MiMo API 无法理解裸音频输入的意图,返回 400 错误。

问题 2:默认模型已废弃
astrbot/core/provider/sources/mimo_api_common.py 中定义:
DEFAULT_MIMO_STT_MODEL = "mimo-v2-omni"
根据 MiMo 官方公告 (https://mimo.mi.com/docs/updates/deprecate):

MiMo-V2 系列模型已于 2026.6.30 00:00 正式下线,原模型名称已失效,请及时核对并完成 V2.5 系列的切换。
mimo-v2-omni 已于 2026 年 6 月 30 日下线,使用该模型名会导致请求失败。当前唯一支持音频理解的模型是 mimo-v2.5。


问题 3:QQ 语音 SILK 格式转换链路不可靠
QQ 语音的特殊性:
QQ 平台(OneBot v11 / NapCat)发送的语音消息使用 SILK 格式(magic bytes 为 #!SILK_V3 或 \x02#!SILK_V3),这不是 MiMo API 支持的格式(仅支持 MP3/WAV/FLAC/M4A/OGG)。
官方代码的转换链路:
Record.convert_to_file_path()
→ MediaResolver.to_path(target_format="wav")
→ ensure_wav()
→ _get_audio_magic_type() 检测格式
→ tencent_silk_to_wav() 使用 pysilk 转换
然后在 prepare_audio_input 中再次调用:
MediaResolver.to_base64_data(strict=True, target_format="wav")
→ _resolve_path(target_format="wav")
→ ensure_wav() 再次检测和转换
存在的问题:

  1. 双重转换链路冗余且可能失败:convert_to_file_path() 已经将音频转为 WAV,prepare_audio_input() 又对已转换的文件再次走 MediaResolver 转换流程。如果第一步转换静默失败(例如 pysilk 导入问题、异步上下文限制),文件仍然是 SILK 格式,第二步会将其当成 WAV 读取并 base64 编码。

  2. 无格式验证:转换后没有验证输出文件是否真的是合法 WAV。原始 SILK 字节被当作 WAV 发送给 API,API 无法识别格式,返回:
    "invalid audio format, only mp3/flac/m4a/wav/ogg are supported"

  3. Silk 检测不一致:_get_audio_magic_type() 能正确检测 SILK,但 ensure_wav() 中的 SILK 处理依赖 tencent_silk_to_wav(),该函数在异步执行器中运行 pysilk,可能存在线程安全或导入时序问题。

Reproduce / 如何复现?

  1. 配置 MiMo STT API 服务提供商
  2. 在 QQ(LLBot / OneBot v11)中向机器人发送语音消息
  3. 观察 AstrBot 日志报错

AstrBot version, deployment method (e.g., Windows Docker Desktop deployment), provider used, and messaging platform used. / AstrBot 版本、部署方式(如 Windows Docker Desktop 部署)、使用的提供商、使用的消息平台适配器

  • AstrBot 版本:v4.26.3
  • 部署方式:AstrBot Launcher (Windows)
  • STT 提供商:MiMo STT API(mimo-v2.5 模型)
  • 消息平台:QQ (LLBot / OneBot v11)

OS

Windows

Logs / 报错日志

日志1

[2026-07-02 16:31:01.303] [Core]
[ERRO]
[v4.26.3] [preprocess_stage.stage:201]: Traceback (most recent call last):
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\core\astrbot\core\provider\sources\mimo_stt_api_source.py", line 63, in get_text
    response.raise_for_status()
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\venv\Lib\site-packages\httpx\_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.xiaomimimo.com/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\core\astrbot\core\pipeline\preprocess_stage\stage.py", line 189, in _stt_record
    result = await stt_provider.get_text(audio_url=path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\core\astrbot\core\provider\sources\mimo_stt_api_source.py", line 66, in get_text
    raise MiMoAPIError(
astrbot.core.provider.sources.mimo_api_common.MiMoAPIError: MiMo STT API request failed: HTTP 400, response: {"error":{"code":"400","message":"Param Incorrect","param":"invalid audio format, only mp3/flac/m4a/wav/ogg are supported","type":""}}
[2026-07-02 16:31:01.304] [Core]
[ERRO]
[v4.26.3] [preprocess_stage.stage:203]: 语音转文本失败: MiMo STT API request failed: HTTP 400, response: {"error":{"code":"400","message":"Param Incorrect","param":"invalid audio format, only mp3/flac/m4a/wav/ogg are supported","type":""}} 

日志2

[2026-07-02 17:02:58.479] [Core]
[ERRO]
[v4.26.3] [preprocess_stage.stage:201]: Traceback (most recent call last):
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\core\astrbot\core\provider\sources\mimo_stt_api_source.py", line 210, in get_text
    response.raise_for_status()
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\venv\Lib\site-packages\httpx\_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.xiaomimimo.com/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\core\astrbot\core\pipeline\preprocess_stage\stage.py", line 189, in _stt_record
    result = await stt_provider.get_text(audio_url=path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\NoFizz\.astrbot_launcher\instances\0550b500-44b4-4771-94bb-2fbaf82952ff\core\astrbot\core\provider\sources\mimo_stt_api_source.py", line 213, in get_text
    raise MiMoAPIError(
astrbot.core.provider.sources.mimo_api_common.MiMoAPIError: MiMo STT API request failed: HTTP 400, response: {"error":{"code":"400","message":"Param Incorrect","param":"invalid audio format, only mp3/flac/m4a/wav/ogg are supported","type":""}}
[2026-07-02 17:02:58.480] [Core]
[ERRO]
[v4.26.3] [preprocess_stage.stage:203]: 语音转文本失败: MiMo STT API request failed: HTTP 400, response: {"error":{"code":"400","message":"Param Incorrect","param":"invalid audio format, only mp3/flac/m4a/wav/ogg are supported","type":""}} 

Are you willing to submit a PR? / 你愿意提交 PR 吗?

  • Yes!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:providerThe bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner.bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions