Skip to content

feat(mimo-tts): support voiceclone model with reference audio#9106

Open
lingyun14beta wants to merge 1 commit into
AstrBotDevs:masterfrom
lingyun14beta:mimo
Open

feat(mimo-tts): support voiceclone model with reference audio#9106
lingyun14beta wants to merge 1 commit into
AstrBotDevs:masterfrom
lingyun14beta:mimo

Conversation

@lingyun14beta

@lingyun14beta lingyun14beta commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

fix #9105

Modifications / 改动点

  • astrbot/core/config/default.py:新增配置项 mimo-tts-voiceclone-audio(参考音频路径/URL/base64),并更新 mimo-tts-voice 的 hint,说明其在 voiceclone 模型下会被忽略
  • astrbot/core/provider/sources/mimo_api_common.pyprepare_audio_input() 增加 target_format/preserve_mp3 可选参数(默认值保持原行为,向后兼容)
  • astrbot/core/provider/sources/mimo_tts_api_source.py
    • 新增 _is_voiceclone_model() 判断当前模型是否为 voiceclone
    • 新增 _resolve_voiceclone_voice(),将参考音频转换为 data URL 并按来源缓存,使用 asyncio.Lock 防止并发请求重复转码
    • 未配置参考音频时抛出清晰的 MiMoAPIError
    • _build_payload() 支持通过 voice_value 覆盖音色字段,写入转换后的 data URL
    • 转码时优先保留原始 mp3(preserve_mp3=True),避免转 wav 后体积膨胀超过官方 10MB 限制
    • terminate() 中补充清理转码产生的临时文件
  • dashboard/src/i18n/locales/{en-US,ru-RU,zh-CN}/features/config-metadata.json:同步补充新字段的多语言文案
  • tests/test_mimo_api_sources.py:新增 7 个测试用例,覆盖模型判定、未配置报错、data URL 写入 payload、缓存生效/换源刷新、mp3 保留、并发调用只转码一次
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add support for MiMo TTS voiceclone model using a configurable reference audio source and integrate it into the existing TTS provider pipeline.

New Features:

  • Introduce the mimo-tts-voiceclone-audio configuration option to supply a reference audio sample for MiMo voiceclone models.
  • Support passing a voice value (e.g., data URL from reference audio) into the TTS payload, overriding the standard voice setting when needed.

Enhancements:

  • Cache converted voiceclone reference audio with concurrency control to avoid redundant conversions and temporary file leaks.
  • Extend audio input preparation to accept optional target format and mp3-preservation flags for more flexible handling of input audio.
  • Ensure cleanup of temporary files generated during voiceclone audio preparation when the provider is terminated.

Documentation:

  • Update configuration metadata and dashboard i18n strings to document the new voiceclone reference audio option and clarify when mimo-tts-voice is ignored.

Tests:

  • Add tests covering voiceclone model detection, error handling when reference audio is missing, payload voice field behavior, caching semantics, mp3 preservation, and concurrent conversion behavior.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jul 1, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The _is_voiceclone_model check relies on a substring match in model_name; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable.
  • The error message in _resolve_voiceclone_voice hardcodes mimo-v2.5-tts-voiceclone even though the check is generic; consider interpolating self.model_name instead so the message stays accurate if other voiceclone models are used.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_is_voiceclone_model` check relies on a substring match in `model_name`; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable.
- The error message in `_resolve_voiceclone_voice` hardcodes `mimo-v2.5-tts-voiceclone` even though the check is generic; consider interpolating `self.model_name` instead so the message stays accurate if other voiceclone models are used.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/mimo_tts_api_source.py" line_range="102-111" />
<code_context>
+        async with self._voiceclone_lock:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.

terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).

Suggested implementation:

```python
        async with self._voiceclone_lock:
            cleanup_files(self._voiceclone_cleanup_paths)

```

To safely reuse `_voiceclone_lock` in `terminate()`:
1. Ensure `terminate()` is an `async def` so that `async with self._voiceclone_lock:` is valid. If `terminate()` must remain synchronous, instead move the cleanup into an `async` helper (e.g. `_async_terminate_cleanup`) that uses the lock, and have `terminate()` schedule/await that helper where appropriate.
2. Verify that all other accesses and mutations of `self._voiceclone_cleanup_paths` (if any) are also guarded by `_voiceclone_lock` to fully prevent races.
</issue_to_address>

### Comment 2
<location path="tests/test_mimo_api_sources.py" line_range="321-224" />
<code_context>
+async def test_mimo_tts_voiceclone_preserves_mp3_instead_of_forcing_wav(monkeypatch):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate

The current tests for `_resolve_voiceclone_voice` and `preserve_mp3` cover caching/conversion, but don’t exercise the new `self._voiceclone_cleanup_paths` tracking or the `cleanup_files` calls on refresh/`terminate()`. Please add a test that monkeypatches `cleanup_files` in `mimo_tts_api_source`, performs a voiceclone conversion to populate `_voiceclone_cleanup_paths`, calls `terminate()`, and asserts that `cleanup_files` is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.

Suggested implementation:

```python
    captured_kwargs: dict = {}


@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
    """voiceclone 的临时文件应在 terminate() 时被清理"""
    provider = _make_tts_provider(
        {
            "model": "mimo-v2.5-tts-voiceclone",
            "mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
            "mimo-tts-seed-text": "",
        }
    )

    # monkeypatch cleanup_files 以便观察调用情况
    cleanup_calls = []

    # 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
    import mimo_tts_api_source  # type: ignore

    async def fake_cleanup_files(paths):
        # 记录被请求清理的路径
        cleanup_calls.append(list(paths))

    monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)

    # 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
    provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
    await provider._resolve_voiceclone_voice()

    # 确认有待清理的临时文件被记录
    assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"

    # 记录当前待清理路径,用于后续断言
    paths_to_cleanup = list(provider._voiceclone_cleanup_paths)

    # 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
    await provider.terminate()

    # 验证 cleanup_files 被调用且传入的路径与记录的一致
    assert cleanup_calls == [paths_to_cleanup]

    # 验证清理列表已被清空
    assert provider._voiceclone_cleanup_paths == []

```

1. 如果 `mimo_tts_api_source` 在测试文件中已有导入(例如 `from src.mimo_tts_api_source import cleanup_files` 或类似),请删除该测试中的局部 `import mimo_tts_api_source` 并改为使用正确的模块引用路径,例如:
   - `import src.mimo_tts_api_source as mimo_tts_api_source`,或
   - `from src import mimo_tts_api_source`2. 确保 `provider` 实例在项目中确实存在私有属性 `self._voiceclone_cleanup_paths`,且 `terminate()` 会调用 `cleanup_files(self._voiceclone_cleanup_paths)` 并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +102 to +111
async with self._voiceclone_lock:
if (
self._voiceclone_cache_data_url is not None
and self._voiceclone_cache_source == self.voiceclone_audio_source
):
return self._voiceclone_cache_data_url

try:
data_url, cleanup_paths = await prepare_audio_input(
self.voiceclone_audio_source,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.

terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).

Suggested implementation:

        async with self._voiceclone_lock:
            cleanup_files(self._voiceclone_cleanup_paths)

To safely reuse _voiceclone_lock in terminate():

  1. Ensure terminate() is an async def so that async with self._voiceclone_lock: is valid. If terminate() must remain synchronous, instead move the cleanup into an async helper (e.g. _async_terminate_cleanup) that uses the lock, and have terminate() schedule/await that helper where appropriate.
  2. Verify that all other accesses and mutations of self._voiceclone_cleanup_paths (if any) are also guarded by _voiceclone_lock to fully prevent races.

with pytest.raises(MiMoAPIError, match="mimo-tts-voiceclone-audio"):
await provider.get_audio("hello")
finally:
await provider.terminate()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate

The current tests for _resolve_voiceclone_voice and preserve_mp3 cover caching/conversion, but don’t exercise the new self._voiceclone_cleanup_paths tracking or the cleanup_files calls on refresh/terminate(). Please add a test that monkeypatches cleanup_files in mimo_tts_api_source, performs a voiceclone conversion to populate _voiceclone_cleanup_paths, calls terminate(), and asserts that cleanup_files is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.

Suggested implementation:

    captured_kwargs: dict = {}


@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
    """voiceclone 的临时文件应在 terminate() 时被清理"""
    provider = _make_tts_provider(
        {
            "model": "mimo-v2.5-tts-voiceclone",
            "mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
            "mimo-tts-seed-text": "",
        }
    )

    # monkeypatch cleanup_files 以便观察调用情况
    cleanup_calls = []

    # 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
    import mimo_tts_api_source  # type: ignore

    async def fake_cleanup_files(paths):
        # 记录被请求清理的路径
        cleanup_calls.append(list(paths))

    monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)

    # 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
    provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
    await provider._resolve_voiceclone_voice()

    # 确认有待清理的临时文件被记录
    assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"

    # 记录当前待清理路径,用于后续断言
    paths_to_cleanup = list(provider._voiceclone_cleanup_paths)

    # 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
    await provider.terminate()

    # 验证 cleanup_files 被调用且传入的路径与记录的一致
    assert cleanup_calls == [paths_to_cleanup]

    # 验证清理列表已被清空
    assert provider._voiceclone_cleanup_paths == []
  1. 如果 mimo_tts_api_source 在测试文件中已有导入(例如 from src.mimo_tts_api_source import cleanup_files 或类似),请删除该测试中的局部 import mimo_tts_api_source 并改为使用正确的模块引用路径,例如:
    • import src.mimo_tts_api_source as mimo_tts_api_source,或
    • from src import mimo_tts_api_source
  2. 确保 provider 实例在项目中确实存在私有属性 self._voiceclone_cleanup_paths,且 terminate() 会调用 cleanup_files(self._voiceclone_cleanup_paths) 并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@lingyun14beta

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@Soulter

Soulter commented Jul 5, 2026

Copy link
Copy Markdown
Member

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the MiMo TTS voice cloning model (mimo-v2.5-tts-voiceclone). It adds a new configuration option mimo-tts-voiceclone-audio for the reference audio sample, along with localization updates. The implementation includes a caching mechanism and concurrency locking to prevent redundant audio conversions and temporary file leaks. Additionally, comprehensive unit tests have been added to verify the new functionality. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]支持mimo音色复刻模型

2 participants