feat(mimo-tts): support voiceclone model with reference audio#9106
feat(mimo-tts): support voiceclone model with reference audio#9106lingyun14beta wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
_is_voiceclone_modelcheck relies on a substring match inmodel_name; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable. - The error message in
_resolve_voiceclone_voicehardcodesmimo-v2.5-tts-voicecloneeven though the check is generic; consider interpolatingself.model_nameinstead so the message stays accurate if other voiceclone models are used.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_is_voiceclone_model` check relies on a substring match in `model_name`; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable.
- The error message in `_resolve_voiceclone_voice` hardcodes `mimo-v2.5-tts-voiceclone` even though the check is generic; consider interpolating `self.model_name` instead so the message stays accurate if other voiceclone models are used.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/mimo_tts_api_source.py" line_range="102-111" />
<code_context>
+ async with self._voiceclone_lock:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.
terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).
Suggested implementation:
```python
async with self._voiceclone_lock:
cleanup_files(self._voiceclone_cleanup_paths)
```
To safely reuse `_voiceclone_lock` in `terminate()`:
1. Ensure `terminate()` is an `async def` so that `async with self._voiceclone_lock:` is valid. If `terminate()` must remain synchronous, instead move the cleanup into an `async` helper (e.g. `_async_terminate_cleanup`) that uses the lock, and have `terminate()` schedule/await that helper where appropriate.
2. Verify that all other accesses and mutations of `self._voiceclone_cleanup_paths` (if any) are also guarded by `_voiceclone_lock` to fully prevent races.
</issue_to_address>
### Comment 2
<location path="tests/test_mimo_api_sources.py" line_range="321-224" />
<code_context>
+async def test_mimo_tts_voiceclone_preserves_mp3_instead_of_forcing_wav(monkeypatch):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate
The current tests for `_resolve_voiceclone_voice` and `preserve_mp3` cover caching/conversion, but don’t exercise the new `self._voiceclone_cleanup_paths` tracking or the `cleanup_files` calls on refresh/`terminate()`. Please add a test that monkeypatches `cleanup_files` in `mimo_tts_api_source`, performs a voiceclone conversion to populate `_voiceclone_cleanup_paths`, calls `terminate()`, and asserts that `cleanup_files` is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.
Suggested implementation:
```python
captured_kwargs: dict = {}
@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
"""voiceclone 的临时文件应在 terminate() 时被清理"""
provider = _make_tts_provider(
{
"model": "mimo-v2.5-tts-voiceclone",
"mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
"mimo-tts-seed-text": "",
}
)
# monkeypatch cleanup_files 以便观察调用情况
cleanup_calls = []
# 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
import mimo_tts_api_source # type: ignore
async def fake_cleanup_files(paths):
# 记录被请求清理的路径
cleanup_calls.append(list(paths))
monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)
# 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
await provider._resolve_voiceclone_voice()
# 确认有待清理的临时文件被记录
assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"
# 记录当前待清理路径,用于后续断言
paths_to_cleanup = list(provider._voiceclone_cleanup_paths)
# 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
await provider.terminate()
# 验证 cleanup_files 被调用且传入的路径与记录的一致
assert cleanup_calls == [paths_to_cleanup]
# 验证清理列表已被清空
assert provider._voiceclone_cleanup_paths == []
```
1. 如果 `mimo_tts_api_source` 在测试文件中已有导入(例如 `from src.mimo_tts_api_source import cleanup_files` 或类似),请删除该测试中的局部 `import mimo_tts_api_source` 并改为使用正确的模块引用路径,例如:
- `import src.mimo_tts_api_source as mimo_tts_api_source`,或
- `from src import mimo_tts_api_source`。
2. 确保 `provider` 实例在项目中确实存在私有属性 `self._voiceclone_cleanup_paths`,且 `terminate()` 会调用 `cleanup_files(self._voiceclone_cleanup_paths)` 并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async with self._voiceclone_lock: | ||
| if ( | ||
| self._voiceclone_cache_data_url is not None | ||
| and self._voiceclone_cache_source == self.voiceclone_audio_source | ||
| ): | ||
| return self._voiceclone_cache_data_url | ||
|
|
||
| try: | ||
| data_url, cleanup_paths = await prepare_audio_input( | ||
| self.voiceclone_audio_source, |
There was a problem hiding this comment.
suggestion (bug_risk): Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.
terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).
Suggested implementation:
async with self._voiceclone_lock:
cleanup_files(self._voiceclone_cleanup_paths)To safely reuse _voiceclone_lock in terminate():
- Ensure
terminate()is anasync defso thatasync with self._voiceclone_lock:is valid. Ifterminate()must remain synchronous, instead move the cleanup into anasynchelper (e.g._async_terminate_cleanup) that uses the lock, and haveterminate()schedule/await that helper where appropriate. - Verify that all other accesses and mutations of
self._voiceclone_cleanup_paths(if any) are also guarded by_voiceclone_lockto fully prevent races.
| with pytest.raises(MiMoAPIError, match="mimo-tts-voiceclone-audio"): | ||
| await provider.get_audio("hello") | ||
| finally: | ||
| await provider.terminate() |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate
The current tests for _resolve_voiceclone_voice and preserve_mp3 cover caching/conversion, but don’t exercise the new self._voiceclone_cleanup_paths tracking or the cleanup_files calls on refresh/terminate(). Please add a test that monkeypatches cleanup_files in mimo_tts_api_source, performs a voiceclone conversion to populate _voiceclone_cleanup_paths, calls terminate(), and asserts that cleanup_files is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.
Suggested implementation:
captured_kwargs: dict = {}
@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
"""voiceclone 的临时文件应在 terminate() 时被清理"""
provider = _make_tts_provider(
{
"model": "mimo-v2.5-tts-voiceclone",
"mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
"mimo-tts-seed-text": "",
}
)
# monkeypatch cleanup_files 以便观察调用情况
cleanup_calls = []
# 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
import mimo_tts_api_source # type: ignore
async def fake_cleanup_files(paths):
# 记录被请求清理的路径
cleanup_calls.append(list(paths))
monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)
# 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
await provider._resolve_voiceclone_voice()
# 确认有待清理的临时文件被记录
assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"
# 记录当前待清理路径,用于后续断言
paths_to_cleanup = list(provider._voiceclone_cleanup_paths)
# 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
await provider.terminate()
# 验证 cleanup_files 被调用且传入的路径与记录的一致
assert cleanup_calls == [paths_to_cleanup]
# 验证清理列表已被清空
assert provider._voiceclone_cleanup_paths == []- 如果
mimo_tts_api_source在测试文件中已有导入(例如from src.mimo_tts_api_source import cleanup_files或类似),请删除该测试中的局部import mimo_tts_api_source并改为使用正确的模块引用路径,例如:import src.mimo_tts_api_source as mimo_tts_api_source,或from src import mimo_tts_api_source。
- 确保
provider实例在项目中确实存在私有属性self._voiceclone_cleanup_paths,且terminate()会调用cleanup_files(self._voiceclone_cleanup_paths)并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for the MiMo TTS voice cloning model (mimo-v2.5-tts-voiceclone). It adds a new configuration option mimo-tts-voiceclone-audio for the reference audio sample, along with localization updates. The implementation includes a caching mechanism and concurrency locking to prevent redundant audio conversions and temporary file leaks. Additionally, comprehensive unit tests have been added to verify the new functionality. There are no review comments, and I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
fix #9105
Modifications / 改动点
astrbot/core/config/default.py:新增配置项mimo-tts-voiceclone-audio(参考音频路径/URL/base64),并更新mimo-tts-voice的 hint,说明其在 voiceclone 模型下会被忽略astrbot/core/provider/sources/mimo_api_common.py:prepare_audio_input()增加target_format/preserve_mp3可选参数(默认值保持原行为,向后兼容)astrbot/core/provider/sources/mimo_tts_api_source.py:_is_voiceclone_model()判断当前模型是否为 voiceclone_resolve_voiceclone_voice(),将参考音频转换为 data URL 并按来源缓存,使用asyncio.Lock防止并发请求重复转码MiMoAPIError_build_payload()支持通过voice_value覆盖音色字段,写入转换后的 data URLpreserve_mp3=True),避免转 wav 后体积膨胀超过官方 10MB 限制terminate()中补充清理转码产生的临时文件dashboard/src/i18n/locales/{en-US,ru-RU,zh-CN}/features/config-metadata.json:同步补充新字段的多语言文案tests/test_mimo_api_sources.py:新增 7 个测试用例,覆盖模型判定、未配置报错、data URL 写入 payload、缓存生效/换源刷新、mp3 保留、并发调用只转码一次Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add support for MiMo TTS voiceclone model using a configurable reference audio source and integrate it into the existing TTS provider pipeline.
New Features:
Enhancements:
Documentation:
Tests: