Skip to content

Add kesha-voice-kit - local voice toolkit OpenClaw skill#3

Open
drakulavich wants to merge 1 commit into
blessonism:mainfrom
drakulavich:add-kesha-voice-kit
Open

Add kesha-voice-kit - local voice toolkit OpenClaw skill#3
drakulavich wants to merge 1 commit into
blessonism:mainfrom
drakulavich:add-kesha-voice-kit

Conversation

@drakulavich
Copy link
Copy Markdown

@drakulavich drakulavich commented May 17, 2026

kesha-voice-kit is an open-source local-first voice toolkit with a built-in OpenClaw skill:

  • STT: 25 languages, ~19x faster than Whisper on Apple Silicon (CoreML)
  • TTS: Kokoro + Vosk-TTS + 180 macOS system voices, SSML
  • VAD + language detection (107 langs)
  • Rust engine, no cloud required, MIT licensed

https://github.com/drakulavich/kesha-voice-kit

Summary by CodeRabbit

Release Notes

  • Documentation
    • Added kesha-voice-kit to the OpenClaw Skills list, a local speech-to-text and text-to-speech solution using a Rust engine with no cloud service dependencies.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

📝 Walkthrough

Walkthrough

This pull request adds a single line to the OpenClaw Skills list in README.md, introducing the kesha-voice-kit skill entry with its description of local speech-to-text and text-to-speech functionality built on a Rust engine, requiring no cloud services.

Changes

OpenClaw Skills Documentation

Layer / File(s) Summary
Add kesha-voice-kit to skills list
README.md
README skill list is extended with kesha-voice-kit entry, describing its local STT+TTS capabilities, Rust engine implementation, and lack of cloud service requirements.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 A voice kit hops in with Rust so grand,
No clouds in sight, just local command,
STT and TTS, all bundled tight,
kesha-voice-kit shines so bright! 🎤✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding kesha-voice-kit as an OpenClaw skill to the README. It is clear, concise, and directly reflects the changeset content.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Warning

⚠️ This pull request might be slop. It has been flagged by CodeRabbit slop detection and should be reviewed carefully.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Line 17: The table row for "kesha-voice-kit" is in English while the rest of
the README is Chinese; update the description string for the kesha-voice-kit
entry (the text after the repository link/name) to Chinese while keeping the
repo name/link and the "Rust" column unchanged; replace the English description
with a concise Chinese translation such as: "本地语音工具包:语音转文本(25 种语言,借助 CoreML 在
Apple Silicon 上比 Whisper 快 ~19 倍,ONNX 回退)、文本转语音(Kokoro + Vosk-TTS + 180 个 macOS
系统声音,支持 SSML)、语音活动检测、语言检测(107 种语言)。Rust 引擎,无需云服务。"
- Line 17: Update the performance claim for kesha-voice-kit in the README:
locate the table row containing the project name "kesha-voice-kit" and replace
the "~19x faster" STT performance text with "~15x faster" so it matches upstream
documentation (keep the rest of the row unchanged, including "25 languages" and
"107 langs" references).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf161a1b-e1e2-4917-874b-e46c9121010d

📥 Commits

Reviewing files that changed from the base of the PR and between 9704c7c and 06ff1d0.

📒 Files selected for processing (1)
  • README.md

Comment thread README.md
| **[github-explorer](./github-explorer/)** | GitHub 项目深度分析。多源采集 + 结构化研判报告 | search-layer, content-extract |
| **[dependency-tracker](./dependency-tracker/)** | 依赖健康检查。扫描 skills/npm/pip/CLI 版本漂移,生成报告 | `requests` |
| **[gitclaw-backup](./gitclaw-backup/)** | GitHub 备份。将 OpenClaw 工作区同步到 GitHub 仓库 | git |
| **[kesha-voice-kit](https://github.com/drakulavich/kesha-voice-kit)** | Local STT+TTS voice toolkit. Speech-to-text (25 languages, ~19x faster than Whisper on Apple Silicon via CoreML, ONNX fallback), text-to-speech (Kokoro + Vosk-TTS + 180 macOS system voices, SSML), VAD, language detection (107 langs). Rust engine. No cloud required. | Rust |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Translate entry to Chinese for consistency.

The entire README is written in Chinese, but this new entry is in English. For consistency and readability, please translate the description to match the language of the rest of the document.

🌐 Suggested translation (example)
-| **[kesha-voice-kit](https://github.com/drakulavich/kesha-voice-kit)** | Local STT+TTS voice toolkit. Speech-to-text (25 languages, ~19x faster than Whisper on Apple Silicon via CoreML, ONNX fallback), text-to-speech (Kokoro + Vosk-TTS + 180 macOS system voices, SSML), VAD, language detection (107 langs). Rust engine. No cloud required. | Rust |
+| **[kesha-voice-kit](https://github.com/drakulavich/kesha-voice-kit)** | 本地 STT+TTS 语音工具包。语音转文字(支持 25 语言,Apple Silicon 上通过 CoreML 比 Whisper 快约 19 倍,ONNX 降级)、文字转语音(Kokoro + Vosk-TTS + 180 个 macOS 系统语音,支持 SSML)、VAD、语言检测(107 语言)。Rust 引擎。无需云服务。 | Rust |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| **[kesha-voice-kit](https://github.com/drakulavich/kesha-voice-kit)** | Local STT+TTS voice toolkit. Speech-to-text (25 languages, ~19x faster than Whisper on Apple Silicon via CoreML, ONNX fallback), text-to-speech (Kokoro + Vosk-TTS + 180 macOS system voices, SSML), VAD, language detection (107 langs). Rust engine. No cloud required. | Rust |
| **[kesha-voice-kit](https://github.com/drakulavich/kesha-voice-kit)** | 本地 STT+TTS 语音工具包。语音转文字(支持 25 语言,Apple Silicon 上通过 CoreML 比 Whisper 快约 19 倍,ONNX 降级)、文字转语音(Kokoro + Vosk-TTS + 180 macOS 系统语音,支持 SSML)、VAD、语言检测(107 语言)。Rust 引擎。无需云服务。 | Rust |
🧰 Tools
🪛 LanguageTool

[style] ~17-~17: This phrase is redundant (‘OS’ stands for ‘operating system’). Use simply “macOS”.
Context: ...text-to-speech (Kokoro + Vosk-TTS + 180 macOS system voices, SSML), VAD, language detection ...

(ACRONYM_TAUTOLOGY)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 17, The table row for "kesha-voice-kit" is in English
while the rest of the README is Chinese; update the description string for the
kesha-voice-kit entry (the text after the repository link/name) to Chinese while
keeping the repo name/link and the "Rust" column unchanged; replace the English
description with a concise Chinese translation such as: "本地语音工具包:语音转文本(25 种语言,借助
CoreML 在 Apple Silicon 上比 Whisper 快 ~19 倍,ONNX 回退)、文本转语音(Kokoro + Vosk-TTS + 180
个 macOS 系统声音,支持 SSML)、语音活动检测、语言检测(107 种语言)。Rust 引擎,无需云服务。"

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Fetch README from kesha-voice-kit repo to verify technical claims

# Fetch the upstream README
curl -s https://raw.githubusercontent.com/drakulavich/kesha-voice-kit/main/README.md > /tmp/kesha_readme.md

# Display relevant sections mentioning the claimed features
echo "=== Checking for language support claims ==="
rg -i "25 language|language.*25|speech.*language" /tmp/kesha_readme.md || echo "No exact match for 25 languages"

echo -e "\n=== Checking for performance claims ==="
rg -i "19.*faster|whisper.*faster|faster.*whisper" /tmp/kesha_readme.md || echo "No exact match for 19x faster claim"

echo -e "\n=== Checking for TTS voice count ==="
rg -i "180.*voice|voice.*180|macos.*voice" /tmp/kesha_readme.md || echo "No exact match for 180 voices"

echo -e "\n=== Checking for language detection ==="
rg -i "107.*lang|language.*detection.*107" /tmp/kesha_readme.md || echo "No exact match for 107 language detection"

echo -e "\n=== Full README context ==="
cat /tmp/kesha_readme.md

Repository: blessonism/openclaw-skills

Length of output: 12329


Correct the performance claim — it should be ~15× faster, not ~19×.

The upstream README states the STT performance as "~15x faster than Whisper on Apple Silicon, ~2.5x on CPU" (against Whisper large-v3-turbo). The description's "~19× faster" claim is inaccurate. Update to match the documented benchmark. The other claims (25 languages for STT, 107 for language detection) are accurate and match the upstream documentation.

🧰 Tools
🪛 LanguageTool

[style] ~17-~17: This phrase is redundant (‘OS’ stands for ‘operating system’). Use simply “macOS”.
Context: ...text-to-speech (Kokoro + Vosk-TTS + 180 macOS system voices, SSML), VAD, language detection ...

(ACRONYM_TAUTOLOGY)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 17, Update the performance claim for kesha-voice-kit in
the README: locate the table row containing the project name "kesha-voice-kit"
and replace the "~19x faster" STT performance text with "~15x faster" so it
matches upstream documentation (keep the rest of the row unchanged, including
"25 languages" and "107 langs" references).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant