Skip to content

Commit 949cd48

Browse files
committed
Add Voice Input and Latest Release notes feed
1 parent b6676a7 commit 949cd48

File tree

10 files changed

+290
-1
lines changed

10 files changed

+290
-1
lines changed

content/docs/features/meta.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
"katex",
1313
"model-selector",
1414
"system-prompts",
15-
"providers"
15+
"providers",
16+
"voice-input"
1617
]
1718
}
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: Voice Input
3+
description: Adds voice-to-text transcription to the chat UI via a microphone button or ALT+D keyboard shortcut.
4+
---
5+
6+
The [voice](https://github.com/ServiceStack/llms/tree/main/llms/extensions/voice) extension supports three transcription modes tried in order: `voxtype`, `transcribe`, and `voxtral-mini-latest`, using the first one that's available.
7+
8+
To remove modes or change their priority, override with the `LLMS_VOICE` environment variable, e.g:
9+
10+
```bash
11+
export LLMS_VOICE="transcribe,voxtral-mini-latest"
12+
```
13+
14+
## Usage
15+
16+
### 🎤 Microphone Button
17+
18+
Click the microphone icon in the chat input area to start recording. Click again to stop and transcribe.
19+
20+
21+
If the **voice** extension is enabled the microphone button will appear in the chat input area, and the `ALT+D` keyboard shortcut will be available for voice input.
22+
23+
### Keyboard Shortcut
24+
25+
**Alt+D** toggles voice recording with two modes:
26+
27+
- **Tap (< 500ms):** Toggle mode - starts recording, press again to stop
28+
- **Hold (≥ 500ms):** Push-to-talk - records **while held**, stops when released
29+
30+
<Screenshot src="/img/features/voice-input.webp" />
31+
32+
The transcribed text is appended to the current message input.
33+
34+
<Screenshot src="/img/features/voice-recording.webp" />
35+
36+
Voice input can be disabled by [disabling the voice extension](/docs/configuration#disable-extensions) or by setting `LLMS_VOICE=""` to disable all modes.
37+
38+
## Available Modes
39+
40+
Voice Input will use the first available mode.
41+
42+
## voxtype
43+
44+
Uses the [voxtype.io](https://voxtype.io) CLI tool for local transcription.
45+
46+
**Requirements:**
47+
- `voxtype` must be installed and on your `$PATH`
48+
- `ffmpeg` must be installed for audio format conversion
49+
50+
#### Installation
51+
52+
Voxtype works on GNOME, KDE, Sway, Hyprland, River—Wayland or X11 with [native packages](https://voxtype.io/#install) for **Arch Linux**, **Debian**, **Ubuntu**, **Fedora** and support for macOS via their source builds.
53+
54+
## transcribe
55+
56+
Use your preferred speech-to-text tool by creating a custom `transcribe` script or executable.
57+
58+
**Requirements:**
59+
- A `transcribe` executable on your `$PATH` that accepts an audio wav file and outputs text to stdout
60+
- `ffmpeg` must be installed for audio format conversion
61+
62+
**Interface:**
63+
64+
```bash
65+
transcribe recording.wav > transcript.txt
66+
```
67+
68+
See [Creating a transcribe Script](#creating-a-transcribe-script) for implementation examples.
69+
70+
## voxtral-mini-latest
71+
72+
Uses [Mistral's Voxtral model](https://docs.mistral.ai/models/voxtral-mini-transcribe-26-02) for cloud-based transcription. A good option if you want to avoid downloading a large model and using local CPU resources.
73+
74+
**Requirements:**
75+
- Mistral provider must be enabled in your configuration
76+
- `MISTRAL_API_KEY` environment variable must be set
77+
78+
**Pricing:** ~$0.003/minute
79+
80+
---
81+
82+
## Creating a transcribe Script
83+
84+
Make the script executable and add it to your `$PATH`:
85+
86+
```bash
87+
chmod +x ./transcribe
88+
sudo ln -s $(pwd)/transcribe /usr/local/bin/transcribe
89+
```
90+
91+
### Using OpenAI Whisper
92+
93+
Create a script using [uvx](https://github.com/astral-sh/uv) and [openai-whisper](https://github.com/openai/whisper):
94+
95+
`./transcribe`
96+
97+
```bash
98+
#!/usr/bin/env bash
99+
uvx --from openai-whisper whisper "$1" --model base.en --output_format txt --output_dir /tmp >/dev/null 2>&1
100+
101+
BASENAME=$(basename "${1%.*}")
102+
cat "/tmp/${BASENAME}.txt"
103+
rm -f "/tmp/${BASENAME}.txt"
104+
```
105+
106+
### Using Whisper.cpp
107+
108+
[whisper.cpp](https://github.com/ggml-org/whisper.cpp) provides a faster, dependency-free C++ implementation.
109+
110+
**Setup:**
111+
112+
```bash
113+
git clone https://github.com/ggml-org/whisper.cpp.git
114+
cd whisper.cpp
115+
116+
# Download a model
117+
sh ./models/download-ggml-model.sh base.en
118+
119+
# Build
120+
cmake -B build
121+
cmake --build build -j --config Release
122+
123+
# Test
124+
./build/bin/whisper-cli -f samples/jfk.wav
125+
```
126+
127+
**Create the transcribe script:**
128+
129+
`./transcribe`
130+
131+
```bash
132+
#!/usr/bin/env bash
133+
SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
134+
MODEL="$SCRIPT_DIR/models/ggml-base.en.bin"
135+
CLI="$SCRIPT_DIR/build/bin/whisper-cli"
136+
TMPFILE=$(mktemp /tmp/whisper-XXXXXX)
137+
138+
trap 'rm -f "$TMPFILE" "${TMPFILE}.txt"' EXIT
139+
140+
"$CLI" -m "$MODEL" -otxt -f "$1" -of "$TMPFILE" >/dev/null 2>&1
141+
142+
cat "${TMPFILE}.txt"
143+
```
144+

content/docs/getting-started/index.mdx

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,13 @@ To update to the latest version:
6060
For Docker:
6161

6262
<ShellCommand>docker pull ghcr.io/servicestack/llms:latest</ShellCommand>
63+
64+
### Reset to latest configuration
65+
66+
New versions sometimes include changes to `llms.json` config which isn't automatically updated.
67+
68+
To reset to the latest configuration just delete your `llms.json` and it will be recreated with the latest defaults on next run:
69+
70+
```bash
71+
rm ~/.llms/llms.json
72+
```

content/docs/latest.mdx

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
title: Latest Features
3+
description: Latest features and updates in llms.py
4+
---
5+
6+
## Feb 8, 2026
7+
8+
### Support for Voice Input
9+
10+
Added [Voice Input](/docs/features/voice-input) extension with speech-to-text transcription via a microphone button or `ALT+D` shortcut, supporting three modes: local transcription with **voxtype**, custom **transcribe** executable, and cloud-based **voxtral-mini-latest** via Mistral.
11+
12+
<Screenshot src="/img/features/voice-recording.webp" />
13+
14+
- Added **tok/s** metrics in Chat UI on a per-message and per-thread basis
15+
16+
## Feb 5, 2026
17+
18+
### Voxtral Audio Models
19+
20+
Added support for Mistral's [Voxtral audio transcription models](https://mistral.ai/news/voxtral-transcribe-2) - use the **audio** input filter in the model selector to find them.
21+
22+
<Screenshot src="/img/models/voxtral-models.webp" />
23+
24+
Both the **Chat Completion** and dedicated **Audio Transcription** APIs deliver impressive speed, with the dedicated transcription endpoint returning results near-instantly.
25+
26+
<ScreenshotsGallery className="mb-8" gridClass="grid grid-cols-1 md:grid-cols-2 gap-4" images={{
27+
'Voxtral Chat': '/img/models/voxtrals-chat.webp',
28+
'Voxtral Audio Transcription': '/img/models/voxtrals-audio-transcription.webp',
29+
}} />
30+
31+
### Compact Threads
32+
33+
Added [Compact Threads feature](/docs/features/chat-ui#compact-feature) for managing long conversations - it summarizes the current thread into a new, condensed thread targeting **30%** of the original context size. The compact button appears when a conversation exceeds **10 messages** or uses more than **40%** of the model's context limit.
34+
35+
<ScreenshotsGallery className="mb-8" gridClass="grid grid-cols-1 md:grid-cols-2 gap-4" images={{
36+
'Compact Button': '/img/compact-button.webp',
37+
'Compact Button Intensity': '/img/compact-intensity.webp',
38+
}} />
39+
40+
The compaction model and prompts are fully customizable in `~/.llms/llms.json`.
41+
42+
- Fix **OpenRouter** provider after [models.dev](https://models.dev) switched to use `@openrouter/ai-sdk-provider`. Remove `llms.json` to reset to default configuration:
43+
44+
<ShellCommand>rm ~/.llms/llms.json</ShellCommand>
45+
46+
## Feb 3, 2026
47+
48+
- Removed duplicate filesystem tools from [Core Tools](/docs/features/core-tools), they're now only included in [File System Tools](/docs/features/core-tools#file-system-tools)
49+
50+
- Add `sort_by` and `max_result` options in `search_files` and made `path` and optional parameter to improve utility and reduce tool use error rates. `path` now defaults to the first allowed directory (project dir).
51+
52+
## Feb 3, 2026
53+
54+
- Add support for overridable **ClientTimeout** limits in `~/.llms/llms.json`:
55+
56+
```json
57+
{
58+
"limits": {
59+
"client_timeout": 120
60+
}
61+
}
62+
```
63+
64+
- Show **proceed** button for assistant messages without content but with reasoning
65+
66+
## Feb 2, 2026
67+
68+
### Multi User Skills
69+
70+
When Auth is enabled, each user [manages their own skill collection](/docs/extensions/skills#multi-user-skills) at `~/.llms/user/<user>/skills` and can enable or disable skills independently. Shared global & project-level skills remain accessible but read-only.
71+
72+
## Jan 31, 2026
73+
74+
- Refactor [GitHub Auth](/docs/deployment/github-oauth) out into a builtin [github_auth](https://github.com/ServiceStack/llms/tree/main/llms/extensions/github_auth) extension
75+
76+
## Jan 30, 2026
77+
78+
- Support for **tool calling** for models returned by local **Ollama** instances
79+
80+
- New `openai-local` provider for custom OpenAI-compatible endpoints
81+
82+
- Fix computer tool issues in Docker by only loading computer tool if run in environment with a display
83+
84+
## Jan 29, 2026
85+
86+
### Skills Management
87+
88+
Added a full [Skills Management UI](/docs/extensions/skills) for creating, editing, and deleting skills directly from the browser.
89+
90+
Skills package domain-specific instructions, scripts, references & assets that enhance your AI agent.
91+
92+
<Screenshot src="/img/skills/skills-edit-page.webp" />
93+
94+
### Browse & Install Skills
95+
96+
Added a [Skill Browser](/docs/extensions/skills#browsing-and-installing-skills) with access to the top 5,000 community skills from [skills.sh](http://skills.sh). Search, browse, and install pre-built skills directly into your personal collection.
97+
98+
<ScreenshotsGallery className="mb-8" gridClass="grid grid-cols-1 md:grid-cols-2 gap-4" images={{
99+
'Browse Skills': '/img/skills/skills-browse.webp',
100+
'Installing Skill': '/img/skills/skills-installing.webp',
101+
}} />
102+
103+
## Jan 28, 2026
104+
105+
- Use a barebones fallback markdown render when [markdown renders like KaTex](/docs/features/katex) fail
106+
107+
- Use `sanitizeHtml` to avoid breaking layout when displaying rendered html
108+
109+
## Jan 26, 2026
110+
111+
- Add copy button to **TextViewer** popover menu
112+
113+
- Add **proceed** and **retry** buttons at the bottom of Threads to continue agent loop
114+
115+
- Add [filesystem tools](/docs/features/core-tools#file-system-tools) in [computer](/docs/extensions/computer_use) extension
116+
117+
- Add a simple `sendUserMessage` API in UI to simulate a new user message on the thread
118+
119+
- Implement `TextViewer` component for displaying Tool Args, Tool Output + SystemPrompt
120+
121+
## Jan 24, 2026
122+
123+
- Auto collapse long tool args content and add ability to min/maximize text content
124+
125+
## Jan 23, 2026
126+
127+
- Add built-in [computer_use extension](/docs/extensions/computer_use)
128+
129+
---
130+
131+
## v3 Released
132+
133+
See [v3 release notes](/docs/v3) for details on the major new features and improvements in v3.

content/docs/meta.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
"title": "Documentation",
33
"pages": [
44
"index",
5+
"latest",
56
"v3",
67
"getting-started",
78
"features",
7.77 KB
Loading
5.76 KB
Loading
54.8 KB
Loading
61.1 KB
Loading
107 KB
Loading

0 commit comments

Comments
 (0)