Skip to content

Commit edd9c47

Browse files
committed
Add support for Cursor transformation provider in Cursor Whisper extension
- Introduced Cursor as a new transformation provider in package.json and updated related configuration options. - Enhanced documentation to include setup instructions for the Cursor provider and its models. - Updated validation logic to ensure proper configuration of Cursor API keys and models. - Added tests to cover the new Cursor provider functionality and ensure correct behavior across the extension. - Improved user interface elements to facilitate selection and configuration of the Cursor model.
1 parent 644f484 commit edd9c47

22 files changed

Lines changed: 1001 additions & 16 deletions

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ A professional VSCode/Cursor extension that captures audio from your microphone,
1616
1. **Install** the extension (VSIX or Marketplace when available)
1717
2. **Run Setup Wizard** — Command Palette → `Cursor Whisper: Setup Wizard`
1818
3. **Configure OpenAI API key** — Required for Whisper voice-to-text
19-
4. **Optionally choose optimization provider** — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, or OpenRouter
19+
4. **Optionally choose optimization provider** — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, or Cursor
2020
5. **Press `Cmd+Alt+V`** and speak
2121

2222
See the full [Quick Start Guide](docs/quickstart.md).
@@ -90,7 +90,7 @@ Developers often have complex architectural ideas, detailed requirements, or int
9090
### Coming Soon
9191

9292
- 🔄 **Prompt Transformation** - AI-powered optimization of transcribed text
93-
- 🔄 **Multiple AI Providers** - OpenAI, Anthropic, Google Gemini, Azure OpenAI, Ollama, OpenCode, and OpenRouter for prompt transformation
93+
- 🔄 **Multiple AI Providers** - OpenAI, Anthropic, Google Gemini, Azure OpenAI, Ollama, OpenCode, OpenRouter, and Cursor for prompt transformation
9494
- 🔄 **Chat Integration** - Direct insertion into Cursor chat input
9595
- 🔄 **Real-time Streaming** - See transcription as you speak
9696
- 🔄 **Multi-language Support** - Auto-detect or manually configure language
@@ -220,6 +220,7 @@ Prompt optimization converts transcribed speech into structured prompts. Choose
220220
| `ollamaBaseUrl` / `ollamaModel` | Local Ollama server settings |
221221
| `openCodeBaseUrl` / `openCodeModel` | Local OpenCode proxy settings |
222222
| `openRouterModel` | OpenRouter model (when provider is `openrouter`) |
223+
| `cursorModel` | Cursor model (when provider is `cursor`) |
223224

224225
Use **Cursor Whisper: Configure Prompt Optimization Provider** to set up interactively. See [`docs/configuration/`](docs/configuration/) for provider setup.
225226

@@ -229,7 +230,7 @@ Use **Cursor Whisper: Configure Prompt Optimization Provider** to set up interac
229230
|---------|------|---------|-------------|
230231
| `transcriptionLanguage` | string | `"en"` | Language for transcription (`en`, `es`, `fr`, `de`, `auto`) |
231232
| `enablePromptTransformation` | boolean | `true` | Transform transcription into optimized prompts |
232-
| `transformationProvider` | string | `"openai"` | LLM provider for transformation (`openai`, `anthropic`, `google`, `azure`, `ollama`, `opencode`, `openrouter`) |
233+
| `transformationProvider` | string | `"openai"` | LLM provider for transformation (`openai`, `anthropic`, `google`, `azure`, `ollama`, `opencode`, `openrouter`, `cursor`) |
233234
| `transformationModel` | string | `"gpt-4o"` | OpenAI model for transformation |
234235
| `audioQuality` | string | `"high"` | Audio recording quality (`low`, `medium`, `high`) |
235236
| `maxRecordingDuration` | number | `120` | Maximum recording duration in seconds |

cursor-whisper-0.1.0.vsix

9.45 KB
Binary file not shown.

docs/adr/0014-multiple-transformation-providers.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ Requirements:
3636
| Ollama | Local/offline inference |
3737
| OpenCode | Local multi-provider gateway via opencode-llm-proxy |
3838
| OpenRouter | Cloud gateway to 200+ models with one API key |
39+
| Cursor | Native Cursor AI models via Cursor SDK |
3940

4041
Key aspects:
4142

@@ -80,7 +81,7 @@ Key aspects:
8081

8182
### Negative
8283

83-
- Increased codebase complexity (7 provider adapters)
84+
- Increased codebase complexity (8 provider adapters)
8485
- More configuration surface area for users
8586
- Quality and latency vary by provider
8687
- Additional SDK dependencies to maintain
@@ -99,6 +100,7 @@ Key aspects:
99100
- Value object: `src/domain/value-objects/TransformationProvider.ts`
100101
- Config: `cursorWhisper.transformationProvider` and provider-specific model settings
101102
- OpenCode and OpenRouter use the OpenAI SDK with custom `baseURL` (OpenAI-compatible APIs)
103+
- Cursor uses `@cursor/sdk` with `Agent.prompt()` for one-shot transformations
102104
- Commands: `cursor-whisper.configureTransformationProvider`, `cursor-whisper.testTransformation`
103105

104106
---

docs/configuration/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Or run **Cursor Whisper: Configure Prompt Optimization Provider**.
6262
| Ollama | Free | Medium | Local | Good | Privacy-first, offline |
6363
| OpenCode | Free | Medium | Local | High | Reuse OpenCode multi-provider setup |
6464
| OpenRouter | Varies | Fast | Cloud | High | 200+ models with one API key |
65+
| Cursor | ~$0.01 | Fast | Cloud | High | Cursor Composer and frontier models with one API key |
6566

6667
\*Plus Whisper transcription cost (~$0.006/min, always OpenAI)
6768

@@ -197,6 +198,25 @@ API keys are stored per provider (`cursor-whisper.apiKey.{provider}`). Switching
197198

198199
---
199200

201+
### Option H: Cursor (SDK)
202+
203+
```json
204+
{
205+
"cursorWhisper.transformationProvider": "cursor",
206+
"cursorWhisper.cursorModel": "composer-2.5"
207+
}
208+
```
209+
210+
**Setup:** Configure OpenAI for Whisper first. Get a Cursor API key from [Cursor Dashboard → Integrations](https://cursor.com/dashboard/integrations). Run **Configure Prompt Optimization Provider**, select Cursor, enter your key, and choose a model.
211+
212+
**Recommended models:** `composer-2.5` (default), `composer-2.5-fast`, `claude-4.5-sonnet`, `gpt-5.1`, `gpt-5.2-codex`
213+
214+
**Notes:** Works in any editor (VSCode, Cursor, VSCodium, etc.). Uses the `@cursor/sdk` package to connect to Cursor's agent API. No Cursor IDE installation required — only a Cursor API key and internet access.
215+
216+
**Pitfalls:** Cursor only handles optimization — Whisper still needs OpenAI. Ensure your Cursor account has sufficient credits.
217+
218+
---
219+
200220
## Step 3: Verify Configuration
201221

202222
Run **Cursor Whisper: Test Configuration**

docs/quickstart.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Get Cursor Whisper running in a few minutes.
99
Cursor Whisper has **two separate services**:
1010

1111
1. **Voice-to-text (required)** — Always uses **OpenAI Whisper**. Requires an **OpenAI API key**.
12-
2. **Prompt optimization (optional)** — Converts transcribed speech into structured prompts. You choose the provider (OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, or OpenRouter) and supply credentials when required.
12+
2. **Prompt optimization (optional)** — Converts transcribed speech into structured prompts. You choose the provider (OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, or Cursor) and supply credentials when required.
1313

1414
```mermaid
1515
graph LR
@@ -53,7 +53,7 @@ On first launch, Cursor Whisper opens the **Setup Wizard**. You can also run it
5353
- Get a key: https://platform.openai.com/api-keys
5454
3. **Test OpenAI connection** — Verifies your key works
5555
4. **Enable optimization?** — Choose yes or transcription-only mode
56-
5. **Select provider** (if enabled) — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, or OpenRouter
56+
5. **Select provider** (if enabled) — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, or Cursor
5757
6. **Provider credentials** — Enter API key or endpoint when required
5858
7. **Select model** — Pick the model for optimization
5959
8. **Test optimization** — Optional validation before finishing

package.json

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,8 @@
144144
"azure",
145145
"ollama",
146146
"opencode",
147-
"openrouter"
147+
"openrouter",
148+
"cursor"
148149
],
149150
"enumDescriptions": [
150151
"OpenAI GPT models (requires OpenAI API key; can reuse Whisper key)",
@@ -153,7 +154,8 @@
153154
"Azure OpenAI deployments (requires Azure API key and endpoint)",
154155
"Local Ollama models (no API key required)",
155156
"Local OpenCode multi-provider proxy via opencode-llm-proxy (no API key required)",
156-
"OpenRouter unified gateway (requires OpenRouter API key)"
157+
"OpenRouter unified gateway (requires OpenRouter API key)",
158+
"Native Cursor AI via Cursor SDK (works in any editor - requires Cursor API key)"
157159
],
158160
"markdownDescription": "**Prompt optimization provider** (separate from Whisper transcription).\n\nWhisper always uses OpenAI. This setting chooses which AI service optimizes transcribed speech into structured prompts. [Configuration guide](https://github.com/vypdev/cursor-whisper/blob/master/docs/configuration/README.md)"
159161
},
@@ -233,6 +235,18 @@
233235
"default": "openai/gpt-4o",
234236
"markdownDescription": "**OpenRouter model identifier** for prompt optimization when `transformationProvider` is `openrouter`. Requires an OpenRouter API key."
235237
},
238+
"cursorWhisper.cursorModel": {
239+
"type": "string",
240+
"default": "composer-2.5",
241+
"enum": [
242+
"composer-2.5",
243+
"composer-2.5-fast",
244+
"claude-4.5-sonnet",
245+
"gpt-5.1",
246+
"gpt-5.2-codex"
247+
],
248+
"markdownDescription": "**Cursor model** for prompt optimization when `transformationProvider` is `cursor`. Works in any editor (VSCode, Cursor, etc.). Requires a Cursor API key from [Cursor Dashboard](https://cursor.com/dashboard/integrations)."
249+
},
236250
"cursorWhisper.audioQuality": {
237251
"type": "string",
238252
"enum": [
@@ -283,6 +297,7 @@
283297
},
284298
"dependencies": {
285299
"@anthropic-ai/sdk": "^0.30.1",
300+
"@cursor/sdk": "^1.0.13",
286301
"@google/generative-ai": "^0.21.0",
287302
"@kstonekuan/audio-capture": "^0.0.3",
288303
"@vscode/webview-ui-toolkit": "1.4.0",

0 commit comments

Comments
 (0)