Is your feature request related to a problem?
Command Mode is great for the built-in Mac actions it already knows how to take, but the set of things it can do is fixed. A lot of the workflows I'd most like to drive by voice live outside of what FluidVoice itself can see — my browser, my inbox, my smart home, my local filesystem, etc. Today the only way to reach those is to fall back to the system (switch apps, type, click) which defeats a lot of the appeal of hands-free voice control.
Describe the solution you'd like
Add Model Context Protocol client support to Command Mode, so users can attach any MCP server (stdio or HTTP/SSE) and have its tools surfaced to the model alongside the built-in command tools. In practice this would look like:
- A Settings pane where users add MCP servers (command + args for stdio, or URL + auth header for HTTP).
- A per-server on/off toggle and a tool-level allow list, so a noisy server doesn't flood the model's tool list or burn context.
- Tool calls rendered in the existing tool-call UI in Command Mode (same
ToolCall / toolOutputView path already used for built-in tools).
- The existing "Apple Intelligence disabled for Command Mode (no tool support)" guard already tells me the plumbing for tool-using providers is in place — this would extend that surface.
Concrete workflows this unlocks
These are all things I'd genuinely use if it shipped:
- Browser control (Chrome DevTools MCP / Playwright MCP). "Open the three tabs I had pinned yesterday and log in to the second one." "Find the checkout button on this page and click it." "Summarise what's on screen and email it to me." Voice is a much better input modality than a trackpad for this kind of multi-step web task.
- Email triage (Gmail MCP). "What emails came in from my bank this week?" "Draft a reply to the last message from Alice saying I'll get back to her on Monday." "Archive everything from newsletters@ that's older than 30 days." Being able to triage the inbox while making coffee is the dream.
- Calendar + contacts. "Do I have anything free tomorrow afternoon?" "Schedule a 30-min call with Bob next Tuesday." "What's Bob's phone number?" These are small, frequent questions where opening Calendar.app is friction.
- Smart home (Home Assistant MCP). "Turn the office lights off and start the robot vacuum." "Is the front door locked?" Obvious voice use case, and unlike Siri it can chain multiple actions in one utterance.
- Local shell / filesystem. "Find the most recent screenshot on my desktop and open it." "Kill whatever's holding port 3000." "Show me the last 10 commits on this branch." Voice is surprisingly nice for short, well-scoped shell-ish tasks.
- Project-specific tools. Anyone who already runs a personal MCP server (notes, task tracker, Linear, Jira, internal APIs…) gets those same tools available by voice for free.
The unifying theme: Command Mode today is a voice UI for FluidVoice's tools; with MCP it becomes a voice UI for anything the user has plugged in. That's a step-change in what the feature is useful for.
Context efficiency: tool search and sandboxed execution
One thing worth designing in from day one: MCP servers can expose hundreds of tools each, and loading every schema into every turn is both slow and expensive (most of the context window ends up as tool JSON the model never uses). Two patterns that Claude Code already uses well and that would fit Command Mode naturally:
- Tool search / lazy loading. Instead of dumping all MCP tool schemas up front, expose a single
tool_search meta-tool that lets the model query for tools by name or keyword and only load the schemas it actually needs. For a user with Gmail + Calendar + Drive + Home Assistant + Chrome MCP servers attached, this is the difference between ~300 tools in context per turn and ~5. Huge win for latency, cost, and accuracy (smaller tool lists → better tool selection).
- Sandboxed code execution as an MCP bridge. A single
execute_code tool that can invoke other MCP tools inside a sandboxed JS/Python runtime is enormously more efficient than round-tripping every intermediate value through the model. Example: "take a photo from my desktop, run it through the barcode scanner, save the decoded URL to a note" — with a sandbox that's one tool call; without it, the raw image bytes pass through context as base64 (slow, expensive, often blows the window). It also lets the model loop/filter/map over tool results without each iteration hitting the model.
Both techniques are additive to the basic MCP client — they can land in a v2 — but if the initial design assumes "every tool in context every turn" it'll be hard to retrofit.
Describe alternatives you've considered
- Hard-coding each integration. Doesn't scale — every new integration is a PR against FluidVoice, and users can't add private/internal tools.
- Piping commands out to Claude Code / a separate agent. Works, but adds latency, a second UI, and a second place to configure tools. Keeping it in Command Mode is a much tighter UX.
- AppleScript / Shortcuts bridges. Covers some of these cases on macOS specifically, but it's Mac-only, brittle, and doesn't match the ecosystem momentum MCP now has (Claude, Cursor, Zed, VS Code, Raycast, etc. all speak it).
Additional context
- Anthropic's SDK (which FluidVoice already uses for Claude-backed enhancement) has first-class MCP support, so the lift on the model side is small — the main work is config UI + process lifecycle management for stdio servers.
Is your feature request related to a problem?
Command Mode is great for the built-in Mac actions it already knows how to take, but the set of things it can do is fixed. A lot of the workflows I'd most like to drive by voice live outside of what FluidVoice itself can see — my browser, my inbox, my smart home, my local filesystem, etc. Today the only way to reach those is to fall back to the system (switch apps, type, click) which defeats a lot of the appeal of hands-free voice control.
Describe the solution you'd like
Add Model Context Protocol client support to Command Mode, so users can attach any MCP server (stdio or HTTP/SSE) and have its tools surfaced to the model alongside the built-in command tools. In practice this would look like:
ToolCall/toolOutputViewpath already used for built-in tools).Concrete workflows this unlocks
These are all things I'd genuinely use if it shipped:
The unifying theme: Command Mode today is a voice UI for FluidVoice's tools; with MCP it becomes a voice UI for anything the user has plugged in. That's a step-change in what the feature is useful for.
Context efficiency: tool search and sandboxed execution
One thing worth designing in from day one: MCP servers can expose hundreds of tools each, and loading every schema into every turn is both slow and expensive (most of the context window ends up as tool JSON the model never uses). Two patterns that Claude Code already uses well and that would fit Command Mode naturally:
tool_searchmeta-tool that lets the model query for tools by name or keyword and only load the schemas it actually needs. For a user with Gmail + Calendar + Drive + Home Assistant + Chrome MCP servers attached, this is the difference between ~300 tools in context per turn and ~5. Huge win for latency, cost, and accuracy (smaller tool lists → better tool selection).execute_codetool that can invoke other MCP tools inside a sandboxed JS/Python runtime is enormously more efficient than round-tripping every intermediate value through the model. Example: "take a photo from my desktop, run it through the barcode scanner, save the decoded URL to a note" — with a sandbox that's one tool call; without it, the raw image bytes pass through context as base64 (slow, expensive, often blows the window). It also lets the model loop/filter/map over tool results without each iteration hitting the model.Both techniques are additive to the basic MCP client — they can land in a v2 — but if the initial design assumes "every tool in context every turn" it'll be hard to retrofit.
Describe alternatives you've considered
Additional context