Name	Name	Last commit message	Last commit date
parent directory ..
src/main/java	src/main/java
README.md	README.md
application.properties.sample	application.properties.sample
pom.xml	pom.xml

Java – MCP Quickstart

For common setup instructions, troubleshooting, and detailed information, see the Java Samples README

This sample demonstrates MCP (Model Context Protocol) server integration with Voice Live using the Azure AI Voice Live SDK for Java.

Like the Model Quickstart, this sample connects directly to a model (e.g. gpt-realtime) — but additionally configures remote MCP servers as tools, enabling the assistant to call external services (DeepWiki, Azure Docs) during the conversation. It implements a voice-based approval flow where the assistant verbally asks the user for permission before using tools that require consent.

What Makes This Sample Unique

MCP Server Integration: Configure remote MCP servers using MCPServer in the session tools list
Voice-Based Approval: Instead of blocking on Scanner input, the assistant verbally asks "Do you approve?" and interprets the user's spoken yes or no
Context-Aware Repeat Approvals: When the model needs additional searches, the prompt changes to "I need one more search. Should I continue?"
MCP Tool Announcements: For auto-approved tools, the assistant says a brief acknowledgement while the call runs
Barge-In Handling: Interrupting during an MCP call prompts the assistant to acknowledge and reassure the user
MCP Stall Detection: If a tool call takes >15 seconds, the assistant proactively tells the user it's still waiting

Voice UX Considerations for MCP Integration

Integrating MCP servers into a voice assistant introduces unique UX challenges that don't exist in text-based or console-based MCP clients.

MCP servers can be configured with different approval policies:

require_approval: "never" — tool calls proceed automatically (e.g., DeepWiki in this sample)
require_approval: "always" — every tool call requires explicit user consent before execution (e.g., Azure Docs in this sample)

In a text-based client, approval is typically a simple y/n console prompt. In a voice UX, this needs to be handled conversationally — and several additional challenges arise around latency, silence, and repeated calls. This quickstart demonstrates patterns to address them:

Tool Approval Must Be Voice-Native

Console-based MCP samples typically use blocking Scanner.nextLine() for approval — fine for a terminal demo, but it freezes the audio pipeline and breaks the voice experience. In a voice UX, approvals should be handled conversationally:

Inject a system message instructing the model to verbally ask for permission
Parse the user's spoken response for clear intent (yes, no, stop, cancel)
Allow barge-in — the user should be able to say "yes" without waiting for the full approval prompt to finish

This quickstart uses word-boundary regex (\byes\b, \b(no|stop|cancel)\b) to avoid false positives from words like "yesterday" or "nobody".

System Instructions Must Teach the Model About Approval

The model needs explicit instructions about the approval flow. Without them, it may paraphrase the permission request into a generic "Let me look that up" — skipping the actual question. This quickstart includes in the system prompt:

"Some tools require user approval. When you receive a system message asking you to request permission, you MUST clearly ask the user for their explicit approval. Never skip the approval question or assume permission is granted."

The per-request system messages use "Say exactly:" phrasing to prevent the model from rewording the question.

Repeated Tool Calls Need Contextual Messaging

MCP servers may require multiple searches to gather complete information. Each search triggers a separate approval if require_approval="always". Rather than asking the identical question each time, this quickstart tracks the call count per server:

First call: "I'd like to search the azure_doc service. Do you approve?"
Subsequent calls: "I need one more search for complete information. Should I continue?"
After 3 approved calls: Auto-denied to prevent infinite loops — the model responds with what it has

The counter resets when results are fully delivered or the user denies a request.

Silence During Tool Calls Must Be Filled

MCP tool calls can take 3–60+ seconds. Without feedback, the user thinks the assistant is broken. This quickstart uses two complementary layers to keep the user informed throughout:

Tool announcements (immediate, client-side): For auto-approved servers, the assistant says "Let me look that up" when the call starts. Skipped for approval-required servers since the approval prompt already communicates. This covers the first few seconds.
Stall detection (client-side, repeating timer): If a tool call runs longer than expected, the assistant proactively tells the user it's still waiting. This quickstart uses a 10-second interval with a maximum of 3 notifications — tune these values based on your expected MCP server latency:
- Fast servers (< 5s): Stall timer rarely fires. Consider increasing the interval or reducing max notifications.
- Medium servers (5–15s): The default 10s/3-max works well. The first notification arrives before most users lose patience.
- Slow servers (15–60s+): Consider shorter intervals (e.g. 8s) or more notifications (e.g. 5) to keep the user engaged. However, note that MCP calls cannot be cancelled — the notifications are status updates, not actionable options.

Together, these two layers ensure continuous feedback: the announcement handles seconds 0–5, and the stall timer covers 10s+ with periodic reassurance.

Batched Response After Tool Completion

When the model makes multiple MCP calls in a single turn (common with search-heavy servers), this quickstart waits for all calls to complete before generating a response. This prevents partial results from being spoken prematurely and avoids the model making additional tool calls based on incomplete data.

For approval-required servers, once the user approves the first call, subsequent calls to the same server within the same turn are auto-approved — avoiding repeated voice prompts for what is logically a single task.

Barge-In During MCP Calls

Users will naturally try to interrupt or ask "Are you still there?" during long tool calls. Rather than ignoring this, the quickstart injects a system message so the model can acknowledge the user and respond to what they said. If the original MCP call completes later, its result is introduced as a late result (e.g. "By the way, those results from earlier just came in..."). Note: since MCP calls cannot be cancelled, the call continues running in the background regardless of what the user says.

Response Collision Handling

MCP flows generate rapid event sequences where response.create calls can collide with active responses. This quickstart defers collisions to the next RESPONSE_DONE event via a flag, ensuring tool results and approval prompts are never silently dropped.

MCP Server Selection: Latency Matters

Not all MCP servers are well-suited for voice UX. Servers that respond quickly (< 5 seconds) provide a seamless experience, while slow servers (10–60+ seconds) create awkward silence even with stall notifications. When choosing MCP servers for a voice assistant:

Prefer low-latency servers — search APIs, simple lookups, and cached data sources work best
Avoid servers that perform heavy computation — large repo analysis, complex document retrieval, or multi-step workflows can take 30–60+ seconds, degrading the voice experience
MCP calls cannot be cancelled — once a call starts, it runs until the server responds or times out. There is no client-side or API-level cancellation mechanism
Late results arrive out of context — if the user moves on during a slow MCP call, the result arrives asynchronously and must be introduced as a late result, which can feel disjointed
Consider whether async results are acceptable for your use case. If users expect real-time answers, long-running MCP servers will frustrate them. If they expect a research-assistant style interaction where results trickle in, it may be acceptable

Prerequisites

Java 11 or later
Maven 3.6+
AI Foundry resource
API key or Azure CLI for authentication
Audio input/output devices (microphone and speakers)
See Java Samples README for common prerequisites

Quick Start

Configure credentials via environment variables or application.properties:

azure.voicelive.endpoint=https://your-endpoint.services.ai.azure.com/
azure.voicelive.api-key=your-api-key-here

Build and run:

mvn clean install
mvn exec:java

# Or with Azure authentication
mvn exec:java -Dexec.args="--use-token-credential"

Command Line Options

Flag	Description
`--api-key`	Azure VoiceLive API key (overrides `application.properties`)
`--endpoint`	Azure VoiceLive endpoint URL (overrides `application.properties`)
`--model`	VoiceLive model to use (default: `gpt-realtime`)
`--voice`	Voice for the assistant (default: `en-US-Ava:DragonHDLatestNeural`)
`--use-token-credential`	Use Azure authentication instead of API key

Sample Trigger Phrases

Say this	MCP Server	Approval	What happens
"What is the GitHub repo fastapi about?"	DeepWiki	Auto (`never`)	Assistant announces lookup, calls tools, speaks results
"Search the Azure documentation for Voice Live API"	Azure Docs	Voice prompt (`always`)	Assistant asks "Do you approve?", waits for your yes or no

How It Works

MCP Server Definitions: MCPServer instances added to the session tools list
Session Configuration: session.update with model, voice, VAD, and MCP tools
Tool Discovery: Voice Live connects to each MCP server and discovers available tools
Tool Announcements: Auto-approved tool calls trigger a brief spoken acknowledgement
Voice Approval: For require_approval="always" servers, a system message is injected prompting the model to ask verbally. The user's spoken response is parsed for yes/no using word-boundary regex
Result Delivery: After MCP call completion, response.create kicks the model to speak the results

Troubleshooting

Symptom	Resolution
`❌ No compatible microphone found`	Verify microphone is connected and available to Java audio.
Missing endpoint/authentication error	Set environment variables or update `application.properties`.
MCP tool discovery failed	Check that MCP server URLs are reachable from your network.
Repeated approval prompts	Expected — the model may need multiple searches. Say "no" or "stop" to deny.
Session hit maximum duration	VoiceLive sessions have a 30-minute limit. Restart the sample.
Maven build failures	Ensure Java 11+ and Maven 3.6+ are installed.

For more troubleshooting guidance, see the Java Samples README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Java – MCP Quickstart

What Makes This Sample Unique

Voice UX Considerations for MCP Integration

Tool Approval Must Be Voice-Native

System Instructions Must Teach the Model About Approval

Repeated Tool Calls Need Contextual Messaging

Silence During Tool Calls Must Be Filled

Batched Response After Tool Completion

Barge-In During MCP Calls

Response Collision Handling

MCP Server Selection: Latency Matters

Prerequisites

Quick Start

Command Line Options

Sample Trigger Phrases

How It Works

Troubleshooting

Additional Resources

Uh oh!

FilesExpand file tree

MCPQuickstart

Directory actions

More options

Directory actions

More options

Latest commit

History

MCPQuickstart

Folders and files

parent directory

README.md

Java – MCP Quickstart

What Makes This Sample Unique

Voice UX Considerations for MCP Integration

Tool Approval Must Be Voice-Native

System Instructions Must Teach the Model About Approval

Repeated Tool Calls Need Contextual Messaging

Silence During Tool Calls Must Be Filled

Batched Response After Tool Completion

Barge-In During MCP Calls

Response Collision Handling

MCP Server Selection: Latency Matters

Prerequisites

Quick Start

Command Line Options

Sample Trigger Phrases

How It Works

Troubleshooting

Additional Resources