Skip to content

Latest commit

 

History

History
192 lines (155 loc) · 7.06 KB

File metadata and controls

192 lines (155 loc) · 7.06 KB
title Sampling
author jeffhandley
description How servers request LLM completions from the client using the sampling feature.
uid sampling

Sampling

MCP sampling allows servers to request LLM completions from the client. This enables agentic behaviors where a server-side tool delegates reasoning back to the client's language model — for example, summarizing content, generating text, or making decisions.

How sampling works

  1. The server calls xref:ModelContextProtocol.Server.McpServer.SampleAsync* (or uses the xref:ModelContextProtocol.Server.McpServer.AsSamplingChatClient* adapter) during tool execution.
  2. The request is sent to the connected client over MCP.
  3. The client's xref:ModelContextProtocol.Client.McpClientHandlers.SamplingHandler processes the request — typically by forwarding it to an LLM.
  4. The client returns the LLM response to the server, which continues tool execution.

Server: requesting a completion

Inject xref:ModelContextProtocol.Server.McpServer into a tool method and use the xref:ModelContextProtocol.Server.McpServer.AsSamplingChatClient* extension method to get an xref:Microsoft.Extensions.AI.IChatClient that sends requests through the connected client:

[McpServerTool(Name = "SummarizeContent"), Description("Summarizes the given text")]
public static async Task<string> Summarize(
    McpServer server,
    [Description("The text to summarize")] string text,
    CancellationToken cancellationToken)
{
    ChatMessage[] messages =
    [
        new(ChatRole.User, "Briefly summarize the following content:"),
        new(ChatRole.User, text),
    ];

    ChatOptions options = new()
    {
        MaxOutputTokens = 256,
        Temperature = 0.3f,
    };

    return $"Summary: {await server.AsSamplingChatClient().GetResponseAsync(messages, options, cancellationToken)}";
}

Alternatively, use xref:ModelContextProtocol.Server.McpServer.SampleAsync* directly for lower-level control:

CreateMessageResult result = await server.SampleAsync(
    new CreateMessageRequestParams
    {
        Messages =
        [
            new SamplingMessage
            {
                Role = Role.User,
                Content = [new TextContentBlock { Text = "What is 2 + 2?" }]
            }
        ],
        MaxTokens = 100,
    },
    cancellationToken);

string response = result.Content.OfType<TextContentBlock>().FirstOrDefault()?.Text ?? string.Empty;

Client: handling sampling requests

Set xref:ModelContextProtocol.Client.McpClientHandlers.SamplingHandler when creating the client. This handler is called when a server sends a sampling/createMessage request.

Using an IChatClient

The simplest approach is to use xref:ModelContextProtocol.AIContentExtensions.CreateSamplingHandler* with any xref:Microsoft.Extensions.AI.IChatClient implementation:

IChatClient chatClient = new OllamaChatClient(new Uri("http://localhost:11434"), "llama3");

McpClientOptions options = new()
{
    Handlers = new()
    {
        SamplingHandler = chatClient.CreateSamplingHandler()
    }
};

await using var client = await McpClient.CreateAsync(transport, options);

Custom handler

For full control, provide a custom delegate:

McpClientOptions options = new()
{
    Handlers = new()
    {
        SamplingHandler = async (request, progress, cancellationToken) =>
        {
            // Forward to your LLM, apply content filtering, etc.
            string prompt = request?.Messages?.LastOrDefault()?.Content
                .OfType<TextContentBlock>().FirstOrDefault()?.Text ?? string.Empty;

            return new CreateMessageResult
            {
                Model = "my-model",
                Role = Role.Assistant,
                Content = [new TextContentBlock { Text = $"Response to: {prompt}" }]
            };
        }
    }
};

Capability negotiation

Sampling requires the client to advertise the sampling capability. This is handled automatically — when a xref:ModelContextProtocol.Client.McpClientHandlers.SamplingHandler is set, the client includes the sampling capability during initialization. The server can check whether the client supports sampling before calling xref:ModelContextProtocol.Server.McpServer.SampleAsync*; if sampling is not supported, the method throws xref:System.InvalidOperationException.

Multi Round-Trip Requests (MRTR)

When both the client and server opt in to the experimental MRTR protocol, sampling requests are handled via incomplete result / retry instead of a direct JSON-RPC request. This is transparent — the existing SampleAsync and AsSamplingChatClient APIs work identically regardless of whether MRTR is active.

High-level API

No code changes are needed. SampleAsync and AsSamplingChatClient automatically use MRTR when both sides have opted in, and fall back to legacy JSON-RPC requests otherwise:

// This code works the same with or without MRTR — the SDK handles it transparently.
var result = await server.SampleAsync(
    new CreateMessageRequestParams
    {
        Messages =
        [
            new SamplingMessage
            {
                Role = Role.User,
                Content = [new TextContentBlock { Text = "Summarize the data" }]
            }
        ],
        MaxTokens = 256,
    },
    cancellationToken);

Low-level API

For stateless servers or scenarios requiring manual control, throw xref:ModelContextProtocol.Protocol.IncompleteResultException with a sampling input request. On retry, read the client's response from xref:ModelContextProtocol.Protocol.RequestParams.InputResponses:

[McpServerTool, Description("Tool that samples via low-level MRTR")]
public static string SampleWithMrtr(
    McpServer server,
    RequestContext<CallToolRequestParams> context)
{
    // On retry, process the client's sampling response
    if (context.Params!.InputResponses?.TryGetValue("llm_call", out var response) is true)
    {
        var text = response.SamplingResult?.Content
            .OfType<TextContentBlock>().FirstOrDefault()?.Text;
        return $"LLM said: {text}";
    }

    if (!server.IsMrtrSupported)
    {
        return "This tool requires MRTR support.";
    }

    // First call — request LLM completion from the client
    throw new IncompleteResultException(
        inputRequests: new Dictionary<string, InputRequest>
        {
            ["llm_call"] = InputRequest.ForSampling(new CreateMessageRequestParams
            {
                Messages =
                [
                    new SamplingMessage
                    {
                        Role = Role.User,
                        Content = [new TextContentBlock { Text = "Summarize the data" }]
                    }
                ],
                MaxTokens = 256
            })
        },
        requestState: "awaiting-sample");
}

Tip

See Multi Round-Trip Requests (MRTR) for the full protocol details, including load shedding, multiple round trips, and the compatibility matrix.