Skip to content

Commit ec72d41

Browse files
MorabbinCopilot
andauthored
Pass structured tool results via RPC instead of stringifying (#970)
* Pass structured tool results via RPC instead of stringifying In both the Node and Go SDKs, _executeToolAndRespond / executeToolAndRespond was converting structured ToolResultObject values into plain JSON strings before sending them over RPC. This destroyed the object shape, causing toolTelemetry (and other structured fields like resultType) to be silently lost on the server side. Node SDK: detect ToolResultObject by checking for textResultForLlm + resultType properties and pass it directly to handlePendingToolCall, which already accepts the union type (string | object) in its RPC schema. Go SDK: send a ResultUnion with ResultResult populated (preserving TextResultForLlm, ResultType, Error, and ToolTelemetry) instead of extracting only the text and sending ResultUnion with String. Fixes the SDK side of github/copilot-agent-runtime#5574. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback: restore Go fallback, add test, fix snapshot - Go: restore fallback to fmt.Sprintf when TextResultForLLM is empty, preserving prior behavior for handlers that don't set it. - Node: add e2e test verifying toolTelemetry is not leaked into LLM content and that the tool.execution_complete event fires. - Update failure-resulttype snapshot to expect just textResultForLlm content instead of the old stringified JSON. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Strengthen toolTelemetry test assertions Assert tool.execution_complete event has success=true and that toolTelemetry, when present, is non-empty (not the {} that results from the old stringification bug). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add tool_results e2e tests for Python, Go, and .NET Mirror the Node e2e tests from tool_results.test.ts in all three remaining SDK languages, reusing the shared YAML snapshots. Each suite covers: - structured ToolResultObject with success resultType - failure resultType - toolTelemetry preservation (verifies LLM content has no stringified JSON and no toolTelemetry/resultType leakage) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Python ruff formatting and .NET JSON serialization - Python: run ruff format on test_tool_results.py - .NET: add JsonSerializerContext with ToolResultAIContent and pass serializerOptions to AIFunctionFactory.Create, matching the pattern used in ToolsTests for complex types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Python line length and .NET partial class for source gen - Python: split long string on line 54 to stay under 100 char limit - .NET: mark ToolResultsTests as partial so the nested ToolResultsJsonContext source generator can emit code Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Python SDK to always send structured tool results Remove the failure special-case that sent only the error string for result_type='failure'. Now the Python SDK always sends the full ResultResult struct (including error, resultType, toolTelemetry), consistent with Node, Go, and .NET SDKs. This fixes the e2e test snapshot mismatch: the shared YAML snapshots expect the CLI to receive a structured result (which it formats as 'Failed to execute ... due to error: Error: Tool execution failed'), but the old Python path sent only the error string, producing a different message format that the replay proxy couldn't match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Distinguish exception-originated failures from deliberate ones The previous commit broke test_handles_tool_calling_errors because define_tool's exception handler wraps exceptions as ToolResult(failure), which was then sent as a structured result instead of a top-level error. Fix: introduce TOOL_EXCEPTION_TEXT constant shared between define_tool's exception handler and _execute_tool_and_respond. When the failure's textResultForLlm matches the known exception wrapper message, send via the top-level error param (CLI formats as 'Failed to execute...'); otherwise send the full structured result to preserve metadata. This preserves backward compatibility for thrown exceptions while allowing user-returned ToolResult(failure) to carry toolTelemetry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address code review: tighter guards, default ResultType, session cleanup - Node: validate resultType against allowed values in isToolResultObject - Go: default empty ResultType to 'success' (or 'failure' when error set) - Python: use _from_exception attribute instead of sentinel string match - Python/Go: disconnect sessions in e2e tests to avoid leaking state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add 'timeout' to ToolResultType in Node and Python The agent runtime recently added 'timeout' as a fifth valid resultType. Update the type guard and type definitions to match. * Fix prettier formatting in session.ts * Make _from_exception a typed dataclass field; revert accidental snapshot changes - Python: replace dynamic attribute with proper dataclass field (field(default=False, repr=False)) so type-checkers can see it - Revert accidentally committed snapshot modifications for blob_attachments and message_attachments tests --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 8fb154e commit ec72d41

File tree

11 files changed

+559
-17
lines changed

11 files changed

+559
-17
lines changed

dotnet/test/ToolResultsTests.cs

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
/*---------------------------------------------------------------------------------------------
2+
* Copyright (c) Microsoft Corporation. All rights reserved.
3+
*--------------------------------------------------------------------------------------------*/
4+
5+
using GitHub.Copilot.SDK.Test.Harness;
6+
using Microsoft.Extensions.AI;
7+
using System.ComponentModel;
8+
using System.Text.Json;
9+
using System.Text.Json.Serialization;
10+
using Xunit;
11+
using Xunit.Abstractions;
12+
13+
namespace GitHub.Copilot.SDK.Test;
14+
15+
public partial class ToolResultsTests(E2ETestFixture fixture, ITestOutputHelper output) : E2ETestBase(fixture, "tool_results", output)
16+
{
17+
[JsonSourceGenerationOptions(JsonSerializerDefaults.Web)]
18+
[JsonSerializable(typeof(ToolResultAIContent))]
19+
[JsonSerializable(typeof(ToolResultObject))]
20+
[JsonSerializable(typeof(JsonElement))]
21+
private partial class ToolResultsJsonContext : JsonSerializerContext;
22+
23+
[Fact]
24+
public async Task Should_Handle_Structured_ToolResultObject_From_Custom_Tool()
25+
{
26+
var session = await CreateSessionAsync(new SessionConfig
27+
{
28+
Tools = [AIFunctionFactory.Create(GetWeather, "get_weather", serializerOptions: ToolResultsJsonContext.Default.Options)],
29+
OnPermissionRequest = PermissionHandler.ApproveAll,
30+
});
31+
32+
await session.SendAsync(new MessageOptions
33+
{
34+
Prompt = "What's the weather in Paris?"
35+
});
36+
37+
var assistantMessage = await TestHelper.GetFinalAssistantMessageAsync(session);
38+
Assert.NotNull(assistantMessage);
39+
Assert.Matches("(?i)sunny|72", assistantMessage!.Data.Content ?? string.Empty);
40+
41+
[Description("Gets weather for a city")]
42+
static ToolResultAIContent GetWeather([Description("City name")] string city)
43+
=> new(new()
44+
{
45+
TextResultForLlm = $"The weather in {city} is sunny and 72°F",
46+
ResultType = "success",
47+
});
48+
}
49+
50+
[Fact]
51+
public async Task Should_Handle_Tool_Result_With_Failure_ResultType()
52+
{
53+
var session = await CreateSessionAsync(new SessionConfig
54+
{
55+
Tools = [AIFunctionFactory.Create(CheckStatus, "check_status", serializerOptions: ToolResultsJsonContext.Default.Options)],
56+
OnPermissionRequest = PermissionHandler.ApproveAll,
57+
});
58+
59+
await session.SendAsync(new MessageOptions
60+
{
61+
Prompt = "Check the status of the service using check_status. If it fails, say 'service is down'."
62+
});
63+
64+
var assistantMessage = await TestHelper.GetFinalAssistantMessageAsync(session);
65+
Assert.NotNull(assistantMessage);
66+
Assert.Contains("service is down", assistantMessage!.Data.Content?.ToLowerInvariant() ?? string.Empty);
67+
68+
[Description("Checks the status of a service")]
69+
static ToolResultAIContent CheckStatus()
70+
=> new(new()
71+
{
72+
TextResultForLlm = "Service unavailable",
73+
ResultType = "failure",
74+
Error = "API timeout",
75+
});
76+
}
77+
78+
[Fact]
79+
public async Task Should_Preserve_ToolTelemetry_And_Not_Stringify_Structured_Results_For_LLM()
80+
{
81+
var session = await CreateSessionAsync(new SessionConfig
82+
{
83+
Tools = [AIFunctionFactory.Create(AnalyzeCode, "analyze_code", serializerOptions: ToolResultsJsonContext.Default.Options)],
84+
OnPermissionRequest = PermissionHandler.ApproveAll,
85+
});
86+
87+
await session.SendAsync(new MessageOptions
88+
{
89+
Prompt = "Analyze the file main.ts for issues."
90+
});
91+
92+
var assistantMessage = await TestHelper.GetFinalAssistantMessageAsync(session);
93+
Assert.NotNull(assistantMessage);
94+
Assert.Contains("no issues", assistantMessage!.Data.Content?.ToLowerInvariant() ?? string.Empty);
95+
96+
// Verify the LLM received just textResultForLlm, not stringified JSON
97+
var traffic = await Ctx.GetExchangesAsync();
98+
var lastConversation = traffic[^1];
99+
100+
var toolResults = lastConversation.Request.Messages
101+
.Where(m => m.Role == "tool")
102+
.ToList();
103+
104+
Assert.Single(toolResults);
105+
Assert.DoesNotContain("toolTelemetry", toolResults[0].Content);
106+
Assert.DoesNotContain("resultType", toolResults[0].Content);
107+
108+
[Description("Analyzes code for issues")]
109+
static ToolResultAIContent AnalyzeCode([Description("File to analyze")] string file)
110+
=> new(new()
111+
{
112+
TextResultForLlm = $"Analysis of {file}: no issues found",
113+
ResultType = "success",
114+
ToolTelemetry = new Dictionary<string, object>
115+
{
116+
["metrics"] = new Dictionary<string, object> { ["analysisTimeMs"] = 150 },
117+
["properties"] = new Dictionary<string, object> { ["analyzer"] = "eslint" },
118+
},
119+
});
120+
}
121+
}
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
package e2e
2+
3+
import (
4+
"strings"
5+
"testing"
6+
7+
copilot "github.com/github/copilot-sdk/go"
8+
"github.com/github/copilot-sdk/go/internal/e2e/testharness"
9+
)
10+
11+
func TestToolResults(t *testing.T) {
12+
ctx := testharness.NewTestContext(t)
13+
client := ctx.NewClient()
14+
t.Cleanup(func() { client.ForceStop() })
15+
16+
t.Run("should handle structured toolresultobject from custom tool", func(t *testing.T) {
17+
ctx.ConfigureForTest(t)
18+
19+
type WeatherParams struct {
20+
City string `json:"city" jsonschema:"City name"`
21+
}
22+
23+
session, err := client.CreateSession(t.Context(), &copilot.SessionConfig{
24+
OnPermissionRequest: copilot.PermissionHandler.ApproveAll,
25+
Tools: []copilot.Tool{
26+
copilot.DefineTool("get_weather", "Gets weather for a city",
27+
func(params WeatherParams, inv copilot.ToolInvocation) (copilot.ToolResult, error) {
28+
return copilot.ToolResult{
29+
TextResultForLLM: "The weather in " + params.City + " is sunny and 72°F",
30+
ResultType: "success",
31+
}, nil
32+
}),
33+
},
34+
})
35+
if err != nil {
36+
t.Fatalf("Failed to create session: %v", err)
37+
}
38+
39+
_, err = session.Send(t.Context(), copilot.MessageOptions{Prompt: "What's the weather in Paris?"})
40+
if err != nil {
41+
t.Fatalf("Failed to send message: %v", err)
42+
}
43+
44+
answer, err := testharness.GetFinalAssistantMessage(t.Context(), session)
45+
if err != nil {
46+
t.Fatalf("Failed to get assistant message: %v", err)
47+
}
48+
49+
content := ""
50+
if answer.Data.Content != nil {
51+
content = *answer.Data.Content
52+
}
53+
if !strings.Contains(strings.ToLower(content), "sunny") && !strings.Contains(content, "72") {
54+
t.Errorf("Expected answer to mention sunny or 72, got %q", content)
55+
}
56+
57+
if err := session.Disconnect(); err != nil {
58+
t.Errorf("Failed to disconnect session: %v", err)
59+
}
60+
})
61+
62+
t.Run("should handle tool result with failure resulttype", func(t *testing.T) {
63+
ctx.ConfigureForTest(t)
64+
65+
session, err := client.CreateSession(t.Context(), &copilot.SessionConfig{
66+
OnPermissionRequest: copilot.PermissionHandler.ApproveAll,
67+
Tools: []copilot.Tool{
68+
{
69+
Name: "check_status",
70+
Description: "Checks the status of a service",
71+
Handler: func(inv copilot.ToolInvocation) (copilot.ToolResult, error) {
72+
return copilot.ToolResult{
73+
TextResultForLLM: "Service unavailable",
74+
ResultType: "failure",
75+
Error: "API timeout",
76+
}, nil
77+
},
78+
},
79+
},
80+
})
81+
if err != nil {
82+
t.Fatalf("Failed to create session: %v", err)
83+
}
84+
85+
_, err = session.Send(t.Context(), copilot.MessageOptions{
86+
Prompt: "Check the status of the service using check_status. If it fails, say 'service is down'.",
87+
})
88+
if err != nil {
89+
t.Fatalf("Failed to send message: %v", err)
90+
}
91+
92+
answer, err := testharness.GetFinalAssistantMessage(t.Context(), session)
93+
if err != nil {
94+
t.Fatalf("Failed to get assistant message: %v", err)
95+
}
96+
97+
content := ""
98+
if answer.Data.Content != nil {
99+
content = *answer.Data.Content
100+
}
101+
if !strings.Contains(strings.ToLower(content), "service is down") {
102+
t.Errorf("Expected 'service is down', got %q", content)
103+
}
104+
105+
if err := session.Disconnect(); err != nil {
106+
t.Errorf("Failed to disconnect session: %v", err)
107+
}
108+
})
109+
110+
t.Run("should preserve tooltelemetry and not stringify structured results for llm", func(t *testing.T) {
111+
ctx.ConfigureForTest(t)
112+
113+
type AnalyzeParams struct {
114+
File string `json:"file" jsonschema:"File to analyze"`
115+
}
116+
117+
session, err := client.CreateSession(t.Context(), &copilot.SessionConfig{
118+
OnPermissionRequest: copilot.PermissionHandler.ApproveAll,
119+
Tools: []copilot.Tool{
120+
copilot.DefineTool("analyze_code", "Analyzes code for issues",
121+
func(params AnalyzeParams, inv copilot.ToolInvocation) (copilot.ToolResult, error) {
122+
return copilot.ToolResult{
123+
TextResultForLLM: "Analysis of " + params.File + ": no issues found",
124+
ResultType: "success",
125+
ToolTelemetry: map[string]any{
126+
"metrics": map[string]any{"analysisTimeMs": 150},
127+
"properties": map[string]any{"analyzer": "eslint"},
128+
},
129+
}, nil
130+
}),
131+
},
132+
})
133+
if err != nil {
134+
t.Fatalf("Failed to create session: %v", err)
135+
}
136+
137+
_, err = session.Send(t.Context(), copilot.MessageOptions{Prompt: "Analyze the file main.ts for issues."})
138+
if err != nil {
139+
t.Fatalf("Failed to send message: %v", err)
140+
}
141+
142+
answer, err := testharness.GetFinalAssistantMessage(t.Context(), session)
143+
if err != nil {
144+
t.Fatalf("Failed to get assistant message: %v", err)
145+
}
146+
147+
content := ""
148+
if answer.Data.Content != nil {
149+
content = *answer.Data.Content
150+
}
151+
if !strings.Contains(strings.ToLower(content), "no issues") {
152+
t.Errorf("Expected 'no issues', got %q", content)
153+
}
154+
155+
// Verify the LLM received just textResultForLlm, not stringified JSON
156+
traffic, err := ctx.GetExchanges()
157+
if err != nil {
158+
t.Fatalf("Failed to get exchanges: %v", err)
159+
}
160+
161+
lastConversation := traffic[len(traffic)-1]
162+
var toolResults []testharness.ChatCompletionMessage
163+
for _, msg := range lastConversation.Request.Messages {
164+
if msg.Role == "tool" {
165+
toolResults = append(toolResults, msg)
166+
}
167+
}
168+
169+
if len(toolResults) != 1 {
170+
t.Fatalf("Expected 1 tool result, got %d", len(toolResults))
171+
}
172+
if strings.Contains(toolResults[0].Content, "toolTelemetry") {
173+
t.Error("Tool result content should not contain 'toolTelemetry'")
174+
}
175+
if strings.Contains(toolResults[0].Content, "resultType") {
176+
t.Error("Tool result content should not contain 'resultType'")
177+
}
178+
179+
if err := session.Disconnect(); err != nil {
180+
t.Errorf("Failed to disconnect session: %v", err)
181+
}
182+
})
183+
}

go/session.go

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -620,13 +620,34 @@ func (s *Session) executeToolAndRespond(requestID, toolName, toolCallID string,
620620
return
621621
}
622622

623-
resultStr := result.TextResultForLLM
624-
if resultStr == "" {
625-
resultStr = fmt.Sprintf("%v", result)
623+
textResultForLLM := result.TextResultForLLM
624+
if textResultForLLM == "" {
625+
textResultForLLM = fmt.Sprintf("%v", result)
626+
}
627+
628+
// Default ResultType to "success" when unset, or "failure" when there's an error.
629+
effectiveResultType := result.ResultType
630+
if effectiveResultType == "" {
631+
if result.Error != "" {
632+
effectiveResultType = "failure"
633+
} else {
634+
effectiveResultType = "success"
635+
}
636+
}
637+
638+
rpcResult := rpc.ResultUnion{
639+
ResultResult: &rpc.ResultResult{
640+
TextResultForLlm: textResultForLLM,
641+
ToolTelemetry: result.ToolTelemetry,
642+
ResultType: &effectiveResultType,
643+
},
644+
}
645+
if result.Error != "" {
646+
rpcResult.ResultResult.Error = &result.Error
626647
}
627648
s.RPC.Tools.HandlePendingToolCall(ctx, &rpc.SessionToolsHandlePendingToolCallParams{
628649
RequestID: requestID,
629-
Result: &rpc.ResultUnion{String: &resultStr},
650+
Result: &rpcResult,
630651
})
631652
}
632653

0 commit comments

Comments
 (0)