Skip to content

Commit 485976e

Browse files
authored
feat: Add TrackDurationOf, TrackMetricsOf, TrackJudgeResult, TrackToolCall (#287)
## Summary Adds the **compound tracking methods** and **new event types** to `ILdAiConfigTracker`. Callers can now measure operation duration via a callable wrapper, extract metrics from an operation result in one call, record judge evaluation outcomes, and track tool invocations. The legacy `TrackDurationOfTask` and `TrackRequest` methods are preserved but marked `[Obsolete]`. Six new methods on `ILdAiConfigTracker` (and implementations on `LdAiConfigTracker`): ### `TrackDurationOf<T>` ```csharp public Task<T> TrackDurationOf<T>(Func<Task<T>> operation); ``` Accepts a **callable** (not a pre-started task) so the tracker controls when execution begins — duration measurement starts at invocation, not at some earlier `Task.Run` call site. Uses `Stopwatch` for wall-clock precision. Duration is recorded even if the operation throws (via `finally`). Emits `$ld:ai:duration:total`. Replaces `TrackDurationOfTask<T>(Task<T>)`, which is now `[Obsolete("Use TrackDurationOf instead.")]`. ### `TrackMetricsOf<T>` ```csharp public Task<T> TrackMetricsOf<T>(Func<T, AiMetrics> metricsExtractor, Func<Task<T>> operation); ``` All-in-one wrapper that tracks duration, success/error, and optional token usage from a single operation. Flow: 1. Starts a stopwatch, invokes `operation`. 2. On success: stops timer → `TrackDuration` → calls `metricsExtractor(result)` → `TrackSuccess`/`TrackError` based on `AiMetrics.Success` → `TrackTokens` if `AiMetrics.Tokens` is non-null. 3. On exception: stops timer → `TrackDuration` → `TrackError` → re-throws. Replaces `TrackRequest(Task<Response>)`, which is now `[Obsolete("Use TrackMetricsOf instead.")]`. ### `TrackJudgeResult` ```csharp public void TrackJudgeResult(JudgeResult result); ``` Records a judge evaluation outcome. The event is **silently dropped** when `result.Sampled == false` or `result.Success == false` — this prevents noisy/invalid scores from polluting metrics. When emitted, the event uses `result.MetricKey` as the track event name and `result.Score` as the metric value. If `result.JudgeConfigKey` is non-null, it's merged into the track data alongside the standard `runId`/`configKey`/`variationKey`/`version` fields. ### `TrackToolCall` ```csharp public void TrackToolCall(string toolKey); ``` Emits a `$ld:ai:tool_call` event with `toolKey` merged into the track data. Unlike most tracker methods, this is **not** at-most-once — it may be called multiple times to record every tool invocation in a run (each emits a separate event with metric value `1`). ### `TrackToolCalls` ```csharp public void TrackToolCalls(IEnumerable<string> toolKeys); ``` Convenience batch method — iterates `toolKeys` and calls `TrackToolCall` for each. ### New types **`AiMetrics`** — immutable record holding `Success` (bool) and optional `Tokens` (Usage?) for use with `TrackMetricsOf`. **`JudgeResult`** — immutable record holding `MetricKey`, `Score`, `Sampled`, `Success`, and optional `JudgeConfigKey` for use with `TrackJudgeResult`. ### `AgentConfigs` event fix `AgentConfigs` now fires **only** the aggregate `$ld:ai:usage:agent-configs` event. It calls the private `BuildAgentConfig` path internally — it does NOT call the public `AgentConfig()` method and does NOT fire individual `$ld:ai:usage:agent-config` events. Tests updated to assert `Times.Never` on individual events. ## Test plan - [ ] `dotnet build` succeeds across `netstandard2.0`, `net462`, `net8.0` - [ ] `dotnet test --framework net8.0` passes - [ ] `LdAiConfigTrackerTest` covers the new tracker surface: - `TrackDurationOf_MeasuresDuration` — verifies wall-clock measurement via a 50ms delay - `TrackMetricsOf_SuccessPath_TracksAllMetrics` — verifies duration + success + tokens all emitted - `TrackMetricsOf_ErrorPath_TracksErrorAndRethrows` — verifies duration + error emitted, exception propagated - `TrackJudgeResult_SampledFalse_NoEventEmitted` — verifies silent drop - `TrackJudgeResult_SuccessFalse_NoEventEmitted` — verifies silent drop - `TrackJudgeResult_SuccessPath_EmitsCorrectEvent` — verifies metric key, score, judgeConfigKey in data - `TrackToolCall_DataIncludesToolKey` — verifies `$ld:ai:tool_call` event with toolKey in data - `TrackToolCall_NoAtMostOnce_EmitsMultipleEvents` — verifies repeated calls emit separate events - `DeprecatedShims_StillCallable` — verifies `[Obsolete]` methods remain functional - [ ] `LdAiClientAgentJudgeTest.AgentConfigs_FiresOnlyAggregateEvent` — verifies NO individual `$ld:ai:usage:agent-config` events - [ ] `LdAiClientTest.AgentConfigs_OnlyBatchEventFired` — same assertion from the client-level test - [ ] Reviewer confirms method signatures, event names, and at-most-once semantics match the cross-SDK contract (AITRACK §1.1.4, §1.1.12, §1.1.13, §1.1.15) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes public tracker contracts and telemetry event shapes; behavior is well-tested but callers migrating off obsolete APIs need to adopt the new wrappers. > > **Overview** > Extends **`ILdAiConfigTracker`** with compound wrappers and new event types for the server AI SDK. > > **`TrackDurationOf`** times a `Func<Task<T>>` (so measurement starts at invocation), records **`$ld:ai:duration:total`** even on failure, and supersedes **`TrackDurationOfTask`**, which stays but is **`[Obsolete]`**. **`TrackMetricsOf`** runs an operation, records duration, then applies success/error and optional tokens via a new **`AiMetrics`** extractor; **`TrackRequest`** is obsolete in favor of this pattern. > > **`TrackJudgeResult`** emits judge scores under a caller metric key when sampled and successful, optionally merging **`judgeConfigKey`** into track data. **`TrackToolCall`** / **`TrackToolCalls`** emit **`$ld:ai:tool_call`** with **`toolKey`** and are **not** at-most-once. Supporting types **`AiMetrics`** and **`JudgeResult`** are added; **`MergeTrackData`** enriches events for judge and tool tracking. > > Tests document **`AgentConfigs`** aggregate-only usage events (no per-key **`$ld:ai:usage:agent-config`**) and cover the new tracker APIs plus deprecated shims. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1e44b8d. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 34293ca commit 485976e

6 files changed

Lines changed: 435 additions & 1 deletion

File tree

pkgs/sdk/server-ai/src/Interfaces/ILdAiConfigTracker.cs

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
using System;
2+
using System.Collections.Generic;
23
using System.Threading.Tasks;
34
using LaunchDarkly.Sdk.Server.Ai.Tracking;
45

@@ -38,6 +39,15 @@ public interface ILdAiConfigTracker
3839
/// <param name="durationMs">the duration in milliseconds</param>
3940
public void TrackDuration(float durationMs);
4041

42+
/// <summary>
43+
/// Wraps a callable operation, measures its wall-clock duration, and records the result via
44+
/// <see cref="TrackDuration"/>. The duration is recorded even if the operation throws.
45+
/// </summary>
46+
/// <param name="operation">a factory that produces the task to time</param>
47+
/// <typeparam name="T">type of the operation's result</typeparam>
48+
/// <returns>the operation result</returns>
49+
public Task<T> TrackDurationOf<T>(Func<Task<T>> operation);
50+
4151
/// <summary>
4252
/// Tracks the duration of a task, and returns the result of the task.
4353
///
@@ -49,6 +59,7 @@ public interface ILdAiConfigTracker
4959
/// <param name="task">the task</param>
5060
/// <typeparam name="T">type of the task's result</typeparam>
5161
/// <returns>the task</returns>
62+
[Obsolete("Use TrackDurationOf instead.")]
5263
public Task<T> TrackDurationOfTask<T>(Task<T> task);
5364

5465
/// <summary>
@@ -84,6 +95,19 @@ public interface ILdAiConfigTracker
8495
/// </remarks>
8596
public void TrackError();
8697

98+
/// <summary>
99+
/// Wraps a callable operation, automatically tracking its duration, success/error status,
100+
/// and optional token usage. The <paramref name="metricsExtractor"/> is called with the
101+
/// operation result to produce an <see cref="AiMetrics"/> value.
102+
///
103+
/// If the operation throws, <see cref="TrackError"/> is called and the exception is re-thrown.
104+
/// </summary>
105+
/// <param name="metricsExtractor">extracts <see cref="AiMetrics"/> from the operation result</param>
106+
/// <param name="operation">a factory that produces the task to time and track</param>
107+
/// <typeparam name="T">type of the operation's result</typeparam>
108+
/// <returns>the operation result</returns>
109+
public Task<T> TrackMetricsOf<T>(Func<T, AiMetrics> metricsExtractor, Func<Task<T>> operation);
110+
87111
/// <summary>
88112
/// Tracks a request to a provider. The request is a task that returns a <see cref="Response"/>, which
89113
/// contains information about the request such as token usage and metrics.
@@ -122,6 +146,7 @@ public interface ILdAiConfigTracker
122146
/// </remarks>
123147
/// <param name="request">a task representing the request</param>
124148
/// <returns>the task</returns>
149+
[Obsolete("Use TrackMetricsOf instead.")]
125150
public Task<Response> TrackRequest(Task<Response> request);
126151

127152
/// <summary>
@@ -130,4 +155,24 @@ public interface ILdAiConfigTracker
130155
/// <remarks>Records at most once per Tracker; further calls are ignored.</remarks>
131156
/// <param name="usage">the token usage</param>
132157
public void TrackTokens(Usage usage);
158+
159+
/// <summary>
160+
/// Tracks the result of a judge evaluation. The event is silently dropped when
161+
/// <see cref="JudgeResult.Sampled"/> or <see cref="JudgeResult.Success"/> is <c>false</c>.
162+
/// </summary>
163+
/// <param name="result">the judge evaluation result</param>
164+
public void TrackJudgeResult(JudgeResult result);
165+
166+
/// <summary>
167+
/// Tracks a single tool invocation. Unlike most track methods, this is not at-most-once;
168+
/// it may be called multiple times to record multiple tool calls in the same run.
169+
/// </summary>
170+
/// <param name="toolKey">the identifier of the tool that was called</param>
171+
public void TrackToolCall(string toolKey);
172+
173+
/// <summary>
174+
/// Tracks multiple tool invocations by calling <see cref="TrackToolCall"/> for each key.
175+
/// </summary>
176+
/// <param name="toolKeys">the identifiers of the tools that were called</param>
177+
public void TrackToolCalls(IEnumerable<string> toolKeys);
133178
}

pkgs/sdk/server-ai/src/LdAiConfigTracker.cs

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ public class LdAiConfigTracker : ILdAiConfigTracker
5959
private const string TokenInput = "$ld:ai:tokens:input";
6060
private const string TokenOutput = "$ld:ai:tokens:output";
6161
private const string TimeToFirstToken = "$ld:ai:tokens:ttf";
62+
private const string ToolCall = "$ld:ai:tool_call";
6263

6364
/// <summary>
6465
/// Constructs a tracker from individual fields, ordered as defined by the AI SDK spec.
@@ -144,6 +145,22 @@ public void TrackDuration(float durationMs)
144145

145146

146147
/// <inheritdoc/>
148+
public async Task<T> TrackDurationOf<T>(Func<Task<T>> operation)
149+
{
150+
var sw = Stopwatch.StartNew();
151+
try
152+
{
153+
return await operation();
154+
}
155+
finally
156+
{
157+
sw.Stop();
158+
TrackDuration((float)sw.Elapsed.TotalMilliseconds);
159+
}
160+
}
161+
162+
/// <inheritdoc/>
163+
[Obsolete("Use TrackDurationOf instead.")]
147164
public async Task<T> TrackDurationOfTask<T>(Task<T> task)
148165
{
149166
var sw = Stopwatch.StartNew();
@@ -217,6 +234,45 @@ public void TrackError()
217234
}
218235

219236
/// <inheritdoc/>
237+
public async Task<T> TrackMetricsOf<T>(Func<T, AiMetrics> metricsExtractor, Func<Task<T>> operation)
238+
{
239+
var sw = Stopwatch.StartNew();
240+
T result;
241+
try
242+
{
243+
result = await operation();
244+
}
245+
catch (Exception)
246+
{
247+
TrackError();
248+
throw;
249+
}
250+
finally
251+
{
252+
sw.Stop();
253+
TrackDuration((float)sw.Elapsed.TotalMilliseconds);
254+
}
255+
256+
var metrics = metricsExtractor(result);
257+
if (metrics.Success)
258+
{
259+
TrackSuccess();
260+
}
261+
else
262+
{
263+
TrackError();
264+
}
265+
266+
if (metrics.Tokens != null)
267+
{
268+
TrackTokens(metrics.Tokens.Value);
269+
}
270+
271+
return result;
272+
}
273+
274+
/// <inheritdoc/>
275+
[Obsolete("Use TrackMetricsOf instead.")]
220276
public async Task<Response> TrackRequest(Task<Response> request)
221277
{
222278
var sw = Stopwatch.StartNew();
@@ -274,6 +330,44 @@ public void TrackTokens(Usage usage)
274330
}
275331
}
276332

333+
/// <inheritdoc/>
334+
public void TrackJudgeResult(JudgeResult result)
335+
{
336+
if (!result.Sampled || !result.Success)
337+
{
338+
return;
339+
}
340+
341+
var data = string.IsNullOrEmpty(result.JudgeConfigKey)
342+
? _trackData
343+
: MergeTrackData("judgeConfigKey", LdValue.Of(result.JudgeConfigKey));
344+
345+
_client.Track(result.MetricKey, _context, data, result.Score);
346+
}
347+
348+
/// <inheritdoc/>
349+
public void TrackToolCall(string toolKey)
350+
{
351+
var data = MergeTrackData("toolKey", LdValue.Of(toolKey));
352+
_client.Track(ToolCall, _context, data, 1);
353+
}
354+
355+
/// <inheritdoc/>
356+
public void TrackToolCalls(IEnumerable<string> toolKeys)
357+
{
358+
foreach (var key in toolKeys)
359+
{
360+
TrackToolCall(key);
361+
}
362+
}
363+
364+
private LdValue MergeTrackData(string key, LdValue value)
365+
{
366+
var builder = new Dictionary<string, LdValue>(_trackData.Dictionary);
367+
builder[key] = value;
368+
return LdValue.ObjectFrom(builder);
369+
}
370+
277371
/// <summary>
278372
/// Reconstructs a tracker from a resumption token. This enables cross-process scenarios
279373
/// such as deferred feedback, where a tracker's runId needs to be reused in a different
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
namespace LaunchDarkly.Sdk.Server.Ai.Tracking;
2+
3+
/// <summary>
4+
/// Holds the metrics extracted from an AI operation for use with
5+
/// <c>ILdAiConfigTracker.TrackMetricsOf</c>.
6+
/// </summary>
7+
public sealed record AiMetrics
8+
{
9+
/// <summary>
10+
/// Whether the operation succeeded.
11+
/// </summary>
12+
public readonly bool Success;
13+
14+
/// <summary>
15+
/// Optional token usage for the operation.
16+
/// </summary>
17+
public readonly Usage? Tokens;
18+
19+
/// <summary>
20+
/// Constructs an <see cref="AiMetrics"/> value.
21+
/// </summary>
22+
/// <param name="success">whether the operation succeeded</param>
23+
/// <param name="tokens">optional token usage</param>
24+
public AiMetrics(bool success, Usage? tokens = null)
25+
{
26+
Success = success;
27+
Tokens = tokens;
28+
}
29+
}
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
namespace LaunchDarkly.Sdk.Server.Ai.Tracking;
2+
3+
/// <summary>
4+
/// Represents the result of a judge evaluation for use with
5+
/// <c>ILdAiConfigTracker.TrackJudgeResult</c>.
6+
/// </summary>
7+
public sealed record JudgeResult
8+
{
9+
/// <summary>
10+
/// The LaunchDarkly metric key to emit the event under.
11+
/// </summary>
12+
public readonly string MetricKey;
13+
14+
/// <summary>
15+
/// The numeric score for this evaluation.
16+
/// </summary>
17+
public readonly double Score;
18+
19+
/// <summary>
20+
/// Whether this result was sampled. When <c>false</c>, the event is silently dropped.
21+
/// </summary>
22+
public readonly bool Sampled;
23+
24+
/// <summary>
25+
/// Whether the judge evaluation succeeded. When <c>false</c>, the event is silently dropped.
26+
/// </summary>
27+
public readonly bool Success;
28+
29+
/// <summary>
30+
/// Optional AI Judge Config key to include in the event data.
31+
/// </summary>
32+
public readonly string JudgeConfigKey;
33+
34+
/// <summary>
35+
/// Constructs a <see cref="JudgeResult"/>.
36+
/// </summary>
37+
/// <param name="metricKey">the LaunchDarkly metric key</param>
38+
/// <param name="score">the numeric score</param>
39+
/// <param name="sampled">whether sampled; defaults to <c>true</c></param>
40+
/// <param name="success">whether successful; defaults to <c>true</c></param>
41+
/// <param name="judgeConfigKey">optional judge config key</param>
42+
public JudgeResult(
43+
string metricKey,
44+
double score,
45+
bool sampled = true,
46+
bool success = true,
47+
string judgeConfigKey = null)
48+
{
49+
MetricKey = metricKey;
50+
Score = score;
51+
Sampled = sampled;
52+
Success = success;
53+
JudgeConfigKey = judgeConfigKey;
54+
}
55+
}

pkgs/sdk/server-ai/test/LdAiClientTest.cs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1035,6 +1035,7 @@ public void AgentConfigs_OnlyBatchEventFired()
10351035

10361036
client.AgentConfigs(requests, context);
10371037

1038+
// Individual $ld:ai:usage:agent-config must NOT fire — the caller used AgentConfigs, not AgentConfig.
10381039
mockClient.Verify(c => c.Track(
10391040
"$ld:ai:usage:agent-config",
10401041
context,
@@ -1079,15 +1080,17 @@ public void AgentConfigs_DuplicateKeys_AggregateEventCountsAllRequests()
10791080

10801081
var result = client.AgentConfigs(requests, context);
10811082

1082-
// The result dictionary de-duplicates, but the aggregate event should count all 3 requests.
1083+
// The result dictionary de-duplicates by key.
10831084
Assert.Equal(2, result.Count);
10841085

1086+
// Individual events must NOT fire.
10851087
mockClient.Verify(c => c.Track(
10861088
"$ld:ai:usage:agent-config",
10871089
context,
10881090
It.IsAny<LdValue>(),
10891091
It.IsAny<double>()), Times.Never);
10901092

1093+
// Aggregate event counts all 3 requests, including the duplicate.
10911094
mockClient.Verify(c => c.Track(
10921095
"$ld:ai:usage:agent-configs",
10931096
context,

0 commit comments

Comments
 (0)