feat: Add TrackDurationOf, TrackMetricsOf, TrackJudgeResult, TrackToolCall (#287)

mattrmc1 · web-flow · commit 485976e2b90d · 2026-06-11T10:59:43.000-05:00
## Summary Adds the **compound tracking methods** and **new event types** to `ILdAiConfigTracker`. Callers can now measure operation duration via a callable wrapper, extract metrics from an operation result in one call, record judge evaluation outcomes, and track tool invocations. The legacy `TrackDurationOfTask` and `TrackRequest` methods are preserved but marked `[Obsolete]`. Six new methods on `ILdAiConfigTracker` (and implementations on `LdAiConfigTracker`): ### `TrackDurationOf<T>` ```csharp public Task<T> TrackDurationOf<T>(Func<Task<T>> operation); ``` Accepts a **callable** (not a pre-started task) so the tracker controls when execution begins — duration measurement starts at invocation, not at some earlier `Task.Run` call site. Uses `Stopwatch` for wall-clock precision. Duration is recorded even if the operation throws (via `finally`). Emits `$ld:ai:duration:total`. Replaces `TrackDurationOfTask<T>(Task<T>)`, which is now `[Obsolete("Use TrackDurationOf instead.")]`. ### `TrackMetricsOf<T>` ```csharp public Task<T> TrackMetricsOf<T>(Func<T, AiMetrics> metricsExtractor, Func<Task<T>> operation); ``` All-in-one wrapper that tracks duration, success/error, and optional token usage from a single operation. Flow: 1. Starts a stopwatch, invokes `operation`. 2. On success: stops timer → `TrackDuration` → calls `metricsExtractor(result)` → `TrackSuccess`/`TrackError` based on `AiMetrics.Success` → `TrackTokens` if `AiMetrics.Tokens` is non-null. 3. On exception: stops timer → `TrackDuration` → `TrackError` → re-throws. Replaces `TrackRequest(Task<Response>)`, which is now `[Obsolete("Use TrackMetricsOf instead.")]`. ### `TrackJudgeResult` ```csharp public void TrackJudgeResult(JudgeResult result); ``` Records a judge evaluation outcome. The event is **silently dropped** when `result.Sampled == false` or `result.Success == false` — this prevents noisy/invalid scores from polluting metrics. When emitted, the event uses `result.MetricKey` as the track event name and `result.Score` as the metric value. If `result.JudgeConfigKey` is non-null, it's merged into the track data alongside the standard `runId`/`configKey`/`variationKey`/`version` fields. ### `TrackToolCall` ```csharp public void TrackToolCall(string toolKey); ``` Emits a `$ld:ai:tool_call` event with `toolKey` merged into the track data. Unlike most tracker methods, this is **not** at-most-once — it may be called multiple times to record every tool invocation in a run (each emits a separate event with metric value `1`). ### `TrackToolCalls` ```csharp public void TrackToolCalls(IEnumerable<string> toolKeys); ``` Convenience batch method — iterates `toolKeys` and calls `TrackToolCall` for each. ### New types **`AiMetrics`** — immutable record holding `Success` (bool) and optional `Tokens` (Usage?) for use with `TrackMetricsOf`. **`JudgeResult`** — immutable record holding `MetricKey`, `Score`, `Sampled`, `Success`, and optional `JudgeConfigKey` for use with `TrackJudgeResult`. ### `AgentConfigs` event fix `AgentConfigs` now fires **only** the aggregate `$ld:ai:usage:agent-configs` event. It calls the private `BuildAgentConfig` path internally — it does NOT call the public `AgentConfig()` method and does NOT fire individual `$ld:ai:usage:agent-config` events. Tests updated to assert `Times.Never` on individual events. ## Test plan - [ ] `dotnet build` succeeds across `netstandard2.0`, `net462`, `net8.0` - [ ] `dotnet test --framework net8.0` passes - [ ] `LdAiConfigTrackerTest` covers the new tracker surface: - `TrackDurationOf_MeasuresDuration` — verifies wall-clock measurement via a 50ms delay - `TrackMetricsOf_SuccessPath_TracksAllMetrics` — verifies duration + success + tokens all emitted - `TrackMetricsOf_ErrorPath_TracksErrorAndRethrows` — verifies duration + error emitted, exception propagated - `TrackJudgeResult_SampledFalse_NoEventEmitted` — verifies silent drop - `TrackJudgeResult_SuccessFalse_NoEventEmitted` — verifies silent drop - `TrackJudgeResult_SuccessPath_EmitsCorrectEvent` — verifies metric key, score, judgeConfigKey in data - `TrackToolCall_DataIncludesToolKey` — verifies `$ld:ai:tool_call` event with toolKey in data - `TrackToolCall_NoAtMostOnce_EmitsMultipleEvents` — verifies repeated calls emit separate events - `DeprecatedShims_StillCallable` — verifies `[Obsolete]` methods remain functional - [ ] `LdAiClientAgentJudgeTest.AgentConfigs_FiresOnlyAggregateEvent` — verifies NO individual `$ld:ai:usage:agent-config` events - [ ] `LdAiClientTest.AgentConfigs_OnlyBatchEventFired` — same assertion from the client-level test - [ ] Reviewer confirms method signatures, event names, and at-most-once semantics match the cross-SDK contract (AITRACK §1.1.4, §1.1.12, §1.1.13, §1.1.15)  --- > [!NOTE] > **Medium Risk** > Changes public tracker contracts and telemetry event shapes; behavior is well-tested but callers migrating off obsolete APIs need to adopt the new wrappers. > > **Overview** > Extends **`ILdAiConfigTracker`** with compound wrappers and new event types for the server AI SDK. > > **`TrackDurationOf`** times a `Func<Task<T>>` (so measurement starts at invocation), records **`$ld:ai:duration:total`** even on failure, and supersedes **`TrackDurationOfTask`**, which stays but is **`[Obsolete]`**. **`TrackMetricsOf`** runs an operation, records duration, then applies success/error and optional tokens via a new **`AiMetrics`** extractor; **`TrackRequest`** is obsolete in favor of this pattern. > > **`TrackJudgeResult`** emits judge scores under a caller metric key when sampled and successful, optionally merging **`judgeConfigKey`** into track data. **`TrackToolCall`** / **`TrackToolCalls`** emit **`$ld:ai:tool_call`** with **`toolKey`** and are **not** at-most-once. Supporting types **`AiMetrics`** and **`JudgeResult`** are added; **`MergeTrackData`** enriches events for judge and tool tracking. > > Tests document **`AgentConfigs`** aggregate-only usage events (no per-key **`$ld:ai:usage:agent-config`**) and cover the new tracker APIs plus deprecated shims. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1e44b8d. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>
diff --git a/pkgs/sdk/server-ai/src/Interfaces/ILdAiConfigTracker.cs b/pkgs/sdk/server-ai/src/Interfaces/ILdAiConfigTracker.cs
@@ -1,4 +1,5 @@
 using System;
+using System.Collections.Generic;
 using System.Threading.Tasks;
 using LaunchDarkly.Sdk.Server.Ai.Tracking;
 
@@ -38,6 +39,15 @@ public interface ILdAiConfigTracker
     /// <param name="durationMs">the duration in milliseconds</param>
     public void TrackDuration(float durationMs);
 
+    /// <summary>
+    /// Wraps a callable operation, measures its wall-clock duration, and records the result via
+    /// <see cref="TrackDuration"/>. The duration is recorded even if the operation throws.
+    /// </summary>
+    /// <param name="operation">a factory that produces the task to time</param>
+    /// <typeparam name="T">type of the operation's result</typeparam>
+    /// <returns>the operation result</returns>
+    public Task<T> TrackDurationOf<T>(Func<Task<T>> operation);
+
     /// <summary>
     /// Tracks the duration of a task, and returns the result of the task.
     ///
@@ -49,6 +59,7 @@ public interface ILdAiConfigTracker
     /// <param name="task">the task</param>
     /// <typeparam name="T">type of the task's result</typeparam>
     /// <returns>the task</returns>
+    [Obsolete("Use TrackDurationOf instead.")]
     public Task<T> TrackDurationOfTask<T>(Task<T> task);
 
     /// <summary>
@@ -84,6 +95,19 @@ public interface ILdAiConfigTracker
     /// </remarks>
     public void TrackError();
 
+    /// <summary>
+    /// Wraps a callable operation, automatically tracking its duration, success/error status,
+    /// and optional token usage. The <paramref name="metricsExtractor"/> is called with the
+    /// operation result to produce an <see cref="AiMetrics"/> value.
+    ///
+    /// If the operation throws, <see cref="TrackError"/> is called and the exception is re-thrown.
+    /// </summary>
+    /// <param name="metricsExtractor">extracts <see cref="AiMetrics"/> from the operation result</param>
+    /// <param name="operation">a factory that produces the task to time and track</param>
+    /// <typeparam name="T">type of the operation's result</typeparam>
+    /// <returns>the operation result</returns>
+    public Task<T> TrackMetricsOf<T>(Func<T, AiMetrics> metricsExtractor, Func<Task<T>> operation);
+
     /// <summary>
     /// Tracks a request to a provider. The request is a task that returns a <see cref="Response"/>, which
     /// contains information about the request such as token usage and metrics.
@@ -122,6 +146,7 @@ public interface ILdAiConfigTracker
     /// </remarks>
     /// <param name="request">a task representing the request</param>
     /// <returns>the task</returns>
+    [Obsolete("Use TrackMetricsOf instead.")]
     public Task<Response> TrackRequest(Task<Response> request);
 
     /// <summary>
@@ -130,4 +155,24 @@ public interface ILdAiConfigTracker
     /// <remarks>Records at most once per Tracker; further calls are ignored.</remarks>
     /// <param name="usage">the token usage</param>
     public void TrackTokens(Usage usage);
+
+    /// <summary>
+    /// Tracks the result of a judge evaluation. The event is silently dropped when
+    /// <see cref="JudgeResult.Sampled"/> or <see cref="JudgeResult.Success"/> is <c>false</c>.
+    /// </summary>
+    /// <param name="result">the judge evaluation result</param>
+    public void TrackJudgeResult(JudgeResult result);
+
+    /// <summary>
+    /// Tracks a single tool invocation. Unlike most track methods, this is not at-most-once;
+    /// it may be called multiple times to record multiple tool calls in the same run.
+    /// </summary>
+    /// <param name="toolKey">the identifier of the tool that was called</param>
+    public void TrackToolCall(string toolKey);
+
+    /// <summary>
+    /// Tracks multiple tool invocations by calling <see cref="TrackToolCall"/> for each key.
+    /// </summary>
+    /// <param name="toolKeys">the identifiers of the tools that were called</param>
+    public void TrackToolCalls(IEnumerable<string> toolKeys);
 }
diff --git a/pkgs/sdk/server-ai/src/LdAiConfigTracker.cs b/pkgs/sdk/server-ai/src/LdAiConfigTracker.cs
@@ -59,6 +59,7 @@ public class LdAiConfigTracker : ILdAiConfigTracker
     private const string TokenInput = "$ld:ai:tokens:input";
     private const string TokenOutput = "$ld:ai:tokens:output";
     private const string TimeToFirstToken = "$ld:ai:tokens:ttf";
+    private const string ToolCall = "$ld:ai:tool_call";
 
     /// <summary>
     /// Constructs a tracker from individual fields, ordered as defined by the AI SDK spec.
@@ -144,6 +145,22 @@ public void TrackDuration(float durationMs)
 
 
     /// <inheritdoc/>
+    public async Task<T> TrackDurationOf<T>(Func<Task<T>> operation)
+    {
+        var sw = Stopwatch.StartNew();
+        try
+        {
+            return await operation();
+        }
+        finally
+        {
+            sw.Stop();
+            TrackDuration((float)sw.Elapsed.TotalMilliseconds);
+        }
+    }
+
+    /// <inheritdoc/>
+    [Obsolete("Use TrackDurationOf instead.")]
     public async Task<T> TrackDurationOfTask<T>(Task<T> task)
     {
         var sw = Stopwatch.StartNew();
@@ -217,6 +234,45 @@ public void TrackError()
     }
 
     /// <inheritdoc/>
+    public async Task<T> TrackMetricsOf<T>(Func<T, AiMetrics> metricsExtractor, Func<Task<T>> operation)
+    {
+        var sw = Stopwatch.StartNew();
+        T result;
+        try
+        {
+            result = await operation();
+        }
+        catch (Exception)
+        {
+            TrackError();
+            throw;
+        }
+        finally
+        {
+            sw.Stop();
+            TrackDuration((float)sw.Elapsed.TotalMilliseconds);
+        }
+
+        var metrics = metricsExtractor(result);
+        if (metrics.Success)
+        {
+            TrackSuccess();
+        }
+        else
+        {
+            TrackError();
+        }
+
+        if (metrics.Tokens != null)
+        {
+            TrackTokens(metrics.Tokens.Value);
+        }
+
+        return result;
+    }
+
+    /// <inheritdoc/>
+    [Obsolete("Use TrackMetricsOf instead.")]
     public async Task<Response> TrackRequest(Task<Response> request)
     {
         var sw = Stopwatch.StartNew();
@@ -274,6 +330,44 @@ public void TrackTokens(Usage usage)
         }
     }
 
+    /// <inheritdoc/>
+    public void TrackJudgeResult(JudgeResult result)
+    {
+        if (!result.Sampled || !result.Success)
+        {
+            return;
+        }
+
+        var data = string.IsNullOrEmpty(result.JudgeConfigKey)
+            ? _trackData
+            : MergeTrackData("judgeConfigKey", LdValue.Of(result.JudgeConfigKey));
+
+        _client.Track(result.MetricKey, _context, data, result.Score);
+    }
+
+    /// <inheritdoc/>
+    public void TrackToolCall(string toolKey)
+    {
+        var data = MergeTrackData("toolKey", LdValue.Of(toolKey));
+        _client.Track(ToolCall, _context, data, 1);
+    }
+
+    /// <inheritdoc/>
+    public void TrackToolCalls(IEnumerable<string> toolKeys)
+    {
+        foreach (var key in toolKeys)
+        {
+            TrackToolCall(key);
+        }
+    }
+
+    private LdValue MergeTrackData(string key, LdValue value)
+    {
+        var builder = new Dictionary<string, LdValue>(_trackData.Dictionary);
+        builder[key] = value;
+        return LdValue.ObjectFrom(builder);
+    }
+
     /// <summary>
     /// Reconstructs a tracker from a resumption token. This enables cross-process scenarios
     /// such as deferred feedback, where a tracker's runId needs to be reused in a different
diff --git a/pkgs/sdk/server-ai/src/Tracking/AiMetrics.cs b/pkgs/sdk/server-ai/src/Tracking/AiMetrics.cs
@@ -0,0 +1,29 @@
+namespace LaunchDarkly.Sdk.Server.Ai.Tracking;
+
+/// <summary>
+/// Holds the metrics extracted from an AI operation for use with
+/// <c>ILdAiConfigTracker.TrackMetricsOf</c>.
+/// </summary>
+public sealed record AiMetrics
+{
+    /// <summary>
+    /// Whether the operation succeeded.
+    /// </summary>
+    public readonly bool Success;
+
+    /// <summary>
+    /// Optional token usage for the operation.
+    /// </summary>
+    public readonly Usage? Tokens;
+
+    /// <summary>
+    /// Constructs an <see cref="AiMetrics"/> value.
+    /// </summary>
+    /// <param name="success">whether the operation succeeded</param>
+    /// <param name="tokens">optional token usage</param>
+    public AiMetrics(bool success, Usage? tokens = null)
+    {
+        Success = success;
+        Tokens = tokens;
+    }
+}
diff --git a/pkgs/sdk/server-ai/src/Tracking/JudgeResult.cs b/pkgs/sdk/server-ai/src/Tracking/JudgeResult.cs
@@ -0,0 +1,55 @@
+namespace LaunchDarkly.Sdk.Server.Ai.Tracking;
+
+/// <summary>
+/// Represents the result of a judge evaluation for use with
+/// <c>ILdAiConfigTracker.TrackJudgeResult</c>.
+/// </summary>
+public sealed record JudgeResult
+{
+    /// <summary>
+    /// The LaunchDarkly metric key to emit the event under.
+    /// </summary>
+    public readonly string MetricKey;
+
+    /// <summary>
+    /// The numeric score for this evaluation.
+    /// </summary>
+    public readonly double Score;
+
+    /// <summary>
+    /// Whether this result was sampled. When <c>false</c>, the event is silently dropped.
+    /// </summary>
+    public readonly bool Sampled;
+
+    /// <summary>
+    /// Whether the judge evaluation succeeded. When <c>false</c>, the event is silently dropped.
+    /// </summary>
+    public readonly bool Success;
+
+    /// <summary>
+    /// Optional AI Judge Config key to include in the event data.
+    /// </summary>
+    public readonly string JudgeConfigKey;
+
+    /// <summary>
+    /// Constructs a <see cref="JudgeResult"/>.
+    /// </summary>
+    /// <param name="metricKey">the LaunchDarkly metric key</param>
+    /// <param name="score">the numeric score</param>
+    /// <param name="sampled">whether sampled; defaults to <c>true</c></param>
+    /// <param name="success">whether successful; defaults to <c>true</c></param>
+    /// <param name="judgeConfigKey">optional judge config key</param>
+    public JudgeResult(
+        string metricKey,
+        double score,
+        bool sampled = true,
+        bool success = true,
+        string judgeConfigKey = null)
+    {
+        MetricKey = metricKey;
+        Score = score;
+        Sampled = sampled;
+        Success = success;
+        JudgeConfigKey = judgeConfigKey;
+    }
+}
diff --git a/pkgs/sdk/server-ai/test/LdAiClientTest.cs b/pkgs/sdk/server-ai/test/LdAiClientTest.cs
@@ -1035,6 +1035,7 @@ public void AgentConfigs_OnlyBatchEventFired()
 
         client.AgentConfigs(requests, context);
 
+        // Individual $ld:ai:usage:agent-config must NOT fire — the caller used AgentConfigs, not AgentConfig.
         mockClient.Verify(c => c.Track(
             "$ld:ai:usage:agent-config",
             context,
@@ -1079,15 +1080,17 @@ public void AgentConfigs_DuplicateKeys_AggregateEventCountsAllRequests()
 
         var result = client.AgentConfigs(requests, context);
 
-        // The result dictionary de-duplicates, but the aggregate event should count all 3 requests.
+        // The result dictionary de-duplicates by key.
         Assert.Equal(2, result.Count);
 
+        // Individual events must NOT fire.
         mockClient.Verify(c => c.Track(
             "$ld:ai:usage:agent-config",
             context,
             It.IsAny<LdValue>(),
             It.IsAny<double>()), Times.Never);
 
+        // Aggregate event counts all 3 requests, including the duplicate.
         mockClient.Verify(c => c.Track(
             "$ld:ai:usage:agent-configs",
             context,
diff --git a/pkgs/sdk/server-ai/test/LdAiConfigTrackerTest.cs b/pkgs/sdk/server-ai/test/LdAiConfigTrackerTest.cs