Skip to content

Commit 9a9941b

Browse files
authored
feat: Add AIConfigTracker with at-most-once tracking and resumption tokens (#179)
## Summary Implements the full `LDAIConfigTracker` interface — previously a stub. Callers can now record AI operation metrics (duration, tokens, success/error, feedback, tool calls, judge results) with at-most-once enforcement, extract metrics from runner operations via `trackMetricsOf`, and reconstruct trackers across processes via resumption tokens. ### Tracking methods ```java void trackDuration(Duration duration); <T> T trackDurationOf(Callable<T> operation) throws Exception; ``` Records wall-clock duration. Null silently dropped (debug log); negatives clamped to zero. `trackDurationOf` wraps a `Callable`, measures via `System.nanoTime()`, records duration in `finally` even on exception. ```java <T> T trackMetricsOf(Function<? super T, AIMetrics> metricsExtractor, Callable<T> operation) throws Exception; ``` All-in-one wrapper: starts timer, invokes operation, stops clock before calling the extractor (slow extractors don't inflate duration). On success: prefers runner-reported `durationMs` over wall-clock, then delegates to `trackSuccess`/`trackError`, `trackTokens`, `trackToolCalls`. On exception: records wall-clock duration, calls `trackError`, rethrows. If the extractor itself throws, operation duration is still recorded before propagating — `trackError` is NOT called since the AI operation succeeded. ```java void trackSuccess(); void trackError(); ``` Share a single `AtomicReference<Boolean>` guard — only the first to fire wins. ```java void trackFeedback(FeedbackKind kind); ``` Validates and resolves the event name before claiming the at-most-once guard, so null/invalid input doesn't burn the slot. ```java void trackTokens(TokenUsage tokens); ``` Emits events for each positive count (total, input, output). All-zero usage does not consume the at-most-once slot. ```java void trackToolCall(String toolKey); void trackToolCalls(List<String> toolKeys); ``` Multi-fire (not at-most-once). Each call emits a separate `$ld:ai:tool_call` event. ```java void trackJudgeResult(JudgeResult result); ``` Silently dropped when not sampled, not successful, or when `metricKey` is blank/null or `score` is null/non-finite. Multi-fire. ```java void trackTimeToFirstToken(Duration duration); ``` Records time-to-first-token duration. At-most-once. ### Resumption tokens ```java String getResumptionToken(); // on LDAIConfigTracker LDAIConfigTracker createTracker(String resumptionToken, LDContext context); // on LDAIClient ``` `getResumptionToken()` returns URL-safe Base64 (no padding) JSON containing `{ runId, configKey, variationKey, version, graphKey }`. `variationKey` and `graphKey` omitted when null. No length cap — large config keys are supported. Empty `runId` / `configKey` are rejected on decode. ### Tracker factory wiring `LDAIClientImpl` now creates real `LDAIConfigTrackerImpl` instances. A private `trackerFactory` method captures config identity and returns a `Supplier<LDAIConfigTracker>` producing a fresh tracker with a new `runId` on each call. Default configs also get real trackers. Default version is `1`. `NoOpAIConfigTracker` deleted — no longer needed. ### New types **`FeedbackKind`** — enum: `POSITIVE`, `NEGATIVE`. **`TokenUsage`** — immutable record: `total`, `input`, `output`. **`AIMetrics`** — immutable builder: `success`, optional `tokens`, `durationMs`, `toolCalls`. **`JudgeResult`** — immutable builder: `metricKey`, `score`, `sampled`, `success`, optional `judgeConfigKey`, `reasoning`, `errorMessage`. **`MetricSummary`** — snapshot of all tracked metrics plus resumption token. **`TrackData`** — run identity fields with `toLDValue()`. ### Thread safety All at-most-once slots use `AtomicReference<T>.compareAndSet(null, value)` — single atomic guard+value, no race window. Tool calls use `CopyOnWriteArrayList`. ## Test plan - [ ] `./gradlew :lib:sdk:server-ai:test` passes - [ ] `LDAIConfigTrackerImplTest` — duration (emit, clamp, at-most-once, null), durationOf (success + exception), success/error (emit, shared guard both directions), feedback (emit, at-most-once, null slot preservation), tokens (positive counts, zero skip, slot preservation), tool calls (multi-fire, null), judge result (sampled/success/metricKey/score guards, multi-fire), trackMetricsOf (success path, error path, extractor failure duration tracking, null AIMetrics guard), variationKey/graphKey in payload, concurrency (20-thread contention), constructor null rejection - [ ] `ResumptionTokensTest` — encode/decode round-trips, large keys, special character escaping, null/malformed rejection, empty runId/configKey rejection <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > New public tracking API and telemetry emission change observability behavior; resumption tokens embed flag-targeting metadata if exposed to clients. > > **Overview** > Replaces the **no-op** `LDAIConfigTracker` stub with a full implementation that emits LaunchDarkly custom metrics for AI runs (duration, time-to-first-token, success/error, feedback, tokens, tool calls, and judge scores). > > **`LDAIClientImpl`** now supplies a per-config `Supplier` that creates **`LDAIConfigTrackerImpl`** instances (new UUID `runId` per `createTracker()`), including when falling back to caller defaults. **`NoOpAIConfigTracker`** is removed. **`LDAIClient#createTracker(String, LDContext)`** decodes a resumption token to continue the same run across requests. > > The expanded **`LDAIConfigTracker`** API adds **`trackMetricsOf`**, **`getSummary`**, **`getTrackData`**, and **`getResumptionToken`**, with **at-most-once** semantics on most metrics (tool calls and judge results are multi-fire). **`LDAITrackingTypes`** holds the new immutable value types; **`ResumptionTokens`** encodes/decodes URL-safe Base64 JSON for run identity (docs warn tokens can expose **variation key** / version and should stay server-side). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 121b140. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent ad2ac08 commit 9a9941b

9 files changed

Lines changed: 2525 additions & 36 deletions

File tree

lib/sdk/server-ai/src/main/java/com/launchdarkly/sdk/server/ai/LDAIClient.java

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,4 +81,23 @@ AIJudgeConfig judgeConfig(
8181
LDContext context,
8282
AIJudgeConfigDefault defaultValue,
8383
Map<String, Object> variables);
84+
85+
/**
86+
* Reconstructs a tracker from a resumption token, preserving the original run's identity.
87+
* <p>
88+
* Use this when a multi-turn or streaming AI interaction spans multiple requests. The caller
89+
* stores the resumption token from a previous tracker (via
90+
* {@link LDAIConfigTracker#getResumptionToken()}) and passes it back here to continue tracking
91+
* against the same run.
92+
* <p>
93+
* <strong>Security note:</strong> resumption tokens embed flag-evaluation details such as the
94+
* variation key and config version. Keep tokens server-side and do not round-trip them through
95+
* untrusted clients where they could leak flag-targeting information.
96+
*
97+
* @param resumptionToken the token returned by a previous tracker; must not be {@code null}
98+
* @param context the evaluation context for the new request; must not be {@code null}
99+
* @return a tracker with the decoded run identity, never {@code null}
100+
* @throws IllegalArgumentException if the token is malformed
101+
*/
102+
LDAIConfigTracker createTracker(String resumptionToken, LDContext context);
84103
}

lib/sdk/server-ai/src/main/java/com/launchdarkly/sdk/server/ai/LDAIClientImpl.java

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,21 @@
88
import com.launchdarkly.sdk.LDContext;
99
import com.launchdarkly.sdk.LDValue;
1010
import com.launchdarkly.sdk.LDValueType;
11-
import com.launchdarkly.sdk.server.ai.datamodel.LDAIConfigTypes.Mode;
1211
import com.launchdarkly.sdk.server.ai.datamodel.LDAIConfigTypes.Message;
12+
import com.launchdarkly.sdk.server.ai.datamodel.LDAIConfigTypes.Mode;
1313
import com.launchdarkly.sdk.server.ai.internal.AIConfigFlagValue;
1414
import com.launchdarkly.sdk.server.ai.internal.AIConfigParser;
1515
import com.launchdarkly.sdk.server.ai.internal.AISdkInfo;
1616
import com.launchdarkly.sdk.server.ai.internal.Interpolator;
17-
import com.launchdarkly.sdk.server.ai.internal.NoOpAIConfigTracker;
17+
import com.launchdarkly.sdk.server.ai.internal.LDAIConfigTrackerImpl;
1818
import com.launchdarkly.sdk.server.interfaces.LDClientInterface;
1919

2020
import java.util.ArrayList;
2121
import java.util.LinkedHashMap;
2222
import java.util.List;
2323
import java.util.Map;
2424
import java.util.Objects;
25+
import java.util.UUID;
2526
import java.util.function.Supplier;
2627

2728
/**
@@ -51,8 +52,6 @@ public final class LDAIClientImpl implements LDAIClient {
5152
.anonymous(true)
5253
.build();
5354

54-
// Tracking is implemented in a later step; until then every config hands out the no-op tracker.
55-
private static final Supplier<LDAIConfigTracker> TRACKER_FACTORY = () -> NoOpAIConfigTracker.INSTANCE;
5655

5756
private final LDClientInterface client;
5857
private final LDLogger logger;
@@ -187,6 +186,9 @@ private AIConfig buildConfig(
187186
AIConfigFlagValue parsed,
188187
LDContext context,
189188
Map<String, Object> variables) {
189+
Supplier<LDAIConfigTracker> factory = trackerFactory(
190+
key, parsed.getVariationKey(), parsed.getVersion(),
191+
parsed.getModel(), parsed.getProvider(), context);
190192
switch (mode) {
191193
case AGENT:
192194
return new AIAgentConfig(
@@ -197,7 +199,7 @@ private AIConfig buildConfig(
197199
interpolate(parsed.getInstructions(), variables, context),
198200
parsed.getJudgeConfiguration(),
199201
parsed.getTools(),
200-
TRACKER_FACTORY);
202+
factory);
201203
case JUDGE:
202204
return new AIJudgeConfig(
203205
key,
@@ -206,7 +208,7 @@ private AIConfig buildConfig(
206208
parsed.getProvider(),
207209
interpolateMessages(parsed.getMessages(), variables, context),
208210
parsed.getEvaluationMetricKey(),
209-
TRACKER_FACTORY);
211+
factory);
210212
case COMPLETION:
211213
default:
212214
return new AICompletionConfig(
@@ -217,7 +219,7 @@ private AIConfig buildConfig(
217219
interpolateMessages(parsed.getMessages(), variables, context),
218220
parsed.getJudgeConfiguration(),
219221
parsed.getTools(),
220-
TRACKER_FACTORY);
222+
factory);
221223
}
222224
}
223225

@@ -231,6 +233,9 @@ private AIConfig buildConfigFromDefault(
231233
AIConfigDefault defaultValue,
232234
LDContext context,
233235
Map<String, Object> variables) {
236+
// Default configs still get real trackers — the configKey was requested even if no flag was found.
237+
// variationKey is null because no flag evaluation occurred.
238+
Supplier<LDAIConfigTracker> factory = trackerFactory(key, null, null, null, null, context);
234239
switch (mode) {
235240
case AGENT: {
236241
AIAgentConfigDefault agent = (AIAgentConfigDefault) defaultValue;
@@ -242,7 +247,7 @@ private AIConfig buildConfigFromDefault(
242247
interpolate(agent.getInstructions(), variables, context),
243248
agent.getJudgeConfiguration(),
244249
agent.getTools(),
245-
TRACKER_FACTORY);
250+
factory);
246251
}
247252
case JUDGE: {
248253
AIJudgeConfigDefault judge = (AIJudgeConfigDefault) defaultValue;
@@ -253,7 +258,7 @@ private AIConfig buildConfigFromDefault(
253258
judge.getProvider(),
254259
interpolateMessages(judge.getMessages(), variables, context),
255260
judge.getEvaluationMetricKey(),
256-
TRACKER_FACTORY);
261+
factory);
257262
}
258263
case COMPLETION:
259264
default: {
@@ -266,11 +271,43 @@ private AIConfig buildConfigFromDefault(
266271
interpolateMessages(completion.getMessages(), variables, context),
267272
completion.getJudgeConfiguration(),
268273
completion.getTools(),
269-
TRACKER_FACTORY);
274+
factory);
270275
}
271276
}
272277
}
273278

279+
/**
280+
* Creates a per-evaluation tracker factory. Each call to the returned {@link Supplier} produces
281+
* a fresh {@link LDAIConfigTrackerImpl} with a new {@code runId}.
282+
*/
283+
private Supplier<LDAIConfigTracker> trackerFactory(
284+
String configKey,
285+
String variationKey,
286+
Integer version,
287+
com.launchdarkly.sdk.server.ai.datamodel.LDAIConfigTypes.Model model,
288+
com.launchdarkly.sdk.server.ai.datamodel.LDAIConfigTypes.Provider provider,
289+
LDContext context) {
290+
String modelName = model != null && model.getName() != null ? model.getName() : "";
291+
String providerName = provider != null && provider.getName() != null ? provider.getName() : "";
292+
int ver = version != null ? version : 1;
293+
return () -> new LDAIConfigTrackerImpl(
294+
client,
295+
UUID.randomUUID().toString(),
296+
configKey,
297+
variationKey,
298+
ver,
299+
modelName,
300+
providerName,
301+
context,
302+
null, // graphKey — set by agentGraph() in Plan 3
303+
logger);
304+
}
305+
306+
@Override
307+
public LDAIConfigTracker createTracker(String resumptionToken, LDContext context) {
308+
return LDAIConfigTrackerImpl.fromResumptionToken(resumptionToken, client, context, logger);
309+
}
310+
274311
private List<Message> interpolateMessages(
275312
List<Message> messages, Map<String, Object> variables, LDContext context) {
276313
if (messages == null) {
Lines changed: 160 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,169 @@
11
package com.launchdarkly.sdk.server.ai;
22

3+
import com.launchdarkly.sdk.server.ai.datamodel.LDAITrackingTypes.AIMetrics;
4+
import com.launchdarkly.sdk.server.ai.datamodel.LDAITrackingTypes.FeedbackKind;
5+
import com.launchdarkly.sdk.server.ai.datamodel.LDAITrackingTypes.JudgeResult;
6+
import com.launchdarkly.sdk.server.ai.datamodel.LDAITrackingTypes.MetricSummary;
7+
import com.launchdarkly.sdk.server.ai.datamodel.LDAITrackingTypes.TokenUsage;
8+
import com.launchdarkly.sdk.server.ai.datamodel.LDAITrackingTypes.TrackData;
9+
10+
import java.time.Duration;
11+
import java.util.List;
12+
import java.util.concurrent.Callable;
13+
import java.util.function.Function;
14+
315
/**
416
* Reports events related to a single AI run of an {@link AIConfig}.
517
* <p>
6-
* A tracker is obtained from a retrieved config via {@link AIConfig#createTracker()}. Each tracker
7-
* corresponds to one AI run and is used to record metrics such as model usage, duration, and
8-
* feedback against the AI Config it was created from.
18+
* A tracker is obtained from a retrieved config via {@link AIConfig#createTracker()}, or
19+
* reconstructed from a resumption token via {@link LDAIClient#createTracker(String, com.launchdarkly.sdk.LDContext)}.
20+
* Each tracker corresponds to one AI run and is used to record metrics such as model usage,
21+
* duration, and feedback against the AI Config it was created from.
22+
* <p>
23+
* Most tracking methods are at-most-once: a second call to the same method on the same tracker
24+
* is silently dropped. {@link #trackToolCall(String)} and {@link #trackJudgeResult(JudgeResult)}
25+
* are multi-fire — each call records a distinct event.
926
* <p>
10-
* <strong>This interface is an intentional placeholder.</strong> The metric- and feedback-reporting
11-
* methods (and resumption-token support) are introduced in a later step of the AI SDK build-out; it
12-
* is defined here so that the public config types expose a stable {@code createTracker()} surface.
13-
* The only implementation in this release is an internal no-op.
27+
* Implementations are thread-safe.
1428
*/
1529
public interface LDAIConfigTracker {
30+
31+
/**
32+
* Returns the correlation metadata for this tracker's run.
33+
*
34+
* @return the track data, never {@code null}
35+
*/
36+
TrackData getTrackData();
37+
38+
/**
39+
* Returns the resumption token for this run.
40+
* <p>
41+
* The resumption token encodes the run's identity and can be passed to
42+
* {@link LDAIClient#createTracker(String, com.launchdarkly.sdk.LDContext)} to reconstruct a
43+
* tracker on a subsequent request (for example, in a streaming scenario).
44+
* <p>
45+
* <strong>Security note:</strong> resumption tokens embed flag-evaluation details such as the
46+
* variation key and config version. Keep tokens server-side and do not round-trip them through
47+
* untrusted clients where they could leak flag-targeting information.
48+
*
49+
* @return the resumption token, or {@code null} if not available
50+
*/
51+
String getResumptionToken();
52+
53+
/**
54+
* Records the duration of the AI generation.
55+
* <p>
56+
* At-most-once: subsequent calls on the same tracker are silently dropped.
57+
*
58+
* @param duration the duration; ignored if {@code null}
59+
*/
60+
void trackDuration(Duration duration);
61+
62+
/**
63+
* Executes the given operation and records its wall-clock duration.
64+
* <p>
65+
* The duration is recorded even if the operation throws. Equivalent to wrapping the operation
66+
* in a try/finally that calls {@link #trackDuration(Duration)}.
67+
*
68+
* @param <T> the return type of the operation
69+
* @param operation the operation to execute and time; must not be {@code null}
70+
* @return the result of the operation
71+
* @throws Exception if the operation throws
72+
*/
73+
<T> T trackDurationOf(Callable<T> operation) throws Exception;
74+
75+
/**
76+
* Records the time from request start to receipt of the first token.
77+
* <p>
78+
* At-most-once: subsequent calls on the same tracker are silently dropped.
79+
*
80+
* @param duration the time to first token; ignored if {@code null}
81+
*/
82+
void trackTimeToFirstToken(Duration duration);
83+
84+
/**
85+
* Records that the AI generation succeeded.
86+
* <p>
87+
* At-most-once and mutually exclusive with {@link #trackError()}: whichever is called first wins.
88+
*/
89+
void trackSuccess();
90+
91+
/**
92+
* Records that the AI generation failed.
93+
* <p>
94+
* At-most-once and mutually exclusive with {@link #trackSuccess()}: whichever is called first wins.
95+
*/
96+
void trackError();
97+
98+
/**
99+
* Records user feedback for this AI generation.
100+
* <p>
101+
* At-most-once: subsequent calls on the same tracker are silently dropped.
102+
*
103+
* @param kind the feedback kind; ignored if {@code null}
104+
*/
105+
void trackFeedback(FeedbackKind kind);
106+
107+
/**
108+
* Records token usage for this AI generation.
109+
* <p>
110+
* At-most-once: subsequent calls on the same tracker are silently dropped. Calls where all
111+
* counts are zero do not consume the at-most-once slot.
112+
*
113+
* @param tokens the token usage; ignored if {@code null}
114+
*/
115+
void trackTokens(TokenUsage tokens);
116+
117+
/**
118+
* Records a single tool call made during this AI generation.
119+
* <p>
120+
* Multi-fire: every call emits an event.
121+
*
122+
* @param toolKey the tool key; ignored if {@code null}
123+
*/
124+
void trackToolCall(String toolKey);
125+
126+
/**
127+
* Records multiple tool calls made during this AI generation.
128+
* <p>
129+
* Equivalent to calling {@link #trackToolCall(String)} for each key.
130+
*
131+
* @param toolKeys the tool keys; ignored if {@code null}
132+
*/
133+
void trackToolCalls(List<String> toolKeys);
134+
135+
/**
136+
* Records the result of a judge evaluation.
137+
* <p>
138+
* Multi-fire per judge metric key. The result is silently skipped if it was not sampled, if
139+
* the evaluation did not succeed, or if the metric key or score is absent.
140+
*
141+
* @param result the judge result; ignored if {@code null}
142+
*/
143+
void trackJudgeResult(JudgeResult result);
144+
145+
/**
146+
* Executes the given operation and tracks its metrics using the extracted {@link AIMetrics}.
147+
* <p>
148+
* Tracks duration (preferring runner-reported duration when present), success or error, tokens,
149+
* and tool calls. If the operation throws, {@link #trackError()} is called and the exception
150+
* is re-thrown.
151+
*
152+
* @param <T> the return type of the operation
153+
* @param metricsExtractor a function that extracts {@link AIMetrics} from the operation result;
154+
* exceptions from the extractor propagate to the caller
155+
* @param operation the AI operation to execute; must not be {@code null}
156+
* @return the result of the operation
157+
* @throws Exception if the operation or the metrics extractor throws
158+
*/
159+
<T> T trackMetricsOf(
160+
Function<? super T, AIMetrics> metricsExtractor,
161+
Callable<T> operation) throws Exception;
162+
163+
/**
164+
* Returns a snapshot of all metrics tracked so far on this tracker.
165+
*
166+
* @return the metric summary, never {@code null}
167+
*/
168+
MetricSummary getSummary();
16169
}

0 commit comments

Comments
 (0)