feat: simplify evaluation schema to flat score/reasoning shape#1286
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@launchdarkly/js-sdk-common size report |
|
@launchdarkly/js-client-sdk size report |
|
@launchdarkly/browser size report |
|
@launchdarkly/js-client-sdk-common size report |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete EvaluationSchemaBuilder.ts and define EVALUATION_SCHEMA as a module-level const in Judge.ts. Remove per-field warnings from _parseEvaluationResponse (keep it pure) and emit a single warning in evaluate() that includes the judge key and raw response data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d81b202. Configure here.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
configKey is already present in tracker.getTrackData(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review nits: narrow EVALUATION_SCHEMA type with as const instead of Record<string, unknown>, and add Array.isArray check in _parseEvaluationResponse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🤖 I have created a release *beep* *boop* --- <details><summary>browser: 0.1.16</summary> ## [0.1.16](browser-v0.1.15...browser-v0.1.16) (2026-04-21) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk bumped from 4.6.0 to 4.6.1 </details> <details><summary>browser-telemetry: 1.0.32</summary> ## [1.0.32](browser-telemetry-v1.0.31...browser-telemetry-v1.0.32) (2026-04-21) ### Bug Fixes * correct typeof comparisons in browser SDK ([#1301](#1301)) ([f4bd636](f4bd636)) * **js-client-sdk:** better `undefined` handling ([#1303](#1303)) ([4818678](4818678)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/js-client-sdk bumped from 4.6.0 to 4.6.1 </details> <details><summary>js-client-sdk: 4.6.1</summary> ## [4.6.1](js-client-sdk-v4.6.0...js-client-sdk-v4.6.1) (2026-04-21) ### Bug Fixes * correct typeof comparisons in browser SDK ([#1301](#1301)) ([f4bd636](f4bd636)) * **js-client-sdk:** better `undefined` handling ([#1303](#1303)) ([4818678](4818678)) </details> <details><summary>react-sdk: 0.2.2</summary> ## [0.2.2](react-sdk-v0.2.1...react-sdk-v0.2.2) (2026-04-21) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk bumped from ^4.6.0 to ^4.6.1 </details> <details><summary>server-sdk-ai: 0.17.0</summary> ## [0.17.0](server-sdk-ai-v0.16.8...server-sdk-ai-v0.17.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Flatten JudgeResponse and EvalScore into new LDJudgeResult ([#1284](#1284)) * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#1270](#1270)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#1270](#1270)) ([fc25ab7](fc25ab7)) * Flatten JudgeResponse and EvalScore into new LDJudgeResult ([#1284](#1284)) ([aba1221](aba1221)) * Implement agent graph definitions ([#1282](#1282)) ([e7d08e5](e7d08e5)) * simplify evaluation schema to flat score/reasoning shape ([#1286](#1286)) ([c132e9f](c132e9f)) ### Bug Fixes * Add support for graph metric tracking ([#1269](#1269)) ([034a89d](034a89d)) </details> <details><summary>server-sdk-ai-langchain: 0.5.5</summary> ## [0.5.5](server-sdk-ai-langchain-v0.5.4...server-sdk-ai-langchain-v0.5.5) (2026-04-21) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.16.8 to ^0.17.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.0 || ^0.16.0 to ^0.17.0 </details> <details><summary>server-sdk-ai-openai: 0.5.5</summary> ## [0.5.5](server-sdk-ai-openai-v0.5.4...server-sdk-ai-openai-v0.5.5) (2026-04-21) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.16.8 to ^0.17.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.0 || ^0.16.0 to ^0.17.0 </details> <details><summary>server-sdk-ai-vercel: 0.5.5</summary> ## [0.5.5](server-sdk-ai-vercel-v0.5.4...server-sdk-ai-vercel-v0.5.5) (2026-04-21) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.16.8 to ^0.17.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.0 || ^0.16.0 to ^0.17.0 </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Primarily a version/changelog bump, but it publishes `@launchdarkly/server-sdk-ai` `0.17.0` with documented breaking API changes that can impact downstream consumers and provider peer dependency resolution. > > **Overview** > Bumps release versions across the monorepo via `.release-please-manifest.json`, updating `@launchdarkly/server-sdk-ai` to `0.17.0`, `@launchdarkly/js-client-sdk` to `4.6.1`, and related packages (`@launchdarkly/browser`, `@launchdarkly/react-sdk`, `@launchdarkly/browser-telemetry`, and AI provider packages) accordingly. > > Updates package metadata, changelogs, examples, and embedded SDK/wrapper version strings (e.g., `BrowserInfo` and `LDReactClient`) to reflect the new releases, including `server-sdk-ai`’s `0.17.0` breaking-change notes and provider peer dependency bumps to `^0.17.0`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit e7f8c09. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: jsonbailey <jbailey@launchdarkly.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Summary
EvaluationSchemaBuilder.build()no longer takes anevaluationMetricKeyparameter. Since there is only ever a single evaluation metric key per judge config, it does not need to be embedded in the schema sent to the LLM.{score, reasoning}shape. The old nested structure ({evaluations: {metricKey: {score, reasoning}}}) is replaced with a simple{score: number, reasoning: string}object. This is easier for LLMs to produce correctly and matches the Python SDK (fix: Remove evaluation metric key from schema which failed on some LLMs python-server-sdk-ai#105).Judge.ts._parseEvaluationResponsenow readsscoreandreasoningdirectly from the top-level response data. The metric key is still sourced from the judge config'sevaluationMetricKeyand used to key the result — it just no longer appears in the schema or LLM response.Test plan
yarn workspace @launchdarkly/server-sdk-ai test)yarn workspace @launchdarkly/server-sdk-ai lint)_parseEvaluationResponseunit tests updated for simplified signature and data shape🤖 Generated with Claude Code
Note
Medium Risk
Changes the structured response contract and parsing for judge evaluations; any callers/providers still emitting the old nested
evaluationsshape will now fail evaluation parsing.Overview
Simplifies judge structured-output handling by switching the expected/provider schema from nested
evaluations[metricKey]{score,reasoning}to a flat top-level{score, reasoning}object, and removes the dynamicEvaluationSchemaBuilderentirely.Judge.evaluatenow always invokes the provider with the static schema and parsesscore/reasoningdirectly; failures log a more specific "Could not parse evaluation response" warning. Tests are updated to use the new response shape and to assert the new warning behavior for missing/malformed responses.Reviewed by Cursor Bugbot for commit 013a80d. Bugbot is set up for automated code reviews on this repo. Configure here.