Skip to content

feat: simplify evaluation schema to flat score/reasoning shape#1286

Merged
jsonbailey merged 6 commits into
feat/ai-sdk-next-releasefrom
jb/aic-2253/simplify-eval-schema
Apr 17, 2026
Merged

feat: simplify evaluation schema to flat score/reasoning shape#1286
jsonbailey merged 6 commits into
feat/ai-sdk-next-releasefrom
jb/aic-2253/simplify-eval-schema

Conversation

@jsonbailey

@jsonbailey jsonbailey commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Removed the metric key from the structured output schema. EvaluationSchemaBuilder.build() no longer takes an evaluationMetricKey parameter. Since there is only ever a single evaluation metric key per judge config, it does not need to be embedded in the schema sent to the LLM.
  • Flattened the schema to a top-level {score, reasoning} shape. The old nested structure ({evaluations: {metricKey: {score, reasoning}}}) is replaced with a simple {score: number, reasoning: string} object. This is easier for LLMs to produce correctly and matches the Python SDK (fix: Remove evaluation metric key from schema which failed on some LLMs python-server-sdk-ai#105).
  • Updated parsing in Judge.ts. _parseEvaluationResponse now reads score and reasoning directly from the top-level response data. The metric key is still sourced from the judge config's evaluationMetricKey and used to key the result — it just no longer appears in the schema or LLM response.

Test plan

  • All 144 existing tests pass (yarn workspace @launchdarkly/server-sdk-ai test)
  • Lint passes (yarn workspace @launchdarkly/server-sdk-ai lint)
  • Test mocks updated to use new flat response shape
  • _parseEvaluationResponse unit tests updated for simplified signature and data shape

🤖 Generated with Claude Code


Note

Medium Risk
Changes the structured response contract and parsing for judge evaluations; any callers/providers still emitting the old nested evaluations shape will now fail evaluation parsing.

Overview
Simplifies judge structured-output handling by switching the expected/provider schema from nested evaluations[metricKey]{score,reasoning} to a flat top-level {score, reasoning} object, and removes the dynamic EvaluationSchemaBuilder entirely.

Judge.evaluate now always invokes the provider with the static schema and parses score/reasoning directly; failures log a more specific "Could not parse evaluation response" warning. Tests are updated to use the new response shape and to assert the new warning behavior for missing/malformed responses.

Reviewed by Cursor Bugbot for commit 013a80d. Bugbot is set up for automated code reviews on this repo. Configure here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 25623 bytes
Compressed size limit: 29000
Uncompressed size: 125843 bytes

@github-actions

Copy link
Copy Markdown
Contributor

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 31655 bytes
Compressed size limit: 34000
Uncompressed size: 112792 bytes

@github-actions

Copy link
Copy Markdown
Contributor

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 179375 bytes
Compressed size limit: 200000
Uncompressed size: 829982 bytes

@github-actions

Copy link
Copy Markdown
Contributor

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 37169 bytes
Compressed size limit: 38000
Uncompressed size: 204305 bytes

jsonbailey and others added 2 commits April 16, 2026 16:07
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete EvaluationSchemaBuilder.ts and define EVALUATION_SCHEMA as a
module-level const in Judge.ts. Remove per-field warnings from
_parseEvaluationResponse (keep it pure) and emit a single warning in
evaluate() that includes the judge key and raw response data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey marked this pull request as ready for review April 16, 2026 21:55
@jsonbailey jsonbailey requested a review from a team as a code owner April 16, 2026 21:55

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d81b202. Configure here.

Comment thread packages/sdk/server-ai/src/api/judge/Judge.ts Outdated
jsonbailey and others added 2 commits April 17, 2026 09:47
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
configKey is already present in tracker.getTrackData().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@joker23 joker23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only nits

Comment thread packages/sdk/server-ai/src/api/judge/Judge.ts Outdated
Comment thread packages/sdk/server-ai/src/api/judge/Judge.ts Outdated
Address review nits: narrow EVALUATION_SCHEMA type with as const
instead of Record<string, unknown>, and add Array.isArray check
in _parseEvaluationResponse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey merged commit 524c99e into feat/ai-sdk-next-release Apr 17, 2026
44 checks passed
@jsonbailey jsonbailey deleted the jb/aic-2253/simplify-eval-schema branch April 17, 2026 16:43
@github-actions github-actions Bot mentioned this pull request Apr 20, 2026
jsonbailey added a commit that referenced this pull request Apr 21, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>browser: 0.1.16</summary>

##
[0.1.16](browser-v0.1.15...browser-v0.1.16)
(2026-04-21)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @launchdarkly/js-client-sdk bumped from 4.6.0 to 4.6.1
</details>

<details><summary>browser-telemetry: 1.0.32</summary>

##
[1.0.32](browser-telemetry-v1.0.31...browser-telemetry-v1.0.32)
(2026-04-21)


### Bug Fixes

* correct typeof comparisons in browser SDK
([#1301](#1301))
([f4bd636](f4bd636))
* **js-client-sdk:** better `undefined` handling
([#1303](#1303))
([4818678](4818678))


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/js-client-sdk bumped from 4.6.0 to 4.6.1
</details>

<details><summary>js-client-sdk: 4.6.1</summary>

##
[4.6.1](js-client-sdk-v4.6.0...js-client-sdk-v4.6.1)
(2026-04-21)


### Bug Fixes

* correct typeof comparisons in browser SDK
([#1301](#1301))
([f4bd636](f4bd636))
* **js-client-sdk:** better `undefined` handling
([#1303](#1303))
([4818678](4818678))
</details>

<details><summary>react-sdk: 0.2.2</summary>

##
[0.2.2](react-sdk-v0.2.1...react-sdk-v0.2.2)
(2026-04-21)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @launchdarkly/js-client-sdk bumped from ^4.6.0 to ^4.6.1
</details>

<details><summary>server-sdk-ai: 0.17.0</summary>

##
[0.17.0](server-sdk-ai-v0.16.8...server-sdk-ai-v0.17.0)
(2026-04-21)


### ⚠ BREAKING CHANGES

* Flatten JudgeResponse and EvalScore into new LDJudgeResult
([#1284](#1284))
* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#1270](#1270))

### Features

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#1270](#1270))
([fc25ab7](fc25ab7))
* Flatten JudgeResponse and EvalScore into new LDJudgeResult
([#1284](#1284))
([aba1221](aba1221))
* Implement agent graph definitions
([#1282](#1282))
([e7d08e5](e7d08e5))
* simplify evaluation schema to flat score/reasoning shape
([#1286](#1286))
([c132e9f](c132e9f))


### Bug Fixes

* Add support for graph metric tracking
([#1269](#1269))
([034a89d](034a89d))
</details>

<details><summary>server-sdk-ai-langchain: 0.5.5</summary>

##
[0.5.5](server-sdk-ai-langchain-v0.5.4...server-sdk-ai-langchain-v0.5.5)
(2026-04-21)


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.16.8 to ^0.17.0
  * peerDependencies
* @launchdarkly/server-sdk-ai bumped from ^0.15.0 || ^0.16.0 to ^0.17.0
</details>

<details><summary>server-sdk-ai-openai: 0.5.5</summary>

##
[0.5.5](server-sdk-ai-openai-v0.5.4...server-sdk-ai-openai-v0.5.5)
(2026-04-21)


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.16.8 to ^0.17.0
  * peerDependencies
* @launchdarkly/server-sdk-ai bumped from ^0.15.0 || ^0.16.0 to ^0.17.0
</details>

<details><summary>server-sdk-ai-vercel: 0.5.5</summary>

##
[0.5.5](server-sdk-ai-vercel-v0.5.4...server-sdk-ai-vercel-v0.5.5)
(2026-04-21)


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.16.8 to ^0.17.0
  * peerDependencies
* @launchdarkly/server-sdk-ai bumped from ^0.15.0 || ^0.16.0 to ^0.17.0
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Primarily a version/changelog bump, but it publishes
`@launchdarkly/server-sdk-ai` `0.17.0` with documented breaking API
changes that can impact downstream consumers and provider peer
dependency resolution.
> 
> **Overview**
> Bumps release versions across the monorepo via
`.release-please-manifest.json`, updating `@launchdarkly/server-sdk-ai`
to `0.17.0`, `@launchdarkly/js-client-sdk` to `4.6.1`, and related
packages (`@launchdarkly/browser`, `@launchdarkly/react-sdk`,
`@launchdarkly/browser-telemetry`, and AI provider packages)
accordingly.
> 
> Updates package metadata, changelogs, examples, and embedded
SDK/wrapper version strings (e.g., `BrowserInfo` and `LDReactClient`) to
reflect the new releases, including `server-sdk-ai`’s `0.17.0`
breaking-change notes and provider peer dependency bumps to `^0.17.0`.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
e7f8c09. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: jsonbailey <jbailey@launchdarkly.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants