Skip to content

Commit 98b8a9d

Browse files
authored
Merge pull request #4545 from Agenta-AI/release/v0.101.1
[release] v0.101.1
2 parents a6a2a3f + 2263ba3 commit 98b8a9d

106 files changed

Lines changed: 2423 additions & 691 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

api/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "api"
3-
version = "0.101.0"
3+
version = "0.101.1"
44
description = "Agenta API"
55
requires-python = ">=3.11,<3.14"
66
authors = [

api/uv.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

clients/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "agenta-client"
3-
version = "0.101.0"
3+
version = "0.101.1"
44
description = "Fern-generated Python client for the Agenta API."
55
requires-python = ">=3.11,<3.14"
66
authors = [

clients/python/uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/design/annotation-queue-v2/rfc-v2.md

Lines changed: 54 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ The consumer layer is a **convenience API** and a **set of UI views** that orche
129129
- An EvaluationQueue with optional user assignments
130130
4. Annotator works through rows → fills in labels → submits
131131
5. On submit: same annotation creation + result linking as today
132-
6. **Write-back step** (separate action): User clicks "Save annotations to test set" → creates a new test set revision with annotation values as new columns
132+
6. **Write-back step** (separate action): User clicks "Add to Testset" → each annotated row is matched to its existing test-set row (by testcase id, falling back to `testcase_dedup_id`), updated **in place** with the annotation columns, and committed as a new revision. See [Write Back / Save as Test Set](#write-back--save-as-test-set) for the identity model and matching rules.
133133

134134
**Key design choice: annotating ≠ modifying the test set.** The annotation step creates annotation traces (OTel spans). These reference the test cases but don't modify them. Writing back to the test set is a separate, explicit action that creates a new revision. This preserves test case immutability and versioning.
135135

@@ -264,26 +264,58 @@ Uses existing endpoints — no change needed:
264264

265265
### Write Back / Save as Test Set
266266

267-
```
268-
POST /annotation-queues/{queue_id}/export
269-
{
270-
// For testset-sourced queues: create new revision with annotation columns
271-
"target": "testset_revision",
272-
"column_mapping": {
273-
"correctness": "is_correct",
274-
"quality": "quality_score"
275-
}
276-
277-
// For trace-sourced queues: create new test set from annotated traces
278-
// "target": "new_testset",
279-
// "name": "Curated Q1 traces",
280-
// "include_annotations_as_columns": true
281-
}
282-
```
283-
284-
The endpoint name is `export` rather than `write-back` to better reflect that it works for both directions: writing annotations back to an existing test set (new revision) or creating an entirely new test set from annotated traces.
285-
286-
**Who triggers this:** The queue creator/admin, not individual annotators. It's a one-time action available on the queue detail page.
267+
The user clicks **"Add to Testset"** on the queue and either appends to an
268+
existing test set (new revision) or creates a new one. **As implemented this is a
269+
client-side operation**, not a backend export: the FE resolves the target's
270+
latest revision, computes a row delta, and commits a new revision via
271+
`POST /testsets/revisions/commit`. (The originally-proposed
272+
`POST /annotation-queues/{queue_id}/export` endpoint was not built — the FE owns
273+
the delta.)
274+
275+
**Identity model.** FE-created annotation queues are **testcase-id-backed**, not
276+
testset-revision-backed: the queue references each row by its testcase id and the
277+
testcase blob's stable `testcase_dedup_id`. Test cases are immutable, so any
278+
update mints a new testcase id — `testcase_dedup_id` is the only key that survives
279+
across revisions, and only if it is preserved on every write.
280+
281+
**Behavior (existing test set):**
282+
283+
1. Base the commit on the test set's **latest _non-archived_ revision**.
284+
2. Match each annotated row to an existing row by **testcase id, falling back to
285+
`testcase_dedup_id`** — the id match works on the first save; the dedup
286+
fallback carries the match after a prior save reassigned the id.
287+
3. **Replace** on match, **add** on miss, and **preserve `testcase_dedup_id`** on
288+
every replaced row so the lineage stays matchable for the next save.
289+
4. **Skip unchanged rows** (deep-equal vs the base row), and skip the commit
290+
entirely when the resulting delta is empty — re-saving with nothing changed is
291+
a no-op (no churn revision).
292+
293+
The annotated row is updated **in place** (new testcase id, same dedup); the row
294+
count stays stable instead of growing.
295+
296+
**For trace-sourced queues** there is no source test set, so the action always
297+
creates a new test set from the annotated rows.
298+
299+
**Who triggers this:** the queue creator/admin, not individual annotators. It's a
300+
one-time action available on the queue detail page.
301+
302+
#### Status & known constraints (AGE-3761)
303+
304+
- **Fixed.** The first implementation committed with blind `add`, appending every
305+
annotated row → duplicates. Two further traps were fixed: base rows were read
306+
through `normalizeRevision`, which strips `testcase_dedup_id` (so the dedup
307+
fallback silently never fired and the second save duplicated) — base rows are
308+
now read **raw**; and "latest" was resolved via `retrieve {testset_ref}`, which
309+
returns **archived** revisions — it's now resolved via the archived-excluding
310+
`query` path.
311+
- **Not FE-fixable.** A test set whose rows already carry **duplicate or missing
312+
`testcase_dedup_id`s** from earlier corruption cannot be cleaned by FE matching
313+
(the dedup→row mapping is ambiguous). The durable fix is backend-owned: a stable,
314+
unique testcase identity preserved on every write, or an upsert-by-stable-key
315+
primitive so the FE never computes the delta.
316+
- **Deferred — multi-testset queues.** When a queue's rows originate from more than
317+
one test set, routing each annotated row back to its source test set is future
318+
work; the current single-target modal is kept.
287319

288320
---
289321

@@ -442,7 +474,7 @@ The metadata-on-traces approach (tagging spans with review status) was considere
442474

443475
3. **Queue visibility in eval runs:** When an eval run has human evaluators, is the auto-created queue visible in the eval run detail view? Or is it fully hidden?
444476

445-
4. **Write-back granularity:** When writing annotations back to a test set, does the user choose which annotation fields become columns? Or do all fields from all evaluators get written back?
477+
4. **Write-back granularity:** ~~When writing annotations back to a test set, does the user choose which annotation fields become columns?~~ **Resolved (as shipped):** all annotation outputs from the queue-scoped evaluators are written back as columns (keyed per evaluator, e.g. `quality-rating`); there is no per-field picker. The export is scoped to the active queue's annotations so other queues' annotations on the same testcase don't bleed in.
446478

447479
5. **Queue lifecycle:** Do annotation queues have a lifecycle (draft → active → completed)? Or are they always active and implicitly complete when all items are annotated?
448480

docs/designs/observability-cell-preview/rfc.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ workflow state. The user has to open the drawer to see anything useful.
1616
This RFC adds a second heuristic layer between chat detection and the raw JSON
1717
fallback. For span values whose shape we recognize, an extractor pulls a small
1818
subset of fields and the cell renders only that subset using the existing
19-
beautified key/value view. Everything else still falls through to raw JSON.
19+
pretty key/value view. Everything else still falls through to raw JSON.
2020

2121
The two existing detector calls (one for chat, one implicit for JSON) become
2222
internal rules of a single dispatcher. The cell stops chaining nullable checks
@@ -44,9 +44,9 @@ both the data to render and the renderer to use.
4444

4545
```ts
4646
type Preview =
47-
| { renderer: "chat"; data: unknown[]; source: string }
48-
| { renderer: "beautified"; data: Record<string, unknown>; source: string }
49-
| { renderer: "json"; data: unknown; source: string }
47+
| { renderer: "chat"; data: unknown[]; source: string }
48+
| { renderer: "pretty"; data: Record<string, unknown>; source: string }
49+
| { renderer: "json"; data: unknown; source: string }
5050

5151
export function extractPreview(
5252
value: unknown,
@@ -60,9 +60,9 @@ The cell becomes a single switch.
6060
function SmartCellContent({value}: {value: unknown}) {
6161
const preview = extractPreview(value)
6262
switch (preview.renderer) {
63-
case "chat": return <ChatCell value={preview.data} />
64-
case "beautified": return <BeautifiedJsonCell value={preview.data} />
65-
case "json": return <JsonCell value={preview.data} />
63+
case "chat": return <ChatCell value={preview.data} />
64+
case "pretty": return <PrettyJsonCell value={preview.data} />
65+
case "json": return <JsonCell value={preview.data} />
6666
}
6767
}
6868
```
@@ -84,7 +84,7 @@ type Rule =
8484
extract: (v: unknown, ctx: { side?: "input" | "output" }) => unknown[] | null
8585
}
8686
| {
87-
kind: "beautified"
87+
kind: "pretty"
8888
name: string
8989
extract: (v: unknown, ctx: { side?: "input" | "output" }) => Record<string, unknown> | null
9090
}
@@ -158,10 +158,10 @@ behavior automatically because they go through `SmartCellContent`.
158158
`extractChatMessages` stays as an internal helper that the chat rule wraps.
159159
External callers that import it directly keep working unchanged.
160160

161-
`JsonCellContent.beautified` already exists. The dispatcher uses it as the
162-
"beautified" renderer. The `SmartCellContent.beautifyJson` prop remains as a
161+
`JsonCellContent.pretty` already exists. The dispatcher uses it as the
162+
"pretty" renderer. The `SmartCellContent.prettyJson` prop remains as a
163163
caller opt-in for the JSON fallback path (used by `ScenarioListView` to force
164-
beautified rendering on the raw-JSON branch).
164+
pretty rendering on the raw-JSON branch).
165165

166166
`LastInputMessageCell` uses the dispatcher and renders only the last message
167167
when the chat rule matches. For non-chat values it delegates to
@@ -173,8 +173,8 @@ when the chat rule matches. For non-chat values it delegates to
173173
hookable comes later.
174174
- Rules that need span type or other span context. Shape-only for the first
175175
pass.
176-
- Aligning the cell's beautified styling with the drawer's
177-
`BeautifiedJsonView`. The cell version is the lightweight one we already
176+
- Aligning the cell's pretty styling with the drawer's
177+
`PrettyJsonView`. The cell version is the lightweight one we already
178178
have. Visual parity with the drawer is a separate piece of work.
179179
- Sharing rules with the drawer. The drawer has its own structure today. If
180180
the rule set proves valuable, we lift it out and reuse it.

hosting/kubernetes/helm/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ apiVersion: v2
22
name: agenta
33
description: A Helm chart for deploying Agenta (OSS or EE) on Kubernetes
44
type: application
5-
version: 0.101.0
6-
appVersion: "v0.101.0"
5+
version: 0.101.1
6+
appVersion: "v0.101.1"
77
keywords:
88
- agenta
99
- llm

sdks/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "agenta"
3-
version = "0.101.0"
3+
version = "0.101.1"
44
description = "The SDK for agenta is an open-source LLMOps platform."
55
readme = "README.md"
66
requires-python = ">=3.11,<3.14"

sdks/python/uv.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

services/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "services"
3-
version = "0.101.0"
3+
version = "0.101.1"
44
description = "Agenta Services (Chat & Completion)"
55
requires-python = ">=3.11,<3.14"
66
authors = [

0 commit comments

Comments
 (0)