Skip to content

Commit 29ae87f

Browse files
nick-skriabinrobot-ci-heartex
authored andcommitted
fix: BROS-1087: Evaluation results retained between different subsets
GitOrigin-RevId: fd6dfe65c361124a55350ff66f48e18f1a93e827
1 parent 2da6414 commit 29ae87f

6 files changed

Lines changed: 48 additions & 20 deletions

File tree

poetry.lock

Lines changed: 10 additions & 10 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

reference.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6240,6 +6240,14 @@ client.prompts.subset_tasks(
62406240
<dl>
62416241
<dd>
62426242

6243+
**model_version:** `typing.Optional[int]` — Restrict prefetched predictions to this specific prompt version. Used with parent_model when no model_run is selected so a newly created version does not inherit predictions from prior versions.
6244+
6245+
</dd>
6246+
</dl>
6247+
6248+
<dl>
6249+
<dd>
6250+
62436251
**ordering:** `typing.Optional[str]` — Which field to use when ordering the results.
62446252

62456253
</dd>
@@ -32487,7 +32495,7 @@ client.projects.roles.get(
3248732495
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
3248832496
</p>
3248932497
</Card>
32490-
Returns label confusion matrix with precision, recall, and top confusion pairs.
32498+
Returns label confusion matrix with precision, recall, and top confusion pairs. In `ground_truth` mode the matrix is directional: rows are GT labels (actual), columns are annotator labels (predicted). In `all` and `accepted` modes — where no canonical "actual vs predicted" axis exists — the matrix is symmetric. When a task has multiple GT annotations the most recently updated one is used. `top_confusion_pairs.rate` is the share of off-diagonal mass.
3249132499
</dd>
3249232500
</dl>
3249332501
</dd>
@@ -32661,7 +32669,7 @@ client.projects.stats.data_quality_agreement_dimensions(
3266132669
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
3266232670
</p>
3266332671
</Card>
32664-
Returns average agreement, histogram buckets, low-agreement count, and total tasks.
32672+
Returns average agreement, a 10-bucket histogram of `Task.precomputed_agreement` (filled on-the-fly from V2 dimension matrices when null), `low_agreement_count`, and `total_tasks`. The low-agreement threshold is `LseProject.agreement_threshold` (the same project setting Data Manager filters and review-routing rules consume); changing that setting moves the count for this endpoint as well.
3266532673
</dd>
3266632674
</dl>
3266732675
</dd>

src/label_studio_sdk/projects/stats/client.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ def data_quality_agreement_confusion_matrix(
6565
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
6666
</p>
6767
</Card>
68-
Returns label confusion matrix with precision, recall, and top confusion pairs.
68+
Returns label confusion matrix with precision, recall, and top confusion pairs. In `ground_truth` mode the matrix is directional: rows are GT labels (actual), columns are annotator labels (predicted). In `all` and `accepted` modes — where no canonical "actual vs predicted" axis exists — the matrix is symmetric. When a task has multiple GT annotations the most recently updated one is used. `top_confusion_pairs.rate` is the share of off-diagonal mass.
6969
7070
Parameters
7171
----------
@@ -149,7 +149,7 @@ def data_quality_agreement_distribution(
149149
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
150150
</p>
151151
</Card>
152-
Returns average agreement, histogram buckets, low-agreement count, and total tasks.
152+
Returns average agreement, a 10-bucket histogram of `Task.precomputed_agreement` (filled on-the-fly from V2 dimension matrices when null), `low_agreement_count`, and `total_tasks`. The low-agreement threshold is `LseProject.agreement_threshold` (the same project setting Data Manager filters and review-routing rules consume); changing that setting moves the count for this endpoint as well.
153153
154154
Parameters
155155
----------
@@ -1218,7 +1218,7 @@ async def data_quality_agreement_confusion_matrix(
12181218
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
12191219
</p>
12201220
</Card>
1221-
Returns label confusion matrix with precision, recall, and top confusion pairs.
1221+
Returns label confusion matrix with precision, recall, and top confusion pairs. In `ground_truth` mode the matrix is directional: rows are GT labels (actual), columns are annotator labels (predicted). In `all` and `accepted` modes — where no canonical "actual vs predicted" axis exists — the matrix is symmetric. When a task has multiple GT annotations the most recently updated one is used. `top_confusion_pairs.rate` is the share of off-diagonal mass.
12221222
12231223
Parameters
12241224
----------
@@ -1318,7 +1318,7 @@ async def data_quality_agreement_distribution(
13181318
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
13191319
</p>
13201320
</Card>
1321-
Returns average agreement, histogram buckets, low-agreement count, and total tasks.
1321+
Returns average agreement, a 10-bucket histogram of `Task.precomputed_agreement` (filled on-the-fly from V2 dimension matrices when null), `low_agreement_count`, and `total_tasks`. The low-agreement threshold is `LseProject.agreement_threshold` (the same project setting Data Manager filters and review-routing rules consume); changing that setting moves the count for this endpoint as well.
13221322
13231323
Parameters
13241324
----------

src/label_studio_sdk/projects/stats/raw_client.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def data_quality_agreement_confusion_matrix(
6161
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
6262
</p>
6363
</Card>
64-
Returns label confusion matrix with precision, recall, and top confusion pairs.
64+
Returns label confusion matrix with precision, recall, and top confusion pairs. In `ground_truth` mode the matrix is directional: rows are GT labels (actual), columns are annotator labels (predicted). In `all` and `accepted` modes — where no canonical "actual vs predicted" axis exists — the matrix is symmetric. When a task has multiple GT annotations the most recently updated one is used. `top_confusion_pairs.rate` is the share of off-diagonal mass.
6565
6666
Parameters
6767
----------
@@ -167,7 +167,7 @@ def data_quality_agreement_distribution(
167167
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
168168
</p>
169169
</Card>
170-
Returns average agreement, histogram buckets, low-agreement count, and total tasks.
170+
Returns average agreement, a 10-bucket histogram of `Task.precomputed_agreement` (filled on-the-fly from V2 dimension matrices when null), `low_agreement_count`, and `total_tasks`. The low-agreement threshold is `LseProject.agreement_threshold` (the same project setting Data Manager filters and review-routing rules consume); changing that setting moves the count for this endpoint as well.
171171
172172
Parameters
173173
----------
@@ -1495,7 +1495,7 @@ async def data_quality_agreement_confusion_matrix(
14951495
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
14961496
</p>
14971497
</Card>
1498-
Returns label confusion matrix with precision, recall, and top confusion pairs.
1498+
Returns label confusion matrix with precision, recall, and top confusion pairs. In `ground_truth` mode the matrix is directional: rows are GT labels (actual), columns are annotator labels (predicted). In `all` and `accepted` modes — where no canonical "actual vs predicted" axis exists — the matrix is symmetric. When a task has multiple GT annotations the most recently updated one is used. `top_confusion_pairs.rate` is the share of off-diagonal mass.
14991499
15001500
Parameters
15011501
----------
@@ -1601,7 +1601,7 @@ async def data_quality_agreement_distribution(
16011601
This endpoint is not available in Label Studio Community Edition. [Learn more about Label Studio Enterprise](https://humansignal.com/goenterprise)
16021602
</p>
16031603
</Card>
1604-
Returns average agreement, histogram buckets, low-agreement count, and total tasks.
1604+
Returns average agreement, a 10-bucket histogram of `Task.precomputed_agreement` (filled on-the-fly from V2 dimension matrices when null), `low_agreement_count`, and `total_tasks`. The low-agreement threshold is `LseProject.agreement_threshold` (the same project setting Data Manager filters and review-routing rules consume); changing that setting moves the count for this endpoint as well.
16051605
16061606
Parameters
16071607
----------

src/label_studio_sdk/prompts/client.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ def subset_tasks(
172172
alignment_outcome: typing.Optional[SubsetTasksPromptsRequestAlignmentOutcome] = None,
173173
include_total: typing.Optional[bool] = None,
174174
model_run: typing.Optional[int] = None,
175+
model_version: typing.Optional[int] = None,
175176
ordering: typing.Optional[str] = None,
176177
output_class: typing.Optional[str] = None,
177178
output_from_name: typing.Optional[str] = None,
@@ -212,6 +213,9 @@ def subset_tasks(
212213
model_run : typing.Optional[int]
213214
A unique ID of a ModelRun
214215
216+
model_version : typing.Optional[int]
217+
Restrict prefetched predictions to this specific prompt version. Used with parent_model when no model_run is selected so a newly created version does not inherit predictions from prior versions.
218+
215219
ordering : typing.Optional[str]
216220
Which field to use when ordering the results.
217221
@@ -262,6 +266,7 @@ def subset_tasks(
262266
alignment_outcome=alignment_outcome,
263267
include_total=include_total,
264268
model_run=model_run,
269+
model_version=model_version,
265270
ordering=ordering,
266271
output_class=output_class,
267272
output_from_name=output_from_name,
@@ -831,6 +836,7 @@ async def subset_tasks(
831836
alignment_outcome: typing.Optional[SubsetTasksPromptsRequestAlignmentOutcome] = None,
832837
include_total: typing.Optional[bool] = None,
833838
model_run: typing.Optional[int] = None,
839+
model_version: typing.Optional[int] = None,
834840
ordering: typing.Optional[str] = None,
835841
output_class: typing.Optional[str] = None,
836842
output_from_name: typing.Optional[str] = None,
@@ -871,6 +877,9 @@ async def subset_tasks(
871877
model_run : typing.Optional[int]
872878
A unique ID of a ModelRun
873879
880+
model_version : typing.Optional[int]
881+
Restrict prefetched predictions to this specific prompt version. Used with parent_model when no model_run is selected so a newly created version does not inherit predictions from prior versions.
882+
874883
ordering : typing.Optional[str]
875884
Which field to use when ordering the results.
876885
@@ -929,6 +938,7 @@ async def main() -> None:
929938
alignment_outcome=alignment_outcome,
930939
include_total=include_total,
931940
model_run=model_run,
941+
model_version=model_version,
932942
ordering=ordering,
933943
output_class=output_class,
934944
output_from_name=output_from_name,

src/label_studio_sdk/prompts/raw_client.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,7 @@ def subset_tasks(
189189
alignment_outcome: typing.Optional[SubsetTasksPromptsRequestAlignmentOutcome] = None,
190190
include_total: typing.Optional[bool] = None,
191191
model_run: typing.Optional[int] = None,
192+
model_version: typing.Optional[int] = None,
192193
ordering: typing.Optional[str] = None,
193194
output_class: typing.Optional[str] = None,
194195
output_from_name: typing.Optional[str] = None,
@@ -229,6 +230,9 @@ def subset_tasks(
229230
model_run : typing.Optional[int]
230231
A unique ID of a ModelRun
231232
233+
model_version : typing.Optional[int]
234+
Restrict prefetched predictions to this specific prompt version. Used with parent_model when no model_run is selected so a newly created version does not inherit predictions from prior versions.
235+
232236
ordering : typing.Optional[str]
233237
Which field to use when ordering the results.
234238
@@ -270,6 +274,7 @@ def subset_tasks(
270274
"alignment_outcome": alignment_outcome,
271275
"include_total": include_total,
272276
"model_run": model_run,
277+
"model_version": model_version,
273278
"ordering": ordering,
274279
"output_class": output_class,
275280
"output_from_name": output_from_name,
@@ -930,6 +935,7 @@ async def subset_tasks(
930935
alignment_outcome: typing.Optional[SubsetTasksPromptsRequestAlignmentOutcome] = None,
931936
include_total: typing.Optional[bool] = None,
932937
model_run: typing.Optional[int] = None,
938+
model_version: typing.Optional[int] = None,
933939
ordering: typing.Optional[str] = None,
934940
output_class: typing.Optional[str] = None,
935941
output_from_name: typing.Optional[str] = None,
@@ -970,6 +976,9 @@ async def subset_tasks(
970976
model_run : typing.Optional[int]
971977
A unique ID of a ModelRun
972978
979+
model_version : typing.Optional[int]
980+
Restrict prefetched predictions to this specific prompt version. Used with parent_model when no model_run is selected so a newly created version does not inherit predictions from prior versions.
981+
973982
ordering : typing.Optional[str]
974983
Which field to use when ordering the results.
975984
@@ -1011,6 +1020,7 @@ async def subset_tasks(
10111020
"alignment_outcome": alignment_outcome,
10121021
"include_total": include_total,
10131022
"model_run": model_run,
1023+
"model_version": model_version,
10141024
"ordering": ordering,
10151025
"output_class": output_class,
10161026
"output_from_name": output_from_name,

0 commit comments

Comments
 (0)