Skip to content

Commit d5f2b62

Browse files
christsoclaude
andauthored
fix: auto-weight grouped rubrics shorthand by criteria count (#1099)
* fix: auto-weight grouped rubrics shorthand by criteria count When string shorthand assertions are mixed with other explicit graders, the rubrics grader created from the strings now gets weight = number of criteria, making each user-visible assertion contribute equal weight to the overall score. Before: [contains, "A", "B", "C"] → contains(w=1) + rubrics(w=1) → 50/50 After: [contains, "A", "B", "C"] → contains(w=1) + rubrics(w=3) → 25/75 The shorthand abstraction is now transparent — users who write N string criteria alongside M explicit graders get equal weight per visible line, without needing to know about internal grader grouping. Closes #1098 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: fix biome formatting * test: remove redundant shorthand weight tests * style: fix trailing blank line --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 209db97 commit d5f2b62

2 files changed

Lines changed: 35 additions & 1 deletion

File tree

packages/core/src/evaluation/loaders/evaluator-parser.ts

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,15 @@ async function parseEvaluatorList(
288288
}
289289
const placeholderIndex = result.indexOf(PLACEHOLDER);
290290
if (strings.length > 0 && placeholderIndex !== -1) {
291-
result[placeholderIndex] = { type: 'rubrics', criteria: strings };
291+
// Set weight = number of criteria so each user-visible string assertion contributes
292+
// equal weight to the overall score alongside other explicit graders.
293+
// e.g. [contains, "crit1", "crit2", "crit3"] → contains(w=1) + rubrics(w=3)
294+
// → each of the 4 visible assertions counts equally.
295+
result[placeholderIndex] = {
296+
type: 'rubrics',
297+
criteria: strings,
298+
weight: strings.length,
299+
};
292300
} else if (placeholderIndex !== -1) {
293301
// All strings were empty — remove the placeholder
294302
result.splice(placeholderIndex, 1);

packages/core/test/evaluation/loaders/evaluator-parser.test.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1989,6 +1989,32 @@ describe('parseEvaluators - string shorthand in assertions', () => {
19891989

19901990
expect(evaluators).toBeUndefined();
19911991
});
1992+
1993+
it('sets rubrics grader weight = criteria count when mixed with other graders', async () => {
1994+
// User sees 4 assertions; each should contribute equal weight.
1995+
// rubrics(w=3) + contains(w=1) → each visible assertion = 1/4.
1996+
const evaluators = await parseEvaluators(
1997+
{
1998+
assertions: [
1999+
'Identifies the undefined access',
2000+
'Suggests a null-safe fix',
2001+
'Explains why the original code is dangerous',
2002+
{ type: 'contains', value: 'null' },
2003+
],
2004+
},
2005+
undefined,
2006+
['/tmp'],
2007+
'test-id',
2008+
);
2009+
2010+
expect(evaluators).toHaveLength(2);
2011+
const rubrics = evaluators?.[0] as LlmGraderEvaluatorConfig;
2012+
expect(rubrics.type).toBe('llm-grader');
2013+
expect(rubrics.rubrics).toHaveLength(3);
2014+
expect(rubrics.weight).toBe(3);
2015+
expect(evaluators?.[1].type).toBe('contains');
2016+
expect(evaluators?.[1].weight).toBeUndefined(); // explicit graders keep their own weight
2017+
});
19922018
});
19932019

19942020
describe('parseEvaluators - file:// prefix prompt resolution', () => {

0 commit comments

Comments
 (0)