In llm_judge/data/judge_prompts.jsonl there are two "You evaluation" typos which should be "Your evaluation"
- "You evaluation should focus on the assistant's answer to the second question" in "single-math-v1-multi-turn"
- "You evaluation should focus on the assistant's answer to the second question" in "single-v1-multi-turn"
Doubt they make any difference to performance as this is an easy error for models to parse but worth correcting for completeness.
In llm_judge/data/judge_prompts.jsonl there are two "You evaluation" typos which should be "Your evaluation"
Doubt they make any difference to performance as this is an easy error for models to parse but worth correcting for completeness.