Skip to content

Latest commit

 

History

History
44 lines (23 loc) · 675 Bytes

File metadata and controls

44 lines (23 loc) · 675 Bytes
name Eval regression
about A golden-dataset question that used to pass now fails (often auto-filed by eval-nightly).
title eval: regression on [case_id]
labels
bug
test

Case

  • ID:
  • Category:
  • Difficulty:

Question

Expected answer

Actual answer

When did it regress?

Reproduction

uv run pytest eval/test_golden_qa.py -k "<case_id>" -v

Suspected cause

Invariant touched