Commit 960342f
fix(kaggle): Fix cognitive probe benchmarks CSV detection and flexible matching
Bug #1: THLP loads wrong dataset due to fuzzy path matching. Fix: use endswith for exact CSV match.
Bug #2: Strict word boundary matching fails for answers with parentheses. Model responds "5 PM" but regex looks for full string with annotations. Fix: Strategy 0 strips parenthetical annotations before matching.
New flexible matching with five strategies:
- Strategy 0: Strip parentheses for "5 PM (annotation)" cases
- Strategy 1: Exact match
- Strategy 2: Substring for short answers
- Strategy 3: Word boundary for clean multi-word answers
- Strategy 4: Fuzzy word match all words present in order
Added debug logging for first 10 failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 41b474d commit 960342f
6 files changed
Lines changed: 622 additions & 317 deletions
File tree
- kaggle/notebooks
- task_templates
0 commit comments