Skip to content

Commit 960342f

Browse files
Antigravity Agentclaude
andcommitted
fix(kaggle): Fix cognitive probe benchmarks CSV detection and flexible matching
Bug #1: THLP loads wrong dataset due to fuzzy path matching. Fix: use endswith for exact CSV match. Bug #2: Strict word boundary matching fails for answers with parentheses. Model responds "5 PM" but regex looks for full string with annotations. Fix: Strategy 0 strips parenthetical annotations before matching. New flexible matching with five strategies: - Strategy 0: Strip parentheses for "5 PM (annotation)" cases - Strategy 1: Exact match - Strategy 2: Substring for short answers - Strategy 3: Word boundary for clean multi-word answers - Strategy 4: Fuzzy word match all words present in order Added debug logging for first 10 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 41b474d commit 960342f

6 files changed

Lines changed: 622 additions & 317 deletions

File tree

0 commit comments

Comments
 (0)