Commit 960342f

Antigravity Agent

and

committed

fix(kaggle): Fix cognitive probe benchmarks CSV detection and flexible matching

Bug #1: THLP loads wrong dataset due to fuzzy path matching. Fix: use endswith for exact CSV match. Bug #2: Strict word boundary matching fails for answers with parentheses. Model responds "5 PM" but regex looks for full string with annotations. Fix: Strategy 0 strips parenthetical annotations before matching. New flexible matching with five strategies: - Strategy 0: Strip parentheses for "5 PM (annotation)" cases - Strategy 1: Exact match - Strategy 2: Substring for short answers - Strategy 3: Word boundary for clean multi-word answers - Strategy 4: Fuzzy word match all words present in order Added debug logging for first 10 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

1 parent 41b474d commit 960342fCopy full SHA for 960342f

6 files changed

kaggle/notebooks
- KAGGLE_TASK_TEMPLATE.py
- task_templates

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 960342f

Uh oh!

File tree

0 commit comments