Skip to content

Add ResearchClawBench eval framework#2174

Open
black-yt wants to merge 1 commit into
huggingface:mainfrom
black-yt:add-researchclawbench-eval-framework
Open

Add ResearchClawBench eval framework#2174
black-yt wants to merge 1 commit into
huggingface:mainfrom
black-yt:add-researchclawbench-eval-framework

Conversation

@black-yt
Copy link
Copy Markdown

@black-yt black-yt commented May 15, 2026

Summary

Adds researchclawbench to the supported evaluation frameworks for benchmark dataset eval.yaml files.

ResearchClawBench is an end-to-end scientific research benchmark for AI agents and standalone LLMs, covering workflows from reading raw data and related work to producing code, figures, and publication-style reports.

Dataset prepared for the Hub Evaluation Results feature:
https://huggingface.co/datasets/InternScience/ResearchClawBench

The dataset repo already includes:

  • eval.yaml with evaluation_framework: researchclawbench
  • .eval_results/*.yaml entries following the benchmark result format

Reference similar benchmark setup:
https://huggingface.co/datasets/claw-eval/Claw-Eval

Change

  • Add researchclawbench to EVALUATION_FRAMEWORKS in packages/tasks/src/eval.ts.

Notes

This is intended to allow the ResearchClawBench dataset to be recognized as a Benchmark dataset and display the benchmark leaderboard/tag on the Hub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant