You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ACE-Bench is a test-driven data generation and evaluation pipeline for feature-level coding benchmarks.
15
+
FeatureBench is a test-driven data generation and evaluation pipeline for feature-level coding benchmarks.
16
16
It provides a unified CLI to run inference, evaluation, and dataset generation.
17
17
18
18
## 📰 News
19
19
20
-
🎁 **2026.02.06**: We now support one-click inference for mainstream agent frameworks, including **OpenHands, Claude Code, Codex, Gemini CLI, and mini-swe-agent**. All supported agent frameworks can be found [here](acebench/infer/agents/). We have also open-sourced the ACE-Bench**data pipeline**.
20
+
🎁 **2026.02.06**: We now support one-click inference for mainstream agent frameworks, including **OpenHands, Claude Code, Codex, Gemini CLI, and mini-swe-agent**. All supported agent frameworks can be found [here](featurebench/infer/agents/). We have also open-sourced the FeatureBench**data pipeline**.
21
21
22
22
## 🚀 Quickstart
23
23
24
24
**Prerequisites:**
25
-
-`uv` for Python environment management
26
-
-`docker` for reproducible builds and evaluation
25
+
-[uv](https://docs.astral.sh/uv/getting-started/installation/) for Python environment management
26
+
-[docker](https://docs.docker.com/engine/install/) for reproducible builds and evaluation
0 commit comments