Skip to content

EntityProcess/agentv-bench-swebench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentV Bench: SWE-bench

Benchmark repo for SWE-bench coding evaluations using AgentV. Contains imported SWE-bench examples converted to agentv eval format, with Docker workspace configurations for isolated execution.

Overview

This repository stores SWE-bench-style coding evaluation benchmarks in the AgentV eval format. Each eval defines:

  • A Docker workspace based on the official SWE-bench evaluation images
  • A problem statement from a real GitHub issue
  • Assertions that verify the fix by running the project's test suite

SWE-bench tasks test an agent's ability to resolve real-world GitHub issues by producing correct code patches. Evals here are imported from the SWE-bench dataset and converted to AgentV YAML format.

Structure

.agentv/              # AgentV project configuration
  config.yaml         # Studio threshold and settings
  targets.yaml        # Model provider targets
evals/
  swebench-verified/  # SWE-bench Verified examples (curated, human-validated)
  swebench-lite/      # SWE-bench Lite examples (smaller subset)
scripts/
  import-swebench.py  # Import script to pull from HuggingFace datasets

Running Evals

Prerequisites:

  • AgentV installed
  • Docker available (SWE-bench images will be pulled automatically)

Run a single eval:

agentv eval evals/swebench-verified/django-15180.EVAL.yaml

Run all SWE-bench Verified evals:

agentv eval evals/swebench-verified/

Run with a specific target:

agentv eval evals/swebench-verified/ --target claude-opus

Results

Results are stored in .agentv/results/ (git-ignored). Use agentv studio to view and compare results across targets:

agentv studio

The default pass threshold is set to 0.8 in .agentv/config.yaml.

About

AgentV Bench: SWE-bench coding evaluations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors