Benchmark Guide

OpenCode++ benchmarks compare context and harness modes.

Static Benchmark

opencode-plusplus benchmark benchmarks --top-k 8

Measures retrieval quality such as relevant files and required tests.

Agent Behavior Benchmark

opencode-plusplus benchmark-agent benchmarks --executor mock --dry-run

Modes:

no-context
agents-md
context-pack
loop-enabled-harness

Metrics:

wrong_files_changed
forbidden_files_changed
tests_missing
tests_failed
hallucinated_commands
iterations_to_finish
final_decision_accuracy
human_review_needed

Use the generic executor hook for real-agent comparisons:

opencode-plusplus benchmark-agent benchmarks \
  --executor opencode \
  --executor-command "opencode run --format json --dir {repo} --file {prompt} \"Follow the attached OpenCode++ task prompt.\"" \
  --max-loops 3 \
  --fail-on required

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Guide

Static Benchmark

Agent Behavior Benchmark

FilesExpand file tree

benchmark-guide.md

Latest commit

History

benchmark-guide.md

File metadata and controls

Benchmark Guide

Static Benchmark

Agent Behavior Benchmark