Skip to content

Latest commit

 

History

History
45 lines (33 loc) · 935 Bytes

File metadata and controls

45 lines (33 loc) · 935 Bytes

Benchmark Guide

OpenCode++ benchmarks compare context and harness modes.

Static Benchmark

opencode-plusplus benchmark benchmarks --top-k 8

Measures retrieval quality such as relevant files and required tests.

Agent Behavior Benchmark

opencode-plusplus benchmark-agent benchmarks --executor mock --dry-run

Modes:

  • no-context
  • agents-md
  • context-pack
  • loop-enabled-harness

Metrics:

  • wrong_files_changed
  • forbidden_files_changed
  • tests_missing
  • tests_failed
  • hallucinated_commands
  • iterations_to_finish
  • final_decision_accuracy
  • human_review_needed

Use the generic executor hook for real-agent comparisons:

opencode-plusplus benchmark-agent benchmarks \
  --executor opencode \
  --executor-command "opencode run --format json --dir {repo} --file {prompt} \"Follow the attached OpenCode++ task prompt.\"" \
  --max-loops 3 \
  --fail-on required