Skip to content

Commit 71ee155

Browse files
22-06-25
1 parent 540aaec commit 71ee155

1 file changed

Lines changed: 9 additions & 3 deletions

File tree

knowledge_base/AI/Benchmarks/CodeGeneration.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## LiveCodeBench
1+
## LiveCodeBench : AI Competitive programming benchmark
22
[paper](https://arxiv.org/pdf/2403.07974)
33
[blog](https://huggingface.co/blog/leaderboard-livecodebench)
44
- solve competitive problems
@@ -11,7 +11,7 @@
1111
- test cases are generated by Gpt-4-turbo based on problem description
1212
- verified by running on known solution
1313

14-
## SWE Bench : AI Agent benchmarking
14+
## SWE Bench : AI Software engineer benchmark
1515
[site](https://www.swebench.com/),[paper](https://arxiv.org/pdf/2310.06770v2)
1616
- resolve github issues
1717
- Input : Issue, Code base snapshot
@@ -26,4 +26,10 @@
2626
- Filter by increase in test fail-to-pass ratio
2727
- After filtering, out of 90000 problems, 2294 were selected
2828
- [openai](https://openai.com/index/introducing-swe-bench-verified) partnered to verify the benchmark
29-
- makes sure the tests captures that the Issue is fixed and are not dependent on the implementation details (follows BDD)
29+
- makes sure the tests captures that the Issue is fixed and are not dependent on the implementation details (follows BDD)
30+
31+
## SWE Lancer : AI Freelancer benchmark
32+
[paper](https://arxiv.org/pdf/2502.12115)
33+
- a benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in realworld payouts.
34+
- Input/Output same as SWE bench
35+
- Evaluation : End to End browser automation tests from original freelancer of the task.

0 commit comments

Comments
 (0)