22-06-25

vignesh14052002 · vignesh14052002 · commit 71ee155c7efc · 2025-06-22T20:01:22.000+05:30
diff --git a/knowledge_base/AI/Benchmarks/CodeGeneration.md b/knowledge_base/AI/Benchmarks/CodeGeneration.md
@@ -1,4 +1,4 @@
-## LiveCodeBench
+## LiveCodeBench : AI Competitive programming benchmark
 [paper](https://arxiv.org/pdf/2403.07974)
 [blog](https://huggingface.co/blog/leaderboard-livecodebench)
 - solve competitive problems
@@ -11,7 +11,7 @@
 - test cases are generated by Gpt-4-turbo based on problem description
   - verified by running on known solution
 
-## SWE Bench : AI Agent benchmarking
+## SWE Bench : AI Software engineer benchmark
 [site](https://www.swebench.com/),[paper](https://arxiv.org/pdf/2310.06770v2)
 - resolve github issues
   - Input : Issue, Code base snapshot
@@ -26,4 +26,10 @@
   - Filter by increase in test fail-to-pass ratio
   - After filtering, out of 90000 problems, 2294 were selected
 - [openai](https://openai.com/index/introducing-swe-bench-verified) partnered to verify the benchmark
-  - makes sure the tests captures that the Issue is fixed and are not dependent on the implementation details (follows BDD)
+  - makes sure the tests captures that the Issue is fixed and are not dependent on the implementation details (follows BDD)
+  
+## SWE Lancer : AI Freelancer benchmark
+[paper](https://arxiv.org/pdf/2502.12115)
+- a benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in realworld payouts.
+  - Input/Output same as SWE bench
+  - Evaluation : End to End browser automation tests from original freelancer of the task.