Moatless Tools used in Stanford/Oxford/Google DeepMind paper

Loved to see Moatless Tools used to set a new SoTA on SWE-Bench Lite by using multi-shot (active search).

Read the [paper](https://arxiv.org/pdf/2407.21787)

From a [related article](https://medium.com/@ignacio.de.gregorio.noblejas/large-language-monkeys-is-the-best-model-always-the-best-option-no-da4b27171fe4) on Medium:

“Impressively, when running DeepSeek-V2-coder, a small language model with multiple sampling, the model outperformed state-of-the-art models like GPT-4o or Claude 3.5 Sonnet, achieving a new state-of-the-art 56% in SWE-Bench Lite (a benchmark that evaluates a model’s capacity to solve GitHub issues), while these two models, combined, achieved 43% (Mixed models).”

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moatless Tools used in Stanford/Oxford/Google DeepMind paper #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Moatless Tools used in Stanford/Oxford/Google DeepMind paper #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions