Skip to content

Commit 57e314a

Browse files
author
Anonymous
committed
Reorganize repo before update: update readme, move experiments to other repository
1 parent 367278f commit 57e314a

5 files changed

Lines changed: 487 additions & 217 deletions

File tree

CONTRIBUTING.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Contributing to CodeEvolve
2+
3+
Thank you for your interest in contributing to CodeEvolve! We welcome contributions from the community and appreciate your efforts to improve the framework.
4+
5+
## Getting Started
6+
7+
1. **Check existing issues:** Browse [open issues](https://github.com/inter-co/science-codeevolve/issues) to see if your idea or bug has already been reported
8+
2. **Create an issue first:** Before starting work, open an issue describing your proposed change, bug fix, or feature
9+
3. **Wait for feedback:** Discuss your approach with maintainers to ensure alignment with project goals
10+
11+
## Contribution Workflow
12+
13+
### 1. Fork and Clone
14+
```bash
15+
git clone https://github.com/YOUR_USERNAME/science-codeevolve.git
16+
cd science-codeevolve
17+
```
18+
19+
### 2. Create a Branch
20+
```bash
21+
git checkout -b feature/your-feature-name
22+
```
23+
24+
### 3. Make Changes
25+
- Write clear, documented code
26+
- Follow existing code style and conventions
27+
- Add tests for new functionality
28+
- Update documentation as needed
29+
30+
### 4. Test Your Changes
31+
```bash
32+
# Run tests to ensure nothing breaks
33+
pytest tests/
34+
```
35+
36+
### 5. Commit and Push
37+
```bash
38+
git add .
39+
git commit -m "Brief description of changes"
40+
git push origin feature/your-feature-name
41+
```
42+
43+
### 6. Submit a Pull Request
44+
- Reference the related issue number
45+
- Provide a clear description of what your PR does
46+
- Explain why the change is necessary
47+
- Include examples or screenshots if applicable
48+
49+
## Pull Request Guidelines
50+
51+
**Keep PRs focused:** Each PR should address a single issue or feature. Avoid combining multiple unrelated changes.
52+
53+
**Write good descriptions:** Clearly explain what your PR does, why it's needed, and how it works.
54+
55+
**Ensure quality:** All code should be tested and documented. PRs with untested changes or massive auto-generated modifications will not be accepted.
56+
57+
**Be responsive:** Address review feedback promptly and be open to suggestions.
58+
59+
## Documentation
60+
61+
Update relevant documentation when making changes:
62+
- README.md for user-facing changes
63+
- Docstrings for API changes
64+
- Comments for complex logic
65+
66+
## Questions?
67+
68+
If you have questions about contributing, feel free to:
69+
- Open an issue for discussion
70+
- Reach out to maintainers
71+
- Check existing documentation
72+
73+
Thank you for helping make CodeEvolve better!

README.md

Lines changed: 104 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,145 @@
11
# CodeEvolve
2+
23
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
3-
[![preprint](https://img.shields.io/badge/preprint-arxiv.2510.14150-red)](https://arxiv.org/abs/2510.14150)
4+
[![arxiv](https://img.shields.io/badge/arxiv-arxiv.2510.14150-red)](https://arxiv.org/abs/2510.14150)
45

5-
CodeEvolve is an open-source evolutionary coding agent, designed to iteratively improve an initial codebase against a set of user-defined metrics. This project was originally created as an attempt to replicate the results of [AlphaEvolve](https://arxiv.org/abs/2506.13131), a closed-source coding agent announced by Google DeepMind in 2025.
6+
<img src='assets/codeevolve_diagram.png' align="center" width=900 />
67

7-
Our primary goal is to implement a transparent, reproducible, and community-driven framework for LLM-driven algorithm discovery. We evaluate CodeEvolve on the same mathematical benchmarks as AlphaEvolve. CodeEvolve has surpassed the reported state-of-the-art performance on 4 of its 13 problems (see the jupyter notebook with our [results](notebooks/results.ipynb)). We are actively working to improve our method on the remaining benchmarks.
8+
**An open-source framework that combines large language models with evolutionary algorithms to discover and optimize high-performing code solutions.**
89

9-
## Overview
10+
CodeEvolve democratizes algorithmic discovery by making LLM-driven evolutionary search transparent, reproducible, and accessible. Whether you're tackling combinatorial optimization, discovering novel algorithms, or optimizing computational kernels, CodeEvolve provides a modular foundation for automated code synthesis guided by quantifiable metrics.
1011

11-
<img src='assets/codeevolve_diagram.png' align="center" width=900 />
12+
## Why CodeEvolve?
13+
14+
**State-of-the-art performance with transparency.** CodeEvolve matches or exceeds the performance of closed-source systems like Google DeepMind's AlphaEvolve on established algorithm-discovery benchmarks, while remaining fully open and reproducible.
15+
16+
**Cost-effective solutions.** Open-weight models like Qwen often match or outperform expensive closed-source LLMs at a fraction of the compute cost, making cutting-edge algorithmic discovery accessible to researchers and practitioners with limited budgets.
17+
18+
**Designed for real problems.** CodeEvolve addresses meta-optimization tasks where you need to discover programs that solve complex optimization problems.
19+
20+
## Key Features
21+
22+
### Islands-based Genetic Algorithm
23+
Multiple populations evolve independently and periodically exchange top performers, maintaining diversity while propagating successful solutions across the search space. This parallel architecture enables efficient exploration and scales naturally to concurrent evaluation.
24+
25+
### Modular Operators
26+
27+
**Inspiration-based Crossover:** Contextual recombination that combines successful solution patterns while preserving semantic coherence.
28+
29+
**Meta-prompting Exploration:** Evolves the prompts themselves, enabling the LLM to reflect on and rewrite its own instructions for more diverse search trajectories.
30+
31+
**Depth-based Exploitation:** Targeted refinement mechanism that makes precise edits to promising solutions, balancing global search with local optimization.
32+
33+
## Architecture
34+
35+
CodeEvolve operates through an iterative process at each epoch:
36+
37+
1. **Population Management:** Each island maintains populations of prompts and solutions, evaluated against user-defined fitness metrics
38+
2. **Evolutionary Operators:** Generate new candidates through crossover, mutation, and meta-prompting
39+
3. **LLM Ensemble:** Transforms operator instructions into executable code modifications
40+
4. **Selection & Migration:** Top performers are retained and periodically migrated between islands
41+
5. **Archive:** MAP-Elites-based archive preserves behavioral diversity across the search
42+
43+
Execution feedback and fitness signals guide the entire loop, translating LLM proposals into testable, executable artifacts.
44+
45+
## Performance Highlights
46+
47+
CodeEvolve demonstrates superior performance on several benchmarks previously used to assess AlphaEvolve:
48+
49+
- **Competitive or better results** across diverse algorithm-discovery tasks
50+
- **Open-weight models** (e.g., Qwen) matching closed-source performance at significantly lower cost
51+
- **Extensive ablations** quantifying each component's contribution to search efficiency
1252

13-
CodeEvolve is built upon an island-based genetic algorithm designed to maintain population diversity and increase throughput through parallel search efforts. Within this architecture, the evolutionary cycle is driven by two primary operators designed to balance the search process:
53+
For comprehensive evaluation details, see our [technical report](https://arxiv.org/abs/2510.14150).
1454

15-
1. **Depth exploitatiom**: This operator selects high-performing parent solutions ($S$) via rank-based selection and uses the LLM Ensemble to perform targeted, incremental refinements. Crucially, it provides the LLM with the parent's full lineage, including its $k$ closest ancestors, $A_k(S)$, to constrain the search space toward local optima refinement.
16-
17-
2. **Meta-Prompting Exploration**: This operator fosters diversity by instructing an auxiliary LLM (MetaPromptingLLM) to analyze the current solution $S$ and its original prompt $P(S)$, generating an enriched new prompt $P'$. This new prompt is then used by the LLMEnsemble to generate a potentially novel solution $S'$. This step intentionally excludes the ancestral context, encouraging the model to explore distinct pathways.
55+
## Reproducing Research Results
1856

19-
Both operators are complemented by an **Inspiration-based Crossover** mechanism, which avoids traditional syntactic splicing by providing the LLMEnsemble with sampled high-performing solutions as additional context. This encourages the LLM to synthesize new solutions by semantically combining successful logic or patterns from multiple parents.
57+
For complete experimental configurations, benchmark implementations, and step-by-step examples demonstrating how to run CodeEvolve on various problems, visit our experiments repository:
58+
59+
**[github.com/inter-co/science-codeevolve-experiments](https://github.com/inter-co/science-codeevolve-experiments)**
60+
61+
This companion repository contains all code necessary to reproduce the results from our [technical report](https://arxiv.org/abs/2510.14150).
62+
63+
## Quick Start
64+
65+
### Installation
66+
67+
Clone this repository and create the conda environment:
2068

21-
## Usage
22-
To setup the proper conda environment, run the following:
2369
```bash
70+
git clone https://github.com/inter-co/science-codeevolve.git
71+
cd science-codeevolve
2472
conda env create -f environment.yml
2573
conda activate codeevolve
2674
```
27-
The command-line version of codeevolve is implemented in ```src/codeevolve/cli.py```, and ```scripts/run.sh``` contains a bash script for running codeevolve on a given benchmark. The most important variables to be defined in this file are the ```API_KEY, API_BASE``` environment variables for connecting with an LLM provider.
2875

29-
More comprehensive tutorials will be released soon.
76+
### Basic Usage
77+
78+
Configure your LLM provider by setting environment variables:
79+
80+
```bash
81+
export API_KEY=your_api_key_here
82+
export API_BASE=your_api_base_url
83+
```
84+
85+
You can run codeevolve via the terminal as follows:
86+
```bash
87+
codeevolve --inpt_dir=INPT_DIR --cfg_path=CFG_PATH --out_dir=RESULTS_DIR --load_ckpt=LOAD_CKPT --terminal_logging
88+
```
89+
See `src/codeevolve/cli.py` for further details. Our [experiments repository](https://github.com/inter-co/science-codeevolve-experiments) multiple usage examples.
3090

31-
## Next steps
91+
### Customizing for Your Problem
3292

33-
We are actively developing CodeEvolve to be a more powerful and robust framework. Our immediate roadmap is focused on incorporating more sophisticated evolutionary mechanisms mentioned in our future work:
93+
CodeEvolve is designed for algorithmic problems with quantifiable metrics. To apply it to your domain:
3494

35-
1. **Dynamic Exploration/Exploitation**: Currently, the choice between exploration (meta-prompting) and exploitation (depth) is set by a fixed probability (exploration_rate). A major planned feature is to implement a more dynamic scheduling, potentially using Reinforcement Learning methods, to adapt this trade-off based on the state of the evolutionary search.
95+
1. Define your evaluation function that measures solution quality
96+
2. Specify the initial codebase or problem structure
97+
3. Configure evolutionary parameters (population size, mutation rates, etc.)
98+
4. Choose your LLM ensemble composition
3699

37-
We plan on working on performance improvements, e.g. faster sampling, asynchronous islands algorithm, etc. We also intend on implementing more benchmark problems to test CodeEvolve.
38-
## Project background and inspirations
100+
Comprehensive tutorials and example notebooks are coming soon.
39101

40-
This project was initiated as an effort to replicate and build upon the work presented in Google DeepMind's AlphaEvolve whitepaper. The closed-source nature of that project presented a barrier to community-driven progress. Our goal is to provide a transparent, open-source framework that implements the high-level concepts of LLM-driven evolution, allowing for reproducible research and collective innovation.
102+
## Use Cases
41103

42-
During the initial stages, we drew inspiration from other open-source efforts like [OpenEvolve](https://github.com/codelion/openevolve), which validated the community's interest in this domain. We are grateful to the contributors of such projects for paving the way and demonstrating the viability of open-source research in this field.
104+
The framework is suitable for any domain where solutions can be represented as code and evaluated programmatically. Some common examples include:
105+
106+
- **Mathematical constructions:** Finding solutions to open problems in mathematics
107+
- **Algorithm design:** Optimizing computational kernels and scheduling algorithms
108+
- **Scientific discovery:** Exploring hypothesis spaces expressed as executable code
43109

44110
## Contributing
45111

46-
We are not accepting pull requests at this time, as we are still actively developing and changing most of the features from CodeEvolve. We plan on accepting pull requests soon. However, you can contribute by reporting issues or suggesting features through the creation of a [GitHub issue](https://github.com/inter-co/science-codeevolve/issues).
112+
We welcome contributions from the community! Here's how to get involved:
113+
114+
1. **Start with an issue:** Browse existing issues or create a new one describing your proposed change
115+
2. **Submit a pull request:** Reference the issue in your PR description
116+
3. **Keep PRs focused:** Avoid massive changes—smaller, well-tested contributions are easier to review
117+
4. **Maintain quality:** Ensure code is tested and documented
118+
119+
Please refer to `CONTRIBUTING.md` for detailed guidelines.
47120

48121
## Citation
49122

123+
If you use CodeEvolve in your research, please cite our paper:
124+
50125
```bibtex
51126
@article{assumpção2025codeevolveopensourceevolutionary,
52-
title={CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization},
127+
title={CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization},
53128
author={Henrique Assumpção and Diego Ferreira and Leandro Campos and Fabricio Murai},
54129
year={2025},
55130
eprint={2510.14150},
56131
archivePrefix={arXiv},
57132
primaryClass={cs.AI},
58-
url={https://arxiv.org/abs/2510.14150},
133+
url={https://arxiv.org/abs/2510.14150},
59134
}
60135
```
61136

137+
## Acknowledgements
138+
139+
The authors thank Bruno Grossi for his continuous support during the development of this project. We thank Fernando Augusto and Tiago Machado for useful conversations about possible applications of CodeEvolve. We also thank the [OpenEvolve](https://github.com/codelion/openevolve) community for their inspiration and discussion about evolutionary coding agents.
140+
62141
## License and Disclaimer
63142

64143
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0.
65144

66-
**This is not an official Inter product.**
145+
**This is not an official Inter product.**

assets/codeevolve_diagram.png

-250 KB
Loading

scripts/clean_ckpts.sh

Lines changed: 0 additions & 61 deletions
This file was deleted.

0 commit comments

Comments
 (0)