|
1 | | -# Neural-Guided-Genetic-Programming-for-Symbolic-Regression |
2 | | -A Unity based implementation of advanced evolutionary algorithms for discovering mathematical expressions from data enhanced with neural network guidance and multi objective optimization. |
| 1 | +# Neural-Guided Genetic Programming for Symbolic Regression |
| 2 | + |
| 3 | +A Unity bsed implementation of advanced evolutionary algorithms for discovering mathematical expressions from data, enhanced with neural network guidance and multiobjective optimization |
| 4 | +## Mathematical Foundation |
| 5 | + |
| 6 | +### Expression Representation |
| 7 | + |
| 8 | +Expressions are represented as binary trees where: |
| 9 | + |
| 10 | +- **Internal nodes**: Mathematical operators ∈ {+, −, ×, ÷, sin, cos, log, exp, ^, √} |
| 11 | +- **Leaf nodes**: Variables (x) or constants (c ∈ ℝ) |
| 12 | + |
| 13 | +Example: f(x) = x² + 1 → Tree(+, Tree(^, x, 2), 1) |
| 14 | + |
| 15 | +### Fitness Function |
| 16 | + |
| 17 | +The fitness φ of an individual is computed as: |
| 18 | +```js |
| 19 | +φ = -(MSE + λ·C) |
| 20 | +``` |
| 21 | + |
| 22 | + |
| 23 | +where: |
| 24 | +- MSE = (1/n)Σᵢ₌₁ⁿ (yᵢ - ŷᵢ)² is the mean squared error |
| 25 | +- C = number of nodes in expression tree (complexity measure) |
| 26 | +- λ ∈ ℝ⁺ is the complexity penalty weight |
| 27 | +- n is the number of data points |
| 28 | + |
| 29 | +### Multi-Objective Optimization |
| 30 | + |
| 31 | +The system implements NSGA-II (Non-dominated Sorting Genetic Algorithm II) to optimize two competing objectives: |
| 32 | + |
| 33 | +| Objective | Goal | Formula | |
| 34 | +|-----------|------|---------| |
| 35 | +| Accuracy | Minimize prediction error | min MSE(f) | |
| 36 | +| Simplicity | Minimize expression complexity | min \|nodes(f)\| | |
| 37 | + |
| 38 | +Pareto dominance: f₁ ≻ f₂ iff MSE(f₁) ≤ MSE(f₂) ∧ C(f₁) ≤ C(f₂) ∧ (MSE(f₁) < MSE(f₂) ∨ C(f₁) < C(f₂)) |
| 39 | + |
| 40 | +## Algorithm Components |
| 41 | + |
| 42 | +### Evolutionary Operators |
| 43 | + |
| 44 | +| Operator | Type | Probability | Description | |
| 45 | +|----------|------|-------------|-------------| |
| 46 | +| Constant Mutation | Point | 0.25 | δc ~ 𝒰(-σ, σ), c' = c + δc | |
| 47 | +| Subtree Mutation | Structural | 0.30 | Replace random subtree with new random tree | |
| 48 | +| Node Deletion | Structural | 0.20 | Remove node, promote child | |
| 49 | +| Simplification | Algebraic | 0.25 | Collapse constant-only subtrees | |
| 50 | +| Crossover | Recombination | 0.70 | Exchange random subtrees between parents | |
| 51 | + |
| 52 | +### Island Model Architecture |
| 53 | + |
| 54 | +The system employs a parallel island model for population diversity: |
| 55 | +- Islands: {I₁, I₂, I₃, I₄} |
| 56 | +- Topology: Ring (Iᵢ → I₍ᵢ₊₁₎ mod 4) |
| 57 | +- Migration interval: τ = 20 generations |
| 58 | +- Migration rate: m = 3 individuals per island |
| 59 | + |
| 60 | +Each island maintains different evolutionary parameters: |
| 61 | + |
| 62 | +| Island | Population | Mutation Rate μ | Complexity Weight λ | Max Depth d | |
| 63 | +|--------|-----------|----------------|---------------------|-------------| |
| 64 | +| I₁ | 250 | 0.70 | 1.0 | 4 | |
| 65 | +| I₂ | 250 | 0.75 | 1.5 | 5 | |
| 66 | +| I₃ | 250 | 0.80 | 2.0 | 4 | |
| 67 | +| I₄ | 250 | 0.85 | 2.5 | 5 | |
| 68 | + |
| 69 | +### Neural Guidance System |
| 70 | + |
| 71 | +A shallow neural network predicts expression fitness before evaluation: |
| 72 | + |
| 73 | +Architecture: → → |
| 74 | +Activation: ReLU(x) = max(0, x) |
| 75 | +Training: Online gradient descent with η |
| 76 | + |
| 77 | +```js |
| 78 | +P(accept worse solution) = exp(Δφ / T) |
| 79 | +T(t) = T₀ · αᵗ |
| 80 | +``` |
| 81 | + |
| 82 | + |
| 83 | +where T₀ = 1.0, α = 0.995 (cooling rate) |
| 84 | + |
| 85 | +## Performance Characteristics |
| 86 | + |
| 87 | +### Computational Complexity |
| 88 | + |
| 89 | +| Operation | Time Complexity | Space Complexity | |
| 90 | +|-----------|----------------|------------------| |
| 91 | +| Tree evaluation | O(n·m) | O(d) | |
| 92 | +| Population evolution | O(p·n·m) | O(p·d) | |
| 93 | +| NSGA-II sorting | O(p²·k) | O(p) | |
| 94 | +| Neural training | O(e·s·h²) | O(h²) | |
| 95 | + |
| 96 | +where: |
| 97 | +- n = number of data points |
| 98 | +- m = average expression size |
| 99 | +- d = tree depth |
| 100 | +- p = population size |
| 101 | +- k = number of objectives (2) |
| 102 | +- e = training epochs |
| 103 | +- s = training set size |
| 104 | +- h = hidden layer size |
| 105 | + |
| 106 | +### Benchmark Results |
| 107 | + |
| 108 | +| Function | Expression | Generations | Final MSE | Time (s) | |
| 109 | +|----------|------------|-------------|-----------|----------| |
| 110 | +| Linear | 2x + 1 | 15 | 1.2×10⁻⁸ | 0.8 | |
| 111 | +| Quadratic | x² + 1 | 28 | 3.4×10⁻⁷ | 1.5 | |
| 112 | +| Trigonometric | sin(x) | 142 | 2.1×10⁻⁴ | 7.3 | |
| 113 | +| Rational | (x+1)/(x+2) | 89 | 1.8×10⁻⁵ | 4.2 | |
| 114 | + |
| 115 | +## Installation |
| 116 | + |
| 117 | +### Prerequisites |
| 118 | + |
| 119 | +- Unity 2021.3 LTS or later |
| 120 | +- .NET Framework 4.7.1+ |
| 121 | +- TextMeshPro package |
| 122 | + |
| 123 | +### Setup |
| 124 | + |
| 125 | +1. Clone the repository: |
| 126 | +git clone https://github.com/InboraStudio/genetic-programming-symbolic-regression.git |
| 127 | +cd genetic-programming-symbolic-regression |
| 128 | + |
| 129 | +2. Open project in Unity |
| 130 | + |
| 131 | +3. Install TextMeshPro: |
| 132 | + - Window → Package Manager → Search "TextMesh Pro" → Install |
| 133 | + |
| 134 | +4. Import all scripts into Assets/Scripts/ |
| 135 | + |
| 136 | +## Usage |
| 137 | + |
| 138 | +### Basic Configuration |
| 139 | + |
| 140 | +// Create test dataset |
| 141 | +float[] inputs = new float[] { 0, 1, 2, 3, 4, 5 }; |
| 142 | +float[] outputs = new float[] { 1, 2, 5, 10, 17, 26 }; // x² + 1 |
| 143 | + |
| 144 | +// Configure GP parameters |
| 145 | +numberOfIslands = 4; |
| 146 | +populationPerIsland = 250; |
| 147 | +maxGenerations = 1000; |
| 148 | +deathRate = 0.7f; |
| 149 | +complexityWeight = 2.0f; |
| 150 | + |
| 151 | +### Running Evolution |
| 152 | + |
| 153 | +1. Attach `AdvancedSymbolicRegressionGP` to GameObject |
| 154 | +2. Configure parameters in Inspector |
| 155 | +3. Press Play |
| 156 | +4. Monitor Console for evolution progress |
| 157 | + |
| 158 | +### Visualization |
| 159 | + |
| 160 | +Attach `RealTimeGraphSystem` for live plotting: |
| 161 | +- Top-left: Evolution metrics (fitness, MSE, complexity) |
| 162 | +- Top-right: Function fit comparison |
| 163 | +- Bottom: Current best expression and statistics |
| 164 | + |
| 165 | +## Repository Structure |
| 166 | + |
| 167 | + |
| 168 | +## Algorithm Configuration Parameters |
| 169 | + |
| 170 | +### Recommended Settings |
| 171 | + |
| 172 | +| Parameter | Small Dataset (<50 pts) | Medium (50-500) | Large (>500) | |
| 173 | +|-----------|------------------------|-----------------|--------------| |
| 174 | +| Population/Island | 100-250 | 250-500 | 500-1000 | |
| 175 | +| Max Generations | 100-500 | 500-1000 | 1000-5000 | |
| 176 | +| Complexity Weight λ | 1.0-2.0 | 2.0-5.0 | 0.5-1.0 | |
| 177 | +| Migration Interval | 10-20 | 20-50 | 50-100 | |
| 178 | + |
| 179 | +### Parameter Sensitivity |
| 180 | + |
| 181 | +High impact parameters (tune first): |
| 182 | +- Population size: Linear relationship with solution quality |
| 183 | +- Complexity weight: Balances accuracy vs. simplicity |
| 184 | +- Death rate: Controls selection pressure |
| 185 | + |
| 186 | +Low impact parameters (use defaults): |
| 187 | +- Migration topology: Ring vs. Star (< 5% difference) |
| 188 | +- Neural guidance interval: 30-100 generations |
| 189 | +- Cooling rate: 0.99-0.999 |
| 190 | + |
| 191 | +## Theoretical Background |
| 192 | + |
| 193 | +### References |
| 194 | + |
| 195 | +This implementation is based on the following research: |
| 196 | + |
| 197 | +1. **Genetic Programming**: Koza, J. R. (1992). *Genetic Programming: On the Programming of Computers by Means of Natural Selection*. MIT Press. |
| 198 | + |
| 199 | +2. **Island Models**: Whitley, D., Rana, S., & Heckendorn, R. B. (1998). *The Island Model Genetic Algorithm: On Separability, Population Size and Convergence*. Journal of Computing and Information Technology. |
| 200 | + |
| 201 | +3. **NSGA-II**: Deb, K., et al. (2002). *A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II*. IEEE Transactions on Evolutionary Computation, 6(2), 182-197. |
| 202 | + |
| 203 | +4. **Symbolic Regression**: Schmidt, M., & Lipson, H. (2009). *Distilling Free-Form Natural Laws from Experimental Data*. Science, 324(5923), 81-85. |
| 204 | + |
| 205 | +5. **Neural-Guided Evolution**: Lample, G., & Charton, F. (2019). *Deep Learning for Symbolic Mathematics*. arXiv:1912.01412. |
| 206 | + |
| 207 | +## Applications |
| 208 | + |
| 209 | +- **Physics**: Discovery of governing equations from experimental data |
| 210 | +- **Engineering**: System identification and model fitting |
| 211 | +- **Economics**: Empirical relationship modeling |
| 212 | +- **Biology**: Gene regulatory network inference |
| 213 | +- **Data Science**: Feature engineering and transformation discovery |
| 214 | + |
| 215 | +## Performance Optimization |
| 216 | + |
| 217 | +### Parallel Execution |
| 218 | + |
| 219 | +Island model enables parallel evolution: |
| 220 | +```cs |
| 221 | +Parallel.ForEach(islands, island => { |
| 222 | +island.EvolveGeneration(inputData, outputData, deathRate, eliteCount); |
| 223 | +}); |
| 224 | +``` |
| 225 | + |
| 226 | +Expected speedup: S ≈ N_islands / (1 + communication_overhead) |
| 227 | + |
| 228 | +### Memory Usage |
| 229 | + |
| 230 | +Approximate memory footprint: |
| 231 | +```cs |
| 232 | +Memory ≈ (p × d × 4 bytes) × N_islands + (h² × 4 bytes) |
| 233 | +≈ (1000 × 20 × 4) × 4 + (64² × 4) |
| 234 | +≈ 336 KB |
| 235 | +``` |
| 236 | + |
| 237 | +## Known Limitations |
| 238 | + |
| 239 | +1. **Discontinuous Functions**: Struggles with step functions and piecewise definitions |
| 240 | +2. **High-Frequency Oscillations**: May underfit rapidly oscillating functions |
| 241 | +3. **Dimensional Analysis**: Does not enforce physical unit consistency |
| 242 | +4. **Computational Scaling**: O(n·m) evaluation cost limits large datasets |
| 243 | + |
| 244 | +## Future Enhancements |
| 245 | + |
| 246 | +- [ ] Multi-variable support (f: ℝⁿ → ℝ) |
| 247 | +- [ ] Dimensional analysis constraints |
| 248 | +- [ ] GPU-accelerated tree evaluation |
| 249 | +- [ ] Automatic operator selection based on domain |
| 250 | +- [ ] Integration with Unity ML-Agents |
| 251 | + |
| 252 | +## Contributing |
| 253 | + |
| 254 | +Contributions are welcome. Please follow these guidelines: |
| 255 | + |
| 256 | +1. Fork the repository |
| 257 | +2. Create feature branch: `git checkout -b feature/YourFeature` |
| 258 | +3. Maintain code documentation standards |
| 259 | +4. Add unit tests for new functionality |
| 260 | +5. Submit pull request with detailed description |
| 261 | + |
| 262 | +## License |
| 263 | + |
| 264 | +MIT License - see LICENSE file for details |
| 265 | + |
| 266 | +## Citation |
| 267 | + |
| 268 | +If you use this software in your research, please cite: |
| 269 | + |
| 270 | +@software{inbora_gp_2025, |
| 271 | +author = {Dr Chamyoung (Alok)}, |
| 272 | +title = {Neural-Guided Genetic Programming for Symbolic Regression}, |
| 273 | +year = {2025}, |
| 274 | +publisher = {GitHub}, |
| 275 | +url = {https://github.com/InboraStudio/genetic-programming-symbolic-regression} |
| 276 | +} |
0 commit comments