Skip to content

Commit eae8e6d

Browse files
committed
add cma-es to docs
1 parent 5209fca commit eae8e6d

File tree

4 files changed

+228
-1
lines changed

4 files changed

+228
-1
lines changed

docs/source/_static/gifs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../gifs

docs/source/api_reference.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Complete API documentation for all optimizers and their methods.
1111
Overview
1212
========
1313

14-
All 22 optimizers in Gradient-Free-Optimizers share a common interface,
14+
All 23 optimizers in Gradient-Free-Optimizers share a common interface,
1515
making it easy to switch between algorithms without changing your code.
1616

1717
.. code-block:: python
@@ -82,6 +82,7 @@ Optimizer Categories
8282
- :class:`~gradient_free_optimizers.GeneticAlgorithmOptimizer`
8383
- :class:`~gradient_free_optimizers.EvolutionStrategyOptimizer`
8484
- :class:`~gradient_free_optimizers.DifferentialEvolutionOptimizer`
85+
- :class:`~gradient_free_optimizers.CMAESOptimizer`
8586

8687
.. grid-item-card:: Sequential Model-Based
8788
:class-card: sd-border-start sd-border-warning
@@ -118,6 +119,7 @@ Optimizers
118119
gradient_free_optimizers.GeneticAlgorithmOptimizer
119120
gradient_free_optimizers.EvolutionStrategyOptimizer
120121
gradient_free_optimizers.DifferentialEvolutionOptimizer
122+
gradient_free_optimizers.CMAESOptimizer
121123
gradient_free_optimizers.BayesianOptimizer
122124
gradient_free_optimizers.TreeStructuredParzenEstimators
123125
gradient_free_optimizers.ForestOptimizer
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
======
2+
CMA-ES
3+
======
4+
5+
CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is a state-of-the-art
6+
evolutionary algorithm for continuous optimization. It maintains a multivariate
7+
normal distribution over the search space and adapts a full covariance matrix to
8+
learn the correlation structure of the fitness landscape. Each generation, the
9+
algorithm samples candidate solutions, ranks them by fitness, shifts the
10+
distribution mean toward the best solutions, and updates the covariance matrix
11+
using evolution paths. A cumulative step-size adaptation mechanism controls the
12+
global step size.
13+
14+
CMA-ES is widely regarded as the default algorithm for continuous black-box
15+
optimization in moderate dimensions (up to ~100). Unlike simpler evolution
16+
strategies that use only a scalar or diagonal step-size, CMA-ES learns
17+
arbitrary axis-aligned and rotated ellipsoidal distributions. This makes it
18+
particularly effective when parameters are correlated or have very different
19+
sensitivities. For example, if increasing ``x`` should be accompanied by
20+
decreasing ``y`` to improve the objective, CMA-ES will learn this relationship
21+
and sample accordingly.
22+
23+
For mixed search spaces with discrete or categorical dimensions, the
24+
implementation samples in a normalized continuous space and maps back to valid
25+
grid values via rounding. This is the standard MI-CMA-ES approach.
26+
27+
28+
Algorithm
29+
---------
30+
31+
Each generation:
32+
33+
1. **Sample**: Draw ``population`` candidates from :math:`\mathcal{N}(m, \sigma^2 C)`
34+
2. **Evaluate**: Score all candidates
35+
3. **Rank**: Sort by fitness, select the best ``mu``
36+
4. **Update mean**: Shift ``m`` toward the weighted mean of the best ``mu``
37+
5. **Update evolution paths**: Accumulate step information (p_sigma, p_c)
38+
6. **Update covariance**: Rank-one update (from p_c) + rank-mu update (from selected solutions)
39+
7. **Adapt step size**: Increase sigma if steps are correlated, decrease if oscillating
40+
41+
.. code-block:: text
42+
43+
x_k = mean + sigma * B @ D @ z_k # sample (z_k ~ N(0, I))
44+
mean_new = sum(w_i * x_i:mu) # weighted recombination
45+
p_sigma = (1-c_s) * p_sigma + ... # evolution path for step size
46+
p_c = (1-c_c) * p_c + ... # evolution path for covariance
47+
C = (1-c_1-c_mu) * C + c_1 * p_c @ p_c.T + c_mu * rank_mu_update
48+
sigma = sigma * exp(c_s/d_s * (||p_sigma||/E||N(0,I)|| - 1))
49+
50+
The covariance matrix ``C`` is decomposed as :math:`C = B D^2 B^T` where ``B``
51+
holds the eigenvectors (rotation) and ``D`` the eigenvalues (axis lengths).
52+
53+
.. note::
54+
55+
CMA-ES automatically sets most internal parameters (learning rates,
56+
weights, damping) from the dimensionality and population size. You
57+
typically only need to set ``population``, ``sigma``, and optionally
58+
``ipop_restart``.
59+
60+
61+
Parameters
62+
----------
63+
64+
.. list-table::
65+
:header-rows: 1
66+
:widths: 20 15 15 50
67+
68+
* - Parameter
69+
- Type
70+
- Default
71+
- Description
72+
* - ``population``
73+
- int | None
74+
- None
75+
- Candidates per generation (lambda). ``None`` uses ``4 + floor(3 * ln(n))``.
76+
* - ``mu``
77+
- int | None
78+
- None
79+
- Number of parents selected. ``None`` uses ``population // 2``.
80+
* - ``sigma``
81+
- float
82+
- 0.3
83+
- Initial step size as fraction of normalized space.
84+
* - ``ipop_restart``
85+
- bool
86+
- False
87+
- Enable IPOP restart on stagnation (doubles population each restart).
88+
89+
90+
Step Size (sigma)
91+
^^^^^^^^^^^^^^^^^
92+
93+
The initial sigma controls the initial spread of samples. CMA-ES adapts it
94+
automatically, so the starting value is not critical.
95+
96+
.. code-block:: python
97+
98+
# Conservative start (fine-tuning around a known good region)
99+
opt = CMAESOptimizer(search_space, sigma=0.1)
100+
101+
# Broad initial exploration
102+
opt = CMAESOptimizer(search_space, sigma=0.5)
103+
104+
105+
IPOP Restart
106+
^^^^^^^^^^^^
107+
108+
When stagnation is detected, IPOP restarts with a doubled population and a new
109+
random starting point. This is effective for multi-modal landscapes where
110+
a single run may converge to a suboptimal local optimum.
111+
112+
.. code-block:: python
113+
114+
opt = CMAESOptimizer(
115+
search_space,
116+
ipop_restart=True,
117+
)
118+
119+
120+
Example
121+
-------
122+
123+
.. code-block:: python
124+
125+
import numpy as np
126+
from gradient_free_optimizers import CMAESOptimizer
127+
128+
def rosenbrock(para):
129+
x, y = para["x"], para["y"]
130+
return -(100 * (y - x**2)**2 + (1 - x)**2)
131+
132+
search_space = {
133+
"x": np.linspace(-5, 5, 1000),
134+
"y": np.linspace(-5, 5, 1000),
135+
}
136+
137+
opt = CMAESOptimizer(
138+
search_space,
139+
population=20,
140+
sigma=0.3,
141+
)
142+
143+
opt.search(rosenbrock, n_iter=500)
144+
print(f"Best: {opt.best_para}, Score: {opt.best_score}")
145+
146+
147+
When to Use
148+
-----------
149+
150+
**Good for:**
151+
152+
- Continuous optimization with correlated parameters
153+
- Problems where parameter sensitivities differ strongly
154+
- Moderate dimensionality (2-100 dimensions)
155+
- Multi-modal landscapes (with ``ipop_restart=True``)
156+
157+
**Not ideal for:**
158+
159+
- Very high dimensions (>100), where the covariance matrix becomes expensive
160+
- Purely discrete/combinatorial problems (GA or DE are better suited)
161+
- Very tight iteration budgets (CMA-ES needs several generations to adapt)
162+
163+
**Compared to other population-based optimizers:**
164+
165+
- CMA-ES vs ES: CMA-ES adapts a full covariance matrix; ES uses scalar/diagonal step sizes
166+
- CMA-ES vs PSO: CMA-ES models the landscape shape; PSO uses velocity/social dynamics
167+
- CMA-ES vs DE: CMA-ES learns correlations explicitly; DE derives steps from population differences
168+
169+
170+
High-Dimensional Example
171+
-------------------------
172+
173+
.. code-block:: python
174+
175+
import numpy as np
176+
from gradient_free_optimizers import CMAESOptimizer
177+
178+
def ellipsoid(para):
179+
total = 0
180+
for i, key in enumerate(sorted(para)):
181+
total += (10 ** (2 * i / 9)) * para[key] ** 2
182+
return -total
183+
184+
search_space = {
185+
f"x{i}": np.linspace(-5, 5, 200)
186+
for i in range(10)
187+
}
188+
189+
opt = CMAESOptimizer(
190+
search_space,
191+
population=30,
192+
sigma=0.3,
193+
ipop_restart=True,
194+
)
195+
196+
opt.search(ellipsoid, n_iter=2000)
197+
print(f"Best score: {opt.best_score}")
198+
199+
200+
Trade-offs
201+
----------
202+
203+
- **Exploration vs. exploitation**: sigma controls initial spread; the algorithm
204+
self-adapts over time. IPOP restart adds macro-level exploration.
205+
- **Computational overhead**: Per generation, CMA-ES performs an eigendecomposition
206+
of the covariance matrix (O(n^3)), making it expensive for very high dimensions.
207+
- **Population size**: Larger populations improve robustness on multi-modal problems
208+
but require more evaluations per generation. The default heuristic is a good
209+
starting point.
210+
211+
212+
Related Algorithms
213+
------------------
214+
215+
- :doc:`evolution_strategy` - Simpler ES with mutation-based search
216+
- :doc:`differential_evolution` - Self-adaptive step sizes from population differences
217+
- :doc:`particle_swarm` - Swarm-based approach with velocity dynamics

docs/source/user_guide/optimizers/population/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ Overview
2828
- Population of hill climbers with occasional mixing.
2929
* - :doc:`differential_evolution`
3030
- Creates new solutions from weighted differences.
31+
* - :doc:`cma_es`
32+
- Adapts a full covariance matrix to learn landscape structure.
3133

3234

3335
When to Use Population-Based
@@ -120,6 +122,10 @@ Algorithm Comparison
120122
- Difference vectors
121123
- Non-linear continuous
122124
- Self-adaptive steps
125+
* - CMA-ES
126+
- Covariance matrix
127+
- Correlated continuous
128+
- Full covariance adaptation
123129

124130

125131
Conceptual Comparison
@@ -217,3 +223,4 @@ Algorithms
217223
genetic_algorithm
218224
evolution_strategy
219225
differential_evolution
226+
cma_es

0 commit comments

Comments
 (0)