update docs with storage and dist.

SimonBlanke · SimonBlanke · commit 99a2d6eba2b7 · 2026-04-05T17:40:20.000+02:00
diff --git a/docs/source/api_reference.rst b/docs/source/api_reference.rst
@@ -225,15 +225,38 @@ Control output during search:
 Distributed Evaluation
 ----------------------
 
-Backends for parallel objective function evaluation.
+Backends for parallel objective function evaluation. Sync backends evaluate
+full batches, async backends process results as they arrive.
 
 .. code-block:: python
 
-    from gradient_free_optimizers.distributed import Multiprocessing, BaseDistribution
+    from gradient_free_optimizers.distributed import Joblib, Ray, Dask
 
 .. autosummary::
     :toctree: api_reference/generated/
     :template: class.rst
 
     gradient_free_optimizers.distributed.BaseDistribution
     gradient_free_optimizers.distributed.Multiprocessing
+    gradient_free_optimizers.distributed.Joblib
+    gradient_free_optimizers.distributed.Ray
+    gradient_free_optimizers.distributed.Dask
+
+
+Evaluation Storage
+------------------
+
+Pluggable backends for caching objective function evaluations. Pass a storage
+instance to the ``memory`` parameter of ``search()``.
+
+.. code-block:: python
+
+    from gradient_free_optimizers.storage import SQLiteStorage
+
+.. autosummary::
+    :toctree: api_reference/generated/
+    :template: class.rst
+
+    gradient_free_optimizers.storage.BaseStorage
+    gradient_free_optimizers.storage.MemoryStorage
+    gradient_free_optimizers.storage.SQLiteStorage
diff --git a/docs/source/user_guide.rst b/docs/source/user_guide.rst
@@ -156,6 +156,13 @@ Guide Sections
       Parallel evaluation across
       multiple workers.
 
+   .. grid-item-card:: Evaluation Storage
+      :link: user_guide/storage
+      :link-type: doc
+
+      Persistent caching and crash
+      recovery with pluggable backends.
+
 ----
 
 Algorithms
@@ -220,6 +227,7 @@ GFO provides 22 optimization algorithms organized into four categories:
    user_guide/stopping_conditions
    user_guide/search_interface
    user_guide/distributed
+   user_guide/storage
 
 .. toctree::
    :maxdepth: 2
diff --git a/docs/source/user_guide/distributed.rst b/docs/source/user_guide/distributed.rst
@@ -15,7 +15,7 @@ minimal changes to your existing code.
 
       .. code-block:: python
 
-          @Multiprocessing(n_workers=4).distribute
+          @Joblib(n_workers=4).distribute
           def objective(para):
               ...
 
@@ -39,6 +39,14 @@ The optimizer proposes ``n_workers`` positions per batch, the decorated function
 evaluates them in parallel, and the results are fed back. The initialization
 phase (where the optimizer evaluates starting positions) always runs serially.
 
+Async backends like Ray and Dask go one step further. Instead of waiting for
+an entire batch, they process results individually as workers complete and
+immediately submit new work. This keeps all workers busy at all times:
+
+.. code-block:: text
+
+    Async:     submit(4) -> W1 done -> tell(1), submit(1) -> W3 done -> tell(1), submit(1) -> ...
+
 
 Basic Usage
 -----------
@@ -47,9 +55,9 @@ Basic Usage
 
     import numpy as np
     from gradient_free_optimizers import HillClimbingOptimizer
-    from gradient_free_optimizers.distributed import Multiprocessing
+    from gradient_free_optimizers.distributed import Joblib
 
-    @Multiprocessing(n_workers=4).distribute
+    @Joblib(n_workers=4).distribute
     def objective(para):
         return -(para["x"]**2 + para["y"]**2)
 
@@ -61,6 +69,14 @@ Basic Usage
     opt = HillClimbingOptimizer(search_space)
     opt.search(objective, n_iter=100)
 
+The decorator can also be applied without the ``@`` syntax, which is useful when
+the function is defined elsewhere:
+
+.. code-block:: python
+
+    distributed_objective = Ray(n_workers=8).distribute(objective)
+    opt.search(distributed_objective, n_iter=100)
+
 
 Machine Learning Example
 ------------------------
@@ -76,12 +92,12 @@ model independently:
     from sklearn.datasets import load_wine
 
     from gradient_free_optimizers import BayesianOptimizer
-    from gradient_free_optimizers.distributed import Multiprocessing
+    from gradient_free_optimizers.distributed import Joblib
 
     data = load_wine()
     X, y = data.data, data.target
 
-    @Multiprocessing(n_workers=4).distribute
+    @Joblib(n_workers=4).distribute
     def model(para):
         gbc = GradientBoostingClassifier(
             n_estimators=para["n_estimators"],
@@ -104,14 +120,96 @@ Available Backends
 
 .. list-table::
    :header-rows: 1
-   :widths: 25 25 50
+   :widths: 20 15 15 50
 
    * - Backend
+     - Async
      - Dependencies
      - Use case
    * - ``Multiprocessing``
+     - No
      - stdlib only
-     - Local parallelism on a single machine
+     - Local parallelism via fork. No extra dependencies needed.
+   * - ``Joblib``
+     - No
+     - joblib
+     - Local parallelism with automatic fork/spawn handling. Included with scikit-learn.
+   * - ``Ray``
+     - Yes
+     - ray
+     - Local or cluster-wide parallelism. Handles serialization via cloudpickle.
+   * - ``Dask``
+     - Yes
+     - dask[distributed]
+     - Local or cluster-wide parallelism. Integrates with existing Dask infrastructure.
+
+Sync backends (Multiprocessing, Joblib) evaluate a full batch and wait for all
+results before proposing the next batch. Async backends (Ray, Dask) process
+results individually as they arrive, keeping workers busy at all times.
+
+
+Async Mode
+----------
+
+When using an async backend like Ray, most optimizers run in true async mode
+where each completed evaluation immediately triggers a new proposal. Three
+optimizers (Downhill Simplex, Powell's Method, DIRECT) use a batch-async
+fallback where positions are submitted asynchronously but collected per batch.
+This distinction is handled automatically.
+
+.. code-block:: python
+
+    from gradient_free_optimizers import ParticleSwarmOptimizer
+    from gradient_free_optimizers.distributed import Ray
+
+    @Ray(n_workers=8).distribute
+    def expensive_simulation(para):
+        # Each evaluation takes 10-60 seconds
+        return run_simulation(para)
+
+    opt = ParticleSwarmOptimizer(search_space, population=20)
+    opt.search(expensive_simulation, n_iter=200)
+
+True async is most beneficial when evaluation times vary widely. Slow evaluations
+no longer block fast ones from being processed and replaced.
+
+
+Error Handling
+--------------
+
+The ``catch`` parameter works with distributed evaluation. Exceptions are caught
+inside each worker process, and the fallback score is returned in place of the
+failed evaluation:
+
+.. code-block:: python
+
+    @Joblib(n_workers=4).distribute
+    def flaky_model(para):
+        # Might fail for certain hyperparameter combinations
+        return train_and_evaluate(para)
+
+    opt.search(flaky_model, n_iter=100, catch={ValueError: -1000.0})
+
+
+Storage Integration
+-------------------
+
+Distributed evaluation works with :doc:`storage backends <storage>`. Positions
+are checked against the cache before being dispatched to workers, and new
+results are stored after evaluation. This avoids redundant computations and
+enables crash recovery:
+
+.. code-block:: python
+
+    from gradient_free_optimizers.distributed import Joblib
+    from gradient_free_optimizers.storage import SQLiteStorage
+
+    @Joblib(n_workers=4).distribute
+    def model(para):
+        return expensive_training(para)
+
+    storage = SQLiteStorage("results.db")
+    opt.search(model, n_iter=100, memory=storage)
 
 
 Custom Backends
@@ -126,7 +224,6 @@ and implementing ``_distribute``:
 
     class MyClusterBackend(BaseDistribution):
         def _distribute(self, func, params_batch):
-            # Send evaluations to your cluster, collect scores
             scores = my_cluster.map(func, params_batch)
             return scores
 
@@ -138,6 +235,25 @@ The ``_distribute`` method receives the original objective function and a
 list of parameter dictionaries. It must return a list of scores in the
 same order.
 
+For async support, also implement ``_submit`` and ``_wait_any`` and set
+``_is_async = True``:
+
+.. code-block:: python
+
+    class MyAsyncBackend(BaseDistribution):
+        _is_async = True
+
+        def _distribute(self, func, params_batch):
+            futures = [self._submit(func, p) for p in params_batch]
+            return [f.result() for f in futures]
+
+        def _submit(self, func, params):
+            return my_cluster.submit(func, params)
+
+        def _wait_any(self, futures):
+            done = my_cluster.wait_any(futures)
+            return done, done.result()
+
 
 Batch Size and Algorithm Interaction
 ------------------------------------
@@ -157,18 +273,16 @@ are evaluated simultaneously:
     opt = HillClimbingOptimizer(search_space, n_neighbours=5)
     opt.search(objective, n_iter=100)  # objective has n_workers=8
 
+For surrogate model-based optimizers (Bayesian Optimization, TPE, Forest
+Optimizer), batch positions are selected using KMeans clustering on the
+acquisition landscape to ensure diversity. Without this, all batch positions
+would cluster around the single highest acquisition peak.
+
 
 Limitations
 -----------
 
-The current implementation has a few constraints:
-
-**Memory caching** is not supported with distributed evaluation. When a
-distributed decorator is detected, memory is automatically disabled.
-
-**The catch parameter** for error handling is not yet supported in
-distributed mode. Worker exceptions propagate directly.
-
 **The objective function** must be defined at module level (not a lambda
 or closure) for the ``Multiprocessing`` backend on systems that do not
-support the ``fork`` start method.
+support the ``fork`` start method. Ray and Dask handle closures via
+cloudpickle.
diff --git a/docs/source/user_guide/storage.rst b/docs/source/user_guide/storage.rst