DistributedFunSearch uses Docker Compose to run two containers: disfun-main (pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime) for the evolutionary search with GPU support, and rabbitmq (rabbitmq:3.13.4-management) for message passing. Both containers communicate via a Docker bridge network.
CUDA Compatibility: The devcontainer uses PyTorch 2.2.2 with CUDA 12.1. Check your server's CUDA version with nvidia-smi, update the base image in .devcontainer/Dockerfile to match (e.g., cuda11.8 or cuda12.4). Find compatible PyTorch Docker images here.
Start the containers from the .devcontainer directory:
cd .devcontainer
docker compose up --build -d
docker exec -it disfun-main bashInside the container, initialize conda and create the environment:
conda init bash && source ~/.bashrc
conda create -n env python=3.11 pip numpy==1.26.4 -y
conda activate envInstall PyTorch matching your CUDA version. For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121For other CUDA versions find the matching installation command here. You can skip this step if using API models.
Install DistributedFunSearch:
cd /workspace/DistributedFunSearch
pip install . # or pip install -e . for development modeBuild FastGraph C++ module (optional, for graph problems):
This compiles for your active Python version (e.g., cpython-311 for Python 3.11). If nodes have different Python versions, build on each node separately.
First install build dependencies:
apt-get update && apt-get install -y build-essential liblmdb-devThen build:
./tools/build_fast_graph.shInstall C compiler (required for local models):
If using local models with vLLM, install gcc/g++ (required for Triton to compile CUDA kernels):
conda install -c conda-forge gcc_linux-64 gxx_linux-64 -yBefore running the experiment, update your config to use the Docker RabbitMQ service name:
# Edit src/experiments/experiment1/config.py
# Change: host='localhost'
# To: host='rabbitmq'Run an experiment:
cd src/experiments/experiment1
python -m disfunThe web-based monitoring dashboard is enabled by default and available at http://localhost:15672 with login credentials guest/guest.
If running on a remote server, the interface is not directly accessible from your local machine. Forward port 15672 using an SSH tunnel from your local machine:
# Standard SSH tunnel
ssh -L 15672:localhost:15672 user@remote-server -N -f
# With jump server
ssh -J jump-user@jump-server -L 15672:localhost:15672 user@remote-server -N -fThen access at http://localhost:15672 on your local machine and login with guest/guest.
To run parallel experiments without interference, use RabbitMQ virtual hosts. Set a different vhost in each experiment's config.py (e.g., vhost='exp1', vhost='exp2'), then create the vhost and set permissions:
docker exec rabbitmq rabbitmqctl add_vhost exp1
docker exec rabbitmq rabbitmqctl set_permissions -p exp1 guest ".*" ".*" ".*"Repeat for each experiment with different vhost names. Each experiment will have completely isolated queues.
To scale across multiple machines, run RabbitMQ and the ProgramsDatabase on a main node, then attach additional samplers and evaluators from worker nodes. The main node uses .devcontainer/ (runs both RabbitMQ and disfun-main), while worker nodes use .devcontainer/external/.devcontainer/ (runs only disfun-main).
Main node setup:
Start both containers and run the full experiment (includes ProgramsDatabase, samplers, and evaluators):
cd /workspace/DistributedFunSearch/.devcontainer
docker compose up --build -d
docker exec -it disfun-main bash
# Follow installation steps, then:
cd src/experiments/experiment1
python -m disfunWorker node setup:
Requirement: Worker nodes must be able to reach the main node on port 5672 (RabbitMQ).
Get the main node's IP (run on main node):
hostname -I | awk '{print $1}'Test connectivity (run on worker node):
python -c "import socket; s=socket.socket(); s.connect(('<main-node-ip>', 5672)); print('OK'); s.close()"Start the external devcontainer which uses network_mode: "host" to share the host's network:
cd /workspace/DistributedFunSearch/.devcontainer/external/.devcontainer
docker compose up --build -d
docker exec -it disfun-main bashInside the worker container, follow the installation steps above (conda env, PyTorch, DistributedFunSearch). Update the RabbitMQ host to point to the main node:
# Edit src/experiments/experiment1/config.py
# Change: host='localhost'
# To: host='192.168.1.10' # Main node's IP or hostnameThen attach only samplers and evaluators (don't run the full experiment, which would create a duplicate ProgramsDatabase):
cd src/experiments/experiment1
# Attach evaluators only
python -m disfun --attach evaluators
# Or attach samplers only
python -m disfun --attach samplers