This project uses Hydra Launchers to enable parallel execution of parameter sweeps locally or on a SLURM cluster via the hydra-submitit-launcher plugin.
When you run an experiment using the --multirun (or -m) flag, Hydra generates a list of jobs based on the parameter combinations you specify.
# Example: This generates 4 jobs
uv run lambda-hat-m model=small,base data=small,baseBy default, Hydra runs these jobs sequentially. To run them in parallel, you need to select a different launcher.
Launcher configurations are defined in conf/hydra/launcher/.
To dispatch jobs to a SLURM cluster, select the submitit_slurm launcher.
Important: You must specify the SLURM partition required by your cluster using the hydra.launcher.partition override.
uv run lambda-hat --multirun \
hydra/launcher=submitit_slurm \
hydra.launcher.partition=YOUR_PARTITION_NAME \
data.n_data=1000,5000 \
model=small,baseThis command submits 4 independent jobs to the specified SLURM partition.
The default resources requested per job are defined in the submitit_slurm.yaml configuration file. You can override these settings via the command line:
uv run lambda-hat --multirun hydra/launcher=submitit_slurm \
hydra.launcher.partition=YOUR_PARTITION_NAME \
hydra.launcher.timeout_min=480 \
hydra.launcher.cpus_per_task=8 \
hydra.launcher.gpus_per_node=1 \
+hydra.launcher.additional_parameters.gpus=1If your cluster requires an account, use hydra.launcher.account=YOUR_ACCOUNT.
Example with GPU on a cluster that needs --gpus=1:
uv run lambda-hat --multirun sampler=hmc \
hydra/launcher=submitit_slurm \
hydra.launcher.gpus_per_node=1 \
+hydra.launcher.additional_parameters.gpus=1 \
hydra.launcher.account=OD-228158Notes:
hydra/launcher=submitit_slurmselects the Slurm launcher.hydra.launcher.account=...maps tosbatch --account=....- If your cluster uses
--gpus=Ninstead of--gres=gpu:N, pass it exactly as shown viaadditional_parameters.gpus.
Create a tiny config file and use it by name:
lambda_hat/conf/hydra/launcher/slurm_account.yaml
# Merges into the submitit_slurm launcher
# Usage: hydra/launcher=submitit_slurm,hydra/launcher=slurm_account
slurm:
additional_parameters:
account: OD-228158Then:
uv run lambda-hat-build-target hydra/launcher=submitit_slurm,hydra/launcher=slurm_accountTo run jobs in parallel locally (e.g., utilizing multiple cores on a workstation), use the submitit_local launcher.
uv run lambda-hat --multirun hydra/launcher=submitit_local model=small,baseThis runs the jobs concurrently using local processes.
The codebase uses JAX. When using Submitit (SLURM or Local), it is crucial to ensure that each job has access to the resources it requested.
- SLURM: When
gpus_per_node=1is set, SLURM typically handles the allocation and ensures the job only sees that GPU (e.g., viaCUDA_VISIBLE_DEVICES). - Local: For CPU-only local execution, ensure
hydra.launcher.gpus_per_node=0is set in the config or via override.
Submitit creates log files (stdout/stderr) for each job. These are stored in a directory specified by submitit_folder within the Hydra sweep output directory (e.g., multirun/YYYY-MM-DD/HH-MM-SS/.submitit_slurm/).