🏠 ProteinDJ > Installation Guide
This guide will walk you through installing ProteinDJ and all its dependencies and is intended for system admins. The installation process involves several steps that need to be completed once per cluster or system.
Before starting, ensure you have:
- Linux/Unix system with internet access
- Sufficient storage space: ~11 GB for downloading models and ~50 GB for containers
- Administrative access or ability to install software
- SLURM cluster (if using HPC environment)
| Component | Minimum | Recommended |
|---|---|---|
| Storage | 60 GB | 100 GB |
| RAM | 32 GB | 48 GB+ |
| GPU | NVIDIA GPU with 16GB+ VRAM | NVIDIA A30/A100 |
| CPU | 8 cores | 24+ cores |
First, clone the ProteinDJ repository:
git clone https://github.com/PapenfussLab/proteindj
cd proteindjProteinDJ requires two key dependencies to be installed and accessible in your PATH. These are common software packages so they may already be implemented in your HPC environment (e.g. module load apptainer nextflow):
Install Apptainer for containerization.
Ubuntu/Debian:
# Add repository and install
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt update
sudo apt install -y apptainerCentOS/RHEL:
# Install from EPEL
sudo yum install -y epel-release
sudo yum install -y apptainerInstall Nextflow (≥ v24.04):
# Install Nextflow
curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/Verify installations:
apptainer --version
nextflow -version💡 Tip: This step downloads ~11 GB of model files. Consider doing this in a shared location so that other users can access the files.
RFdiffusion requires several diffusion model checkpoints (~3.7 GB). If you have not already downloaded the models, use the commands below, and update the rfd_models variable in nextflow.config to the location of the model directory (e.g. './models/rfd'):
mkdir -p models/rfd && cd models/rfd
# Download all required checkpoints
wget http://files.ipd.uw.edu/pub/RFdiffusion/6f5902ac237024bdd0c176cb93063dc4/Base_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/e29311f6f1bf1af907f9ef9f44b8328b/Complex_base_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/60f09a193fb5e5ccdc4980417708dbab/Complex_Fold_base_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/74f51cfb8b440f50d70878e05361d8f0/InpaintSeq_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/76d00716416567174cdb7ca96e208296/InpaintSeq_Fold_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/5532d2e1f3a4738decd58b19d633b3c3/ActiveSite_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/12fc204edeae5b57713c5ad7dcb97d39/Base_epoch8_ckpt.pt
wget http://files.ipd.uw.edu/pub/RFdiffusion/f572d396fae9206628714fb2ce00f72e/Complex_beta_ckpt.pt
cd ../..To perform AlphaFold2 predictions, you will need to download the AF2 models from DeepMind's repository. If you have not already downloaded the models, use the commands below, and update the af2_models variable in nextflow.config to the location of the model directory (e.g. './models/af2'):
mkdir -p models/af2 && cd models/af2
# Download and extract AF2 parameters (only need the first model for AlphaFold2 Initial-Guess)
wget https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
tar -xf alphafold_params_2022-12-06.tar
rm -f alphafold_params_2022-12-06.tar
cd ../..To perform Boltz-2 predictions, you will need to download the models (download ~3.8 GB, final size ~2.2 GB). If you have not already downloaded the models, use the commands below, and update the boltz_models variable in nextflow.config to the location of the model directory (e.g. './boltz_models'):
mkdir -p models/boltz && cd models/boltz
# Download Boltz-2 checkpoints and molecular data
wget https://huggingface.co/boltz-community/boltz-2/resolve/main/boltz2_conf.ckpt
wget https://huggingface.co/boltz-community/boltz-2/resolve/main/mols.tar
# We only need the amino acid CCD files for protein design
tar -tf mols.tar | grep -E 'mols/(ALA|ARG|ASN|ASP|CYS|GLU|GLN|GLY|HIS|ILE|LEU|LYS|MET|PHE|PRO|SER|THR|TRP|TYR|VAL|UNK)\.pkl$' | xargs -d '\n' -I {} tar -xvf mols.tar "{}"
rm -f mols.tar
# Create dummy affinity checkpoint file (required by Boltz2 but not used for protein design)
touch boltz2_aff.ckpt
cd ../..Verify downloads:
# Check that all required files exist
ls -la models/rfd/ # Should contain 8 .pt files
ls -la models/af2/ # Should contain params directory
ls -la models/boltz/ # Should contain .ckpt files and mols directoryProteinDJ requires several containers for the different dependencies. By default, these will be fetched during execution of Nextflow and cached by Apptainer to the location specified by the environment variable NXF_APPTAINER_CACHEDIR. If you are on a shared environment, we recommend creating a profile in nextflow.config with apptainer.cacheDir set to a shared location with read/write permissions for all users (see the Milton/WEHI profile as an example).
We recommend using our containers from the cloud, but if you would like to build/modify containers locally, we have provide def files for containers in proteindj/apptainer. When updating ProteinDJ make sure to also update your local containers to maintain compatibility with the pipeline. You may already have similar containers for some of these programs, but we have made changes to the source code and environment so we do not recommend using other containers with ProteinDJ. Once the containers have been built, you need to change the path to each built container in nextflow.config e.g. for BindCraft:
withLabel: 'BC' {
container = "/path/to/containers/bindcraft.sif"
containerOptions = """--nv \
--bind ${params.af2_models}:/af2params \
"""
}
We have provided a script for building the containers in a series of sbatch jobs (apptainer/build_containers.sh). You may need to tweak the SLURM parameters and enviroment settings for your cluster.
⚠️ Important: Container building requires significant resources and may take several hours.
-
Navigate to the apptainer directory:
cd apptainer -
Edit build script:
# Edit the BUILD_DIRECTORY in build_containers.sh nano build_containers.sh # Update this line to your desired location: BUILD_DIRECTORY="/path/to/your/containers"
-
Submit build jobs:
./build_containers.sh
-
Monitor progress:
# Check SLURM queue squeue -u $USER
If you don't have SLURM or prefer manual building:
cd apptainer
# Build each container individually
apptainer build --fakeroot af2.sif af2.def
apptainer build --fakeroot bindsweeper.sif bindsweeper.def
apptainer build --fakeroot boltz2.sif boltz2.def
apptainer build --fakeroot dl_binder_design.sif dl_binder_design.def
apptainer build --fakeroot fampnn.sif fampnn.def
apptainer build --fakeroot pyrosetta_tools.sif pyrosetta_tools.def
apptainer build --fakeroot rfdiffusion.sif rfdiffusion.def
cd ..Verify container builds:
ls -la containers/ # Should contain 7 .sif filesEdit the nextflow.config file to match your system configuration:
nano nextflow.configKey parameters to update:
| Parameter | Description | Examples |
|---|---|---|
rfd_models |
Path to RFdiffusion models | "${projectDir}/models/rfd" |
af2_models |
Path to AlphaFold2 models | "${projectDir}/models/af2" |
boltz_models |
Path to Boltz-2 models | "${projectDir}/models/boltz" |
gpu_model |
Your GPU type | 'A30', 'V100', 'A100' |
gpus |
Number of GPUs to request | 1, 2, 4, 8 |
cpus_per_gpu |
Number of CPUs to request per GPU | 8, 12 |
memory_gpu |
Memory to request for GPU jobs | '24GB', '48GB' |
cpus |
Number of CPUs to request for CPU-only jobs | 12, 24 |
memory_cpu |
Memory for request for CPU-only jobs | '24GB', '48GB' |
Before running production workloads, verify your installation works correctly.
# From the proteindj root directory
nextflow run main.nf -profile test,monomer_denovoExpected output:
- Uses RFdiffusion, Full-atom MPNN, and Boltz-2
- Generates small number of de novo monomers
- Creates output directory with results
nextflow run main.nf -profile test,binder_denovoExpected output:
- Uses RFdiffusion, ProteinMPNN, and AlphaFold2 Initial-Guess
- Generates small number of binders
- Creates output directory with results
For thorough validation, we have made an end-to-end testing script that performs multiple runs with different modes: scripts/end2end_test.sh. See our testing documentation for more details.:
# Run full end-to-end tests
./scripts/end2end_test.sh apptainer /home/$USER/test_outputsContainer build failures:
# Check available space
df -h
# Check Apptainer version
apptainer --version
# Try building with more verbose output
apptainer build --fakeroot --verbose <container>.sif <container>.defConfiguration issues:
# Test Nextflow configuration
nextflow config
# Verify file paths exist
ls -la /path/to/models
ls -la /path/to/containersTest failures:
# Check Nextflow work directory
ls -la work/
# Review error logs
cat .nextflow.log
# Clean and retry
nextflow clean -fFor better performance:
- Store models on fast SSD storage
- Use shared filesystem for multi-user setups
- Adjust CPU/memory requests based on available resources
- Consider using local scratch space for intermediate files
After successful installation:
- Read the Getting Started Guide for your first protein design
- Review ProteinDJ Modes to understand available options
- Configure Parameters for your specific needs
If you encounter issues:
- Check the Troubleshooting Guide
- Search existing GitHub Issues
- Create a new issue with detailed error information
🎉 Congratulations! You're now ready to DJ some proteins with ProteinDJ!