ColabMDA is a specialized tool that lets you run high-quality Molecular Dynamics simulations on Google Colab without the fear of losing your work. The biggest problem with Colab is that it disconnects, often destroying hours of simulation data. ColabMDA fixes this with a "resume-safe" system that automatically saves your progress to Google Drive. If your session expires, you can resume exactly where you left off with one simple command. From modeling protein mutations to generating publication-ready analysis, ColabMDA handles the complex setup for you, making it the easiest way to get high-quality MD results using free cloud GPUs.
📖 Full Documentation: Visit our official manual at colabmda.readthedocs.io
| Category | Details |
|---|---|
| Release | |
| Availability | |
| Documentation | |
| Workflows | |
| Issues | |
| License | |
| Style / Lint | |
| Dependencies | OpenMM, Modeller, MDAnalysis, MDTraj |
| Platform | Linux HPC (SLURM) |
| Structure | Source Layout |
ColabMDA is organized into clear, functional modules:
src/colabmda/modeller/: Homology modeling engine (Biological numbering supported).src/colabmda/openmm_pw/: OpenMM simulation engines (Modular EM/NVT/NPT/MD).envs/: Automated installation scripts for scientific environments.scripts/: Quick-start bootstrap scripts for Google Colab.notebooks/: Ready-to-use Colab notebooks for simulation and analysis.
💡 Terminal Access: All bash commands should be run in the Colab Terminal (Open via the ⋮ menu -> Terminal).
Before starting, ensure your environment is ready:
- Enable GPU: Go to
Runtime->Change runtime typeand select T4 GPU. - Verify GPU: Run
!nvidia-smiin a cell to confirm GPU access. - Mount Drive: Click the Folder icon 📂 in the left sidebar, then click the Drive icon (Mount Drive), or run the code block below:
from google.colab import drive
drive.mount('/content/drive')
!nvidia-smi
Run the following in the Colab Terminal (⋮ → Terminal). Estimated time: ~3–5 minutes.
# 1. Install the core scientific environment
cd /content
curl -fsSL https://raw.githubusercontent.com/paulshamrat/ColabMDA/main/scripts/bootstrap_colab_openmm_gpu.sh -o bootstrap_colab_openmm_gpu.sh
WITH_MODELLER=1 bash bootstrap_colab_openmm_gpu.sh latest
# 2. Install ColabMDA package
python3 -m pip install --upgrade "git+https://github.com/paulshamrat/ColabMDA.git@main"Important
Modeller License Prompt: During Step 1, the script will pause and ask you to Enter your Modeller License Key. You must paste your key and press Enter to proceed. The installation will not complete without it.
🔑 Get a Free License: If you don't have one, register at salilab.org/modeller/registration.html (Academic licenses are free and sent instantly via email).
If your Google Colab session expires:
- Re-run Required Steps 1.2 to reinstall the environment.
- Run the exact same
colabmda openmm runcommand you used before. - The tool will automatically detect your
.chkfiles and resume from where it left off.
⚠️ Note: This section is for local workstations, HPC, or advanced manual setups. For standard Google Colab runs, please use the Quick Start (Section 1) above.
🛠️ A. Manual Terminal Installation (Step-by-Step)
In the Colab Terminal (⋮ → Terminal), run each step one at a time:
# Step 1: Download & install Miniforge (Conda)
wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O /tmp/miniforge.sh && \
bash /tmp/miniforge.sh -b -p "$HOME/miniforge3"
# Step 2: Initialize Conda in this shell
export PATH="$HOME/miniforge3/bin:$PATH" && source "$HOME/miniforge3/etc/profile.d/conda.sh"
# Step 3: Install Mamba into the base environment
conda install -y -n base -c conda-forge mamba
# Step 4: Install CUDA-enabled OpenMM and OpenMMTools
mamba install -y -c conda-forge cudatoolkit=11.8 openmm openmmtools
# Step 5: Install PDBFixer (conda, fallback to pip)
conda install -y -c conda-forge pdbfixer || pip install pdbfixer
# Step 6: Install MDAnalysis, MDTraj, NumPy, Matplotlib, and Biopython
mamba install -y -c conda-forge mdanalysis mdtraj numpy matplotlib biopython
# Step 7: Verify installations
python3 - << 'EOF'
from openmm import Platform; print("OpenMM platforms:", [Platform.getPlatform(i).getName() for i in range(Platform.getNumPlatforms())])
import MDAnalysis, mdtraj, Bio; print("MDAnalysis:", MDAnalysis.__version__, "MDTraj:", mdtraj.__version__, "Biopython:", Bio.__version__)
EOF📜 B. Alternative: Script-based Installation
cd /content
curl -fsSL https://raw.githubusercontent.com/paulshamrat/ColabMDA/main/scripts/install_colabmda_release.sh -o install_colabmda_release.sh
bash install_colabmda_release.sh latest /content/colabmda🧬 C. Modeller CPU Environment Setup
cd /content/drive/MyDrive/openmm/ColabMDA
bash envs/install_modeller_env.sh💻 D. Local Workstation Setup (Laptop/Desktop)
Beyond the Cloud ☁️: ColabMDA works on any Linux system with an NVIDIA GPU. Use the provided environment.yml to create a production-ready environment:
mamba env create -f environment.yml
conda activate colabmda🏢 E. HPC Usage (SLURM)
You can easily incorporate ColabMDA into SLURM batch scripts. Since it processes trajectories in chunks, it is highly efficient for long-running jobs on cluster partitions with time limits.
- Model: Build structures (WT & Mutants) in
structures/ - Stage: Initialize the simulation folder in
simulations/ - Run: Execute the MD simulation (Resume-safe)
- Merge: Combine trajectory chunks into a final file
- Analyze: Generate RMSD, Rg, and RMSF plots
Environment: modeller_env
The build workflow now includes Biological Numbering and Automated Quality Control.
--uniprot-numbering: Physically re-numbers the PDB residues to match the UniProt biological index (Best Practice).- Automatic Alignment Summary: Displays a full sequence comparison before building to catch range errors early.
- Post-Build Sanity Check: Verifies every residue in the final PDB against the UniProt reference and reports
✅ SUCCESSor❌ FAILED.
source "$HOME/miniforge3/etc/profile.d/conda.sh"
conda activate modeller_env
cd /path/to/your/project
# Example: Build Wild-Type KRAS (Starting at Residue 1)
colabmda modeller build --pdb-id 4ldj --uniprot-id P01116 --chain A --range 1 169 --uniprot-numbering --outdir structures/4ldj/wt
# Example: Create G12D Mutant (Preserves Numbering)
colabmda modeller mutate --pdb-in structures/4ldj/wt/target.B99990001_with_cryst.pdb --chain A --mut G12D --outdir-mut structures/4ldj/mutants/4ldj_G12DEnvironment: openmm_env
source "$HOME/miniforge3/etc/profile.d/conda.sh"
conda activate openmm_env
cd /path/to/your/project
# 1. Initialize the simulation folder
# For Wild-Type:
colabmda openmm stage --pdb-file structures/4ldj/wt/target.B99990001_with_cryst.pdb --name 4ldj_wt --replica r1
# For Mutant (G12D):
colabmda openmm stage --pdb-file structures/4ldj/mutants/4ldj_G12D/target.B99990001_G12D.pdb --name 4ldj_G12D --replica r1
# 2. Start the Production Run (Example: 10ns)
# For Wild-Type:
colabmda openmm run --name 4ldj_wt --replica r1 --total-ns 10.0 --traj-interval 10 --equil-time 1000 --checkpoint-ps 1000
# For Mutant (G12D):
colabmda openmm run --name 4ldj_G12D --replica r1 --total-ns 10.0 --traj-interval 10 --equil-time 1000 --checkpoint-ps 1000💡 Storage Tip: For a typical system (e.g., KRAS in water, ~30,000 atoms), a 100ns run at high resolution (1ps) can produce over 36GB of data. On a free 15GB Google Drive, we recommend using
--traj-interval 10to reduce this to ~3.6GB. Always calculate your storage needs based on your specific system size before starting long runs.
Note: The
runcommand includes an Automated Stability Gate. It automatically analyzes equilibration logs and aborts if the system hasn't stabilized, saving GPU time.
Combine trajectory chunks into a single DCD, apply periodic boundary condition (PBC) correction, and center the protein using the robust MDAnalysis-based engine (--mda).
# Standard Merge (Center + Wrap, All Atoms)
# For Wild-Type:
colabmda openmm merge --pdb-dir simulations/4ldj_wt/r1 --center --wrap
# For Mutant (G12D):
colabmda openmm merge --pdb-dir simulations/4ldj_G12D/r1 --center --wrapAfter a standard merge, the following files are created in the simulation replica folder:
prod_full.dcd: The final, concatenated, and centered/wrapped trajectory file.prod_full.log: The consolidated log file containing energy and temperature statistics.
💡 Pro-Tip for Long Runs: Merging processes trajectories frame-by-frame, so it won't crash your RAM. You can merge without striding (
--stride 1) for full resolution, or use--stride 10to create a lightweight file for local viewing.When you merge with
--stride 10, the logs are automatically strided to match, and the analysis command (colabmda openmm analysis) will read the logs to correctly infer the time scale.
💡 Protein-Only Trajectory (Optional):
If you wish to save significant disk space (saving >85% storage), you can add the--protein-onlyflag to the merge command. This will extract only the protein atoms, discarding water and ions:
colabmda openmm merge --pdb-dir simulations/4ldj_wt/r1 --center --wrap --protein-onlyOutputs with
--protein-only:
prod_full.dcd: Thinned trajectory containing protein atoms only.prod_full.pdb: Matching protein-only topology PDB file (crucial for subsequent analysis/visualization to avoid atom mismatch).prod_full.log: Consolidated log file.
To quickly check the status, simulation time, and number of frames in your merged files, run:
colabmda openmm status --pdb-dir simulations/4ldj_wt/r1This will print a comprehensive status report including topology stats, chunks, and exact frame counts:
[STATUS]
Workdir : /path/to/simulations/4ldj_wt/r1
Chunks (DCD/log) : 100 / 100
Topology File : solvated.pdb
└─ 27273 atoms, 8388 residues
└─ (169 protein, 8170 water, 49 ions)
Trajectory File : prod_full.dcd (10000 frames)
Log File : prod_full.log (10000 frames)
Frames (from logs): 10000
Alternatively, you can query the frame count using a python one-liner with MDAnalysis:
python3 -c "import MDAnalysis as mda; u = mda.Universe('prod_full.pdb', 'prod_full.dcd'); print('Frames:', len(u.trajectory))"To calculate how much simulation time (in nanoseconds) your trajectory represents or how many frames you should expect, use this quick guide:
-
Total Simulation Time: Controlled by
--total-ns(e.g.,100.0ns =100,000ps). -
Frame Saving Frequency: Controlled by
--traj-intervalin picoseconds (default is10.0ps =0.01ns). -
Calculating Expected Frames:
$$\text{Expected Frames} = \frac{\text{Total Time (ps)}}{\text{Trajectory Saving Interval (ps)}}$$ -
Example: If you run a
100.0ns simulation with a10.0ps saving interval:$$\text{Expected Frames} = \frac{100,000\text{ ps}}{10\text{ ps}} = 10,000\text{ frames}$$
-
Example: If you run a
-
Effect of Striding on Merged Trajectories: If you merge chunks using a stride (e.g.,
--stride 10for lightweight local viewing):$$\text{Merged Frames} = \frac{\text{Total Frames}}{\text{Stride}} = \frac{10,000}{10} = 1,000\text{ frames}$$ Thestatuscommand will display this discrepancy clearly:Frames (from logs): 10000 # Original simulation frames Merged DCD : YES (1000 frames) # Thinned frames after striding
When merging trajectories, the reference topology file used and the resulting output files depend on your merge options:
| Merge Mode | Command Flags | Reference Topology | Generated Topology File | Subsequent Command Usage |
|---|---|---|---|---|
| Standard Merge | colabmda openmm merge --center --wrap |
solvated.pdb |
None (only prod_full.dcd is written) |
Use solvated.pdb for analysis/visualization. |
| MDAnalysis Merge | colabmda openmm merge --mda --center --wrap |
solvated.pdb |
prod_full.pdb (all atoms) |
Use prod_full.pdb (or solvated.pdb). |
| Protein-Only Merge | colabmda openmm merge --protein-only |
solvated.pdb |
prod_full.pdb (protein atoms only) |
Must use prod_full.pdb (since prod_full.dcd contains only protein coordinates). |
If you merge using the --protein-only flag, your prod_full.dcd will only contain protein coordinates (~2,600 atoms). Attempting to load this trajectory along with the original solvated.pdb (~27,000 atoms) in PyMOL or MDAnalysis will result in a fatal atom mismatch error. Always match the trajectory file with its corresponding topology file as shown in the table above.
# For Wild-Type:
colabmda openmm analysis --pdb-id 4ldj_wt
# For Mutant (G12D):
colabmda openmm analysis --pdb-id 4ldj_G12D
⚠️ Analysis Tip: If your plots show the wrong time scale (e.g., 10ns instead of 100ns), provide the frame interval manually. For example, if you ran with--traj-interval 10:colabmda openmm analysis --pdb-id 4ldj_wt --interval 10
colabmda openmm compare \
--series "WT=analysis/single/4ldj_wt/r1,analysis/single/4ldj_wt/r2" \
--series "G12D=analysis/single/4ldj_G12D/r1,analysis/single/4ldj_G12D/r2" \
--outdir analysis/compare/wt_vs_g12d_avgTo visualize molecular dynamics trajectories in PyMOL with full secondary structure (ribbon/cartoon) and detailed sidechain representations, you can use the built-in colabmda openmm view command.
Since molecular dynamics simulations are performed on Google Colab, trajectory visualization is run locally on your workstation/laptop.
Note
For Windows Users: It is highly recommended to run the visualization locally either using WSL (Windows Subsystem for Linux) or using native PyMOL on Windows (by running the CLI to generate visualize.pml, then opening it in the Windows PyMOL GUI).
It is highly recommended to create a dedicated conda environment (e.g. colabmda_env) containing both PyMOL and the ColabMDA package:
# 1. Create a clean environment and install PyMOL
conda create -y -n colabmda_env -c conda-forge python=3.11 pymol-open-source
# 2. Activate the environment
conda activate colabmda_env
# 3. Install ColabMDA package (Lightweight local installation)
python3 -m pip install --upgrade "git+https://github.com/paulshamrat/ColabMDA.git@main"
# 4. Verify PyMOL installation and version
pymol --version(Note: This local installation is extremely lightweight and does not require heavy simulation engines like OpenMM or MDAnalysis just to view files).
Simply run the view command from your simulation folder (it will automatically look for prod_full.pdb and prod_full.dcd, generate a PyMOL script, and launch PyMOL):
# 1. Run view command from within your replica directory:
cd simulations/4ldj_wt/r1
colabmda openmm view
# 2. Or run it by specifying the directory path:
# For Wild-Type:
colabmda openmm view --pdb-dir simulations/4ldj_wt/r1
# For Mutant (e.g., G12D):
colabmda openmm view --pdb-dir simulations/4ldj_G12D/r1💡 Custom Residue/Topology/Trajectory:
By default, the tool highlights residue index 12 (mutant site). You can customize the highlighted residue and load custom trajectory files using flags:
colabmda openmm view --pdb-dir simulations/4ldj_wt/r1 --resi 12 -t prod_full.pdb -x prod_full.dcd
The command automatically generates a visualize.pml script in the folder.
Click to view the generated PyMOL script configuration (visualize.pml)
# 1. Clear out all the old overlapping objects from memory
reinitialize
bg_color white
# 2. Load the merged trajectory files (prod_full.pdb and prod_full.dcd)
load prod_full.pdb, kras
load_traj prod_full.dcd, kras
# 2b. Align trajectory to the crystal reference to ensure identical viewing orientation (e.g., 4ldj_wt.pdb)
load 4ldj_wt.pdb, crystal_ref
align kras and name CA, crystal_ref and name CA, mobile_state=1, target_state=1
delete crystal_ref
# 3. Hide solvent water and ions
hide everything, all
hide nonbonded, all
hide nb_spheres, all
# 4. Freeze the backbone tumbling rotation frame-by-frame
intra_fit kras and name CA
# 5. Generate your crisp secondary structure ribbon
dss kras
cartoon automatic, kras
show cartoon, kras
# 6. Apply your smooth cyan color scheme
color cyan, kras and name C*
util.cnc("kras")
# 7. Highlight your mutated Cysteine 12 side chain sticks flawlessly
show sticks, kras and resi 12 and not name N+C+O+H
color yellow, kras and resi 12 and name SG
set stick_radius, 0.25
# 8. Focus camera right onto the protein
zoom kras and polymer, buffer=4
To render publication-quality structural snapshot grids comparing transitions over simulation trajectories (such as WT vs Mutants across specific frames), you can use the colabmda openmm snapshots command.
⚠️ Prerequisite: This command requires the PythonpymolandPillow(PIL) libraries to be installed in the active environment (e.g.,pymol-viz).
# Generate the default 3x8 transition snapshot grid (WT vs G12C vs G12D)
colabmda openmm snapshotsBy default, the command uses a built-in template designed for the KRAS WT/G12C/G12D trajectory transition grid. You can customize the behavior by supplying a custom JSON configuration file:
colabmda openmm snapshots --config my_config.json --output figures/comparison_grid.png{
"align_ref_pdb": "structures/4ldj/wt/target.B99990001_with_cryst.pdb",
"stable_core_sel": "resi 1-10 or resi 40-55 or resi 80-169",
"camera_view": [
0.0, 1.0, 0.0,
0.0, 0.0, 1.0,
1.0, 0.0, 0.0,
0.0, 0.0, -50.0
],
"systems": {
"WT": {
"pdb": "simulations/4ldj_wt/r1/prod_full.pdb",
"dcd": "simulations/4ldj_wt/r1/prod_full.dcd",
"states": [1, 200],
"times": ["0.00 ns", "2.00 ns"],
"mut_residue": 12
}
}
}Organize work in three phases:
- Preparation: Build WT first in
structures/<pdbid>/wt/, then generate mutants. - Simulation: Run WT and mutants in separate folders under
simulations/. - Analysis: Store per-system analysis in
analysis/single/, then generate overlays inanalysis/compare/.
/content/drive/MyDrive/openmm/
structures/
4ldj/
wt/ # Wild-type modeled PDBs
mutants/ # G12D/G12C modeled PDBs
simulations/
4ldj_wt/
r1/ # Replica 1 (em.chk, npt.chk, prod.dcd)
r2/ # Replica 2
4ldj_G12D/
r1/
r2/
analysis/
single/
4ldj_wt/ # [r1, r2, aggregate] reports
4ldj_G12D/
compare/ # Final WT vs Mutant overlays
- OpenMM & PDBFixer
- Modeller
- MDAnalysis & MDTraj
- NumPy, Matplotlib, Biopython
- Google Colab & Miniforge/Conda
- Modular Pipeline: New modular CLI for EM, NVT, NPT, and Production MD.
- Resume-Safe Engine: Integrated checkpointing logic for fail-safe simulations on Google Colab.
- Modeling: Automated Wild-Type building and mutation support via Modeller.
- Analysis: Robust trajectory merging and comparative RMSD/Rg/RMSF analysis tools.
- Professional Standards: Added CI/CD workflows, Black formatting, and Ruff linting.
This repository was inspired by the methodologies established in the research published below. Originally developed as a simple GROMACS-on-Colab workflow, ColabMDA has since evolved into a specialized OpenMM-centered pipeline. If you use this tool, please consider citing the underlying study:
Paul SK, Saddam M, Rahaman KA, Choi JG, Lee SS, Hasan M. Molecular modeling, molecular dynamics simulation, and essential dynamics analysis of grancalcin: An upregulated biomarker in experimental autoimmune encephalomyelitis mice. Heliyon. 2022 Oct 23;8(10):e11232. doi: 10.1016/j.heliyon.2022.e11232. PMID: 36340004; PMCID: PMC9626934.