CCPBioSim
diff --git a/‎notebooks/protein_ligand_complex_parameterisation_and_md.ipynb‎
Lines changed: 39 additions & 32 deletions b/‎notebooks/protein_ligand_complex_parameterisation_and_md.ipynb‎
Lines changed: 39 additions & 32 deletions
diff --git a/‎notebooks/small_molecule_parameterisation.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎notebooks/small_molecule_parameterisation.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎notebooks/trajectory_gpu.dcd‎
79.7 MB b/‎notebooks/trajectory_gpu.dcd‎
79.7 MB
diff --git a/‎structures/6o6f_ligand.sdf‎
Lines changed: 1 addition & 1 deletion b/‎structures/6o6f_ligand.sdf‎
Lines changed: 1 addition & 1 deletion
@@ -6,10 +6,10 @@
     "tags": []
    },
    "source": [
-    "# Parameterisation, Molecular Dynamics, and Trajectory Analysis of a Protein-Ligand Complex with OpenFF, OpenMM, and MDTraj\n",
+    "# Parameterisation, Molecular Dynamics, and Trajectory Analysis of a Protein-Ligand Complex with OpenFF, OpenMM, MDAnalysis, and ProLIF\n",
     "\n",
     "This is the second of two jupyter notebooks on handling force fields using [Open Force Field's](https://openforcefield.org/) software, and subsequent molecular dynamics and analysis. The first notebook (`small_molecule_parameterisation.ipynb`) introduced fundamental concepts in OpenFF and demonstrated parameterisation of a small molecule.\n",
-    "This notebook demonstrates how to prepare a system that combines solvent, a ligand using Sage, and a protein using a standard Amber force field. We'll take the structures of the MCL-1 and the bound ligand from the crystal structure, but we could just as easily use a ligand pose from docking. We'll solvate the complex, assemble the system, parameterise it, and finally simulate it with OpenMM and visualize the results, all without leaving the notebook. Have fun!\n",
+    "This notebook demonstrates how to prepare a system that combines solvent, a ligand using Sage, and a protein using a standard AMBER force field. We'll take the structures of the MCL-1 and the bound ligand from the crystal structure, but we could just as easily use a ligand pose from docking. We'll solvate the complex, assemble the system, parameterise it, and finally simulate it with OpenMM and visualize the results, all without leaving the notebook. Have fun!\n",
     "\n",
     "### Prerequisites\n",
     "\n",
@@ -269,8 +269,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<div class=\"alert alert-info\" style=\"max-width: 700px; margin-left: auto; margin-right: auto;\">\n",
-    "ℹ️ Have a play with this visualization! Try clearing the default representations with <code>view.clear()</code> and configuring your own cartoon <em>(Hint: <a href=https://nglviewer.org/nglview/latest/api.html#nglview.NGLWidget>Check</a> the <a href=https://nglviewer.org/ngl/api/manual/molecular-representations.html>docs</a>)</em>. See if you can display the ligand in a way you like. When you're happy with what you've made, save the image with <code>view.download_image()</code>\n",
+    "<div class=\"alert alert-success\" style=\"max-width: 500px; margin-left: auto; margin-right: auto; border-left: 6px solid #5cb85c; background-color: #f1fff1;\">\n",
+    "✏️ <b>Exercise:</b>️ Have a play with this visualization! Try clearing the default representations with <code>view.clear()</code> and configuring your own cartoon <em>(Hint: <a href=https://nglviewer.org/nglview/latest/api.html#nglview.NGLWidget>Check the docs</a>)</em>. You'll need <code>view.add_representation</code>. See if you can display the ligand in a way you like. When you're happy with what you've made, save the image with <code>view.download_image()</code>\n",
     "</div>"
    ]
   },
@@ -281,7 +281,7 @@
     "<a id=\"parameterise\"></a>\n",
     "## 3. We Can Assemble a Combined `ForceField` and use this to Parameterise the Whole System\n",
     "\n",
-    "Now that we've prepared our co-ordinates, we should choose the force field. For now, we don't have any single SMIRNOFF force field that can handle both proteins and small molecules; the Rosemary 3.0.0 force field will support this, but it's not yet ready. As an alternative, we'll combine the [Sage] small molecule force field with the SMIRNOFF port of Amber ff14SB. These force fields parameterise non-bonded parameters in similar ways and with the same functional form, so we don't expect any outrageous artifacts, but they also haven't been carefully tested together (TODO: Still true?). Note that Sage also includes the TIP3P water model, which is appropriate for Amber ff14SB too.\n",
+    "Now that we've prepared our co-ordinates, we should choose the force field. For now, we don't have any single SMIRNOFF force field that can handle both proteins and small molecules; the Rosemary 3.0.0 force field will support this, but it's not yet ready. As an alternative, we'll combine the AMBER-compatible [Sage] small molecule force field with the SMIRNOFF port of AMBER ff14SB. Note that Sage also includes the TIP3P water model, which is appropriate for AMBER ff14SB too.\n",
     "\n",
     "When we combine multiple SMIRNOFF force fields into one, we provide them in an order from general to specific. Sage includes parameters that could be applied to a protein, but they're general across all molecules; ff14SB's parameters are specific to proteins. Since the Toolkit always applies the last parameters that match a moiety, this order makes sure the right parameters get assigned.\n",
     "\n",
@@ -342,7 +342,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "*(This should take about a minute, largely because of the complexity of the Amber protein force field port. In the future, this should be faster.)*"
+    "*(This should take about a minute, largely because of the complexity of the AMBER protein force field port. In the future, this should be faster.)*"
    ]
   },
   {
@@ -397,7 +397,7 @@
     "\n",
     "### 4.1 Configure and run the simulation\n",
     "\n",
-    "Here, we'll use a Langevin thermostat at 300 Kelvin and a 2 fs time step. We'll write the structure to disk every 10 steps."
+    "Here, we'll use a Langevin thermostat at 300 Kelvin and a 2 fs time step. We'll write the structure to disk every 10 steps. In contrast to the previous notebook, we'll add a MonteCarloBarostat to fix the pressure, while allowing the volume to fluctuate. Our simulation corresponds to the $NPT$ ensemble."
    ]
   },
   {
@@ -415,19 +415,24 @@
    "source": [
     "import openmm\n",
     "\n",
-    "# Construct and configure a Langevin integrator at 300 K with an appropriate friction constant and time-step\n",
-    "integrator = openmm.LangevinIntegrator(\n",
-    "    300 * openmm.unit.kelvin,\n",
-    "    1 / openmm.unit.picosecond,\n",
-    "    0.002 * openmm.unit.picoseconds,\n",
+    "TEMPERATURE = 300 * openmm.unit.kelvin\n",
+    "PRESSURE = 1 * openmm.unit.atmosphere\n",
+    "FRICTION_COEFFICIENT = 1 / openmm.unit.picosecond\n",
+    "TIMESTEP = 0.002 * openmm.unit.picoseconds\n",
+    "\n",
+    "# Construct and configure a LangevinMiddleIntegrator at 300 K with an appropriate friction constant and time-step\n",
+    "integrator = openmm.LangevinMiddleIntegrator(\n",
+    "    TEMPERATURE,\n",
+    "    FRICTION_COEFFICIENT,\n",
+    "    TIMESTEP,\n",
     ")\n",
+    "barostat = openmm.MonteCarloBarostat(PRESSURE, TEMPERATURE)\n",
     "\n",
     "# Under the hood, this creates *OpenMM* `System` and `Topology` objects, then combines them together\n",
-    "simulation = interchange.to_openmm_simulation(integrator=integrator)\n",
+    "simulation = interchange.to_openmm_simulation(integrator=integrator, additional_forces=[barostat])\n",
     "\n",
     "# Add a reporter to record the structure every 50 steps\n",
     "dcd_reporter = openmm.app.DCDReporter(\"trajectory.dcd\", 50)\n",
-    "pdb_reporter = openmm.app.PDBReporter(\"trajectory.pdb\", 50)\n",
     "simulation.reporters.append(dcd_reporter)"
    ]
   },
@@ -505,7 +510,7 @@
    },
    "outputs": [],
    "source": [
-    "simulation.context.setVelocitiesToTemperature(300 * openmm.unit.kelvin)\n",
+    "simulation.context.setVelocitiesToTemperature(TEMPERATURE)\n",
     "simulation.runForClockTime(1.0 * openmm.unit.minute)"
    ]
   },
@@ -520,7 +525,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "While that runs, let's talk a bit about OpenFF (TODO: Link to discussion of OpenFF from previous notebook and remove this?)\n",
+    "While that runs, let's talk a bit about OpenFF.\n",
     "\n",
     "### Open Source Force Fields\n",
     "\n",
@@ -553,11 +558,7 @@
    "source": [
     "### 4.4 Visualize the simulation with nglview\n",
     "\n",
-    "NGLView can display single structures and entire trajectories. Mouse over the widget to see the animation controls.\n",
-    "\n",
-    "<div class=\"alert alert-warning\" style=\"max-width: 700px; margin-left: auto; margin-right: auto;\">\n",
-    "    ❓ Can you visualize this trajectory in VMD (or another visualization tool of your choice)? Hint: DCD files only include the trajectory data (positions of each atoms over time) and lack topological information, so you might need another file.\n",
-    "</div>"
+    "NGLView can display single structures and entire trajectories. Mouse over the widget to see the animation controls. Note that the trajectory generated above is `trajectory.dcd`. However, as this was run on CPUs (slow), we've provided a longer trajectory (`trajectory_gpu.dcd`, saved every 500 steps (1 ps)) for you to use for analysis. Feel free to visualise/ analyse both."
    ]
   },
   {
@@ -575,7 +576,7 @@
    "source": [
     "import MDAnalysis as mda\n",
     "\n",
-    "u = mda.Universe(interchange.to_openmm_topology(), \"trajectory.dcd\")\n",
+    "u = mda.Universe(interchange.to_openmm_topology(), \"trajectory_gpu.dcd\")\n",
     "\n",
     "view = nglview.show_mdanalysis(u)\n",
     "view.add_representation(\"line\", selection=\"protein\")\n",
@@ -586,8 +587,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<div class=\"alert alert-info\" style=\"max-width: 700px; margin-left: auto; margin-right: auto;\">\n",
-    "ℹ️ MDTraj is a great library for analysis. Check out the <a href=https://mdtraj.org/1.9.4/api/generated/mdtraj.Trajectory.html>docs</a> for the <code>Trajectory</code> object you just created, as well as their <a href=https://mdtraj.org/1.9.4/analysis.html>analysis functions</a>, and see if you can compute something interesting. Its real superpower is that it provides the coordinates of the trajectory as a <a href=https://numpy.org/doc/stable/reference/generated/numpy.array.html>NumPy array</a>, so if you're really keen try computing something directly from <code>mdt_traj.xyz</code>\n",
+    "<div class=\"alert alert-success\" style=\"max-width: 500px; margin-left: auto; margin-right: auto; border-left: 6px solid #5cb85c; background-color: #f1fff1;\">\n",
+    "    ✏️ <b>Exercise:</b> Can you visualize this trajectory in VMD (or another visualization tool of your choice)? Hint: DCD files only include the trajectory data (positions of each atoms over time) and lack topological information, so you might need another file.\n",
     "</div>"
    ]
   },
@@ -617,9 +618,8 @@
     "from MDAnalysis.analysis import rms\n",
     "import pandas as pd\n",
     "\n",
-    "# Define selection strings for the ligand and the binding pocket residues\n",
-    "LIGAND = \"resname UNK\"\n",
-    "POCKET_RESIDUES = \"protein and byres around 10.0 resname UNK\"\n",
+    "# Define a selection string for the ligand\n",
+    "LIGAND = \"resname LIG\"\n",
     "\n",
     "# Compute the RMSD of the ligand aligned to itself\n",
     "R = rms.RMSD(u,  # universe to align\n",
@@ -650,8 +650,13 @@
     "</div>\n",
     "\n",
     "<div class=\"alert alert-success\" style=\"max-width: 500px; margin-left: auto; margin-right: auto; border-left: 6px solid #5cb85c; background-color: #f1fff1;\">\n",
-    "    ✏️ <b>Exercise:</b> Why is the above analysis not sufficient for determining pose stability? (Hint: what would happen if a rigid ligand drifted out of the binding site?) Can you modify the code above to compute an RMSD which is more informative about binding pose stability? (Hint: think about the alignment reference. You can specify additional groups which are used to calculate RMSD but not to align the trajectory -- have a look at the <a href=\"https://userguide.mdanalysis.org/stable/examples/analysis/alignment_and_rms/rmsd.html\">MDAnalysis RMSD tutorial</a>. You might want to use the <code>POCKET_RESIDUES</code> selection string.) When would you expect this new RMSD to be similar or different to the RMSD computed above?\n",
-    "</div>"
+    "    ✏️ <b>Exercise:</b> Why is the above analysis not sufficient for determining pose stability? (Hint: what would happen if a rigid ligand drifted out of the binding site?) Can you modify the code above to compute an RMSD which is more informative about binding pose stability? (Hint: think about the alignment reference. You can specify additional groups which are used to calculate RMSD but not to align the trajectory -- have a look at the <a href=\"https://userguide.mdanalysis.org/stable/examples/analysis/alignment_and_rms/rmsd.html\">MDAnalysis RMSD tutorial</a>. You might want to check out the <a href=\"https://userguide.mdanalysis.org/stable/selections.html\"> MDAnalysis atom selection language documentation </a>.) When would you expect this new RMSD to be similar or different to the RMSD computed above? If you're stuck, you can click to reveal another hint below.\n",
+    "</div>\n",
+    "\n",
+    "<details>\n",
+    "  <summary><b>Click here to reveal another hint</b></summary>\n",
+    "You'll likely want to use a selection string similar to <code>POCKET_RESIDUES = \"protein and byres around 10.0 resname LIG\"</code>\n",
+    "</details>"
    ]
   },
   {
@@ -689,9 +694,11 @@
    "source": [
     "import prolif as plf\n",
     "\n",
+    "assert 'POCKET_RESIDUES' in globals(), \"Please define POCKET_RESIDUES based on your answer to the previous exercise (see the extra hint if needed).\"\n",
+    "\n",
     "# create selections for the ligand and protein\n",
     "ligand_selection = u.select_atoms(LIGAND)\n",
-    "protein_selection = u.select_atoms(POCKET_RESIDUES)\n",
+    "protein_selection = u.select_atoms(POCKET_RESIDUES) # noqa: F821 (this just tells the auto-formatter not to complain)\n",
     "ligand_selection, protein_selection"
    ]
   },
@@ -819,7 +826,7 @@
    "metadata": {},
    "source": [
     "<div class=\"alert alert-success\" style=\"max-width: 500px; margin-left: auto; margin-right: auto; border-left: 6px solid #5cb85c; background-color: #f1fff1;\">\n",
-    "    ✏️ <b>Exercise:</b> Repeat this entire notebook using a ligand from docked to MCL-1 during this morning's session. (Hint: You'll need to convert the pdbqt files to sdf files using obabel, adding protons as appropriate for pH 7. This will look something like <code>obabel input_lig.pdbqt  -p 7.0  -O output_lig.sdf</code>.) Is the binding pose stable? Are similar interactions formed by the docked ligand and the crystallographic ligand? Which do you think is likely to bind more strongly?\n",
+    "    ✏️ <b>Exercise:</b> Repeat this entire notebook using a ligand from docked to MCL-1 during this morning's session. (Hint: You'll need to convert the pdbqt files to sdf files using obabel, adding protons as appropriate for pH 7. This will look something like <code>obabel docked_ligand.pdbqt -opdb | obabel -ipdb -osdf -p 7.0 -O docked_ligand.sdf</code>. Make sure to use the docked coordinates! An example docked pdbqt file is provided at <code>../structures/docked_ligand.pdbqt</code>) Is the binding pose stable? Are similar interactions formed by the docked ligand and the crystallographic ligand? Which do you think is likely to bind more strongly? What would be required to answer these questions robustly?\n",
     "</div>"
    ]
   },
@@ -857,7 +864,7 @@
     "<a id=\"further_non_openff_materials\"></a>\n",
     "## 8. Beyond OpenFF\n",
     "\n",
-    "You can parameterise your complex and run molecular dynamics -- so what's next? If you're interested in quantiatively assessing the binding affinity of your ligand for your target, then [alchemical (and path-based) free energy calculations are the gold-standard method](https://www.nature.com/articles/s42004-023-01019-9). [Open Free Energy](https://openfree.energy/) is another [Open Molecular Software Foundation](https://omsf.io/) initiative, which develops open-source tools for binding free energy calculations. Head to their [tutorials](https://docs.openfree.energy/en/latest/tutorials/index.html) to learn more! However, these calculations are computationally demanding. If you're interested in a relatively fast (but relatively inaccurate) ranking of the binding affinities of a set of ligands, methods such as MM/GBSA may be appropriate (see the upcoming tutorial)."
+    "You can parameterise your complex and run molecular dynamics -- so what's next? If you're interested in quantiatively assessing the binding affinity of your ligand for your target, then [alchemical (and path-based) free energy calculations are the gold-standard method](https://www.nature.com/articles/s42004-023-01019-9). [Open Free Energy](https://openfree.energy/) is another [Open Molecular Software Foundation](https://omsf.io/) initiative, which develops open-source tools for binding free energy calculations. Head to their [tutorials](https://docs.openfree.energy/en/latest/tutorials/index.html) to learn more! However, these calculations are computationally demanding. If you're interested in a relatively fast (but relatively inaccurate) ranking of the binding affinities of a set of ligands, methods such as MM/GBSA may be appropriate."
    ]
   }
  ],
 
@@ -743,7 +743,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "interchange.to_amber(prefix=\"ligand\")"
+    "interchange.to_AMBER(prefix=\"ligand\")"
    ]
   },
   {
@@ -762,7 +762,7 @@
    "id": "212385e0",
    "metadata": {},
    "source": [
-    "Here, we'll export to OpenMM and run a short simulation directly from the noteboook. We can create an OpenMM `Simulation` object from the `Interchange` and run for a specified wall clock time using `runForClockTime` (the simluation time will depend on how quickly it runs on your machine). We keep the volume (V), number of particles (N), and average temperature (T) (using the LangevinMiddleIntegrator) constant and the simulation corresponds to the NVT ensemble."
+    "Here, we'll export to OpenMM and run a short simulation directly from the noteboook. We can create an OpenMM `Simulation` object from the `Interchange` and run for a specified wall clock time using `runForClockTime` (the simluation time will depend on how quickly it runs on your machine). We keep the volume ($V$), number of particles ($N$), and average temperature ($T$) (using the LangevinMiddleIntegrator) constant and the simulation corresponds to the $NVT$ ensemble."
    ]
   },
   {
@@ -946,7 +946,7 @@
    "outputs": [],
    "source": [
     "# normally when we call `ForceField.create_interchange` or `ForceField.create_openmm_system`, the toolkit will call\n",
-    "# AmberTools or OEChem to assign partial charges, since that's what's in the force field file. A future OpenFF release\n",
+    "# AMBERTools or OEChem to assign partial charges, since that's what's in the force field file. A future OpenFF release\n",
     "# which uses NAGL for charge assignment will encode this instruction in the force field file itself, but until that we\n",
     "# can use the `charge_from_molecules` argument to tell it to use the charges that we just assigned# for more, see:\n",
     "# https://docs.openforcefield.org/projects/toolkit/en/stable/api/generated/openff.toolkit.typing.engines.smirnoff.ForceField.html#openff.toolkit.typing.engines.smirnoff.ForceField.create_openmm_system\n",
 
@@ -1,4 +1,4 @@
-6o6f_ligand.pdb
+LIG
  OpenBabel09182516053D
 
  51 55  0  0  1  0  0  0  0  0999 V2000
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-6o6f_ligand.pdb`
	`1`	`+LIG`
`2`	`2`	`OpenBabel09182516053D`
`3`	`3`
`4`	`4`	`51 55 0 0 1 0 0 0 0 0999 V2000`