Skip to content

09.3 How to Run the Jupyter Notebook

Chris Swain edited this page Apr 27, 2021 · 1 revision

9.3 How to Run the Jupyter Notebook

Double clicking on the notebook on the home menu will open up a new tab where you are connected to the kernel and can run the notebook. The notebook should open up the SMINA Jupyter kernel (highlighted in red below) which will contain the correct environment for docking on the cluster.

The benefit of running your docking experiments on the cluster means that you do not need to install any other programmes onto your machines. The SMINA kernel will open up already running Python3 in the conda environment specially created for this work. This environment contains all the libraries and tools you will need (SMINA, RDKit, rfscoresvs etc.).

Before you start running the notebook you will need to scroll through the notebook and enter in the exact names of your files in the correct places. The first one to change is the “CompoundsforDocking.sdf” file (red arrow line).

Here our sdf file containing all the structures we wished to dock was named “CompoundsforDocking” (as can be seen in the earlier screenshots of the home screen). You must ensure the sdfFilePath is to your sdf files name, and that the ConfoutputFilePath (blue arrow line) is named after your sdf file too.

For example, if your sdf file containing the compounds you wished to dock was called “ZincDataset.sdf” then you would replace “CompoundsforDocking.sdf” with “ZincDataset.sdf” on the red arrow line, and on the blue arrow line you would replace the “CompoundsforDockingconformations.sdf” with “ZincDatasetconformations.sdf”.

Next you will need to scroll down to the Docking to Protein section, here you will need to enter the exact names of your pdb files.

Here our “protein only.pdb” file is the file of the crystal structure containing just the protein, and our “ligand only.pdb” file is the file containing only the ligand. If your files are named differently you will need to replace these file names with your own.

Finally, you will need to scroll down to the “Rescore using Random Forest Model” section to enter in your remaining file name.

Here our “protein and ligand.pdb” file is the file containing the crystal structure of the protein with the ligand bound. As above, you will need to replace this with your own file name if your file is named differently.

Now you have corrected all the file names you will be able to run the notebook.

Click on the first cell of the notebook, this will highlight the cell (as shown below) and then click run.

<img src=https://github.com/UCL/Open_Docking_Lab_Handbook/blob/main/"media/image150.png" style="width:6.26644in;height:3.30357in" />

You can manually run through all the cells in the notebook by clicking run each time a new cell is highlighted.

Here you can see after running the “In[2]” cell, an “Out[2]” is created. This output shows you the number of compounds in your sdf file that you are going to dock. Our sdf file for docking contained 2 structures (from section 8) and this has been displayed here.

Continue to run through all of the cells, when you get to this one, you will be able to see RDKit generating the conformations of your compounds for docking (see arrow on screenshot). This will take longer depending on how many structures you had in your sdf file.

The cell below, “In[5]” checks how many conformations have been generated per compound. Here you can see the “Out[5]” was 6. Each conformation will then be docked into your target protein (see below) generating a number of docked poses per conformation.

When you get to the cell below, you will see SMINA actually perform the docking experiments, this is the part that will take the longest to run. If you have many compounds in your dataset then you can run this cell overnight. Just leave your laptop on and plugged in, and ensure you have a steady internet connection.

The current maximum runtime for jobs on the cluster is 48 hours, and the maximum time a notebook can remain idle for (before being shut down) is 2 hours. A notebook will become idle if you have any internet outages or if you lose connection to UCL via your VPN. This idle window should mean that your experiment will not abort midway through docking just because your WiFi goes down.

You will be able to see the conformations being docked in real time as this bar fills up from 0% to 100%.

You will know when your docking experiments have finished because the cell “In[*]” will change to “In[7]” (where * denotes running).

Next you will be onto the flexible docking part of the notebook, flexible docking sets all the residues within a defined region in your active site to occupy multiple different conformations rather than being rigid. Whilst this can be more accurate as proteins are not static, it does take much longer.

The notebook is automatically set up to not perform flexible docking. If you wish to do some flexible docking experiments too, then you will need to remove the “#” from the start of the code line.

You can skip through multiple cells at once even if they are still running (“In[*]”), eventually the notebook will catch up and all will show as complete.

Towards the end of the notebook, you will be able to see the poses generated from your compounds and be able to see their associated minimised affinity (binding affinity) and RFScore (docking score).

The last step is to save your results, once you have run through every cell in the notebook you can now click back on the home tab.

Here you will see all the files you have generated whilst running the notebook, the most important one being the “Alldata.sdf.gz”. This is the file which you will want to download to analyse your docking results.

Note – this is a compressed file (hence .gz), you will need to extract the contents before you can view/analyse your results (see section 10).

Notice also that the notebook will show up green and say that it is still running if you have not closed down the notebook tab.

The files generated by running the notebook will remain on your homepage (as will all uploaded files) as long as they are uniquely named. Some files, like the Alldata file will be overwritten each time you perform a docking experiment unless you rename the file before starting a new experiment.

You can create folders to organise your experiments into if you wish to keep all the files on your Jupyter account. If not, you can simply download the files you want and then delete them from the list.

Clone this wiki locally