Skip to content

06.2 Viewing your Dataset in DataWarrior

Chris Swain edited this page Apr 27, 2021 · 1 revision

6.2 Viewing your Dataset in DataWarrior

You will have downloaded compound datasets as sdf files from online databases (see section 5) or will have saved designs from ChemDraw as sdf files (see section 8). Now you can open up these saved sdf files and view them in DataWarrior, to do this simply open up the file as you would any other file from your computer.

Here is the dataset downloaded earlier in this guide in the Zinc15 tutorial (section 5.3), DataWarrior displays the dataset in various different formats including two graph formats (bottom two windows). For the purposes of this tutorial, we won’t be needing these views so you can go ahead and close these two windows.

There are 3 main areas of the DataWarrior display. The first (1.) is the Main View Area, in this case featuring the main windows 1 and 2. Window 1 is essentially a table which contains a list of every compound in your dataset, each row is a new compound, and each column is a property which was defined in the sdf file. The Zinc15 dataset contained information about each compound’s structure, SMILES and zinc ID, as well as having each compound numbered within the dataset. You can see this information displayed in the four columns of the table, the left-most displaying the 2D structure, the next listing the compounds structure number and the two on the far right of the table giving the zinc ID and SMILES respectively. Note – not all datasets number the compounds, this is important to remember and notice when saving files as sdfs following DataWarrior examination (see later). The second window in the main view area displays every compound with its 2D structure (ordered 1, 2, 3, etc. from left to right), you will notice that just as with the first window you are able to click on and select any of the compounds shown from this window.

The 2^nd^ area is the Filter Area (2.), here you can select different filters for your dataset based on the properties you have displayed in table 1 (window 1). You can alter the filters using the sliders (here the only slider currently showing is structure number) and by using text filters. We will explore this filter area in more detail later.

The final area to note is in the bottom right-hand corner and is called the Detail Area (3.), this is where the 2D and 3D structure (if the sdf file contained this information) is displayed for the selected compound. Here the dataset downloaded from Zinc15 contained both 2D and 3D conformations of each compound, the selected compound’s structures will appear here, along with a data window, where all the properties shown in table 1 (window 1) are also listed. You can easily scroll through these properties for your selected compound in this window. You can also make any of these corner windows smaller/bigger to see them better or hide any you aren’t interested in focusing on currently, to do this simply change the size of the desired window and the other two will adjust their size accordingly.

In the screenshot above we have selected compound 3 (clicking on it in window 1 highlights it in both main windows). By altering the size of the detail area windows we have made the 2D structure bigger and hidden the 3D structure. IMPORTANT NOTE: DataWarrior can be fiddly, hovering your mouse over any compound in the main view area (windows 1 or 2) will cause this compound to display here instead. Be careful and sure that the compound you wanted to select/look at is indeed the one showing here in the detail area.

One final feature to note highlighted in the above screenshot (red box) is shown at the bottom of the page. This area is known as the Status Area. The status area tells you how many compounds are in your dataset (total:…), how many compounds you currently have selected (selected:…), and how many compounds the current filters are displaying (visible:…). You will notice these values changing as you perform various tasks in DataWarrior.

Clone this wiki locally