This is a guide to extracting data from GSD files generated by HOOMD-blue simulations and running basic analyses for research in the Rheoinformatic Colloids Team.
This guide is optimized for MacOS. See Simulating waterDPD for help running colloids simulations with HOOMD-blue.
[Last Update: August 2022]
Different parts of our analysis workflow were developed by Mohammad (Nabi) Nabizadeh and Dr. Deepak Mangal. They were streamlined for this guide by Rob Campbell.
- Types of Analysis
- Basic Analyses of waterDPD
- Modyfing waterDPD and Analyzing the Changes
- Shearing waterDPD
- (OPTIONAL) Installing R Locally
For qualitative analysis of a simulation we always view the GSD file in VMD to visualy inspect the results (as discussed in the guide to Using VMD).
The primary way to collect quantitative data from a HOOMD-blue simulation is to save data to the GSD file.
Generally the best practice is to a run a simulation and then, after the simulation is completed, extract specific data from the GSD file as needed. This is the method we will discuss here; however, it is also possible to output data to a table as you run the simulation, and to save that table to a file (as explained in the HOOMD-blue documentation).
We usually work with the data in a GSD file in one of three ways:
- For basic analyses (such as confirming that a simulation has run correctly), we usually use a python script to extract data from the GSD file and then plot it with Matplotlib.
- For more intensive analyses (such as calculating average contact number, mean squared displacement, pair correlation function, etc.), we often use a Fortran module to efficiently calculate these values from the GSD data. The Fortran module is run from a Python script using f2py.
- Python and Matplotlib are usually sufficient for plots; however, for more plotting options and more efficient statistical analysis, members of our group have also used the R programming language.
This tutorial focuse on basic analyses (type 1), and provides some advice on installing and learning R (type 3). Information about the Fortran module is included in the Gelation and Shearing guide.
Start by viewing the Equilibrium.gsd file in VMD. You should be able to see the water particles start with very fast motion that slowly equilibrates and reaches a steady state by the end of the simulation (see Using VMD for more advice on visualizing the simulation).
The basic quantitive analyses to confirm that this simulation has run correctly are to:
- plot the change in system kinetic temperature (kT) over time
- plot the change in the negative of the xy-component of the pressure tensor (AKA the shear stress) over time
To do this we first extract these data from the GSD file using a Python script, and then we plot the data.
On either Discovery or your local computer, go to your waterDPD folder and cd into the 01-waterDPD/analysis folder.
The sim-analysis-waterDPD.py script extracts the temperature and xy-component of the pressure tensor (i.e. the negative of the xy stress) from each frame of the Equilibrium.gsd file and saves it to a file called "gsd-properties.txt" (data is labeled with the simulation frame where it occured).
The plot-kT.py script uses the gsd-properties.txt data to plot temperature versus DPD time. This should show a large spike in temperature at the start of the simulation, which then stabilizes at the chosen kT value (in this case kT = 0.1).
The plot-shearstress.py script uses the gsd-properties.txt data to plot shear stress versus DPD time. This should show a relatively constant average around zero (becuase no shear is being applied to the equilibrium simulation).
If these plots do not reach these equilibrium values, than either something went wrong in the simulation OR the simulation needs to run for more time.
You can run each of these scripts individually (either locally or on Discovery using srun), or you can run them in the background on Discovery using the analyze job script to run sim-analysis-waterDPD.py, and the plotter job script to run the plot scripts:
sbatch analyze
sbatch plotter(Remember to edit these job scripts to update the filepath for your installation of hoomd4.2.1-mod before running them)
Now that we've successfully run a simple simulation, we can play around with it.
-
Try reruning the simulation for different lengths of time (change N_timesteps) and then see how the simulation visually and quantitatively changes.
-
Try rerunning the simulation for different system temperatures (change kT) and see how the simulation visually and quantitatively changes.
In addition to changing overall simulation parameters, you can access the invidual particles in a simulation and change their parameters as well.
For example, the 02-particle-arch folder contains a script where the position of particles in the center of the simulation are moved into a semi-circle, turning the initial configuration of the system into an H-shaped arch.
If you look at the script you'll see that after randomly distributing the particles in the simulation box for init.gsd, we access the particle positions and move them into the arch shape.
If you run the simulation and then open it's H-Equilibrium.gsd file in VMD, you will be able to see what this looks like.
This method can also be used to change other particle parameters, such as giving them an initial velocity.
Now that we've brought a simulation to equilibrium, we can apply a simple shear flow.
If you haven't already, install our modified version of HOOMD-blue and take a look at the shearing guide in the Background Reading to understand how we apply shear to a simulation.
An example code for a shearing simulation is the sim-shear-waterDPD.py script in the 03-shear-waterDPD folder. This script follows the same basic structure as our previous sims, however there are two major changes:
- Even though this simulation does not include any colloid particles, we have to set additional parameters as if we are including colloidal particles. This is because our modifications for colloids and for shearing are integrated. We cannot use our shearing method without also providing dummy variables for colloidal interactions. These parameters are explained in the next Gelation and Shearing guide, but you do not need to worry about them right now.
- After setting the particle interactions and the integration method, we also add several steps to manually control the Box Resize process and avoid HOOMD-blue's default shearing method. In basic HOOMD-blue, shear is applied by changing the shape of the simulation box to change the position of particles, and then that change in position is converted into a change in velocity. We want to do the opposite: apply a change in velocity that results in a change in position. That is why we have to add several steps to control the Box Resize process in addition to adding a shear rate (SR). More details of our shearing method are in the shearing background reading.
Be sure to run sim-shear-waterDPD.py using the modified version of HOOMD-blue (and not the basic version of HOOMD-blue). You can do this manually (by sourcing into the correct version locally or while in an srun session), or you can use the job-shear-waterDPD sbatch script.
For the analysis script, most of it is the same; however, our time units are different. We want to report the data from each frame in terms of the number of strains, not the DPD time. Calculating this value requires additional information from the simulation.
Once we have extracted the data, the plot of shear stress versus strain gives us the system's stress-strain curve for our chosen shear rate.
You can analyze the data from these simulations in pretty much any language (Python, MATLAB, R, etc.), but the quality of plotting options in R and it's versatility and ease of use for statistics makes it especially great for our advanced analyses.
Nabi has done extensive analysis in R, but most of those scripts worked from a different format of data files (output by an older version of HOOMD-blue). If you'd like to start doing analysis in R you will probably have to write your own analysis scripts, but you can chat with Rob or Nabi for some of their older examples.
For more resources on getting started, see the LinkedIn Learning courses and related resources listed in the Programming Resources document.
You can install R directly from the R Project website by downloading from a CRAN mirror (to get automatically redirected to the closest available mirror, use the "0-Cloud" link)
Note: If you are downloading for MacOS there are two releases of the latest version of R, one for Intel-based computers and one for computers with Apple silicon (ARM processors). Be sure to choose the correct version for your needs.
Once you have installed R you should install RStudio from the RStudio website, by choosing the free desktop version
Note: The website should automatically detect your operating system and give you the correct download link.
For getting started with R, it is recommended that you install the tidyverse collection of packages designed for data science. You can do this in RStudio by going to the Console and running
install.packages("tidyverse")You are now ready to use R!
Pro Tip: Got to "RStudio" on the MacOS Menu Bar and select "Preferences..." to open the Options window. Select "Code" from the sidebar. On the "Display" tab you can check a box to turn on "Rainbow parentheses." This will change the color of nested parentheses to a ROYGBIV order, so that the outermost are red and the innermost are purple. This can make it easier to keep track of nested parentheses and avoid parentheses-related errors.
If you would like to modify the appearance of RStudio further, go to "Appearance" in the sidebar of the Options window to change the overall "RStudio theme", adjust the font and font size, and/or change your "Editor theme" to match your color preferences (or to match another IDE, such as "Eclipse").
RStudio has 4 main sections
- [top-left] A new or existing file, where you write your R scripts. This will only be visible if you have a file open.
- [bottom-left] A Console, Terminal, and Jobs manager. You can use the Console for basic math, to run R commands separate from a full script, etc. The Console will also display the status of executed code from an R script file.
- [top-right] A list of data sets loaded into the Environment, as well as History, Connections, and Tutorial tabs.
- [bottom-right] A Files viewer, Plots, available Packages, Help, and general Viewer.
To get started we will write our code in a file in the top-left, view information about executed code in the Console at the bottom-left, and view Plots in the bottom-right.
Go to the "File" menu on the MacOS Menu Bar, and select "New File," "R Script."
You should now be familiar with all the basic steps required for our research! The next step is to see the Gelation and Shearing Guide and put all this to work simulating colloids!