|
| 1 | +# Radii Benchmarking Pipeline |
| 2 | + |
| 3 | +Comparison of VDW radii sets and scaling strategies for reproducing isodensity molecular volumes and buried volumes. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Three benchmarking analyses are provided: |
| 8 | + |
| 9 | +1. **Radii comparison** — qualitative comparison of Bondi vs Charry-Tkatchenko radii |
| 10 | +2. **Parity plots** — quantitative accuracy of three radii conditions vs isodensity reference (486 molecules, 1044 buried volume samples) |
| 11 | +3. **Optimal scaling** — systematic sweep of Bondi scaling factors to minimise deviation from isodensity reference |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## 1. Radii Comparison |
| 16 | + |
| 17 | +**Script:** `compare_radii.py` |
| 18 | +**Output:** `radii_comparison.png` |
| 19 | + |
| 20 | +Generates a 3-panel figure comparing the two VDW radii sets available in DBSTEP: |
| 21 | + |
| 22 | +- **(a)** Scatter plot of Bondi vs Charry-Tkatchenko radii for all 54 shared elements. The Charry-Tkatchenko radii (free-atom, derived from dipole polarizability) are systematically larger than Bondi radii for most elements, particularly for transition metals (e.g. Cu, Zn, Pd, Pt). |
| 23 | +- **(b)** Molecular volumes computed with both radii sets for all 22 molecules in `dbstep/data/`. |
| 24 | +- **(c)** Percent buried volumes (%V_Bur) at R = 3.5 Å for the same molecules. |
| 25 | + |
| 26 | +**References:** |
| 27 | +- Bondi: *J. Phys. Chem.* **1964**, *68*, 441; Mantina et al. *J. Phys. Chem. A* **2009**, *113*, 5806 |
| 28 | +- Charry-Tkatchenko: *J. Chem. Theory Comput.* **2024**, *20*, 7844-7855 (R_vdW^free[alpha] from Table S1, SI) |
| 29 | + |
| 30 | +```bash |
| 31 | +uv run python analysis/radii/compare_radii.py |
| 32 | +``` |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## 2. Parity Plots vs Isodensity Reference |
| 39 | + |
| 40 | +**Script:** `parity_plots.py` |
| 41 | +**Output:** `parity_plots.png` |
| 42 | + |
| 43 | +A 2×3 figure comparing three radii conditions against the isodensity reference (0.0016 e/Bohr³ electron density surface): |
| 44 | + |
| 45 | +| Condition | Radii | Scaling | Hydrogens | |
| 46 | +|-----------|-------|---------|-----------| |
| 47 | +| Bondi | Bondi | 1.0× | included | |
| 48 | +| Charry-Tkatchenko | Charry-Tkatchenko | 1.0× | included | |
| 49 | +| SambVca | Bondi | 1.17× | excluded | |
| 50 | + |
| 51 | +**Top row — molecular volumes** (486 molecules from ZINC dataset): |
| 52 | +- Parity plots of VDW vs isodensity molecular volume for each condition |
| 53 | +- Reference: `isodensity_volumes.txt`, `bondi_volumes.txt`, `charry_volumes.txt`, `sambvca_volumes.txt` |
| 54 | + |
| 55 | +**Bottom row — buried volumes** (1044 stratified atom-center samples): |
| 56 | +- Parity plots of %V_Bur vs isodensity %V_Bur, coloured by center-atom element |
| 57 | +- Reference: `isodensity_sampled.csv`, `bondi_sampled.csv`, `charry_sampled.csv`, `sambvca_sampled.csv` |
| 58 | + |
| 59 | +### Sampling strategy |
| 60 | + |
| 61 | +Buried volume samples were generated using a stratified element-based approach to ensure diverse element coverage: |
| 62 | + |
| 63 | +| Element | Quota | Notes | |
| 64 | +|---------|-------|-------| |
| 65 | +| C | 250 | | |
| 66 | +| H | 250 | | |
| 67 | +| N | 150 | | |
| 68 | +| O | 150 | | |
| 69 | +| S | 100 | | |
| 70 | +| F | 94 | capped (only 94 molecules contain F) | |
| 71 | +| Cl | 50 | | |
| 72 | +| **Total** | **1044** | | |
| 73 | + |
| 74 | +At most one atom of each element is sampled per molecule. See `sample_atoms.py` for details; `sample_atoms.csv` contains the full list of (mol_file, atom1_idx, element) triples. |
| 75 | + |
| 76 | +### Running the calculations |
| 77 | + |
| 78 | +Buried volume calculations for all three conditions are run with: |
| 79 | + |
| 80 | +```bash |
| 81 | +uv run python analysis/radii/run_sampled_vbur.py |
| 82 | +``` |
| 83 | + |
| 84 | +This produces `bondi_sampled.csv`, `charry_sampled.csv`, and `sambvca_sampled.csv`. |
| 85 | + |
| 86 | +The isodensity reference (`isodensity_sampled.csv`) requires cube files and should be run on a server: |
| 87 | + |
| 88 | +```bash |
| 89 | +# Copy sample_atoms.csv and run_sampled_vbur.py to server, then: |
| 90 | +python analysis/radii/run_sampled_vbur.py --isodensity |
| 91 | +``` |
| 92 | + |
| 93 | +### Generating the figure |
| 94 | + |
| 95 | +```bash |
| 96 | +uv run python analysis/radii/parity_plots.py |
| 97 | +``` |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## 3. Optimal Bondi Scaling Factor |
| 104 | + |
| 105 | +**Script:** `optimal_scaling.py` |
| 106 | +**Output:** `optimal_scaling.png` |
| 107 | + |
| 108 | +Determines the optimal Bondi radius scaling factor for reproducing isodensity molecular volumes and buried volumes by running actual DBSTEP calculations across a sweep of scaling factors. |
| 109 | + |
| 110 | +**Sweep:** s = 1.00 to 1.20 in steps of 0.01 (21 values total) |
| 111 | +- s = 1.00 uses `bondi_sampled.csv` |
| 112 | +- s = 1.01–1.20 uses `bondi_x{s:.2f}_sampled.csv` |
| 113 | + |
| 114 | +**Reference data:** |
| 115 | +- Molecular volumes: `isodensity_volumes.txt` (486 molecules, mol_id from filename) |
| 116 | +- Buried volumes: `isodensity_sampled.csv` (1044 samples) |
| 117 | + |
| 118 | +**Output panels:** |
| 119 | +- **(a)** RMSE vs scaling factor for both molecular volume (ų, left y-axis) and %V_Bur (%, right y-axis), with vertical lines marking the optimal s for each metric |
| 120 | +- **(b)** Molecular volume parity plot at the optimal scaling factor |
| 121 | +- **(c)** %V_Bur parity plot at the optimal scaling factor, coloured by center-atom element |
| 122 | + |
| 123 | +### Running the sweep |
| 124 | + |
| 125 | +First generate the sweep CSV files (skips any that already exist): |
| 126 | + |
| 127 | +```bash |
| 128 | +uv run python analysis/radii/run_sampled_vbur.py --sweep |
| 129 | +``` |
| 130 | + |
| 131 | +Then generate the figure: |
| 132 | + |
| 133 | +```bash |
| 134 | +uv run python analysis/radii/optimal_scaling.py |
| 135 | +``` |
| 136 | + |
| 137 | + |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +## Full Pipeline |
| 142 | + |
| 143 | +To reproduce all results from scratch (excluding the isodensity reference which requires cube files): |
| 144 | + |
| 145 | +```bash |
| 146 | +# 1. Generate atom sample list |
| 147 | +uv run python analysis/radii/sample_atoms.py |
| 148 | + |
| 149 | +# 2. Run VDW calculations (Bondi, Charry-Tkatchenko, SambVca) |
| 150 | +uv run python analysis/radii/run_sampled_vbur.py |
| 151 | + |
| 152 | +# 3. Run scaling sweep (Bondi x1.01 to x1.20) |
| 153 | +uv run python analysis/radii/run_sampled_vbur.py --sweep |
| 154 | + |
| 155 | +# 4. Generate figures |
| 156 | +uv run python analysis/radii/compare_radii.py |
| 157 | +uv run python analysis/radii/parity_plots.py |
| 158 | +uv run python analysis/radii/optimal_scaling.py |
| 159 | +``` |
| 160 | + |
| 161 | +Steps 2–4 require `isodensity_sampled.csv` and the `*_volumes.txt` files from the isodensity reference calculations. |
0 commit comments