Skip to content

Commit 8a77d4d

Browse files
committed
add radii benchmarking pipeline & fix occ_vol storage bug
Move compare_radii.py and assets into analysis/radii/, add stratified buried volume sampling (sample_atoms.py/csv), runner script (run_sampled_vbur.py), parity plots vs isodensity reference (parity_plots.py), optimal Bondi scaling sweep (optimal_scaling.py), and a README documenting the full pipeline. Fix Dbstep.py occ_vol storage condition (was gated on measure==grid, now stores whenever occ_vol is not None).
1 parent ae39ff8 commit 8a77d4d

529 files changed

Lines changed: 51730 additions & 28 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

analysis/README.md

Lines changed: 0 additions & 27 deletions
This file was deleted.

analysis/radii/README.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# Radii Benchmarking Pipeline
2+
3+
Comparison of VDW radii sets and scaling strategies for reproducing isodensity molecular volumes and buried volumes.
4+
5+
## Overview
6+
7+
Three benchmarking analyses are provided:
8+
9+
1. **Radii comparison** — qualitative comparison of Bondi vs Charry-Tkatchenko radii
10+
2. **Parity plots** — quantitative accuracy of three radii conditions vs isodensity reference (486 molecules, 1044 buried volume samples)
11+
3. **Optimal scaling** — systematic sweep of Bondi scaling factors to minimise deviation from isodensity reference
12+
13+
---
14+
15+
## 1. Radii Comparison
16+
17+
**Script:** `compare_radii.py`
18+
**Output:** `radii_comparison.png`
19+
20+
Generates a 3-panel figure comparing the two VDW radii sets available in DBSTEP:
21+
22+
- **(a)** Scatter plot of Bondi vs Charry-Tkatchenko radii for all 54 shared elements. The Charry-Tkatchenko radii (free-atom, derived from dipole polarizability) are systematically larger than Bondi radii for most elements, particularly for transition metals (e.g. Cu, Zn, Pd, Pt).
23+
- **(b)** Molecular volumes computed with both radii sets for all 22 molecules in `dbstep/data/`.
24+
- **(c)** Percent buried volumes (%V_Bur) at R = 3.5 Å for the same molecules.
25+
26+
**References:**
27+
- Bondi: *J. Phys. Chem.* **1964**, *68*, 441; Mantina et al. *J. Phys. Chem. A* **2009**, *113*, 5806
28+
- Charry-Tkatchenko: *J. Chem. Theory Comput.* **2024**, *20*, 7844-7855 (R_vdW^free[alpha] from Table S1, SI)
29+
30+
```bash
31+
uv run python analysis/radii/compare_radii.py
32+
```
33+
34+
![Radii Comparison](radii_comparison.png)
35+
36+
---
37+
38+
## 2. Parity Plots vs Isodensity Reference
39+
40+
**Script:** `parity_plots.py`
41+
**Output:** `parity_plots.png`
42+
43+
A 2×3 figure comparing three radii conditions against the isodensity reference (0.0016 e/Bohr³ electron density surface):
44+
45+
| Condition | Radii | Scaling | Hydrogens |
46+
|-----------|-------|---------|-----------|
47+
| Bondi | Bondi | 1.0× | included |
48+
| Charry-Tkatchenko | Charry-Tkatchenko | 1.0× | included |
49+
| SambVca | Bondi | 1.17× | excluded |
50+
51+
**Top row — molecular volumes** (486 molecules from ZINC dataset):
52+
- Parity plots of VDW vs isodensity molecular volume for each condition
53+
- Reference: `isodensity_volumes.txt`, `bondi_volumes.txt`, `charry_volumes.txt`, `sambvca_volumes.txt`
54+
55+
**Bottom row — buried volumes** (1044 stratified atom-center samples):
56+
- Parity plots of %V_Bur vs isodensity %V_Bur, coloured by center-atom element
57+
- Reference: `isodensity_sampled.csv`, `bondi_sampled.csv`, `charry_sampled.csv`, `sambvca_sampled.csv`
58+
59+
### Sampling strategy
60+
61+
Buried volume samples were generated using a stratified element-based approach to ensure diverse element coverage:
62+
63+
| Element | Quota | Notes |
64+
|---------|-------|-------|
65+
| C | 250 | |
66+
| H | 250 | |
67+
| N | 150 | |
68+
| O | 150 | |
69+
| S | 100 | |
70+
| F | 94 | capped (only 94 molecules contain F) |
71+
| Cl | 50 | |
72+
| **Total** | **1044** | |
73+
74+
At most one atom of each element is sampled per molecule. See `sample_atoms.py` for details; `sample_atoms.csv` contains the full list of (mol_file, atom1_idx, element) triples.
75+
76+
### Running the calculations
77+
78+
Buried volume calculations for all three conditions are run with:
79+
80+
```bash
81+
uv run python analysis/radii/run_sampled_vbur.py
82+
```
83+
84+
This produces `bondi_sampled.csv`, `charry_sampled.csv`, and `sambvca_sampled.csv`.
85+
86+
The isodensity reference (`isodensity_sampled.csv`) requires cube files and should be run on a server:
87+
88+
```bash
89+
# Copy sample_atoms.csv and run_sampled_vbur.py to server, then:
90+
python analysis/radii/run_sampled_vbur.py --isodensity
91+
```
92+
93+
### Generating the figure
94+
95+
```bash
96+
uv run python analysis/radii/parity_plots.py
97+
```
98+
99+
![Parity Plots](parity_plots.png)
100+
101+
---
102+
103+
## 3. Optimal Bondi Scaling Factor
104+
105+
**Script:** `optimal_scaling.py`
106+
**Output:** `optimal_scaling.png`
107+
108+
Determines the optimal Bondi radius scaling factor for reproducing isodensity molecular volumes and buried volumes by running actual DBSTEP calculations across a sweep of scaling factors.
109+
110+
**Sweep:** s = 1.00 to 1.20 in steps of 0.01 (21 values total)
111+
- s = 1.00 uses `bondi_sampled.csv`
112+
- s = 1.01–1.20 uses `bondi_x{s:.2f}_sampled.csv`
113+
114+
**Reference data:**
115+
- Molecular volumes: `isodensity_volumes.txt` (486 molecules, mol_id from filename)
116+
- Buried volumes: `isodensity_sampled.csv` (1044 samples)
117+
118+
**Output panels:**
119+
- **(a)** RMSE vs scaling factor for both molecular volume (ų, left y-axis) and %V_Bur (%, right y-axis), with vertical lines marking the optimal s for each metric
120+
- **(b)** Molecular volume parity plot at the optimal scaling factor
121+
- **(c)** %V_Bur parity plot at the optimal scaling factor, coloured by center-atom element
122+
123+
### Running the sweep
124+
125+
First generate the sweep CSV files (skips any that already exist):
126+
127+
```bash
128+
uv run python analysis/radii/run_sampled_vbur.py --sweep
129+
```
130+
131+
Then generate the figure:
132+
133+
```bash
134+
uv run python analysis/radii/optimal_scaling.py
135+
```
136+
137+
![Optimal Scaling](optimal_scaling.png)
138+
139+
---
140+
141+
## Full Pipeline
142+
143+
To reproduce all results from scratch (excluding the isodensity reference which requires cube files):
144+
145+
```bash
146+
# 1. Generate atom sample list
147+
uv run python analysis/radii/sample_atoms.py
148+
149+
# 2. Run VDW calculations (Bondi, Charry-Tkatchenko, SambVca)
150+
uv run python analysis/radii/run_sampled_vbur.py
151+
152+
# 3. Run scaling sweep (Bondi x1.01 to x1.20)
153+
uv run python analysis/radii/run_sampled_vbur.py --sweep
154+
155+
# 4. Generate figures
156+
uv run python analysis/radii/compare_radii.py
157+
uv run python analysis/radii/parity_plots.py
158+
uv run python analysis/radii/optimal_scaling.py
159+
```
160+
161+
Steps 2–4 require `isodensity_sampled.csv` and the `*_volumes.txt` files from the isodensity reference calculations.

0 commit comments

Comments
 (0)