Skip to content

Commit df834ea

Browse files
authored
Merge pull request #2933 to add uncertainties in quadrature
Uncertainty Part 1.5: Adding in quadrature. Some changes to how uncertainties are handled. Most significant is that they are now added "in quadrature" (ie. square, add, root). But the assumed error on a thermo Benson group has changed too (derived from actual GA vs Benchmark numbers). And a Jupyter notebook is updatade. This has been reviewed many times, and I just ran the entire test suite again locally, so I'm merging.
2 parents e980b7c + 687ed9a commit df834ea

4 files changed

Lines changed: 86 additions & 58 deletions

File tree

ipython/local_uncertainty.ipynb

Lines changed: 36 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,29 @@
114114
"\n",
115115
"$$\\Delta G = \\frac{1}{\\sqrt{12}}(G_{max} - G_{min})$$\n",
116116
"\n",
117-
"Several parameters are used to formulate $\\Delta G$. These are $\\Delta G_\\mathrm{library}$, $\\Delta G_\\mathrm{QM}$, $\\Delta G_\\mathrm{GAV}$, and $\\Delta _\\mathrm{group}$.\n",
117+
"But first, to formulate $G$, we need to compile all the uncertainty sources. These are $ G_\\mathrm{library}$, $ G_\\mathrm{QM}$, $ G_\\mathrm{GAV}$, and $G _\\mathrm{group}$. We treat each as continuous random variable and add them to get the species uncertainty.\n",
118118
" \n",
119-
"$$\\Delta G = \\delta_\\mathrm{library} \\Delta G_\\mathrm{library} + \\delta_\\mathrm{QM} \\Delta G_\\mathrm{QM} + \\delta_\\mathrm{GAV} \\left( \\Delta G_\\mathrm{GAV} + \\sum_{\\mathrm{group}\\; j} d_{j} \\Delta G_{\\mathrm{group},j} \\right)$$\n",
119+
"$$G = \\delta_\\mathrm{library} G_\\mathrm{library} + \\delta_\\mathrm{QM} G_\\mathrm{QM} + \\delta_\\mathrm{GAV} \\left( G_\\mathrm{GAV} + \\sum_{\\mathrm{group}\\; j} d_{j} G_{\\mathrm{group},j} \\right)$$\n",
120+
"\n",
121+
"Here $\\delta$ is the Kronecker delta function which equals one if the species thermochemistry parameter contains the particular source type and $d_{j}$ is the degeneracy (number of appearances) of the thermo group used to construct the species thermochemistry in the group additivity method.\n",
122+
"\n",
123+
"To compute $\\Delta G$, we'll use the property that the variance of the sum of two random variables, $X$ and $Y$, is:\n",
124+
"$$Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$$\n",
125+
"\n",
126+
"If we assume that each uncertainty source is uncorrelated with the others, then the covariance term drops out:\n",
127+
"\n",
128+
"$$Var(X+Y)=Var(X)+Var(Y), \\ \\ \\ \\ \\ X \\neq Y$$\n",
129+
"\n",
130+
"And we can compute the variance of $G$:\n",
131+
"\n",
132+
"$$Var(G) = \\delta_\\mathrm{library} Var(G_\\mathrm{library}) + \\delta_\\mathrm{QM} Var(G_\\mathrm{QM})+\\delta_\\mathrm{GAV} \\left(Var(G_\\mathrm{GAV}) +\\sum_{group\\ j} d_j^2 Var(G_\\mathrm{group\\ j}) \\right)$$\n",
133+
"\n",
134+
"\n",
135+
"The standard deviation $\\Delta G$ is then the square root of that (this is an example of adding in quadrature):\n",
136+
"$$\\Delta(G) = \\sqrt{\\delta_\\mathrm{library} Var(G_\\mathrm{library}) + \\delta_\\mathrm{QM} Var(G_\\mathrm{QM})+\\delta_\\mathrm{GAV} \\left(Var(G_\\mathrm{GAV}) +\\sum_{group\\ j} d_j^2 Var(G_\\mathrm{group\\ j}) \\right)}$$\n",
137+
"\n",
138+
"\n",
120139
"\n",
121-
"where $\\delta$ is the Kronecker delta function which equals one if the species thermochemistry parameter contains the particular source type and $d_{j}$ is the degeneracy (number of appearances) of the thermo group used to construct the species thermochemistry in the group additivity method.\n",
122140
"\n",
123141
"### Kinetics Uncertainty\n",
124142
"\n",
@@ -130,13 +148,20 @@
130148
"\n",
131149
"$$\\Delta \\ln(k) = \\frac{1}{\\sqrt{12}}(\\ln k_{max} - \\ln k_{min})$$\n",
132150
"\n",
133-
"The parameters used to formulate $\\Delta \\ln k$ are $\\Delta \\ln k_\\mathrm{library}$, $\\Delta \\ln k_\\mathrm{training}$, $\\Delta \\ln k_\\mathrm{pdep}$, $\\Delta \\ln k_\\mathrm{family}$, $\\Delta \\ln k_\\mathrm{non-exact}$, and $\\Delta \\ln k_\\mathrm{rule}$.\n",
151+
"The sources used to formulate $ \\ln k$ are $ \\ln k_\\mathrm{library}$, $ \\ln k_\\mathrm{training}$, $ \\ln k_\\mathrm{pdep}$, $ \\ln k_\\mathrm{family}$, $ \\ln k_\\mathrm{non-exact}$, and $ \\ln k_\\mathrm{rule}$.\n",
152+
"\n",
153+
"For exact training data matches and library and pdep reactions, the kinetic uncertainty is assigned according to their uncertainty type. For kinetics estimated using RMG's decision trees, the following formula is used to calculate the uncertainty:\n",
134154
"\n",
135-
"For library, training, and pdep reactions, the kinetic uncertainty is assigned according to their uncertainty type. For kinetics estimated using RMG's rate rules, the following formula is used to calculate the uncertainty:\n",
155+
"$$ \\ln k_\\mathrm{rate\\; rules} = \\ln k_\\mathrm{family} + \\log_{10}(N+1) \\left(\\ln k_\\mathrm{non-exact}\\right) + \\sum_{\\mathrm{rule}\\; i} w_i \\ln k_{\\mathrm{rule},i}$$\n",
136156
"\n",
137-
"$$\\Delta \\ln k_\\mathrm{rate\\; rules} = \\Delta\\ln k_\\mathrm{family} + \\log_{10}(N+1) \\left(\\Delta\\ln k_\\mathrm{non-exact}\\right) + \\sum_{\\mathrm{rule}\\; i} w_i \\Delta \\ln k_{\\mathrm{rule},i}$$\n",
157+
"where N is the total number of rate rules or training reactions used and $w_{i}$ is the weight of the rate rule or training reaction in the averaging scheme for that kinetics estimate.\n",
138158
"\n",
139-
"where N is the total number of rate rules used and $w_{i}$ is the weight of the rate rule in the averaging scheme for that kinetics estimate. "
159+
"Once again we use the property of adding variances with the assumption that sources are uncorrelated to each other. The standard deviation then adds in quadrature.\n",
160+
"\n",
161+
"$$\\Delta\\ln k_\\mathrm{rate\\; rules} = \\sqrt{(\\Delta \\ln k_\\mathrm{family})^2 + \\left(\\log_{10}(N+1) \\right)^2\\left(\\Delta \\ln k_\\mathrm{non-exact}\\right)^2 + \\sum_{\\mathrm{rule}\\; i} w_i^2 (\\Delta \\ln k_{\\mathrm{rule},i})^2}$$\n",
162+
"\n",
163+
"\n",
164+
"\n"
140165
]
141166
},
142167
{
@@ -178,9 +203,7 @@
178203
{
179204
"cell_type": "code",
180205
"execution_count": null,
181-
"metadata": {
182-
"scrolled": true
183-
},
206+
"metadata": {},
184207
"outputs": [],
185208
"source": [
186209
"result = uncertainty.local_analysis(sensitive_species, correlated=False, number=5, fileformat='.png')\n",
@@ -190,9 +213,7 @@
190213
{
191214
"cell_type": "code",
192215
"execution_count": null,
193-
"metadata": {
194-
"scrolled": true
195-
},
216+
"metadata": {},
196217
"outputs": [],
197218
"source": [
198219
"# Show the uncertainty plots\n",
@@ -230,9 +251,7 @@
230251
{
231252
"cell_type": "code",
232253
"execution_count": null,
233-
"metadata": {
234-
"scrolled": true
235-
},
254+
"metadata": {},
236255
"outputs": [],
237256
"source": [
238257
"uncertainty.assign_parameter_uncertainties(correlated=True)\n",
@@ -243,9 +262,7 @@
243262
{
244263
"cell_type": "code",
245264
"execution_count": null,
246-
"metadata": {
247-
"scrolled": true
248-
},
265+
"metadata": {},
249266
"outputs": [],
250267
"source": [
251268
"# Show the uncertainty plots\n",

rmgpy/data/thermo.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2716,11 +2716,13 @@ def extract_source_from_comments(self, species):
27162716
Parses the verbose string of comments from the thermo data of the species object,
27172717
and extracts the thermo sources.
27182718
2719-
Returns a dictionary with keys of either 'Library', 'QM', and/or 'GAV'.
2719+
Returns a dictionary with keys of 'Library', 'QM', 'ADS', and/or 'GAV'.
27202720
Commonly, species thermo are estimated using only one of these sources.
27212721
However, a radical can be estimated with more than one type of source, for
27222722
instance a saturated library value and a GAV HBI correction, or a QM saturated value
2723-
and a GAV HBI correction.
2723+
and a GAV HBI correction. Adsorbates can be estimated using a single library
2724+
for the adsorbate or a combination of a gas phase library for the
2725+
gas phase portion and an adsorption correction.
27242726
27252727
source = {'Library': String_Name_of_Library_Used,
27262728
'QM': String_of_Method_Used,

rmgpy/tools/uncertainty.py

Lines changed: 37 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ class ThermoParameterUncertainty(object):
4343
This class is an engine that generates the species uncertainty based on its thermo sources.
4444
"""
4545

46-
def __init__(self, dG_library=1.5, dG_QM=3.0, dG_GAV=1.5, dG_group=0.1, dG_ADS_correction=6.918, dG_surf_lib=6.918):
46+
def __init__(self, dG_library=1.5, dG_QM=3.0, dG_GAV=1.5, dG_group=0.7159, dG_ADS_correction=6.918, dG_surf_lib=6.918):
4747
"""
4848
Initialize the different uncertainties dG_library, dG_QM, dG_GAV, and dG_other with set values
4949
in units of kcal/mol.
@@ -62,22 +62,22 @@ def get_uncertainty_value(self, source):
6262
"""
6363
Retrieve the uncertainty value in kcal/mol when the source of the thermo of a species is given.
6464
"""
65-
dG = 0.0
65+
varG = 0.0
6666
if 'Library' in source:
67-
dG += self.dG_library
67+
varG += self.dG_library * self.dG_library
6868
if 'Surface_Library' in source:
69-
dG += self.dG_surf_lib
69+
varG += self.dG_surf_lib * self.dG_surf_lib
7070
if 'QM' in source:
71-
dG += self.dG_QM
71+
varG += self.dG_QM * self.dG_QM
7272
if 'GAV' in source:
73-
dG += self.dG_GAV # Add a fixed uncertainty for the GAV method
73+
varG += self.dG_GAV * self.dG_GAV # Add a fixed uncertainty for the GAV method
7474
for group_type, group_entries in source['GAV'].items():
7575
group_weights = [groupTuple[-1] for groupTuple in group_entries]
76-
dG += np.sum([weight * self.dG_group for weight in group_weights])
76+
varG += np.sum([weight * weight * self.dG_group * self.dG_group for weight in group_weights])
7777
if 'ADS' in source:
78-
dG += self.dG_ADS_correction # Add adsorption correction uncertainty
78+
varG += self.dG_ADS_correction * self.dG_ADS_correction # Add adsorption correction uncertainty
7979

80-
return dG
80+
return np.sqrt(varG)
8181

8282
def get_partial_uncertainty_value(self, source, corr_source_type, corr_param=None, corr_group_type=None):
8383
"""
@@ -170,39 +170,39 @@ def get_uncertainty_value(self, source):
170170
"""
171171
Retrieve the dlnk uncertainty when the source of the reaction kinetics are given
172172
"""
173-
dlnk = 0.0
173+
varlnk = 0.0
174174
if 'Library' in source:
175175
# Should be a single library reaction source
176-
dlnk += self.dlnk_library
176+
varlnk += self.dlnk_library * self.dlnk_library
177177
elif 'Surface_Library' in source:
178178
# Should be a single library reaction source
179-
dlnk += self.dlnk_surf_library
179+
varlnk += self.dlnk_surf_library * self.dlnk_surf_library
180180
elif 'PDep' in source:
181181
# Should be a single pdep reaction source
182-
dlnk += self.dlnk_pdep
182+
varlnk += self.dlnk_pdep * self.dlnk_pdep
183183
elif 'Training' in source:
184184
# Should be a single training reaction
185185
# Although some training entries may be used in reverse,
186-
# We still consider the kinetics to be directly dependent
186+
# We still consider the kinetics to be directly dependent
187187
if 'surface' in source['Training'][0].lower():
188-
dlnk += self.dlnk_surf_training
188+
varlnk += self.dlnk_surf_training * self.dlnk_surf_training
189189
else:
190-
dlnk += self.dlnk_training
190+
varlnk += self.dlnk_training * self.dlnk_training
191191
elif 'Rate Rules' in source:
192192
family_label = source['Rate Rules'][0]
193193
source_dict = source['Rate Rules'][1]
194194
exact = source_dict['exact']
195195
rule_weights = [ruleTuple[-1] for ruleTuple in source_dict['rules']]
196196
training_weights = [trainingTuple[-1] for trainingTuple in source_dict['training']]
197197

198-
dlnk += self.dlnk_family
198+
varlnk += self.dlnk_family * self.dlnk_family
199199

200200
N = len(rule_weights) + len(training_weights)
201201
if 'node_std_dev' in source_dict:
202202
# Handle autogen BM trees
203203
if source_dict['node_std_dev'] < 0:
204204
raise ValueError('Invalid value for std dev of kinetics family rule node')
205-
dlnk += source_dict['node_std_dev']
205+
varlnk += np.float_power(source_dict['node_std_dev'], 2.0)
206206
if source_dict['node_n_train'] is None:
207207
raise ValueError('Invalid number of training reactions for kinetics family rule node')
208208
N = source_dict['node_n_train']
@@ -211,28 +211,28 @@ def get_uncertainty_value(self, source):
211211
# every node template has its own fitted rate rule by definition, but here we use the
212212
# number of training reactions as an approximation of the node's specificity/generality
213213
# and add a penalty for being too general (large # of training reactions)
214-
dlnk += np.log10(N + 1) * self.dlnk_nonexact
214+
varlnk += (np.log10(N + 1) * self.dlnk_nonexact) ** 2
215215
else:
216216
# Handle hand-made trees
217217
if not exact:
218218
# nonexactness contribution increases as N increases
219-
dlnk += np.log10(N + 1) * self.dlnk_nonexact
219+
varlnk += (np.log10(N + 1) * self.dlnk_nonexact) ** 2
220220

221221
if 'surface' in family_label.lower():
222-
dlnk += np.sum([weight * self.dlnk_surf_rule for weight in rule_weights])
223-
dlnk += np.sum([weight * self.dlnk_surf_training for weight in training_weights])
222+
varlnk += np.sum([weight * weight * self.dlnk_surf_rule * self.dlnk_surf_rule for weight in rule_weights])
223+
varlnk += np.sum([weight * weight * self.dlnk_surf_training * self.dlnk_surf_training for weight in training_weights])
224224
else:
225225
# Add the contributions from rules
226-
dlnk += np.sum([weight * self.dlnk_rule for weight in rule_weights])
226+
varlnk += np.sum([weight * weight * self.dlnk_rule * self.dlnk_rule for weight in rule_weights])
227227
# Add the contributions from training
228228
# Even though these source from training reactions, we actually
229229
# use the uncertainty for rate rules, since these are now approximations
230230
# of the original reaction. We consider these to be independent of original the training
231231
# parameters because the rate rules may be reversing the training reactions,
232232
# which leads to more complicated dependence
233-
dlnk += np.sum([weight * self.dlnk_rule for weight in training_weights])
233+
varlnk += np.sum([weight * weight * self.dlnk_rule * self.dlnk_rule for weight in training_weights])
234234

235-
return dlnk
235+
return np.sqrt(varlnk)
236236

237237
def get_partial_uncertainty_value(self, source, corr_source_type, corr_param=None, corr_family=None):
238238
"""
@@ -442,10 +442,16 @@ def extract_sources_from_model(self):
442442
"""
443443
self.species_sources_dict = {}
444444
self.extra_species = []
445+
allowed_source_keys = {'Library', 'QM', 'GAV', 'ADS'}
445446
for species in self.species_list:
446447
if species not in self.extra_species:
447448
source = self.database.thermo.extract_source_from_comments(species)
448-
assert source.keys() <= {'Library', 'QM', 'GAV', 'ADS'}, 'Source of thermo must be either Library, QM, GAV, or ADS'
449+
unexpected_source_keys = set(source.keys()) - allowed_source_keys
450+
if unexpected_source_keys:
451+
raise ValueError(
452+
f'Source of thermo must be either Library, QM, GAV, or ADS; '
453+
f'got unexpected source keys {unexpected_source_keys} for species {species.label}'
454+
)
449455

450456
# Now prep the source data
451457
# Do not alter the GAV information, but reassign QM and Library sources to the species indices that they came from
@@ -501,7 +507,11 @@ def extract_sources_from_model(self):
501507
elif len(source) == 3:
502508
# combination of adsorption correction, GAV (radical), and Library/ML
503509

504-
assert species.contains_surface_site(), 'only surface species should have 3 sources: adsorption correction, GAV, library/ML'
510+
if not species.contains_surface_site():
511+
raise ValueError(
512+
f'Only surface species should have 3 thermo sources (adsorption correction, GAV, and library/QM); '
513+
f'got species={species.label}, source={source}'
514+
)
505515

506516
# retrieve the desorbed version of the surface species-- the thing the adsorption correction was applied to during thermo estimation
507517
dummy_gas_species = Species()

test/rmgpy/tools/uncertaintyTest.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -169,13 +169,12 @@ def test_uncertainty_assignment(self):
169169

170170
np.testing.assert_allclose(
171171
thermo_unc,
172-
[1.5, 1.5, 2.0, 1.9, 3.1, 1.5, 1.9, 2.0, 2.0, 1.9, 2.2, 1.9, 2.0, 1.5, 3.1, 1.9, 1.5, 2.0, 1.7, 1.8, 1.8, 1.9, 1.8, 1.9, 1.9],
172+
[1.5, 1.5, 2.61966, 2.51994, 2.23886, 1.5, 2.30761, 2.41611, 2.61966, 2.51994, 2.61966, 2.51994, 2.61966, 1.5, 2.23886, 2.30761, 1.5, 2.61966, 2.07366, 2.19376, 2.19376, 2.30761, 1.94616, 2.07366, 2.07366],
173173
rtol=1e-4,
174174
)
175175
np.testing.assert_allclose(
176176
kinetic_unc,
177-
# the 7.81 value comes from the SIDT tree node uncertainty: 5.756 + non-exact penalty for N=1: log10(1+1) * 3.5 + family uncertainty: 1.0
178-
[0.5, 1.5, 3.169924, 3.169924, 2.553605, 0.5, 2.0, 7.81, 7.81, 0.5],
177+
[0.5, 1.118, 1.9783, 1.9783, 1.5363, 0.5, 2.0, 5.9369, 5.9369, 0.5],
179178
rtol=1e-4
180179
)
181180

@@ -185,13 +184,13 @@ def test_specific_species_uncertainties(self):
185184
"""
186185

187186
expected_results = { # order is (total_uncertainty, [group_names], [group_counts])
188-
'CCCC': (1.9, ['Cs-CsCsHH', 'Cs-CsHHH'], [2, 2]),
189-
'CCCCCCCCCC': (2.5, ['Cs-CsCsHH', 'Cs-CsHHH'], [8, 2]),
190-
'CC(OO)CC': (2.1, ['O2s-OsCs', 'O2s-OsH', 'Cs-CsCsOsH', 'Cs-CsCsHH', 'Cs-CsHHH'], [1, 1, 1, 1, 2]),
191-
'C=NCC': (1.9, ['N3d-CdCs', 'Cs-(N3dCd)CsHH', 'Cs-CsHHH', 'Cd-N3dHH'], [1, 1, 1, 1]),
192-
'C=C': (1.7, ['Cds-CdsHH'], [2]),
193-
'C*': (10.018, ['CH3'], [1]), # Gas library + radical + adsorption correction
194-
'O=[CH]*': (8.618, ['Cds-OdHH', 'HCdsJO'], [1, 1]), # GAV + radical + adsorption correction
187+
'CCCC': (2.5199409675625337, ['Cs-CsCsHH', 'Cs-CsHHH'], [2, 2]),
188+
'CCCCCCCCCC': (6.091048438487417, ['Cs-CsCsHH', 'Cs-CsHHH'], [8, 2]),
189+
'CC(OO)CC': (2.5199409675625337, ['O2s-OsCs', 'O2s-OsH', 'Cs-CsCsOsH', 'Cs-CsCsHH', 'Cs-CsHHH'], [1, 1, 1, 1, 2]),
190+
'C=NCC': (2.07365649035707, ['N3d-CdCs', 'Cs-(N3dCd)CsHH', 'Cs-CsHHH', 'Cd-N3dHH'], [1, 1, 1, 1]),
191+
'C=C': (2.07365649035707, ['Cds-CdsHH'], [2]),
192+
'C*': (7.271261019245562, ['CH3'], [1]), # Gas library + radical + adsorption correction
193+
'O=[CH]*': (7.150786643440007, ['Cds-OdHH', 'HCdsJO'], [1, 1]), # GAV + radical + adsorption correction
195194
}
196195

197196
uncertainty = rmgpy.tools.uncertainty.Uncertainty()

0 commit comments

Comments
 (0)