Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 36 additions & 19 deletions ipython/local_uncertainty.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,29 @@
"\n",
"$$\\Delta G = \\frac{1}{\\sqrt{12}}(G_{max} - G_{min})$$\n",
"\n",
"Several parameters are used to formulate $\\Delta G$. These are $\\Delta G_\\mathrm{library}$, $\\Delta G_\\mathrm{QM}$, $\\Delta G_\\mathrm{GAV}$, and $\\Delta _\\mathrm{group}$.\n",
"But first, to formulate $G$, we need to compile all the uncertainty sources. These are $ G_\\mathrm{library}$, $ G_\\mathrm{QM}$, $ G_\\mathrm{GAV}$, and $G _\\mathrm{group}$. We treat each as continuous random variable and add them to get the species uncertainty.\n",
" \n",
"$$\\Delta G = \\delta_\\mathrm{library} \\Delta G_\\mathrm{library} + \\delta_\\mathrm{QM} \\Delta G_\\mathrm{QM} + \\delta_\\mathrm{GAV} \\left( \\Delta G_\\mathrm{GAV} + \\sum_{\\mathrm{group}\\; j} d_{j} \\Delta G_{\\mathrm{group},j} \\right)$$\n",
"$$G = \\delta_\\mathrm{library} G_\\mathrm{library} + \\delta_\\mathrm{QM} G_\\mathrm{QM} + \\delta_\\mathrm{GAV} \\left( G_\\mathrm{GAV} + \\sum_{\\mathrm{group}\\; j} d_{j} G_{\\mathrm{group},j} \\right)$$\n",
"\n",
"Here $\\delta$ is the Kronecker delta function which equals one if the species thermochemistry parameter contains the particular source type and $d_{j}$ is the degeneracy (number of appearances) of the thermo group used to construct the species thermochemistry in the group additivity method.\n",
"\n",
"To compute $\\Delta G$, we'll use the property that the variance of the sum of two random variables, $X$ and $Y$, is:\n",
"$$Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$$\n",
"\n",
"If we assume that each uncertainty source is uncorrelated with the others, then the covariance term drops out:\n",
"\n",
"$$Var(X+Y)=Var(X)+Var(Y), \\ \\ \\ \\ \\ X \\neq Y$$\n",
"\n",
"And we can compute the variance of $G$:\n",
"\n",
"$$Var(G) = \\delta_\\mathrm{library} Var(G_\\mathrm{library}) + \\delta_\\mathrm{QM} Var(G_\\mathrm{QM})+\\delta_\\mathrm{GAV} \\left(Var(G_\\mathrm{GAV}) +\\sum_{group\\ j} d_j^2 Var(G_\\mathrm{group\\ j}) \\right)$$\n",
"\n",
"\n",
"The standard deviation $\\Delta G$ is then the square root of that (this is an example of adding in quadrature):\n",
"$$\\Delta(G) = \\sqrt{\\delta_\\mathrm{library} Var(G_\\mathrm{library}) + \\delta_\\mathrm{QM} Var(G_\\mathrm{QM})+\\delta_\\mathrm{GAV} \\left(Var(G_\\mathrm{GAV}) +\\sum_{group\\ j} d_j^2 Var(G_\\mathrm{group\\ j}) \\right)}$$\n",
"\n",
"\n",
"\n",
"where $\\delta$ is the Kronecker delta function which equals one if the species thermochemistry parameter contains the particular source type and $d_{j}$ is the degeneracy (number of appearances) of the thermo group used to construct the species thermochemistry in the group additivity method.\n",
"\n",
"### Kinetics Uncertainty\n",
"\n",
Expand All @@ -130,13 +148,20 @@
"\n",
"$$\\Delta \\ln(k) = \\frac{1}{\\sqrt{12}}(\\ln k_{max} - \\ln k_{min})$$\n",
"\n",
"The parameters used to formulate $\\Delta \\ln k$ are $\\Delta \\ln k_\\mathrm{library}$, $\\Delta \\ln k_\\mathrm{training}$, $\\Delta \\ln k_\\mathrm{pdep}$, $\\Delta \\ln k_\\mathrm{family}$, $\\Delta \\ln k_\\mathrm{non-exact}$, and $\\Delta \\ln k_\\mathrm{rule}$.\n",
"The sources used to formulate $ \\ln k$ are $ \\ln k_\\mathrm{library}$, $ \\ln k_\\mathrm{training}$, $ \\ln k_\\mathrm{pdep}$, $ \\ln k_\\mathrm{family}$, $ \\ln k_\\mathrm{non-exact}$, and $ \\ln k_\\mathrm{rule}$.\n",
"\n",
"For exact training data matches and library and pdep reactions, the kinetic uncertainty is assigned according to their uncertainty type. For kinetics estimated using RMG's decision trees, the following formula is used to calculate the uncertainty:\n",
"\n",
"For library, training, and pdep reactions, the kinetic uncertainty is assigned according to their uncertainty type. For kinetics estimated using RMG's rate rules, the following formula is used to calculate the uncertainty:\n",
"$$ \\ln k_\\mathrm{rate\\; rules} = \\ln k_\\mathrm{family} + \\log_{10}(N+1) \\left(\\ln k_\\mathrm{non-exact}\\right) + \\sum_{\\mathrm{rule}\\; i} w_i \\ln k_{\\mathrm{rule},i}$$\n",
"\n",
"$$\\Delta \\ln k_\\mathrm{rate\\; rules} = \\Delta\\ln k_\\mathrm{family} + \\log_{10}(N+1) \\left(\\Delta\\ln k_\\mathrm{non-exact}\\right) + \\sum_{\\mathrm{rule}\\; i} w_i \\Delta \\ln k_{\\mathrm{rule},i}$$\n",
"where N is the total number of rate rules or training reactions used and $w_{i}$ is the weight of the rate rule or training reaction in the averaging scheme for that kinetics estimate.\n",
"\n",
"where N is the total number of rate rules used and $w_{i}$ is the weight of the rate rule in the averaging scheme for that kinetics estimate. "
"Once again we use the property of adding variances with the assumption that sources are uncorrelated to each other. The standard deviation then adds in quadrature.\n",
"\n",
"$$\\Delta\\ln k_\\mathrm{rate\\; rules} = \\sqrt{(\\Delta \\ln k_\\mathrm{family})^2 + \\left(\\log_{10}(N+1) \\right)^2\\left(\\Delta \\ln k_\\mathrm{non-exact}\\right)^2 + \\sum_{\\mathrm{rule}\\; i} w_i^2 (\\Delta \\ln k_{\\mathrm{rule},i})^2}$$\n",
"\n",
Comment thread
sevyharris marked this conversation as resolved.
"\n",
"\n"
]
},
{
Expand Down Expand Up @@ -178,9 +203,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"result = uncertainty.local_analysis(sensitive_species, correlated=False, number=5, fileformat='.png')\n",
Expand All @@ -190,9 +213,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"# Show the uncertainty plots\n",
Expand Down Expand Up @@ -230,9 +251,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"uncertainty.assign_parameter_uncertainties(correlated=True)\n",
Expand All @@ -243,9 +262,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [],
"source": [
"# Show the uncertainty plots\n",
Expand Down
6 changes: 4 additions & 2 deletions rmgpy/data/thermo.py
Original file line number Diff line number Diff line change
Expand Up @@ -2716,11 +2716,13 @@ def extract_source_from_comments(self, species):
Parses the verbose string of comments from the thermo data of the species object,
and extracts the thermo sources.

Returns a dictionary with keys of either 'Library', 'QM', and/or 'GAV'.
Returns a dictionary with keys of 'Library', 'QM', 'ADS', and/or 'GAV'.
Commonly, species thermo are estimated using only one of these sources.
However, a radical can be estimated with more than one type of source, for
instance a saturated library value and a GAV HBI correction, or a QM saturated value
and a GAV HBI correction.
and a GAV HBI correction. Adsorbates can be estimated using a single library
for the adsorbate or a combination of a gas phase library for the
gas phase portion and an adsorption correction.

source = {'Library': String_Name_of_Library_Used,
'QM': String_of_Method_Used,
Expand Down
64 changes: 37 additions & 27 deletions rmgpy/tools/uncertainty.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class ThermoParameterUncertainty(object):
This class is an engine that generates the species uncertainty based on its thermo sources.
"""

def __init__(self, dG_library=1.5, dG_QM=3.0, dG_GAV=1.5, dG_group=0.1, dG_ADS_correction=6.918, dG_surf_lib=6.918):
def __init__(self, dG_library=1.5, dG_QM=3.0, dG_GAV=1.5, dG_group=0.7159, dG_ADS_correction=6.918, dG_surf_lib=6.918):
Comment thread
sevyharris marked this conversation as resolved.
Comment thread
sevyharris marked this conversation as resolved.
"""
Initialize the different uncertainties dG_library, dG_QM, dG_GAV, and dG_other with set values
in units of kcal/mol.
Expand All @@ -62,22 +62,22 @@ def get_uncertainty_value(self, source):
"""
Retrieve the uncertainty value in kcal/mol when the source of the thermo of a species is given.
"""
dG = 0.0
varG = 0.0
if 'Library' in source:
dG += self.dG_library
varG += self.dG_library * self.dG_library
if 'Surface_Library' in source:
dG += self.dG_surf_lib
varG += self.dG_surf_lib * self.dG_surf_lib
if 'QM' in source:
dG += self.dG_QM
varG += self.dG_QM * self.dG_QM
if 'GAV' in source:
dG += self.dG_GAV # Add a fixed uncertainty for the GAV method
varG += self.dG_GAV * self.dG_GAV # Add a fixed uncertainty for the GAV method
for group_type, group_entries in source['GAV'].items():
group_weights = [groupTuple[-1] for groupTuple in group_entries]
dG += np.sum([weight * self.dG_group for weight in group_weights])
varG += np.sum([weight * weight * self.dG_group * self.dG_group for weight in group_weights])
if 'ADS' in source:
dG += self.dG_ADS_correction # Add adsorption correction uncertainty
varG += self.dG_ADS_correction * self.dG_ADS_correction # Add adsorption correction uncertainty

return dG
return np.sqrt(varG)

def get_partial_uncertainty_value(self, source, corr_source_type, corr_param=None, corr_group_type=None):
"""
Expand Down Expand Up @@ -170,39 +170,39 @@ def get_uncertainty_value(self, source):
"""
Retrieve the dlnk uncertainty when the source of the reaction kinetics are given
"""
dlnk = 0.0
varlnk = 0.0
if 'Library' in source:
# Should be a single library reaction source
dlnk += self.dlnk_library
varlnk += self.dlnk_library * self.dlnk_library
elif 'Surface_Library' in source:
# Should be a single library reaction source
dlnk += self.dlnk_surf_library
varlnk += self.dlnk_surf_library * self.dlnk_surf_library
elif 'PDep' in source:
# Should be a single pdep reaction source
dlnk += self.dlnk_pdep
varlnk += self.dlnk_pdep * self.dlnk_pdep
elif 'Training' in source:
# Should be a single training reaction
# Although some training entries may be used in reverse,
# We still consider the kinetics to be directly dependent
# We still consider the kinetics to be directly dependent
if 'surface' in source['Training'][0].lower():
dlnk += self.dlnk_surf_training
varlnk += self.dlnk_surf_training * self.dlnk_surf_training
else:
dlnk += self.dlnk_training
varlnk += self.dlnk_training * self.dlnk_training
elif 'Rate Rules' in source:
family_label = source['Rate Rules'][0]
source_dict = source['Rate Rules'][1]
exact = source_dict['exact']
rule_weights = [ruleTuple[-1] for ruleTuple in source_dict['rules']]
training_weights = [trainingTuple[-1] for trainingTuple in source_dict['training']]

dlnk += self.dlnk_family
varlnk += self.dlnk_family * self.dlnk_family

N = len(rule_weights) + len(training_weights)
if 'node_std_dev' in source_dict:
# Handle autogen BM trees
if source_dict['node_std_dev'] < 0:
raise ValueError('Invalid value for std dev of kinetics family rule node')
dlnk += source_dict['node_std_dev']
varlnk += np.float_power(source_dict['node_std_dev'], 2.0)
if source_dict['node_n_train'] is None:
raise ValueError('Invalid number of training reactions for kinetics family rule node')
N = source_dict['node_n_train']
Expand All @@ -211,28 +211,28 @@ def get_uncertainty_value(self, source):
# every node template has its own fitted rate rule by definition, but here we use the
# number of training reactions as an approximation of the node's specificity/generality
# and add a penalty for being too general (large # of training reactions)
dlnk += np.log10(N + 1) * self.dlnk_nonexact
varlnk += (np.log10(N + 1) * self.dlnk_nonexact) ** 2
else:
# Handle hand-made trees
if not exact:
# nonexactness contribution increases as N increases
dlnk += np.log10(N + 1) * self.dlnk_nonexact
varlnk += (np.log10(N + 1) * self.dlnk_nonexact) ** 2

if 'surface' in family_label.lower():
dlnk += np.sum([weight * self.dlnk_surf_rule for weight in rule_weights])
dlnk += np.sum([weight * self.dlnk_surf_training for weight in training_weights])
varlnk += np.sum([weight * weight * self.dlnk_surf_rule * self.dlnk_surf_rule for weight in rule_weights])
varlnk += np.sum([weight * weight * self.dlnk_surf_training * self.dlnk_surf_training for weight in training_weights])
else:
# Add the contributions from rules
dlnk += np.sum([weight * self.dlnk_rule for weight in rule_weights])
varlnk += np.sum([weight * weight * self.dlnk_rule * self.dlnk_rule for weight in rule_weights])
# Add the contributions from training
# Even though these source from training reactions, we actually
# use the uncertainty for rate rules, since these are now approximations
# of the original reaction. We consider these to be independent of original the training
# parameters because the rate rules may be reversing the training reactions,
# which leads to more complicated dependence
dlnk += np.sum([weight * self.dlnk_rule for weight in training_weights])
varlnk += np.sum([weight * weight * self.dlnk_rule * self.dlnk_rule for weight in training_weights])

return dlnk
return np.sqrt(varlnk)

def get_partial_uncertainty_value(self, source, corr_source_type, corr_param=None, corr_family=None):
"""
Expand Down Expand Up @@ -442,10 +442,16 @@ def extract_sources_from_model(self):
"""
self.species_sources_dict = {}
self.extra_species = []
allowed_source_keys = {'Library', 'QM', 'GAV', 'ADS'}
for species in self.species_list:
if species not in self.extra_species:
source = self.database.thermo.extract_source_from_comments(species)
assert source.keys() <= {'Library', 'QM', 'GAV', 'ADS'}, 'Source of thermo must be either Library, QM, GAV, or ADS'
unexpected_source_keys = set(source.keys()) - allowed_source_keys
if unexpected_source_keys:
raise ValueError(
f'Source of thermo must be either Library, QM, GAV, or ADS; '
f'got unexpected source keys {unexpected_source_keys} for species {species.label}'
)

# Now prep the source data
# Do not alter the GAV information, but reassign QM and Library sources to the species indices that they came from
Expand Down Expand Up @@ -501,7 +507,11 @@ def extract_sources_from_model(self):
elif len(source) == 3:
# combination of adsorption correction, GAV (radical), and Library/ML

assert species.contains_surface_site(), 'only surface species should have 3 sources: adsorption correction, GAV, library/ML'
if not species.contains_surface_site():
raise ValueError(
f'Only surface species should have 3 thermo sources (adsorption correction, GAV, and library/QM); '
f'got species={species.label}, source={source}'
)

# retrieve the desorbed version of the surface species-- the thing the adsorption correction was applied to during thermo estimation
dummy_gas_species = Species()
Expand Down
19 changes: 9 additions & 10 deletions test/rmgpy/tools/uncertaintyTest.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,13 +169,12 @@ def test_uncertainty_assignment(self):

np.testing.assert_allclose(
thermo_unc,
[1.5, 1.5, 2.0, 1.9, 3.1, 1.5, 1.9, 2.0, 2.0, 1.9, 2.2, 1.9, 2.0, 1.5, 3.1, 1.9, 1.5, 2.0, 1.7, 1.8, 1.8, 1.9, 1.8, 1.9, 1.9],
[1.5, 1.5, 2.61966, 2.51994, 2.23886, 1.5, 2.30761, 2.41611, 2.61966, 2.51994, 2.61966, 2.51994, 2.61966, 1.5, 2.23886, 2.30761, 1.5, 2.61966, 2.07366, 2.19376, 2.19376, 2.30761, 1.94616, 2.07366, 2.07366],
rtol=1e-4,
)
np.testing.assert_allclose(
kinetic_unc,
# the 7.81 value comes from the SIDT tree node uncertainty: 5.756 + non-exact penalty for N=1: log10(1+1) * 3.5 + family uncertainty: 1.0
[0.5, 1.5, 3.169924, 3.169924, 2.553605, 0.5, 2.0, 7.81, 7.81, 0.5],
[0.5, 1.118, 1.9783, 1.9783, 1.5363, 0.5, 2.0, 5.9369, 5.9369, 0.5],
rtol=1e-4
)

Expand All @@ -185,13 +184,13 @@ def test_specific_species_uncertainties(self):
"""

expected_results = { # order is (total_uncertainty, [group_names], [group_counts])
'CCCC': (1.9, ['Cs-CsCsHH', 'Cs-CsHHH'], [2, 2]),
'CCCCCCCCCC': (2.5, ['Cs-CsCsHH', 'Cs-CsHHH'], [8, 2]),
'CC(OO)CC': (2.1, ['O2s-OsCs', 'O2s-OsH', 'Cs-CsCsOsH', 'Cs-CsCsHH', 'Cs-CsHHH'], [1, 1, 1, 1, 2]),
'C=NCC': (1.9, ['N3d-CdCs', 'Cs-(N3dCd)CsHH', 'Cs-CsHHH', 'Cd-N3dHH'], [1, 1, 1, 1]),
'C=C': (1.7, ['Cds-CdsHH'], [2]),
'C*': (10.018, ['CH3'], [1]), # Gas library + radical + adsorption correction
'O=[CH]*': (8.618, ['Cds-OdHH', 'HCdsJO'], [1, 1]), # GAV + radical + adsorption correction
'CCCC': (2.5199409675625337, ['Cs-CsCsHH', 'Cs-CsHHH'], [2, 2]),
'CCCCCCCCCC': (6.091048438487417, ['Cs-CsCsHH', 'Cs-CsHHH'], [8, 2]),
'CC(OO)CC': (2.5199409675625337, ['O2s-OsCs', 'O2s-OsH', 'Cs-CsCsOsH', 'Cs-CsCsHH', 'Cs-CsHHH'], [1, 1, 1, 1, 2]),
'C=NCC': (2.07365649035707, ['N3d-CdCs', 'Cs-(N3dCd)CsHH', 'Cs-CsHHH', 'Cd-N3dHH'], [1, 1, 1, 1]),
'C=C': (2.07365649035707, ['Cds-CdsHH'], [2]),
'C*': (7.271261019245562, ['CH3'], [1]), # Gas library + radical + adsorption correction
'O=[CH]*': (7.150786643440007, ['Cds-OdHH', 'HCdsJO'], [1, 1]), # GAV + radical + adsorption correction
}

uncertainty = rmgpy.tools.uncertainty.Uncertainty()
Expand Down
Loading