diff --git a/ipython/local_uncertainty.ipynb b/ipython/local_uncertainty.ipynb index 9afc08d09f..d0d9d24a2f 100644 --- a/ipython/local_uncertainty.ipynb +++ b/ipython/local_uncertainty.ipynb @@ -114,11 +114,29 @@ "\n", "$$\\Delta G = \\frac{1}{\\sqrt{12}}(G_{max} - G_{min})$$\n", "\n", - "Several parameters are used to formulate $\\Delta G$. These are $\\Delta G_\\mathrm{library}$, $\\Delta G_\\mathrm{QM}$, $\\Delta G_\\mathrm{GAV}$, and $\\Delta _\\mathrm{group}$.\n", + "But first, to formulate $G$, we need to compile all the uncertainty sources. These are $ G_\\mathrm{library}$, $ G_\\mathrm{QM}$, $ G_\\mathrm{GAV}$, and $G _\\mathrm{group}$. We treat each as continuous random variable and add them to get the species uncertainty.\n", " \n", - "$$\\Delta G = \\delta_\\mathrm{library} \\Delta G_\\mathrm{library} + \\delta_\\mathrm{QM} \\Delta G_\\mathrm{QM} + \\delta_\\mathrm{GAV} \\left( \\Delta G_\\mathrm{GAV} + \\sum_{\\mathrm{group}\\; j} d_{j} \\Delta G_{\\mathrm{group},j} \\right)$$\n", + "$$G = \\delta_\\mathrm{library} G_\\mathrm{library} + \\delta_\\mathrm{QM} G_\\mathrm{QM} + \\delta_\\mathrm{GAV} \\left( G_\\mathrm{GAV} + \\sum_{\\mathrm{group}\\; j} d_{j} G_{\\mathrm{group},j} \\right)$$\n", + "\n", + "Here $\\delta$ is the Kronecker delta function which equals one if the species thermochemistry parameter contains the particular source type and $d_{j}$ is the degeneracy (number of appearances) of the thermo group used to construct the species thermochemistry in the group additivity method.\n", + "\n", + "To compute $\\Delta G$, we'll use the property that the variance of the sum of two random variables, $X$ and $Y$, is:\n", + "$$Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$$\n", + "\n", + "If we assume that each uncertainty source is uncorrelated with the others, then the covariance term drops out:\n", + "\n", + "$$Var(X+Y)=Var(X)+Var(Y), \\ \\ \\ \\ \\ X \\neq Y$$\n", + "\n", + "And we can compute the variance of $G$:\n", + "\n", + "$$Var(G) = \\delta_\\mathrm{library} Var(G_\\mathrm{library}) + \\delta_\\mathrm{QM} Var(G_\\mathrm{QM})+\\delta_\\mathrm{GAV} \\left(Var(G_\\mathrm{GAV}) +\\sum_{group\\ j} d_j^2 Var(G_\\mathrm{group\\ j}) \\right)$$\n", + "\n", + "\n", + "The standard deviation $\\Delta G$ is then the square root of that (this is an example of adding in quadrature):\n", + "$$\\Delta(G) = \\sqrt{\\delta_\\mathrm{library} Var(G_\\mathrm{library}) + \\delta_\\mathrm{QM} Var(G_\\mathrm{QM})+\\delta_\\mathrm{GAV} \\left(Var(G_\\mathrm{GAV}) +\\sum_{group\\ j} d_j^2 Var(G_\\mathrm{group\\ j}) \\right)}$$\n", + "\n", + "\n", "\n", - "where $\\delta$ is the Kronecker delta function which equals one if the species thermochemistry parameter contains the particular source type and $d_{j}$ is the degeneracy (number of appearances) of the thermo group used to construct the species thermochemistry in the group additivity method.\n", "\n", "### Kinetics Uncertainty\n", "\n", @@ -130,13 +148,20 @@ "\n", "$$\\Delta \\ln(k) = \\frac{1}{\\sqrt{12}}(\\ln k_{max} - \\ln k_{min})$$\n", "\n", - "The parameters used to formulate $\\Delta \\ln k$ are $\\Delta \\ln k_\\mathrm{library}$, $\\Delta \\ln k_\\mathrm{training}$, $\\Delta \\ln k_\\mathrm{pdep}$, $\\Delta \\ln k_\\mathrm{family}$, $\\Delta \\ln k_\\mathrm{non-exact}$, and $\\Delta \\ln k_\\mathrm{rule}$.\n", + "The sources used to formulate $ \\ln k$ are $ \\ln k_\\mathrm{library}$, $ \\ln k_\\mathrm{training}$, $ \\ln k_\\mathrm{pdep}$, $ \\ln k_\\mathrm{family}$, $ \\ln k_\\mathrm{non-exact}$, and $ \\ln k_\\mathrm{rule}$.\n", + "\n", + "For exact training data matches and library and pdep reactions, the kinetic uncertainty is assigned according to their uncertainty type. For kinetics estimated using RMG's decision trees, the following formula is used to calculate the uncertainty:\n", "\n", - "For library, training, and pdep reactions, the kinetic uncertainty is assigned according to their uncertainty type. For kinetics estimated using RMG's rate rules, the following formula is used to calculate the uncertainty:\n", + "$$ \\ln k_\\mathrm{rate\\; rules} = \\ln k_\\mathrm{family} + \\log_{10}(N+1) \\left(\\ln k_\\mathrm{non-exact}\\right) + \\sum_{\\mathrm{rule}\\; i} w_i \\ln k_{\\mathrm{rule},i}$$\n", "\n", - "$$\\Delta \\ln k_\\mathrm{rate\\; rules} = \\Delta\\ln k_\\mathrm{family} + \\log_{10}(N+1) \\left(\\Delta\\ln k_\\mathrm{non-exact}\\right) + \\sum_{\\mathrm{rule}\\; i} w_i \\Delta \\ln k_{\\mathrm{rule},i}$$\n", + "where N is the total number of rate rules or training reactions used and $w_{i}$ is the weight of the rate rule or training reaction in the averaging scheme for that kinetics estimate.\n", "\n", - "where N is the total number of rate rules used and $w_{i}$ is the weight of the rate rule in the averaging scheme for that kinetics estimate. " + "Once again we use the property of adding variances with the assumption that sources are uncorrelated to each other. The standard deviation then adds in quadrature.\n", + "\n", + "$$\\Delta\\ln k_\\mathrm{rate\\; rules} = \\sqrt{(\\Delta \\ln k_\\mathrm{family})^2 + \\left(\\log_{10}(N+1) \\right)^2\\left(\\Delta \\ln k_\\mathrm{non-exact}\\right)^2 + \\sum_{\\mathrm{rule}\\; i} w_i^2 (\\Delta \\ln k_{\\mathrm{rule},i})^2}$$\n", + "\n", + "\n", + "\n" ] }, { @@ -178,9 +203,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "result = uncertainty.local_analysis(sensitive_species, correlated=False, number=5, fileformat='.png')\n", @@ -190,9 +213,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# Show the uncertainty plots\n", @@ -230,9 +251,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "uncertainty.assign_parameter_uncertainties(correlated=True)\n", @@ -243,9 +262,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# Show the uncertainty plots\n", diff --git a/rmgpy/data/thermo.py b/rmgpy/data/thermo.py index 2ab242de81..ba748fa280 100644 --- a/rmgpy/data/thermo.py +++ b/rmgpy/data/thermo.py @@ -2716,11 +2716,13 @@ def extract_source_from_comments(self, species): Parses the verbose string of comments from the thermo data of the species object, and extracts the thermo sources. - Returns a dictionary with keys of either 'Library', 'QM', and/or 'GAV'. + Returns a dictionary with keys of 'Library', 'QM', 'ADS', and/or 'GAV'. Commonly, species thermo are estimated using only one of these sources. However, a radical can be estimated with more than one type of source, for instance a saturated library value and a GAV HBI correction, or a QM saturated value - and a GAV HBI correction. + and a GAV HBI correction. Adsorbates can be estimated using a single library + for the adsorbate or a combination of a gas phase library for the + gas phase portion and an adsorption correction. source = {'Library': String_Name_of_Library_Used, 'QM': String_of_Method_Used, diff --git a/rmgpy/tools/uncertainty.py b/rmgpy/tools/uncertainty.py index 4321b679db..75b150db48 100644 --- a/rmgpy/tools/uncertainty.py +++ b/rmgpy/tools/uncertainty.py @@ -43,7 +43,7 @@ class ThermoParameterUncertainty(object): This class is an engine that generates the species uncertainty based on its thermo sources. """ - def __init__(self, dG_library=1.5, dG_QM=3.0, dG_GAV=1.5, dG_group=0.1, dG_ADS_correction=6.918, dG_surf_lib=6.918): + def __init__(self, dG_library=1.5, dG_QM=3.0, dG_GAV=1.5, dG_group=0.7159, dG_ADS_correction=6.918, dG_surf_lib=6.918): """ Initialize the different uncertainties dG_library, dG_QM, dG_GAV, and dG_other with set values in units of kcal/mol. @@ -62,22 +62,22 @@ def get_uncertainty_value(self, source): """ Retrieve the uncertainty value in kcal/mol when the source of the thermo of a species is given. """ - dG = 0.0 + varG = 0.0 if 'Library' in source: - dG += self.dG_library + varG += self.dG_library * self.dG_library if 'Surface_Library' in source: - dG += self.dG_surf_lib + varG += self.dG_surf_lib * self.dG_surf_lib if 'QM' in source: - dG += self.dG_QM + varG += self.dG_QM * self.dG_QM if 'GAV' in source: - dG += self.dG_GAV # Add a fixed uncertainty for the GAV method + varG += self.dG_GAV * self.dG_GAV # Add a fixed uncertainty for the GAV method for group_type, group_entries in source['GAV'].items(): group_weights = [groupTuple[-1] for groupTuple in group_entries] - dG += np.sum([weight * self.dG_group for weight in group_weights]) + varG += np.sum([weight * weight * self.dG_group * self.dG_group for weight in group_weights]) if 'ADS' in source: - dG += self.dG_ADS_correction # Add adsorption correction uncertainty + varG += self.dG_ADS_correction * self.dG_ADS_correction # Add adsorption correction uncertainty - return dG + return np.sqrt(varG) def get_partial_uncertainty_value(self, source, corr_source_type, corr_param=None, corr_group_type=None): """ @@ -170,24 +170,24 @@ def get_uncertainty_value(self, source): """ Retrieve the dlnk uncertainty when the source of the reaction kinetics are given """ - dlnk = 0.0 + varlnk = 0.0 if 'Library' in source: # Should be a single library reaction source - dlnk += self.dlnk_library + varlnk += self.dlnk_library * self.dlnk_library elif 'Surface_Library' in source: # Should be a single library reaction source - dlnk += self.dlnk_surf_library + varlnk += self.dlnk_surf_library * self.dlnk_surf_library elif 'PDep' in source: # Should be a single pdep reaction source - dlnk += self.dlnk_pdep + varlnk += self.dlnk_pdep * self.dlnk_pdep elif 'Training' in source: # Should be a single training reaction # Although some training entries may be used in reverse, - # We still consider the kinetics to be directly dependent + # We still consider the kinetics to be directly dependent if 'surface' in source['Training'][0].lower(): - dlnk += self.dlnk_surf_training + varlnk += self.dlnk_surf_training * self.dlnk_surf_training else: - dlnk += self.dlnk_training + varlnk += self.dlnk_training * self.dlnk_training elif 'Rate Rules' in source: family_label = source['Rate Rules'][0] source_dict = source['Rate Rules'][1] @@ -195,14 +195,14 @@ def get_uncertainty_value(self, source): rule_weights = [ruleTuple[-1] for ruleTuple in source_dict['rules']] training_weights = [trainingTuple[-1] for trainingTuple in source_dict['training']] - dlnk += self.dlnk_family + varlnk += self.dlnk_family * self.dlnk_family N = len(rule_weights) + len(training_weights) if 'node_std_dev' in source_dict: # Handle autogen BM trees if source_dict['node_std_dev'] < 0: raise ValueError('Invalid value for std dev of kinetics family rule node') - dlnk += source_dict['node_std_dev'] + varlnk += np.float_power(source_dict['node_std_dev'], 2.0) if source_dict['node_n_train'] is None: raise ValueError('Invalid number of training reactions for kinetics family rule node') N = source_dict['node_n_train'] @@ -211,28 +211,28 @@ def get_uncertainty_value(self, source): # every node template has its own fitted rate rule by definition, but here we use the # number of training reactions as an approximation of the node's specificity/generality # and add a penalty for being too general (large # of training reactions) - dlnk += np.log10(N + 1) * self.dlnk_nonexact + varlnk += (np.log10(N + 1) * self.dlnk_nonexact) ** 2 else: # Handle hand-made trees if not exact: # nonexactness contribution increases as N increases - dlnk += np.log10(N + 1) * self.dlnk_nonexact + varlnk += (np.log10(N + 1) * self.dlnk_nonexact) ** 2 if 'surface' in family_label.lower(): - dlnk += np.sum([weight * self.dlnk_surf_rule for weight in rule_weights]) - dlnk += np.sum([weight * self.dlnk_surf_training for weight in training_weights]) + varlnk += np.sum([weight * weight * self.dlnk_surf_rule * self.dlnk_surf_rule for weight in rule_weights]) + varlnk += np.sum([weight * weight * self.dlnk_surf_training * self.dlnk_surf_training for weight in training_weights]) else: # Add the contributions from rules - dlnk += np.sum([weight * self.dlnk_rule for weight in rule_weights]) + varlnk += np.sum([weight * weight * self.dlnk_rule * self.dlnk_rule for weight in rule_weights]) # Add the contributions from training # Even though these source from training reactions, we actually # use the uncertainty for rate rules, since these are now approximations # of the original reaction. We consider these to be independent of original the training # parameters because the rate rules may be reversing the training reactions, # which leads to more complicated dependence - dlnk += np.sum([weight * self.dlnk_rule for weight in training_weights]) + varlnk += np.sum([weight * weight * self.dlnk_rule * self.dlnk_rule for weight in training_weights]) - return dlnk + return np.sqrt(varlnk) def get_partial_uncertainty_value(self, source, corr_source_type, corr_param=None, corr_family=None): """ @@ -442,10 +442,16 @@ def extract_sources_from_model(self): """ self.species_sources_dict = {} self.extra_species = [] + allowed_source_keys = {'Library', 'QM', 'GAV', 'ADS'} for species in self.species_list: if species not in self.extra_species: source = self.database.thermo.extract_source_from_comments(species) - assert source.keys() <= {'Library', 'QM', 'GAV', 'ADS'}, 'Source of thermo must be either Library, QM, GAV, or ADS' + unexpected_source_keys = set(source.keys()) - allowed_source_keys + if unexpected_source_keys: + raise ValueError( + f'Source of thermo must be either Library, QM, GAV, or ADS; ' + f'got unexpected source keys {unexpected_source_keys} for species {species.label}' + ) # Now prep the source data # Do not alter the GAV information, but reassign QM and Library sources to the species indices that they came from @@ -501,7 +507,11 @@ def extract_sources_from_model(self): elif len(source) == 3: # combination of adsorption correction, GAV (radical), and Library/ML - assert species.contains_surface_site(), 'only surface species should have 3 sources: adsorption correction, GAV, library/ML' + if not species.contains_surface_site(): + raise ValueError( + f'Only surface species should have 3 thermo sources (adsorption correction, GAV, and library/QM); ' + f'got species={species.label}, source={source}' + ) # retrieve the desorbed version of the surface species-- the thing the adsorption correction was applied to during thermo estimation dummy_gas_species = Species() diff --git a/test/rmgpy/tools/uncertaintyTest.py b/test/rmgpy/tools/uncertaintyTest.py index a47866eef4..14a4e26f37 100644 --- a/test/rmgpy/tools/uncertaintyTest.py +++ b/test/rmgpy/tools/uncertaintyTest.py @@ -169,13 +169,12 @@ def test_uncertainty_assignment(self): np.testing.assert_allclose( thermo_unc, - [1.5, 1.5, 2.0, 1.9, 3.1, 1.5, 1.9, 2.0, 2.0, 1.9, 2.2, 1.9, 2.0, 1.5, 3.1, 1.9, 1.5, 2.0, 1.7, 1.8, 1.8, 1.9, 1.8, 1.9, 1.9], + [1.5, 1.5, 2.61966, 2.51994, 2.23886, 1.5, 2.30761, 2.41611, 2.61966, 2.51994, 2.61966, 2.51994, 2.61966, 1.5, 2.23886, 2.30761, 1.5, 2.61966, 2.07366, 2.19376, 2.19376, 2.30761, 1.94616, 2.07366, 2.07366], rtol=1e-4, ) np.testing.assert_allclose( kinetic_unc, - # the 7.81 value comes from the SIDT tree node uncertainty: 5.756 + non-exact penalty for N=1: log10(1+1) * 3.5 + family uncertainty: 1.0 - [0.5, 1.5, 3.169924, 3.169924, 2.553605, 0.5, 2.0, 7.81, 7.81, 0.5], + [0.5, 1.118, 1.9783, 1.9783, 1.5363, 0.5, 2.0, 5.9369, 5.9369, 0.5], rtol=1e-4 ) @@ -185,13 +184,13 @@ def test_specific_species_uncertainties(self): """ expected_results = { # order is (total_uncertainty, [group_names], [group_counts]) - 'CCCC': (1.9, ['Cs-CsCsHH', 'Cs-CsHHH'], [2, 2]), - 'CCCCCCCCCC': (2.5, ['Cs-CsCsHH', 'Cs-CsHHH'], [8, 2]), - 'CC(OO)CC': (2.1, ['O2s-OsCs', 'O2s-OsH', 'Cs-CsCsOsH', 'Cs-CsCsHH', 'Cs-CsHHH'], [1, 1, 1, 1, 2]), - 'C=NCC': (1.9, ['N3d-CdCs', 'Cs-(N3dCd)CsHH', 'Cs-CsHHH', 'Cd-N3dHH'], [1, 1, 1, 1]), - 'C=C': (1.7, ['Cds-CdsHH'], [2]), - 'C*': (10.018, ['CH3'], [1]), # Gas library + radical + adsorption correction - 'O=[CH]*': (8.618, ['Cds-OdHH', 'HCdsJO'], [1, 1]), # GAV + radical + adsorption correction + 'CCCC': (2.5199409675625337, ['Cs-CsCsHH', 'Cs-CsHHH'], [2, 2]), + 'CCCCCCCCCC': (6.091048438487417, ['Cs-CsCsHH', 'Cs-CsHHH'], [8, 2]), + 'CC(OO)CC': (2.5199409675625337, ['O2s-OsCs', 'O2s-OsH', 'Cs-CsCsOsH', 'Cs-CsCsHH', 'Cs-CsHHH'], [1, 1, 1, 1, 2]), + 'C=NCC': (2.07365649035707, ['N3d-CdCs', 'Cs-(N3dCd)CsHH', 'Cs-CsHHH', 'Cd-N3dHH'], [1, 1, 1, 1]), + 'C=C': (2.07365649035707, ['Cds-CdsHH'], [2]), + 'C*': (7.271261019245562, ['CH3'], [1]), # Gas library + radical + adsorption correction + 'O=[CH]*': (7.150786643440007, ['Cds-OdHH', 'HCdsJO'], [1, 1]), # GAV + radical + adsorption correction } uncertainty = rmgpy.tools.uncertainty.Uncertainty()