@@ -348,53 +348,90 @@ The figure above provides a schematic representation of this feature scaling met
348348Diagonal basis
349349--------------
350350
351- Performing the training and validation split without diagonalising the :math: `t_0 ` covmat :math: `C_{0 }` neglects
352- any correlations that may be present between training and validation data. To remedy this,
353- we rotate to a basis in which the correlation matrix is diagonal before performing any training/validation split.
354- Starting from the definition of the :math: `\chi ^2 ` function in the NNPDF methodology, we have
351+ Training and validation data are obtained by performing a random split of the
352+ original data set. However, data points in the two sets are not necessarily
353+ statistically independent, as they may be correlated through the fitting
354+ covariance matrix :math: `C_{\rm fit}`. Here the fitting covariance matrix is the
355+ sum of the :math: `t_{0 }` experimental covariance matrix :math: `C_{0 }` and any
356+ theory covariance matrix :math: `C_{\rm th}` used in the fit, i.e., :math: `C_{\rm
357+ fit} = C_{0 } + C_{\rm th}`. In order to disentangle the training and
358+ validation data, we perform the training-validation split in a basis in which
359+ the correlation matrix is diagonal.
360+
361+ We first compute the correlation matrix :math: `\rho `, which is defined as
355362
356363.. math ::
357364
358- \chi ^2 &= (D-T)^T C_0 ^{-1 } (D-T) \\
359- &= (D-T)^T R^{-1 } R C_0 ^{-1 } R R^{-1 } (D-T) \\
360- &= (D-T)^T R^{-1 } \left ( R^{-1 } C_0 R^{-1 } \right )^{-1 } R^{-1 } (D-T) \\
361- &\equiv \tilde {\epsilon }^T \rho ^{-1 } \tilde {\epsilon } \, ,
365+ \rho = \Sigma ^{-1 } C_{\rm fit} \Sigma ^{-1 } \, ,
362366
363- where we have defined :math: `\tilde {\epsilon } \equiv R^{-1 }(D-T)` and :math: `\rho = R^{-1 } C_0 R^{-1 }`.
367+ where we have defined :math: `\Sigma _{ij} = \sqrt {C_{\rm fit, ii}} \delta _{ij}` and
368+ :math: `(\Sigma ^{-1 })_{ij} = \frac {1 }{\sqrt {C_{\rm fit, ii}}} \delta _{ij}`. The
369+ correlation matrix is a symmetric positive-definite matrix, and therefore it can
370+ be diagonalized by an orthogonal transformation. Therefore we can write
364371
365- Choosing :math: `R_{ii} = \sqrt {C_{0 , ii}}`, we have that :math: `R^{-1 } C_0 R^{-1 }` coincides with the usual definition of the correlation matrix.
372+ .. math ::
373+
374+ \rho = V \Lambda V^T \, ,
375+
376+ where :math: `V` is an orthogonal matrix and :math: `\Lambda ` is a diagonal matrix
377+ containing the eigenvalues of :math: `\rho `. The original fitting covariance
378+ matrix can then be written as
379+
380+ .. math ::
381+
382+ C_{\rm fit} &= \Sigma \rho \Sigma \\
383+ &= (\Sigma V) \Lambda (V^T \Sigma ) \\
384+ &\equiv U \Lambda U^T \, ,
366385
367- Next, we move to the basis in which :math: `\rho ` is diagonal. Writing :math: `\rho = \tilde {U}^T \tilde {\Lambda } \tilde {U}`, we find
386+ where we have defined the non-orthogonal matrix :math: `U = \Sigma V`. Its
387+ inverse defines the rotation matrix that diagonalizes the
388+ :math: `\chi ^2 `, and is given by :math: `R^T \equiv U^{-1 } = V^T \Sigma ^{-1 }`.
389+ Therefore, the inverse of the fitting covariance matrix can be written as
368390
369391.. math ::
370392
371- \chi ^2 &= \tilde {\epsilon }^T \rho ^{-1 } \tilde {\epsilon } \\
372- &= \tilde {\epsilon }^T (\tilde {U}^T \tilde {\Lambda } \tilde {U})^{-1 } \tilde {\epsilon } \\
373- &= \tilde {\epsilon }^T \tilde {U}^T \tilde {\Lambda }^{-1 } \tilde {U} \tilde {\epsilon } \\
374- &\equiv \tilde {\tilde {\epsilon }}^T \tilde {\Lambda }^{-1 } \tilde {\tilde {\epsilon }} \, ,
393+ C_{\rm fit}^{-1 } &= (U \Lambda U^T)^{-1 } \\
394+ &= (U^T)^{-1 } \Lambda ^{-1 } U^{-1 } \\
395+ &= R \Lambda ^{-1 } R^T \, .
375396
376- where on the last line we have defined
397+ Considering the definition of the :math: `\chi ^2 ` function in the NNPDF
398+ methodology, we finally have
377399
378400.. math ::
379401
380- \tilde {\tilde {\epsilon }} \equiv \tilde {U}\tilde {\epsilon } = \tilde {U}R^{-1 }(D-T).
402+ \chi ^2 &= (D-T)^T C_{\rm fit}^{-1 } (D-T) \\
403+ &= (D-T)^T R \Lambda ^{-1 } R^T (D-T) \\
404+ &= \epsilon ^T \Lambda ^{-1 } \epsilon \\
405+ &= \lVert \epsilon \rVert ^2 _{\Lambda ^{-1 }} \, ,
406+
407+ where we have defined the residuals in the diagonal basis as :math: `\epsilon \equiv R^T(D-T)` or, writing it in index notation,
408+
409+ .. math ::
410+
411+ \epsilon _i = (V^T)_{ij} \frac {(D-T)_j}{\sqrt {C_{\rm fit, jj}}} \,.
412+
413+ In this basis, the :math: `\chi ^2 ` becomes a weighted norm of the residuals,
414+ where the weights are given by the inverse of the eigenvalues of the correlation
415+ matrix.
381416
382- In index notation, this reads
417+ The transformed data :math: `\epsilon ` are statistically independent in the
418+ diagonal basis of the correlation matrix :math: `\rho `. As a crosscheck, we
419+ can compute the covariance of :math: `\epsilon `,
383420
384421.. math ::
385422
386- \tilde {\tilde {\epsilon _i}} = \tilde {U}_{ij} \frac {(D-T)_j}{\sqrt {C_{0 , jj}}}
423+ \mathbb {E}[\epsilon \epsilon ^T] &= \mathbb {E}[R^T(D-T)(D-T)^T R] \\
424+ &= R^T \mathbb {E}[(D-T)(D-T)^T] R \\
425+ &= R^T C_{\rm fit} R \\
426+ &= R^T U \Lambda U^T R \\
427+ &= \Lambda \,,
387428
388- The transformed data :math: `\tilde { \tilde { \epsilon }}` is statistically independent in the diagonal basis of the correlation matrix :math: ` \rho `.
389- Computing the covariance of :math: `\tilde { \tilde { \epsilon }}`,
429+ where we have used the fact that :math: `R^T U = I` and the assumption that the
430+ data are distributed according to the fitting covariance matrix :math: `C_{ \rm fit}`
390431
391432.. math ::
392433
393- \mathbb {E}[\tilde {\tilde {\epsilon }}\tilde {\tilde {\epsilon }}^T]
394- &= \mathbb {E} \big [ (\tilde {U} R^{-1 }(D-T)) (\tilde {U} R^{-1 }(D-T))^T \big ] \\
395- &= \tilde {U} R^{-1 } \mathbb {E}[(D-T)(D-T)^T] R^{-1 } \tilde {U}^T \\
396- &= \tilde {U} \rho \tilde {U}^T \\
397- &= \tilde {U}\tilde {U}^T \tilde {\Lambda } \tilde {U}\tilde {U}^T \\
398- &= \tilde {\Lambda } \, ,
434+ \mathbb {E}[(D-T)(D-T)^T] = C_{\rm fit} \, .
399435
400- we find that it is diagonal, which demonstrates that the training/validation data are statistically independent indeed.
436+ This shows that the correlation is indeed diagonal, and demonstrates that the
437+ training/validation data are uncorrelated.
0 commit comments