Skip to content

Commit e802729

Browse files
committed
Change notation for diagonal basis in documentation - first try
1 parent f68bafd commit e802729

1 file changed

Lines changed: 65 additions & 28 deletions

File tree

doc/sphinx/source/n3fit/methodology.rst

Lines changed: 65 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -348,53 +348,90 @@ The figure above provides a schematic representation of this feature scaling met
348348
Diagonal basis
349349
--------------
350350

351-
Performing the training and validation split without diagonalising the :math:`t_0` covmat :math:`C_{0}` neglects
352-
any correlations that may be present between training and validation data. To remedy this,
353-
we rotate to a basis in which the correlation matrix is diagonal before performing any training/validation split.
354-
Starting from the definition of the :math:`\chi^2` function in the NNPDF methodology, we have
351+
Training and validation data are obtained by performing a random split of the
352+
original data set. However, data points in the two sets are not necessarily
353+
statistically independent, as they may be correlated through the fitting
354+
covariance matrix :math:`C_{\rm fit}`. Here the fitting covariance matrix is the
355+
sum of the :math:`t_{0}` experimental covariance matrix :math:`C_{0}` and any
356+
theory covariance matrix :math:`C_{\rm th}` used in the fit, i.e., :math:`C_{\rm
357+
fit} = C_{0} + C_{\rm th}`. In order to disentangle the training and
358+
validation data, we perform the training-validation split in a basis in which
359+
the correlation matrix is diagonal.
360+
361+
We first compute the correlation matrix :math:`\rho`, which is defined as
355362

356363
.. math::
357364
358-
\chi^2 &= (D-T)^T C_0^{-1} (D-T) \\
359-
&= (D-T)^T R^{-1} R C_0^{-1} R R^{-1} (D-T) \\
360-
&= (D-T)^T R^{-1} \left( R^{-1} C_0 R^{-1} \right)^{-1} R^{-1} (D-T) \\
361-
&\equiv \tilde{\epsilon}^T \rho^{-1} \tilde{\epsilon} \, ,
365+
\rho = \Sigma^{-1} C_{\rm fit} \Sigma^{-1} \, ,
362366
363-
where we have defined :math:`\tilde{\epsilon} \equiv R^{-1}(D-T)` and :math:`\rho = R^{-1} C_0 R^{-1}`.
367+
where we have defined :math:`\Sigma_{ij} = \sqrt{C_{\rm fit, ii}} \delta_{ij}` and
368+
:math:`(\Sigma^{-1})_{ij} = \frac{1}{\sqrt{C_{\rm fit, ii}}} \delta_{ij}`. The
369+
correlation matrix is a symmetric positive-definite matrix, and therefore it can
370+
be diagonalized by an orthogonal transformation. Therefore we can write
364371

365-
Choosing :math:`R_{ii} = \sqrt{C_{0, ii}}`, we have that :math:`R^{-1} C_0 R^{-1}` coincides with the usual definition of the correlation matrix.
372+
.. math::
373+
374+
\rho = V \Lambda V^T \, ,
375+
376+
where :math:`V` is an orthogonal matrix and :math:`\Lambda` is a diagonal matrix
377+
containing the eigenvalues of :math:`\rho`. The original fitting covariance
378+
matrix can then be written as
379+
380+
.. math::
381+
382+
C_{\rm fit} &= \Sigma \rho \Sigma \\
383+
&= (\Sigma V) \Lambda (V^T \Sigma) \\
384+
&\equiv U \Lambda U^T \, ,
366385
367-
Next, we move to the basis in which :math:`\rho` is diagonal. Writing :math:`\rho = \tilde{U}^T \tilde{\Lambda} \tilde{U}`, we find
386+
where we have defined the non-orthogonal matrix :math:`U = \Sigma V`. Its
387+
inverse defines the rotation matrix that diagonalizes the
388+
:math:`\chi^2`, and is given by :math:`R^T \equiv U^{-1} = V^T \Sigma^{-1}`.
389+
Therefore, the inverse of the fitting covariance matrix can be written as
368390

369391
.. math::
370392
371-
\chi^2 &= \tilde{\epsilon}^T \rho^{-1} \tilde{\epsilon} \\
372-
&= \tilde{\epsilon}^T (\tilde{U}^T \tilde{\Lambda} \tilde{U})^{-1} \tilde{\epsilon} \\
373-
&= \tilde{\epsilon}^T \tilde{U}^T \tilde{\Lambda}^{-1} \tilde{U} \tilde{\epsilon} \\
374-
&\equiv \tilde{\tilde{\epsilon}}^T \tilde{\Lambda}^{-1} \tilde{\tilde{\epsilon}} \, ,
393+
C_{\rm fit}^{-1} &= (U \Lambda U^T)^{-1} \\
394+
&= (U^T)^{-1} \Lambda^{-1} U^{-1} \\
395+
&= R \Lambda^{-1} R^T \, .
375396
376-
where on the last line we have defined
397+
Considering the definition of the :math:`\chi^2` function in the NNPDF
398+
methodology, we finally have
377399

378400
.. math::
379401
380-
\tilde{\tilde{\epsilon}} \equiv \tilde{U}\tilde{\epsilon} = \tilde{U}R^{-1}(D-T).
402+
\chi^2 &= (D-T)^T C_{\rm fit}^{-1} (D-T) \\
403+
&= (D-T)^T R \Lambda^{-1} R^T (D-T) \\
404+
&= \epsilon^T \Lambda^{-1} \epsilon \\
405+
&= \lVert \epsilon \rVert^2_{\Lambda^{-1}} \, ,
406+
407+
where we have defined the residuals in the diagonal basis as :math:`\epsilon \equiv R^T(D-T)` or, writing it in index notation,
408+
409+
.. math::
410+
411+
\epsilon_i = (V^T)_{ij} \frac{(D-T)_j}{\sqrt{C_{\rm fit, jj}}} \,.
412+
413+
In this basis, the :math:`\chi^2` becomes a weighted norm of the residuals,
414+
where the weights are given by the inverse of the eigenvalues of the correlation
415+
matrix.
381416

382-
In index notation, this reads
417+
The transformed data :math:`\epsilon` are statistically independent in the
418+
diagonal basis of the correlation matrix :math:`\rho`. As a crosscheck, we
419+
can compute the covariance of :math:`\epsilon`,
383420

384421
.. math::
385422
386-
\tilde{\tilde{\epsilon_i}} = \tilde{U}_{ij} \frac{(D-T)_j}{\sqrt{C_{0, jj}}}
423+
\mathbb{E}[\epsilon \epsilon^T] &= \mathbb{E}[R^T(D-T)(D-T)^T R] \\
424+
&= R^T \mathbb{E}[(D-T)(D-T)^T] R \\
425+
&= R^T C_{\rm fit} R \\
426+
&= R^T U \Lambda U^T R \\
427+
&= \Lambda \,,
387428
388-
The transformed data :math:`\tilde{\tilde{\epsilon}}` is statistically independent in the diagonal basis of the correlation matrix :math:`\rho`.
389-
Computing the covariance of :math:`\tilde{\tilde{\epsilon}}`,
429+
where we have used the fact that :math:`R^T U = I` and the assumption that the
430+
data are distributed according to the fitting covariance matrix :math:`C_{\rm fit}`
390431

391432
.. math::
392433
393-
\mathbb{E}[\tilde{\tilde{\epsilon}}\tilde{\tilde{\epsilon}}^T]
394-
&= \mathbb{E} \big[ (\tilde{U} R^{-1}(D-T)) (\tilde{U} R^{-1}(D-T))^T \big] \\
395-
&= \tilde{U} R^{-1} \mathbb{E}[(D-T)(D-T)^T] R^{-1} \tilde{U}^T \\
396-
&= \tilde{U} \rho \tilde{U}^T \\
397-
&= \tilde{U}\tilde{U}^T \tilde{\Lambda} \tilde{U}\tilde{U}^T \\
398-
&= \tilde{\Lambda} \, ,
434+
\mathbb{E}[(D-T)(D-T)^T] = C_{\rm fit} \, .
399435
400-
we find that it is diagonal, which demonstrates that the training/validation data are statistically independent indeed.
436+
This shows that the correlation is indeed diagonal, and demonstrates that the
437+
training/validation data are uncorrelated.

0 commit comments

Comments
 (0)