Skip to content

Commit 3acc2ee

Browse files
authored
docs: add method documentation
Signed-off-by: Athan <kgryte@gmail.com>
1 parent 69bd1a1 commit 3acc2ee

1 file changed

Lines changed: 103 additions & 0 deletions

File tree

  • lib/node_modules/@stdlib/stats/strided/dpcorrwd/src

lib/node_modules/@stdlib/stats/strided/dpcorrwd/src/main.c

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,109 @@ double API_SUFFIX(stdlib_strided_dpcorrwd)( const CBLAS_INT N, const double *X,
4040
/**
4141
* Computes the sample Pearson product-moment correlation coefficient of two double-precision floating-point strided arrays using Welford's algorithm and alternative indexing semantics.
4242
*
43+
* ## Method
44+
*
45+
* - We begin by defining the co-moment \\(C_n\\)
46+
*
47+
* ```tex
48+
* C_n = \sum_{i=1}^{n} ( x_i - \bar{x}_n ) ( y_i - \bar{y}_n )
49+
* ```
50+
*
51+
* where \\(\bar{x}_n\\) and \\(\bar{y}_n\\) are the sample means for \\(x\\) and \\(y\\), respectively.
52+
*
53+
* - Based on Welford's method, we know the update formulas for the sample means are given by
54+
*
55+
* ```tex
56+
* \bar{x}_n = \bar{x}_{n-1} + \frac{x_n - \bar{x}_{n-1}}{n}
57+
* ```
58+
*
59+
* and
60+
*
61+
* ```tex
62+
* \bar{y}_n = \bar{y}_{n-1} + \frac{y_n - \bar{y}_{n-1}}{n}
63+
* ```
64+
*
65+
* - Substituting into the equation for \\(C_n\\) and rearranging terms
66+
*
67+
* ```tex
68+
* C_n = C_{n-1} + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})
69+
* ```
70+
*
71+
* where the apparent asymmetry arises from
72+
*
73+
* ```tex
74+
* x_n - \bar{x}_n = \frac{n-1}{n} (x_n - \bar{x}_{n-1})
75+
* ```
76+
*
77+
* and, hence, the update term can be equivalently expressed
78+
*
79+
* ```tex
80+
* \frac{n-1}{n} (x_n - \bar{x}_{n-1}) (y_n - \bar{y}_{n-1})
81+
* ```
82+
*
83+
* - The covariance can be defined
84+
*
85+
* ```tex
86+
* \begin{align*}
87+
* \operatorname{cov}_n(x,y) &= \frac{C_n}{n} \\
88+
* &= \frac{C_{n-1} + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n} \\
89+
* &= \frac{(n-1)\operatorname{cov}_{n-1}(x,y) + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n}
90+
* \end{align*}
91+
* ```
92+
*
93+
* - Applying Bessel's correction, we arrive at an update formula for calculating an unbiased sample covariance
94+
*
95+
* ```tex
96+
* \begin{align*}
97+
* \operatorname{cov}_n(x,y) &= \frac{n}{n-1}\cdot\frac{(n-1)\operatorname{cov}_{n-1}(x,y) + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n} \\
98+
* &= \operatorname{cov}_{n-1}(x,y) + \frac{(x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n-1} \\
99+
* &= \frac{C_{n-1} + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n-1}
100+
* &= \frac{C_{n-1} + (x_n - \bar{x}_{n-1}) (y_n - \bar{y}_n)}{n-1}
101+
* \end{align*}
102+
* ```
103+
*
104+
* - To calculate the corrected sample standard deviation, we can use Welford's method, which can be derived as follows. We can express the variance as
105+
*
106+
* ```tex
107+
* \begin{align*}
108+
* S_n &= n \sigma_n^2 \\
109+
* &= \sum_{i=1}^{n} (x_i - \mu_n)^2 \\
110+
* &= \biggl(\sum_{i=1}^{n} x_i^2 \biggr) - n\mu_n^2
111+
* \end{align*}
112+
* ```
113+
*
114+
* Accordingly,
115+
*
116+
* ```tex
117+
* \begin{align*}
118+
* S_n - S_{n-1} &= \sum_{i=1}^{n} x_i^2 - n\mu_n^2 - \sum_{i=1}^{n-1} x_i^2 + (n-1)\mu_{n-1}^2 \\
119+
* &= x_n^2 - n\mu_n^2 + (n-1)\mu_{n-1}^2 \\
120+
* &= x_n^2 - \mu_{n-1}^2 + n(\mu_{n-1}^2 - \mu_n^2) \\
121+
* &= x_n^2 - \mu_{n-1}^2 + n(\mu_{n-1} - \mu_n)(\mu_{n-1} + \mu_n) \\
122+
* &= x_n^2 - \mu_{n-1}^2 + (\mu_{n-1} - x_n)(\mu_{n-1} + \mu_n) \\
123+
* &= x_n^2 - \mu_{n-1}^2 + \mu_{n-1}^2 - x_n\mu_n - x_n\mu_{n-1} + \mu_n\mu_{n-1} \\
124+
* &= x_n^2 - x_n\mu_n - x_n\mu_{n-1} + \mu_n\mu_{n-1} \\
125+
* &= (x_n - \mu_{n-1})(x_n - \mu_n) \\
126+
* &= S_{n-1} + (x_n - \mu_{n-1})(x_n - \mu_n)
127+
* \end{align*}
128+
* ```
129+
*
130+
* where we use the identity
131+
*
132+
* ```tex
133+
* x_n - \mu_{n-1} = n (\mu_n - \mu_{n-1})
134+
* ```
135+
*
136+
* - To compute the corrected sample standard deviation, we apply Bessel's correction and take the square root.
137+
*
138+
* - The sample Pearson product-moment correlation coefficient can thus be calculated as
139+
*
140+
* ```tex
141+
* r = \frac{\operatorname{cov}_n(x,y)}{\sigma_x \sigma_y}
142+
* ```
143+
*
144+
* where \\(\sigma_x\\) and \\(\sigma_y\\) are the corrected sample standard deviations for \\(x\\) and \\(y\\), respectively.
145+
*
43146
* @param N number of indexed elements
44147
* @param X first input array
45148
* @param strideX stride length of `X`

0 commit comments

Comments
 (0)