@@ -40,6 +40,109 @@ double API_SUFFIX(stdlib_strided_dpcorrwd)( const CBLAS_INT N, const double *X,
4040/**
4141* Computes the sample Pearson product-moment correlation coefficient of two double-precision floating-point strided arrays using Welford's algorithm and alternative indexing semantics.
4242*
43+ * ## Method
44+ *
45+ * - We begin by defining the co-moment \\(C_n\\)
46+ *
47+ * ```tex
48+ * C_n = \sum_{i=1}^{n} ( x_i - \bar{x}_n ) ( y_i - \bar{y}_n )
49+ * ```
50+ *
51+ * where \\(\bar{x}_n\\) and \\(\bar{y}_n\\) are the sample means for \\(x\\) and \\(y\\), respectively.
52+ *
53+ * - Based on Welford's method, we know the update formulas for the sample means are given by
54+ *
55+ * ```tex
56+ * \bar{x}_n = \bar{x}_{n-1} + \frac{x_n - \bar{x}_{n-1}}{n}
57+ * ```
58+ *
59+ * and
60+ *
61+ * ```tex
62+ * \bar{y}_n = \bar{y}_{n-1} + \frac{y_n - \bar{y}_{n-1}}{n}
63+ * ```
64+ *
65+ * - Substituting into the equation for \\(C_n\\) and rearranging terms
66+ *
67+ * ```tex
68+ * C_n = C_{n-1} + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})
69+ * ```
70+ *
71+ * where the apparent asymmetry arises from
72+ *
73+ * ```tex
74+ * x_n - \bar{x}_n = \frac{n-1}{n} (x_n - \bar{x}_{n-1})
75+ * ```
76+ *
77+ * and, hence, the update term can be equivalently expressed
78+ *
79+ * ```tex
80+ * \frac{n-1}{n} (x_n - \bar{x}_{n-1}) (y_n - \bar{y}_{n-1})
81+ * ```
82+ *
83+ * - The covariance can be defined
84+ *
85+ * ```tex
86+ * \begin{align*}
87+ * \operatorname{cov}_n(x,y) &= \frac{C_n}{n} \\
88+ * &= \frac{C_{n-1} + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n} \\
89+ * &= \frac{(n-1)\operatorname{cov}_{n-1}(x,y) + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n}
90+ * \end{align*}
91+ * ```
92+ *
93+ * - Applying Bessel's correction, we arrive at an update formula for calculating an unbiased sample covariance
94+ *
95+ * ```tex
96+ * \begin{align*}
97+ * \operatorname{cov}_n(x,y) &= \frac{n}{n-1}\cdot\frac{(n-1)\operatorname{cov}_{n-1}(x,y) + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n} \\
98+ * &= \operatorname{cov}_{n-1}(x,y) + \frac{(x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n-1} \\
99+ * &= \frac{C_{n-1} + (x_n - \bar{x}_n) (y_n - \bar{y}_{n-1})}{n-1}
100+ * &= \frac{C_{n-1} + (x_n - \bar{x}_{n-1}) (y_n - \bar{y}_n)}{n-1}
101+ * \end{align*}
102+ * ```
103+ *
104+ * - To calculate the corrected sample standard deviation, we can use Welford's method, which can be derived as follows. We can express the variance as
105+ *
106+ * ```tex
107+ * \begin{align*}
108+ * S_n &= n \sigma_n^2 \\
109+ * &= \sum_{i=1}^{n} (x_i - \mu_n)^2 \\
110+ * &= \biggl(\sum_{i=1}^{n} x_i^2 \biggr) - n\mu_n^2
111+ * \end{align*}
112+ * ```
113+ *
114+ * Accordingly,
115+ *
116+ * ```tex
117+ * \begin{align*}
118+ * S_n - S_{n-1} &= \sum_{i=1}^{n} x_i^2 - n\mu_n^2 - \sum_{i=1}^{n-1} x_i^2 + (n-1)\mu_{n-1}^2 \\
119+ * &= x_n^2 - n\mu_n^2 + (n-1)\mu_{n-1}^2 \\
120+ * &= x_n^2 - \mu_{n-1}^2 + n(\mu_{n-1}^2 - \mu_n^2) \\
121+ * &= x_n^2 - \mu_{n-1}^2 + n(\mu_{n-1} - \mu_n)(\mu_{n-1} + \mu_n) \\
122+ * &= x_n^2 - \mu_{n-1}^2 + (\mu_{n-1} - x_n)(\mu_{n-1} + \mu_n) \\
123+ * &= x_n^2 - \mu_{n-1}^2 + \mu_{n-1}^2 - x_n\mu_n - x_n\mu_{n-1} + \mu_n\mu_{n-1} \\
124+ * &= x_n^2 - x_n\mu_n - x_n\mu_{n-1} + \mu_n\mu_{n-1} \\
125+ * &= (x_n - \mu_{n-1})(x_n - \mu_n) \\
126+ * &= S_{n-1} + (x_n - \mu_{n-1})(x_n - \mu_n)
127+ * \end{align*}
128+ * ```
129+ *
130+ * where we use the identity
131+ *
132+ * ```tex
133+ * x_n - \mu_{n-1} = n (\mu_n - \mu_{n-1})
134+ * ```
135+ *
136+ * - To compute the corrected sample standard deviation, we apply Bessel's correction and take the square root.
137+ *
138+ * - The sample Pearson product-moment correlation coefficient can thus be calculated as
139+ *
140+ * ```tex
141+ * r = \frac{\operatorname{cov}_n(x,y)}{\sigma_x \sigma_y}
142+ * ```
143+ *
144+ * where \\(\sigma_x\\) and \\(\sigma_y\\) are the corrected sample standard deviations for \\(x\\) and \\(y\\), respectively.
145+ *
43146* @param N number of indexed elements
44147* @param X first input array
45148* @param strideX stride length of `X`
0 commit comments