Adding tTestTwoSample() and tTestPaired() method

roberto-butti · roberto-butti · commit 77807d8a35f9 · 2026-02-23T22:37:35.000+01:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,8 @@
 # Changelog
 
 ## 1.3.1 - WIP
+- Adding `tTestTwoSample()` method for two-sample independent t-test (Welch's t-test) — compares the means of two independent groups without assuming equal variances
+- Adding `tTestPaired()` method for paired t-test — tests whether the mean difference between paired observations (e.g. before/after) is significantly different from zero
 - Adding `StudentT` class for the Student's t-distribution (pdf, cdf, invCdf) — building block for t-tests and confidence intervals with small samples
 - Adding `tTest()` method for one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown
 - Adding `zTest()` method for one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean (includes p-value calculation)
diff --git a/README.md b/README.md
@@ -102,6 +102,8 @@ The various mathematical statistics are listed below:
 | `confidenceInterval()` | confidence interval for the mean using the normal (z) distribution |
 | `zTest()` | one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean |
 | `tTest()` | one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown |
+| `tTestTwoSample()` | two-sample independent t-test (Welch's) — compares the means of two independent groups without assuming equal variances |
+| `tTestPaired()` | paired t-test — tests whether the mean difference between paired observations is significantly different from zero |
 | `kde()` | kernel density estimation — returns a closure that estimates the probability density (or CDF) at any point |
 | `kdeRandom()` | random sampling from a kernel density estimate — returns a closure that generates random floats from the KDE distribution |
 
@@ -717,6 +719,53 @@ $result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
 // ['tStatistic' => 2.6458, 'pValue' => 0.0331, 'degreesOfFreedom' => 7]
 ```
 
+#### Stat::tTestTwoSample( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
+Perform a two-sample independent t-test (Welch's t-test). Compares the means of two independent groups without assuming equal variances. Uses the Welch–Satterthwaite approximation for degrees of freedom.
+
+Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.
+
+Requires at least 2 data points in each sample.
+
+```php
+use HiFolks\Statistics\Stat;
+use HiFolks\Statistics\Enums\Alternative;
+
+// Compare two groups
+$group1 = [30.02, 29.99, 30.11, 29.97, 30.01, 29.99];
+$group2 = [29.89, 29.93, 29.72, 29.98, 30.02, 29.98];
+$result = Stat::tTestTwoSample($group1, $group2);
+// ['tStatistic' => 1.6245..., 'pValue' => 0.1444..., 'degreesOfFreedom' => 6.84...]
+
+// One-tailed test: is group1 mean greater than group2 mean?
+$result = Stat::tTestTwoSample($group1, $group2, alternative: Alternative::Greater);
+
+// Groups can have different sizes
+$result = Stat::tTestTwoSample([1, 2, 3, 4, 5, 6, 7, 8], [3, 4, 5], round: 4);
+```
+
+#### Stat::tTestPaired( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
+Perform a paired t-test. Tests whether the mean difference between paired observations (e.g. before/after measurements on the same subjects) is significantly different from zero.
+
+Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. Both arrays must have the same length.
+
+Requires at least 2 paired observations.
+
+```php
+use HiFolks\Statistics\Stat;
+use HiFolks\Statistics\Enums\Alternative;
+
+// Before and after treatment measurements
+$before = [200, 190, 210, 220, 215, 205, 195, 225];
+$after  = [192, 186, 198, 212, 208, 198, 188, 215];
+$result = Stat::tTestPaired($before, $after);
+// ['tStatistic' => 5.715..., 'pValue' => 0.0007..., 'degreesOfFreedom' => 7]
+
+// One-tailed: did the treatment decrease the values?
+$result = Stat::tTestPaired($before, $after, alternative: Alternative::Greater);
+
+$result = Stat::tTestPaired($before, $after, round: 4);
+```
+
 #### Stat::kde ( array $data , float $h , KdeKernel $kernel = KdeKernel::Normal , bool $cumulative = false )
 Create a continuous probability density function (or cumulative distribution function) from discrete sample data using Kernel Density Estimation.
 Returns a `Closure` that can be called with any point to estimate the density (or CDF value).
diff --git a/TODO.md b/TODO.md
@@ -11,7 +11,7 @@
 
 ### Hypothesis Testing
 
-- T-test (two-sample, paired) — one-sample is done
+- ~~T-test (two-sample, paired) — one-sample is done~~ DONE: `tTestTwoSample()` (Welch's) and `tTestPaired()`
 - Chi-squared test
 
 ### Other Distributions (beyond Normal)
diff --git a/src/Stat.php b/src/Stat.php
@@ -1874,4 +1874,115 @@ public static function tTest(
             'degreesOfFreedom' => $df,
         ];
     }
+
+    /**
+     * Perform a two-sample independent t-test (Welch's t-test).
+     *
+     * Tests whether two independent samples have different means.
+     * Uses Welch's approximation for degrees of freedom, which does not
+     * assume equal variances.
+     *
+     * @param  array<int|float>  $data1  the first sample (at least 2 elements)
+     * @param  array<int|float>  $data2  the second sample (at least 2 elements)
+     * @param  Alternative  $alternative  the alternative hypothesis
+     * @param  int|null  $round  optional decimal precision for rounding results
+     * @return array{tStatistic: float, pValue: float, degreesOfFreedom: float}
+     *
+     * @throws InvalidDataInputException if either sample has fewer than 2 elements
+     */
+    public static function tTestTwoSample(
+        array $data1,
+        array $data2,
+        Alternative $alternative = Alternative::TwoSided,
+        ?int $round = null,
+    ): array {
+        $n1 = self::count($data1);
+        $n2 = self::count($data2);
+
+        if ($n1 < 2 || $n2 < 2) {
+            throw new InvalidDataInputException(
+                "Two-sample t-test requires at least 2 data points in each sample.",
+            );
+        }
+
+        $mean1 = self::mean($data1);
+        $mean2 = self::mean($data2);
+        $var1 = self::variance($data1);
+        $var2 = self::variance($data2);
+
+        $se = sqrt($var1 / $n1 + $var2 / $n2);
+
+        if ($se === 0.0) {
+            throw new InvalidDataInputException(
+                "Two-sample t-test requires non-zero variance in at least one sample.",
+            );
+        }
+
+        $tStatistic = ($mean1 - $mean2) / $se;
+
+        // Welch–Satterthwaite degrees of freedom
+        $v1 = $var1 / $n1;
+        $v2 = $var2 / $n2;
+        $df = (($v1 + $v2) ** 2) / (($v1 ** 2) / ($n1 - 1) + ($v2 ** 2) / ($n2 - 1));
+
+        $studentT = new StudentT($df);
+
+        $pValue = match ($alternative) {
+            Alternative::TwoSided => 2 * (1 - $studentT->cdf(abs($tStatistic))),
+            Alternative::Greater => 1 - $studentT->cdf($tStatistic),
+            Alternative::Less => $studentT->cdf($tStatistic),
+        };
+
+        return [
+            'tStatistic' => Math::round($tStatistic, $round),
+            'pValue' => Math::round($pValue, $round),
+            'degreesOfFreedom' => Math::round($df, $round),
+        ];
+    }
+
+    /**
+     * Perform a paired t-test.
+     *
+     * Tests whether the mean difference between paired observations is
+     * significantly different from zero. This is equivalent to a one-sample
+     * t-test on the differences.
+     *
+     * @param  array<int|float>  $data1  the first set of observations
+     * @param  array<int|float>  $data2  the second set of observations (same length as $data1)
+     * @param  Alternative  $alternative  the alternative hypothesis
+     * @param  int|null  $round  optional decimal precision for rounding results
+     * @return array{tStatistic: float, pValue: float, degreesOfFreedom: int}
+     *
+     * @throws InvalidDataInputException if arrays have different lengths or fewer than 2 elements
+     */
+    public static function tTestPaired(
+        array $data1,
+        array $data2,
+        Alternative $alternative = Alternative::TwoSided,
+        ?int $round = null,
+    ): array {
+        $n1 = self::count($data1);
+        $n2 = self::count($data2);
+
+        if ($n1 !== $n2) {
+            throw new InvalidDataInputException(
+                "Paired t-test requires both samples to have the same number of observations.",
+            );
+        }
+
+        if ($n1 < 2) {
+            throw new InvalidDataInputException(
+                "Paired t-test requires at least 2 data points.",
+            );
+        }
+
+        // Compute differences
+        $differences = [];
+        for ($i = 0; $i < $n1; $i++) {
+            $differences[] = $data1[$i] - $data2[$i];
+        }
+
+        // Paired t-test is a one-sample t-test on the differences with μ₀ = 0
+        return self::tTest($differences, 0.0, $alternative, $round);
+    }
 }
diff --git a/src/Statistics.php b/src/Statistics.php
@@ -318,6 +318,36 @@ public function tTest(float $populationMean, Alternative $alternative = Alternat
         return Stat::tTest($this->numericalArray(), $populationMean, $alternative, $round);
     }
 
+    /**
+     * Perform a two-sample independent t-test (Welch's t-test).
+     *
+     * @param  array<int|float>  $data2  the second sample
+     * @param  Alternative  $alternative  the alternative hypothesis
+     * @param  int|null  $round  whether to round the results
+     * @return array{tStatistic: float, pValue: float, degreesOfFreedom: float}
+     *
+     * @see Stat::tTestTwoSample()
+     */
+    public function tTestTwoSample(array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null): array
+    {
+        return Stat::tTestTwoSample($this->numericalArray(), $data2, $alternative, $round);
+    }
+
+    /**
+     * Perform a paired t-test.
+     *
+     * @param  array<int|float>  $data2  the second set of observations (same length)
+     * @param  Alternative  $alternative  the alternative hypothesis
+     * @param  int|null  $round  whether to round the results
+     * @return array{tStatistic: float, pValue: float, degreesOfFreedom: int}
+     *
+     * @see Stat::tTestPaired()
+     */
+    public function tTestPaired(array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null): array
+    {
+        return Stat::tTestPaired($this->numericalArray(), $data2, $alternative, $round);
+    }
+
     /**
      * Return the mean absolute deviation (MAD).
      *
diff --git a/tests/StatTest.php b/tests/StatTest.php