Skip to content

Commit 77807d8

Browse files
committed
Adding tTestTwoSample() and tTestPaired() method
1 parent b0e66a5 commit 77807d8

6 files changed

Lines changed: 377 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Changelog
22

33
## 1.3.1 - WIP
4+
- Adding `tTestTwoSample()` method for two-sample independent t-test (Welch's t-test) — compares the means of two independent groups without assuming equal variances
5+
- Adding `tTestPaired()` method for paired t-test — tests whether the mean difference between paired observations (e.g. before/after) is significantly different from zero
46
- Adding `StudentT` class for the Student's t-distribution (pdf, cdf, invCdf) — building block for t-tests and confidence intervals with small samples
57
- Adding `tTest()` method for one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown
68
- Adding `zTest()` method for one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean (includes p-value calculation)

README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ The various mathematical statistics are listed below:
102102
| `confidenceInterval()` | confidence interval for the mean using the normal (z) distribution |
103103
| `zTest()` | one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean |
104104
| `tTest()` | one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown |
105+
| `tTestTwoSample()` | two-sample independent t-test (Welch's) — compares the means of two independent groups without assuming equal variances |
106+
| `tTestPaired()` | paired t-test — tests whether the mean difference between paired observations is significantly different from zero |
105107
| `kde()` | kernel density estimation — returns a closure that estimates the probability density (or CDF) at any point |
106108
| `kdeRandom()` | random sampling from a kernel density estimate — returns a closure that generates random floats from the KDE distribution |
107109

@@ -717,6 +719,53 @@ $result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
717719
// ['tStatistic' => 2.6458, 'pValue' => 0.0331, 'degreesOfFreedom' => 7]
718720
```
719721

722+
#### Stat::tTestTwoSample( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
723+
Perform a two-sample independent t-test (Welch's t-test). Compares the means of two independent groups without assuming equal variances. Uses the Welch–Satterthwaite approximation for degrees of freedom.
724+
725+
Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.
726+
727+
Requires at least 2 data points in each sample.
728+
729+
```php
730+
use HiFolks\Statistics\Stat;
731+
use HiFolks\Statistics\Enums\Alternative;
732+
733+
// Compare two groups
734+
$group1 = [30.02, 29.99, 30.11, 29.97, 30.01, 29.99];
735+
$group2 = [29.89, 29.93, 29.72, 29.98, 30.02, 29.98];
736+
$result = Stat::tTestTwoSample($group1, $group2);
737+
// ['tStatistic' => 1.6245..., 'pValue' => 0.1444..., 'degreesOfFreedom' => 6.84...]
738+
739+
// One-tailed test: is group1 mean greater than group2 mean?
740+
$result = Stat::tTestTwoSample($group1, $group2, alternative: Alternative::Greater);
741+
742+
// Groups can have different sizes
743+
$result = Stat::tTestTwoSample([1, 2, 3, 4, 5, 6, 7, 8], [3, 4, 5], round: 4);
744+
```
745+
746+
#### Stat::tTestPaired( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
747+
Perform a paired t-test. Tests whether the mean difference between paired observations (e.g. before/after measurements on the same subjects) is significantly different from zero.
748+
749+
Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. Both arrays must have the same length.
750+
751+
Requires at least 2 paired observations.
752+
753+
```php
754+
use HiFolks\Statistics\Stat;
755+
use HiFolks\Statistics\Enums\Alternative;
756+
757+
// Before and after treatment measurements
758+
$before = [200, 190, 210, 220, 215, 205, 195, 225];
759+
$after = [192, 186, 198, 212, 208, 198, 188, 215];
760+
$result = Stat::tTestPaired($before, $after);
761+
// ['tStatistic' => 5.715..., 'pValue' => 0.0007..., 'degreesOfFreedom' => 7]
762+
763+
// One-tailed: did the treatment decrease the values?
764+
$result = Stat::tTestPaired($before, $after, alternative: Alternative::Greater);
765+
766+
$result = Stat::tTestPaired($before, $after, round: 4);
767+
```
768+
720769
#### Stat::kde ( array $data , float $h , KdeKernel $kernel = KdeKernel::Normal , bool $cumulative = false )
721770
Create a continuous probability density function (or cumulative distribution function) from discrete sample data using Kernel Density Estimation.
722771
Returns a `Closure` that can be called with any point to estimate the density (or CDF value).

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
### Hypothesis Testing
1313

14-
- T-test (two-sample, paired) — one-sample is done
14+
- ~~T-test (two-sample, paired) — one-sample is done~~ DONE: `tTestTwoSample()` (Welch's) and `tTestPaired()`
1515
- Chi-squared test
1616

1717
### Other Distributions (beyond Normal)

src/Stat.php

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1874,4 +1874,115 @@ public static function tTest(
18741874
'degreesOfFreedom' => $df,
18751875
];
18761876
}
1877+
1878+
/**
1879+
* Perform a two-sample independent t-test (Welch's t-test).
1880+
*
1881+
* Tests whether two independent samples have different means.
1882+
* Uses Welch's approximation for degrees of freedom, which does not
1883+
* assume equal variances.
1884+
*
1885+
* @param array<int|float> $data1 the first sample (at least 2 elements)
1886+
* @param array<int|float> $data2 the second sample (at least 2 elements)
1887+
* @param Alternative $alternative the alternative hypothesis
1888+
* @param int|null $round optional decimal precision for rounding results
1889+
* @return array{tStatistic: float, pValue: float, degreesOfFreedom: float}
1890+
*
1891+
* @throws InvalidDataInputException if either sample has fewer than 2 elements
1892+
*/
1893+
public static function tTestTwoSample(
1894+
array $data1,
1895+
array $data2,
1896+
Alternative $alternative = Alternative::TwoSided,
1897+
?int $round = null,
1898+
): array {
1899+
$n1 = self::count($data1);
1900+
$n2 = self::count($data2);
1901+
1902+
if ($n1 < 2 || $n2 < 2) {
1903+
throw new InvalidDataInputException(
1904+
"Two-sample t-test requires at least 2 data points in each sample.",
1905+
);
1906+
}
1907+
1908+
$mean1 = self::mean($data1);
1909+
$mean2 = self::mean($data2);
1910+
$var1 = self::variance($data1);
1911+
$var2 = self::variance($data2);
1912+
1913+
$se = sqrt($var1 / $n1 + $var2 / $n2);
1914+
1915+
if ($se === 0.0) {
1916+
throw new InvalidDataInputException(
1917+
"Two-sample t-test requires non-zero variance in at least one sample.",
1918+
);
1919+
}
1920+
1921+
$tStatistic = ($mean1 - $mean2) / $se;
1922+
1923+
// Welch–Satterthwaite degrees of freedom
1924+
$v1 = $var1 / $n1;
1925+
$v2 = $var2 / $n2;
1926+
$df = (($v1 + $v2) ** 2) / (($v1 ** 2) / ($n1 - 1) + ($v2 ** 2) / ($n2 - 1));
1927+
1928+
$studentT = new StudentT($df);
1929+
1930+
$pValue = match ($alternative) {
1931+
Alternative::TwoSided => 2 * (1 - $studentT->cdf(abs($tStatistic))),
1932+
Alternative::Greater => 1 - $studentT->cdf($tStatistic),
1933+
Alternative::Less => $studentT->cdf($tStatistic),
1934+
};
1935+
1936+
return [
1937+
'tStatistic' => Math::round($tStatistic, $round),
1938+
'pValue' => Math::round($pValue, $round),
1939+
'degreesOfFreedom' => Math::round($df, $round),
1940+
];
1941+
}
1942+
1943+
/**
1944+
* Perform a paired t-test.
1945+
*
1946+
* Tests whether the mean difference between paired observations is
1947+
* significantly different from zero. This is equivalent to a one-sample
1948+
* t-test on the differences.
1949+
*
1950+
* @param array<int|float> $data1 the first set of observations
1951+
* @param array<int|float> $data2 the second set of observations (same length as $data1)
1952+
* @param Alternative $alternative the alternative hypothesis
1953+
* @param int|null $round optional decimal precision for rounding results
1954+
* @return array{tStatistic: float, pValue: float, degreesOfFreedom: int}
1955+
*
1956+
* @throws InvalidDataInputException if arrays have different lengths or fewer than 2 elements
1957+
*/
1958+
public static function tTestPaired(
1959+
array $data1,
1960+
array $data2,
1961+
Alternative $alternative = Alternative::TwoSided,
1962+
?int $round = null,
1963+
): array {
1964+
$n1 = self::count($data1);
1965+
$n2 = self::count($data2);
1966+
1967+
if ($n1 !== $n2) {
1968+
throw new InvalidDataInputException(
1969+
"Paired t-test requires both samples to have the same number of observations.",
1970+
);
1971+
}
1972+
1973+
if ($n1 < 2) {
1974+
throw new InvalidDataInputException(
1975+
"Paired t-test requires at least 2 data points.",
1976+
);
1977+
}
1978+
1979+
// Compute differences
1980+
$differences = [];
1981+
for ($i = 0; $i < $n1; $i++) {
1982+
$differences[] = $data1[$i] - $data2[$i];
1983+
}
1984+
1985+
// Paired t-test is a one-sample t-test on the differences with μ₀ = 0
1986+
return self::tTest($differences, 0.0, $alternative, $round);
1987+
}
18771988
}

src/Statistics.php

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,36 @@ public function tTest(float $populationMean, Alternative $alternative = Alternat
318318
return Stat::tTest($this->numericalArray(), $populationMean, $alternative, $round);
319319
}
320320

321+
/**
322+
* Perform a two-sample independent t-test (Welch's t-test).
323+
*
324+
* @param array<int|float> $data2 the second sample
325+
* @param Alternative $alternative the alternative hypothesis
326+
* @param int|null $round whether to round the results
327+
* @return array{tStatistic: float, pValue: float, degreesOfFreedom: float}
328+
*
329+
* @see Stat::tTestTwoSample()
330+
*/
331+
public function tTestTwoSample(array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null): array
332+
{
333+
return Stat::tTestTwoSample($this->numericalArray(), $data2, $alternative, $round);
334+
}
335+
336+
/**
337+
* Perform a paired t-test.
338+
*
339+
* @param array<int|float> $data2 the second set of observations (same length)
340+
* @param Alternative $alternative the alternative hypothesis
341+
* @param int|null $round whether to round the results
342+
* @return array{tStatistic: float, pValue: float, degreesOfFreedom: int}
343+
*
344+
* @see Stat::tTestPaired()
345+
*/
346+
public function tTestPaired(array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null): array
347+
{
348+
return Stat::tTestPaired($this->numericalArray(), $data2, $alternative, $round);
349+
}
350+
321351
/**
322352
* Return the mean absolute deviation (MAD).
323353
*

0 commit comments

Comments
 (0)