Skip to content

Commit a7d77c3

Browse files
committed
Adding StudentT class for the Student's t-distribution
1 parent fd8db60 commit a7d77c3

9 files changed

Lines changed: 637 additions & 2 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Changelog
22

33
## 1.3.1 - WIP
4+
- Adding `StudentT` class for the Student's t-distribution (pdf, cdf, invCdf) — building block for t-tests and confidence intervals with small samples
5+
- Adding `tTest()` method for one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown
46
- Adding `zTest()` method for one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean (includes p-value calculation)
57
- Adding `Alternative` enum (`TwoSided`, `Greater`, `Less`) for hypothesis testing
68
- Adding `confidenceInterval()` method for computing confidence intervals for the mean using the normal (z) distribution

README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ The various mathematical statistics are listed below:
101101
| `rSquared()` | coefficient of determination (R²) — proportion of variance explained by linear regression |
102102
| `confidenceInterval()` | confidence interval for the mean using the normal (z) distribution |
103103
| `zTest()` | one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean |
104+
| `tTest()` | one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown |
104105
| `kde()` | kernel density estimation — returns a closure that estimates the probability density (or CDF) at any point |
105106
| `kdeRandom()` | random sampling from a kernel density estimate — returns a closure that generates random floats from the KDE distribution |
106107

@@ -695,6 +696,27 @@ $result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
695696
// ['zScore' => 2.6458, 'pValue' => 0.0081]
696697
```
697698

699+
#### Stat::tTest( array $data, float $populationMean, Alternative $alternative = Alternative::TwoSided, ?int $round = null )
700+
Perform a one-sample t-test for the mean. Tests whether the sample mean differs significantly from a hypothesized population mean using the Student's t-distribution. Unlike the z-test, the t-test is appropriate for small samples where the population standard deviation is unknown.
701+
702+
Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.
703+
704+
Requires at least 2 data points.
705+
706+
```php
707+
use HiFolks\Statistics\Stat;
708+
use HiFolks\Statistics\Enums\Alternative;
709+
710+
$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0);
711+
// ['tStatistic' => 2.6457..., 'pValue' => 0.0331..., 'degreesOfFreedom' => 7]
712+
713+
$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, alternative: Alternative::Greater);
714+
// one-tailed test: is the sample mean greater than 3?
715+
716+
$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
717+
// ['tStatistic' => 2.6458, 'pValue' => 0.0331, 'degreesOfFreedom' => 7]
718+
```
719+
698720
#### Stat::kde ( array $data , float $h , KdeKernel $kernel = KdeKernel::Normal , bool $cumulative = false )
699721
Create a continuous probability density function (or cumulative distribution function) from discrete sample data using Kernel Density Estimation.
700722
Returns a `Closure` that can be called with any point to estimate the density (or CDF value).
@@ -1217,7 +1239,44 @@ $tempCelsius->getSigmaRounded(1); // 2.5
12171239

12181240
This class is inspired by Python’s `statistics.NormalDist` and aims to provide similar functionality for PHP users. (Work in Progress)
12191241

1242+
## `StudentT` class
1243+
1244+
The `StudentT` class represents the Student’s t-distribution, which is used for hypothesis testing and confidence intervals when the population standard deviation is unknown, especially with small sample sizes. As the degrees of freedom increase, the t-distribution approaches the standard normal distribution.
1245+
1246+
### Creating a StudentT instance
12201247

1248+
```php
1249+
use HiFolks\Statistics\StudentT;
1250+
1251+
$t = new StudentT(df: 10); // 10 degrees of freedom
1252+
```
1253+
1254+
### Probability Density Function (PDF)
1255+
1256+
```php
1257+
$t = new StudentT(5);
1258+
$t->pdf(0); // ≈ 0.37961 (peak of the distribution)
1259+
$t->pdf(2.0); // density at t=2
1260+
$t->pdfRounded(0); // 0.38
1261+
```
1262+
1263+
### Cumulative Distribution Function (CDF)
1264+
1265+
```php
1266+
$t = new StudentT(5);
1267+
$t->cdf(0); // 0.5 (symmetric around zero)
1268+
$t->cdf(2.0); // ≈ 0.94874
1269+
$t->cdfRounded(2.0); // 0.949
1270+
```
1271+
1272+
### Inverse CDF (Quantile Function)
1273+
1274+
```php
1275+
$t = new StudentT(10);
1276+
$t->invCdf(0.975); // ≈ 2.228 (critical value for 95% two-sided test)
1277+
$t->invCdf(0.5); // 0.0 (median)
1278+
$t->invCdfRounded(0.975, 3); // 2.228
1279+
```
12211280

12221281
## StreamingStat (Experimental)
12231282

TODO.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,11 @@
1111

1212
### Hypothesis Testing
1313

14-
- T-test (one-sample, two-sample, paired)
14+
- T-test (two-sample, paired) — one-sample is done
1515
- Chi-squared test
1616

1717
### Other Distributions (beyond Normal)
1818

19-
- Student's t-distribution
2019
- Chi-squared distribution
2120
- Binomial distribution
2221
- Poisson distribution

src/Stat.php

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
use HiFolks\Statistics\Enums\KdeKernel;
77
use HiFolks\Statistics\Exception\InvalidDataInputException;
88
use HiFolks\Statistics\NormalDist;
9+
use HiFolks\Statistics\StudentT;
910

1011
class Stat
1112
{
@@ -1823,4 +1824,50 @@ public static function zTest(
18231824
'pValue' => Math::round($pValue, $round),
18241825
];
18251826
}
1827+
1828+
/**
1829+
* Perform a one-sample t-test for the mean.
1830+
*
1831+
* Tests whether the sample mean differs significantly from a hypothesized
1832+
* population mean using the Student's t-distribution. Unlike the z-test,
1833+
* the t-test is appropriate for small samples where the population standard
1834+
* deviation is unknown.
1835+
*
1836+
* @param array<int|float> $data the sample data (at least 2 elements)
1837+
* @param float $populationMean the hypothesized population mean (H₀: μ = populationMean)
1838+
* @param Alternative $alternative the alternative hypothesis
1839+
* @param int|null $round optional decimal precision for rounding results
1840+
* @return array{tStatistic: float, pValue: float, degreesOfFreedom: int}
1841+
*
1842+
* @throws InvalidDataInputException if data has fewer than 2 elements
1843+
*/
1844+
public static function tTest(
1845+
array $data,
1846+
float $populationMean,
1847+
Alternative $alternative = Alternative::TwoSided,
1848+
?int $round = null,
1849+
): array {
1850+
if (self::count($data) < 2) {
1851+
throw new InvalidDataInputException(
1852+
"T-test requires at least 2 data points.",
1853+
);
1854+
}
1855+
1856+
$df = self::count($data) - 1;
1857+
$tStatistic = (self::mean($data) - $populationMean) / self::sem($data);
1858+
1859+
$studentT = new StudentT($df);
1860+
1861+
$pValue = match ($alternative) {
1862+
Alternative::TwoSided => 2 * (1 - $studentT->cdf(abs($tStatistic))),
1863+
Alternative::Greater => 1 - $studentT->cdf($tStatistic),
1864+
Alternative::Less => $studentT->cdf($tStatistic),
1865+
};
1866+
1867+
return [
1868+
'tStatistic' => Math::round($tStatistic, $round),
1869+
'pValue' => Math::round($pValue, $round),
1870+
'degreesOfFreedom' => $df,
1871+
];
1872+
}
18261873
}

src/Statistics.php

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,21 @@ public function zTest(float $populationMean, Alternative $alternative = Alternat
303303
return Stat::zTest($this->numericalArray(), $populationMean, $alternative, $round);
304304
}
305305

306+
/**
307+
* Perform a one-sample t-test for the mean.
308+
*
309+
* @param float $populationMean the hypothesized population mean
310+
* @param Alternative $alternative the alternative hypothesis
311+
* @param int|null $round whether to round the results
312+
* @return array{tStatistic: float, pValue: float, degreesOfFreedom: int}
313+
*
314+
* @see Stat::tTest()
315+
*/
316+
public function tTest(float $populationMean, Alternative $alternative = Alternative::TwoSided, ?int $round = null): array
317+
{
318+
return Stat::tTest($this->numericalArray(), $populationMean, $alternative, $round);
319+
}
320+
306321
/**
307322
* Return the mean absolute deviation (MAD).
308323
*

0 commit comments

Comments
 (0)