Skip to content

Commit 620810e

Browse files
committed
Adding percentile() and coefficientOfVariation()
1 parent 1e3a9c8 commit 620810e

6 files changed

Lines changed: 273 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
## 1.3.0 - WIP
44
- Adding `StreamingStat` class (experimental) for streaming/online computation of mean, variance, stdev, skewness, kurtosis, sum, min, and max with O(1) memory
5+
- Adding `percentile()` method for computing the value at any percentile (0–100) with linear interpolation
6+
- Adding `coefficientOfVariation()` method for relative dispersion (CV%), supporting both sample and population modes
57

68
## 1.2.5 - 2026-02-22
79
- Adding `kurtosis()` method for excess kurtosis

README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,15 @@ The various mathematical statistics are listed below:
7373
| `quantiles()` | cut points dividing the range of a probability distribution into continuous intervals with equal probabilities (supports `exclusive` and `inclusive` methods) |
7474
| `thirdQuartile()` | 3rd quartile, is the value at which 75 percent of the data is below it |
7575
| `firstQuartile()` | first quartile, is the value at which 25 percent of the data is below it |
76+
| `percentile()` | value at any percentile (0–100) with linear interpolation |
7677
| `pstdev()` | Population standard deviation |
7778
| `stdev()` | Sample standard deviation |
7879
| `pvariance()` | variance for a population (supports pre-computed mean via `mu`) |
7980
| `variance()` | variance for a sample (supports pre-computed mean via `xbar`) |
8081
| `skewness()` | adjusted Fisher-Pearson sample skewness |
8182
| `pskewness()` | population (biased) skewness |
8283
| `kurtosis()` | excess kurtosis (sample formula, 0 for normal distribution) |
84+
| `coefficientOfVariation()` | coefficient of variation (CV%), relative dispersion as percentage |
8385
| `geometricMean()` | geometric mean |
8486
| `harmonicMean()` | harmonic mean |
8587
| `correlation()` | Pearson’s or Spearman’s rank correlation coefficient for two inputs |
@@ -265,6 +267,20 @@ $percentile = Stat::thirdQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
265267
// 92.0
266268
```
267269

270+
#### Stat::percentile( array $data, float $p, ?int $round = null )
271+
Return the value at the given percentile of the data, using linear interpolation between adjacent data points (exclusive method, consistent with `quantiles()`).
272+
273+
The percentile `$p` must be between 0 and 100. Requires at least 2 data points.
274+
275+
```php
276+
use HiFolks\Statistics\Stat;
277+
$value = Stat::percentile([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 50);
278+
// 55.0 (median)
279+
280+
$value = Stat::percentile([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 90);
281+
// 91.0
282+
```
283+
268284
#### Stat::pstdev( array $data )
269285
Return the **Population** Standard Deviation, a measure of the amount of variation or dispersion of a set of values.
270286
A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.
@@ -359,6 +375,25 @@ $kurtosis = Stat::kurtosis([1, 2, 2, 2, 2, 2, 2, 2, 2, 50]);
359375
// positive (leptokurtic, heavier tails due to outlier)
360376
```
361377

378+
#### Stat::coefficientOfVariation( array $data, ?int $round = null, bool $population = false )
379+
The coefficient of variation (CV) is the ratio of the standard deviation to the mean, expressed as a percentage. It measures relative variability and is useful for comparing dispersion across datasets with different units or scales.
380+
381+
By default it uses the sample standard deviation. Pass `population: true` to use the population standard deviation instead.
382+
383+
Requires at least 2 data points (sample) or 1 (population). Throws if the mean is zero.
384+
385+
```php
386+
use HiFolks\Statistics\Stat;
387+
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50]);
388+
// ~52.70 (sample)
389+
390+
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50], round: 2);
391+
// 52.7
392+
393+
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50], population: true);
394+
// ~47.14 (population)
395+
```
396+
362397
#### Stat::covariance ( array $x , array $y )
363398
Covariance, static method, returns the sample covariance of two inputs *$x* and *$y*.
364399
Covariance is a measure of the joint variability of two inputs.

src/Stat.php

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,52 @@ public static function thirdQuartile(array $data): mixed
412412
return $quartiles[2];
413413
}
414414

415+
/**
416+
* Return the value at the given percentile of the data.
417+
*
418+
* Uses linear interpolation between adjacent data points,
419+
* consistent with the exclusive quantile method.
420+
*
421+
* @param array<int|float> $data
422+
* @param float $p percentile in range 0..100
423+
* @param int|null $round whether to round the result
424+
* @return float the interpolated value at the given percentile
425+
*
426+
* @throws InvalidDataInputException if the data has fewer than 2 elements or p is out of range
427+
*/
428+
public static function percentile(array $data, float $p, ?int $round = null): float
429+
{
430+
$count = self::count($data);
431+
if ($count < 2) {
432+
throw new InvalidDataInputException(
433+
"Percentile requires at least 2 data points.",
434+
);
435+
}
436+
if ($p < 0 || $p > 100) {
437+
throw new InvalidDataInputException(
438+
"Percentile must be between 0 and 100.",
439+
);
440+
}
441+
442+
sort($data);
443+
444+
// Exclusive method: rank = p/100 * (n + 1), 1-based index
445+
$rank = ($p / 100) * ($count + 1);
446+
447+
if ($rank <= 1) {
448+
return Math::round((float) $data[0], $round);
449+
}
450+
if ($rank >= $count) {
451+
return Math::round((float) $data[$count - 1], $round);
452+
}
453+
454+
$lower = (int) floor($rank) - 1;
455+
$fraction = $rank - floor($rank);
456+
$interpolated = $data[$lower] + $fraction * ($data[$lower + 1] - $data[$lower]);
457+
458+
return Math::round($interpolated, $round);
459+
}
460+
415461
/**
416462
* Return the **population** standard deviation,
417463
* a measure of the amount of variation or dispersion of a set of values.
@@ -625,6 +671,37 @@ public static function kurtosis(array $data, ?int $round = null): float
625671
return Math::round($kurtosis, $round);
626672
}
627673

674+
/**
675+
* Return the coefficient of variation (CV) of the data.
676+
* The coefficient of variation is the ratio of the standard deviation
677+
* to the mean, expressed as a percentage. It measures relative variability
678+
* and is useful for comparing dispersion across datasets with different units or scales.
679+
*
680+
* @param array<int|float> $data
681+
* @param int|null $round whether to round the result
682+
* @param bool $population if true, use population stdev/mean; otherwise sample
683+
* @return float the coefficient of variation as a percentage
684+
*
685+
* @throws InvalidDataInputException if the data has fewer than 2 elements (sample)
686+
* or is empty (population), or if the mean is zero
687+
*/
688+
public static function coefficientOfVariation(
689+
array $data,
690+
?int $round = null,
691+
bool $population = false,
692+
): float {
693+
$mean = self::mean($data);
694+
if ($mean == 0) {
695+
throw new InvalidDataInputException(
696+
"Coefficient of variation is undefined when the mean is zero.",
697+
);
698+
}
699+
700+
$sd = $population ? self::pstdev($data) : self::stdev($data);
701+
702+
return Math::round(($sd / abs($mean)) * 100, $round);
703+
}
704+
628705
/**
629706
* Return the geometric mean of the numeric data.
630707
* That is the number that can replace each of these numbers so that their product

src/Statistics.php

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,32 @@ public function kurtosis(?int $round = null): float
307307
return Stat::kurtosis($this->numericalArray(), $round);
308308
}
309309

310+
/**
311+
* Return the value at the given percentile.
312+
*
313+
* @param float $p percentile in range 0..100
314+
* @param int|null $round whether to round the result
315+
*
316+
* @see Stat::percentile()
317+
*/
318+
public function percentile(float $p, ?int $round = null): float
319+
{
320+
return Stat::percentile($this->numericalArray(), $p, $round);
321+
}
322+
323+
/**
324+
* Return the coefficient of variation (CV%) of the numeric data.
325+
*
326+
* @param int|null $round whether to round the result
327+
* @param bool $population if true, use population stdev/mean
328+
*
329+
* @see Stat::coefficientOfVariation()
330+
*/
331+
public function coefficientOfVariation(?int $round = null, bool $population = false): float
332+
{
333+
return Stat::coefficientOfVariation($this->numericalArray(), $round, $population);
334+
}
335+
310336
/**
311337
* Return the geometric mean of the numeric data.
312338
*

tests/StatTest.php

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1015,4 +1015,105 @@ public function test_kde_random_triweight_covers_both_signs(): void
10151015
$this->assertIsFloat($value);
10161016
}
10171017
}
1018+
1019+
// --- percentile ---
1020+
1021+
public function test_percentile_median_matches(): void
1022+
{
1023+
$data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
1024+
$p50 = Stat::percentile($data, 50);
1025+
$this->assertEqualsWithDelta(Stat::median($data), $p50, 1e-10);
1026+
}
1027+
1028+
public function test_percentile_quartiles(): void
1029+
{
1030+
$data = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];
1031+
$q1 = Stat::percentile($data, 25);
1032+
$q3 = Stat::percentile($data, 75);
1033+
$this->assertEqualsWithDelta(Stat::firstQuartile($data), $q1, 1e-10);
1034+
$this->assertEqualsWithDelta(Stat::thirdQuartile($data), $q3, 1e-10);
1035+
}
1036+
1037+
public function test_percentile_boundaries(): void
1038+
{
1039+
$data = [10, 20, 30, 40, 50];
1040+
$this->assertEqualsWithDelta(10.0, Stat::percentile($data, 0), 1e-10);
1041+
$this->assertEqualsWithDelta(50.0, Stat::percentile($data, 100), 1e-10);
1042+
}
1043+
1044+
public function test_percentile_rounding(): void
1045+
{
1046+
$data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
1047+
$result = Stat::percentile($data, 33, 2);
1048+
$this->assertEquals(round($result, 2), $result);
1049+
}
1050+
1051+
public function test_percentile_too_few_data_throws(): void
1052+
{
1053+
$this->expectException(InvalidDataInputException::class);
1054+
Stat::percentile([1], 50);
1055+
}
1056+
1057+
public function test_percentile_out_of_range_throws(): void
1058+
{
1059+
$this->expectException(InvalidDataInputException::class);
1060+
Stat::percentile([1, 2, 3], 101);
1061+
}
1062+
1063+
public function test_percentile_negative_throws(): void
1064+
{
1065+
$this->expectException(InvalidDataInputException::class);
1066+
Stat::percentile([1, 2, 3], -1);
1067+
}
1068+
1069+
// --- coefficientOfVariation ---
1070+
1071+
public function test_coefficient_of_variation(): void
1072+
{
1073+
$data = [10, 20, 30, 40, 50];
1074+
$expected = (Stat::stdev($data) / abs((float) Stat::mean($data))) * 100;
1075+
$this->assertEqualsWithDelta($expected, Stat::coefficientOfVariation($data), 1e-10);
1076+
}
1077+
1078+
public function test_coefficient_of_variation_population(): void
1079+
{
1080+
$data = [10, 20, 30, 40, 50];
1081+
$expected = (Stat::pstdev($data) / abs((float) Stat::mean($data))) * 100;
1082+
$this->assertEqualsWithDelta($expected, Stat::coefficientOfVariation($data, population: true), 1e-10);
1083+
}
1084+
1085+
public function test_coefficient_of_variation_rounding(): void
1086+
{
1087+
$data = [10, 20, 30, 40, 50];
1088+
$result = Stat::coefficientOfVariation($data, 2);
1089+
$this->assertEquals(round($result, 2), $result);
1090+
}
1091+
1092+
public function test_coefficient_of_variation_low_dispersion(): void
1093+
{
1094+
// Nearly identical values → low CV
1095+
$data = [100, 100.1, 99.9, 100.2, 99.8];
1096+
$cv = Stat::coefficientOfVariation($data);
1097+
$this->assertLessThan(1.0, $cv);
1098+
}
1099+
1100+
public function test_coefficient_of_variation_zero_mean_throws(): void
1101+
{
1102+
$this->expectException(InvalidDataInputException::class);
1103+
Stat::coefficientOfVariation([-1, 0, 1]);
1104+
}
1105+
1106+
public function test_coefficient_of_variation_negative_mean(): void
1107+
{
1108+
$data = [-10, -20, -30];
1109+
// Should use abs(mean), so CV is still positive
1110+
$cv = Stat::coefficientOfVariation($data);
1111+
$this->assertGreaterThan(0, $cv);
1112+
}
1113+
1114+
public function test_coefficient_of_variation_too_few_data_throws(): void
1115+
{
1116+
$this->expectException(InvalidDataInputException::class);
1117+
Stat::coefficientOfVariation([5]);
1118+
}
10181119
}

tests/StatisticTest.php

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,4 +241,36 @@ public function test_min_with_empty_array(): void
241241
{
242242
$this->assertEquals(0, Statistics::make([])->min());
243243
}
244+
245+
public function test_percentile(): void
246+
{
247+
$s = Statistics::make([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]);
248+
$this->assertEqualsWithDelta($s->median(), $s->percentile(50), 1e-10);
249+
$this->assertEqualsWithDelta($s->firstQuartile(), $s->percentile(25), 1e-10);
250+
$this->assertEqualsWithDelta($s->thirdQuartile(), $s->percentile(75), 1e-10);
251+
}
252+
253+
public function test_percentile_with_rounding(): void
254+
{
255+
$s = Statistics::make([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
256+
$result = $s->percentile(33, 2);
257+
$this->assertEquals(round($result, 2), $result);
258+
}
259+
260+
public function test_coefficient_of_variation(): void
261+
{
262+
$s = Statistics::make([10, 20, 30, 40, 50]);
263+
$cv = $s->coefficientOfVariation();
264+
$this->assertGreaterThan(0, $cv);
265+
266+
$cvPop = $s->coefficientOfVariation(population: true);
267+
$this->assertLessThan($cv, $cvPop);
268+
}
269+
270+
public function test_coefficient_of_variation_with_rounding(): void
271+
{
272+
$s = Statistics::make([10, 20, 30, 40, 50]);
273+
$result = $s->coefficientOfVariation(2);
274+
$this->assertEquals(round($result, 2), $result);
275+
}
244276
}

0 commit comments

Comments
 (0)