Skip to content

Commit 9aeafd9

Browse files
committed
Adding kendallTau() method for Kendall tau-b rank correlation with tie handling
1 parent 3bf22a4 commit 9aeafd9

7 files changed

Lines changed: 219 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Changelog
22

3+
## 1.5.2 - WIP
4+
- Adding `kendallTau()` method for Kendall tau-b rank correlation with tie handling
5+
- Adding `kendall` method support to `correlation()`
6+
37
## 1.5.1 - 2026-05-12
48
- Adding `rank()` method for assigning 1-based ranks to data points, with support for `average`, `min`, `max`, `dense`, and `ordinal` tie strategies
59
- Adding `percentileRank()` method for calculating the percentile position of a value, with `weak`, `strict`, `mean`, and `rank` variants

README.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,8 @@ The various mathematical statistics are listed below:
9797
| `iqrOutliers()` | outlier detection based on IQR method (box plot whiskers), robust for skewed data |
9898
| `geometricMean()` | geometric mean |
9999
| `harmonicMean()` | harmonic mean |
100-
| `correlation()` | Pearson’s or Spearman’s rank correlation coefficient for two inputs |
100+
| `correlation()` | Pearson’s, Spearman’s rank, or Kendall tau correlation coefficient for two inputs |
101+
| `kendallTau()` | Kendall tau-b rank correlation coefficient for ordinal association with tie handling |
101102
| `covariance()` | the sample covariance of two inputs |
102103
| `linearRegression()` | return the slope and intercept of simple linear regression parameters estimated using ordinary least squares (supports `proportional: true` for regression through the origin) |
103104
| `logarithmicRegression()` | logarithmic regression — fits `y = a × ln(x) + b`, ideal for diminishing returns patterns (e.g., athletic improvement, learning curves) |
@@ -609,10 +610,12 @@ $covariance = Stat::covariance(
609610
// -7.5
610611
```
611612

612-
#### Stat::correlation ( array $x , array $y, string $method = linear )
613+
#### Stat::correlation ( array $x , array $y, string $method = 'linear' )
613614
Return the Pearson’s correlation coefficient for two inputs. Pearson’s correlation coefficient r takes values between -1 and +1. It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.
614615

615-
Use `$method = ‘ranked’` for Spearman’s rank correlation, which measures monotonic relationships (not just linear). Spearman’s correlation is computed by applying Pearson’s formula to the ranks of the data.
616+
Use `$method = 'ranked'` for Spearman’s rank correlation, which measures monotonic relationships (not just linear). Spearman’s correlation is computed by applying Pearson’s formula to the ranks of the data.
617+
618+
Use `$method = 'kendall'` for Kendall tau-b correlation, which measures ordinal association by comparing concordant and discordant pairs. Kendall tau-b is often useful for ordinal data, small samples, and datasets with ties.
616619

617620
```php
618621
$correlation = Stat::correlation(
@@ -635,11 +638,33 @@ Spearman’s rank correlation (non-linear but monotonic relationship):
635638
$correlation = Stat::correlation(
636639
[1, 2, 3, 4, 5],
637640
[1, 4, 9, 16, 25],
638-
ranked
641+
'ranked'
639642
);
640643
// 1.0
641644
```
642645

646+
Kendall tau-b rank correlation with ties:
647+
```php
648+
$correlation = Stat::correlation(
649+
[12, 2, 1, 12, 2],
650+
[1, 4, 7, 1, 0],
651+
'kendall'
652+
);
653+
// -0.47140452079103173
654+
```
655+
656+
#### Stat::kendallTau ( array $x , array $y, ?int $round = null )
657+
Return Kendall's tau-b rank correlation coefficient for two inputs.
658+
659+
```php
660+
$correlation = Stat::kendallTau(
661+
[12, 2, 1, 12, 2],
662+
[1, 4, 7, 1, 0],
663+
4
664+
);
665+
// -0.4714
666+
```
667+
643668
#### Stat::linearRegression ( array $x , array $y , bool $proportional = false )
644669
Return the slope and intercept of simple linear regression parameters estimated using ordinary least squares.
645670
Simple linear regression describes the relationship between an independent variable *$x* and a dependent variable *$y* in terms of a linear function.

TODO.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,9 @@
99

1010
### Priority 2: Correlation
1111

12-
- `kendallTau()` - Kendall tau rank correlation.
13-
- Useful for ordinal data and small samples.
14-
- Complements the existing Pearson and Spearman support in `correlation()`.
15-
- Consider extending `correlation()` with a Kendall method option.
12+
- DONE: `kendallTau()` - Kendall tau-b rank correlation.
13+
- Useful for ordinal data, small samples, and datasets with ties.
14+
- DONE: Extend `correlation()` with a Kendall method option.
1615

1716
### Priority 3: Hypothesis Testing
1817

examples/article-boston-marathon-analysis.php

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,11 +199,13 @@
199199

200200
$pearson = Stat::correlation($ages, $finishTimes);
201201
$spearman = Stat::correlation($ages, $finishTimes, 'ranked');
202+
$kendall = Stat::correlation($ages, $finishTimes, 'kendall');
202203
$regression = Stat::linearRegression($ages, $finishTimes);
203204
$r2 = Stat::rSquared($ages, $finishTimes, false, 4);
204205

205206
echo "Pearson correlation: " . round($pearson, 4) . PHP_EOL;
206207
echo "Spearman correlation: " . round($spearman, 4) . PHP_EOL;
208+
echo "Kendall tau: " . round($kendall, 4) . PHP_EOL;
207209
echo PHP_EOL;
208210
echo "Linear regression: finish = " . round($regression[0], 1) . " × age + " . round($regression[1]) . PHP_EOL;
209211
echo "R-squared: " . $r2 . PHP_EOL;
@@ -212,6 +214,7 @@
212214
echo "How to interpret:" . PHP_EOL;
213215
echo "- Pearson and Spearman close to +1 = strong positive relationship (older = slower)." . PHP_EOL;
214216
echo "- If both correlations are similar, the relationship is linear, not just monotonic." . PHP_EOL;
217+
echo "- Kendall tau is stricter and useful for rank agreement, especially with ties." . PHP_EOL;
215218
echo "- The slope tells you seconds added per year of age. Divide by 60 for minutes." . PHP_EOL;
216219
echo "- R-squared tells you what fraction of variation age explains (0 = none, 1 = all)." . PHP_EOL;
217220

@@ -458,6 +461,7 @@
458461
['Stat::tTestPaired()', '3'],
459462
['Stat::correlation() — Pearson', '4'],
460463
['Stat::correlation() — Spearman', '4'],
464+
['Stat::correlation() — Kendall', '4'],
461465
['Stat::linearRegression()', '4'],
462466
['Stat::rSquared()', '4'],
463467
['Stat::coefficientOfVariation()', '5'],

examples/stat_methods.php

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@
3434
// 60.0
3535
$percentileRank = Stat::percentileRank([10, 20, 20, 30, 40], 20, Stat::PERCENTILE_RANK_STRICT);
3636
// 20.0
37+
$correlation = Stat::kendallTau([12, 2, 1, 12, 2], [1, 4, 7, 1, 0], 4);
38+
// -0.4714
39+
$correlation = Stat::correlation([12, 2, 1, 12, 2], [1, 4, 7, 1, 0], 'kendall');
40+
// -0.47140452079103173
3741
$quantiles = Stat::quantiles([98, 90, 70, 18, 92, 92, 55, 83, 45, 95, 88]);
3842
// [ 55.0, 88.0, 92.0 ]
3943
$quantiles = Stat::quantiles([105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110, 100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129, 106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86, 111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95, 103, 107, 101, 81, 109, 104], 10);

src/Stat.php

Lines changed: 90 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1257,8 +1257,8 @@ public static function harmonicMean(
12571257
* Return the sample covariance of two inputs *$x* and *$y*.
12581258
* Covariance is a measure of the joint variability of two inputs.
12591259
*
1260-
* @param array<int|float> $x
1261-
* @param array<int|float> $y
1260+
* @param array<mixed> $x
1261+
* @param array<mixed> $y
12621262
*
12631263
* @throws InvalidDataInputException if 2 arrays have different size,
12641264
* or if the length of arrays are < 2, or if the 2 input arrays has not numeric elements
@@ -1311,8 +1311,8 @@ public static function covariance(array $x, array $y): false|float
13111311
* -1 very strong, negative linear relationship,
13121312
* and 0 no linear relationship.
13131313
*
1314-
* @param array<int|float> $x
1315-
* @param array<int|float> $y
1314+
* @param array<mixed> $x
1315+
* @param array<mixed> $y
13161316
*
13171317
* @throws InvalidDataInputException if 2 arrays have different size,
13181318
* or if the length of arrays are < 2, or if the 2 input arrays has not numeric elements,
@@ -1323,9 +1323,9 @@ public static function correlation(
13231323
array $y,
13241324
string $method = "linear",
13251325
): false|float {
1326-
if ($method !== "linear" && $method !== "ranked") {
1326+
if (!in_array($method, ["linear", "ranked", "kendall"], true)) {
13271327
throw new InvalidDataInputException(
1328-
"Correlation method must be 'linear' or 'ranked'.",
1328+
"Correlation method must be 'linear', 'ranked', or 'kendall'.",
13291329
);
13301330
}
13311331

@@ -1346,6 +1346,9 @@ public static function correlation(
13461346
$x = self::ranks($x);
13471347
$y = self::ranks($y);
13481348
}
1349+
if ($method === "kendall") {
1350+
return self::kendallTau($x, $y);
1351+
}
13491352

13501353
$meanX = self::mean($x);
13511354
$meanY = self::mean($y);
@@ -1370,6 +1373,87 @@ public static function correlation(
13701373
return $a / $b;
13711374
}
13721375

1376+
/**
1377+
* Return Kendall's tau-b rank correlation coefficient for two inputs.
1378+
*
1379+
* Kendall's tau measures ordinal association by comparing concordant and
1380+
* discordant pairs. The tau-b variant adjusts for ties in either input.
1381+
*
1382+
* @param array<mixed> $x
1383+
* @param array<mixed> $y
1384+
* @param int|null $round whether to round the result
1385+
*
1386+
* @throws InvalidDataInputException if inputs have different sizes, fewer than 2 data points,
1387+
* or non-numeric/constant data
1388+
*/
1389+
public static function kendallTau(array $x, array $y, ?int $round = null): float
1390+
{
1391+
$countX = count($x);
1392+
$countY = count($y);
1393+
if ($countX !== $countY) {
1394+
throw new InvalidDataInputException(
1395+
"Kendall tau requires that both inputs have same number of data points.",
1396+
);
1397+
}
1398+
if ($countX < 2) {
1399+
throw new InvalidDataInputException(
1400+
"Kendall tau requires at least two data points.",
1401+
);
1402+
}
1403+
1404+
$concordant = 0;
1405+
$discordant = 0;
1406+
$tiesX = 0;
1407+
$tiesY = 0;
1408+
1409+
for ($i = 0; $i < $countX - 1; $i++) {
1410+
if (!is_numeric($x[$i]) || !is_numeric($y[$i])) {
1411+
throw new InvalidDataInputException(
1412+
"Kendall tau requires numeric data points.",
1413+
);
1414+
}
1415+
for ($j = $i + 1; $j < $countX; $j++) {
1416+
if (!is_numeric($x[$j]) || !is_numeric($y[$j])) {
1417+
throw new InvalidDataInputException(
1418+
"Kendall tau requires numeric data points.",
1419+
);
1420+
}
1421+
1422+
$xComparison = $x[$i] <=> $x[$j];
1423+
$yComparison = $y[$i] <=> $y[$j];
1424+
1425+
if ($xComparison === 0 && $yComparison === 0) {
1426+
continue;
1427+
}
1428+
if ($xComparison === 0) {
1429+
$tiesX++;
1430+
continue;
1431+
}
1432+
if ($yComparison === 0) {
1433+
$tiesY++;
1434+
continue;
1435+
}
1436+
if ($xComparison === $yComparison) {
1437+
$concordant++;
1438+
continue;
1439+
}
1440+
$discordant++;
1441+
}
1442+
}
1443+
1444+
$denominator = sqrt(
1445+
($concordant + $discordant + $tiesX)
1446+
* ($concordant + $discordant + $tiesY),
1447+
);
1448+
if ($denominator == 0) {
1449+
throw new InvalidDataInputException(
1450+
"Kendall tau, at least one of the inputs is constant.",
1451+
);
1452+
}
1453+
1454+
return Math::round(($concordant - $discordant) / $denominator, $round);
1455+
}
1456+
13731457
/**
13741458
* Assign average ranks to data values (handles ties by averaging).
13751459
*

tests/StatTest.php

Lines changed: 85 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -522,14 +522,14 @@ public function test_calculates_covariance_with_non_numeric_first(): void
522522
{
523523
$this->expectException(InvalidDataInputException::class);
524524
// Intentionally passing non-numeric values to test exception handling
525-
Stat::covariance(['a', 1], ['b', 2]); // @phpstan-ignore argument.type, argument.type
525+
Stat::covariance(['a', 1], ['b', 2]);
526526
}
527527

528528
public function test_calculates_covariance_with_non_numeric_second(): void
529529
{
530530
$this->expectException(InvalidDataInputException::class);
531531
// Intentionally passing non-numeric values to test exception handling
532-
Stat::covariance([3, 1], ['b', 2]); // @phpstan-ignore argument.type
532+
Stat::covariance([3, 1], ['b', 2]);
533533
}
534534

535535
public function test_calculates_correlation(): void
@@ -629,6 +629,87 @@ public function test_calculates_spearman_correlation_with_ties(): void
629629
$this->assertEqualsWithDelta(1.0, $correlation, 1e-9);
630630
}
631631

632+
public function test_calculates_kendall_tau(): void
633+
{
634+
$this->assertEqualsWithDelta(
635+
1.0,
636+
Stat::kendallTau([1, 2, 3, 4], [10, 20, 30, 40]),
637+
1e-9,
638+
);
639+
$this->assertEqualsWithDelta(
640+
-1.0,
641+
Stat::kendallTau([1, 2, 3, 4], [40, 30, 20, 10]),
642+
1e-9,
643+
);
644+
}
645+
646+
public function test_calculates_kendall_tau_with_ties(): void
647+
{
648+
$tau = Stat::kendallTau([12, 2, 1, 12, 2], [1, 4, 7, 1, 0]);
649+
650+
$this->assertEqualsWithDelta(-0.47140452079103173, $tau, 1e-12);
651+
}
652+
653+
public function test_calculates_kendall_tau_with_ties_in_y_only(): void
654+
{
655+
$tau = Stat::kendallTau([1, 2, 3], [1, 1, 2]);
656+
657+
$this->assertEqualsWithDelta(0.8164965809277261, $tau, 1e-12);
658+
}
659+
660+
public function test_calculates_kendall_tau_with_rounding(): void
661+
{
662+
$tau = Stat::kendallTau([12, 2, 1, 12, 2], [1, 4, 7, 1, 0], 4);
663+
664+
$this->assertSame(-0.4714, $tau);
665+
}
666+
667+
public function test_calculates_correlation_with_kendall_method(): void
668+
{
669+
$correlation = Stat::correlation(
670+
[12, 2, 1, 12, 2],
671+
[1, 4, 7, 1, 0],
672+
'kendall',
673+
);
674+
675+
$this->assertEqualsWithDelta(-0.47140452079103173, $correlation, 1e-12);
676+
}
677+
678+
public function test_calculates_kendall_tau_wrong_usage_different_lengths(): void
679+
{
680+
$this->expectException(InvalidDataInputException::class);
681+
682+
Stat::kendallTau([1, 2], [1, 2, 3]);
683+
}
684+
685+
public function test_calculates_kendall_tau_wrong_usage_empty(): void
686+
{
687+
$this->expectException(InvalidDataInputException::class);
688+
689+
Stat::kendallTau([], []);
690+
}
691+
692+
public function test_calculates_kendall_tau_wrong_usage_constant(): void
693+
{
694+
$this->expectException(InvalidDataInputException::class);
695+
696+
Stat::kendallTau([1, 1, 1], [1, 2, 3]);
697+
}
698+
699+
public function test_calculates_kendall_tau_with_non_numeric_data(): void
700+
{
701+
$this->expectException(InvalidDataInputException::class);
702+
703+
Stat::kendallTau([1, 'a', 3], [1, 2, 3]);
704+
}
705+
706+
public function test_calculates_kendall_tau_with_non_numeric_first_pair(): void
707+
{
708+
$this->expectException(InvalidDataInputException::class);
709+
710+
Stat::kendallTau(['a', 1, 3], [1, 2, 3]);
711+
}
712+
632713
public function test_calculates_correlation_invalid_method(): void
633714
{
634715
$this->expectException(InvalidDataInputException::class);
@@ -1493,13 +1574,13 @@ public function test_covariance_non_numeric_x_throws(): void
14931574
$this->expectException(InvalidDataInputException::class);
14941575
// true passes mean()'s string filter and array_sum without warnings,
14951576
// but is_numeric(true) returns false, triggering the loop guard
1496-
Stat::covariance([true, 1, 2], [3, 4, 5]); // @phpstan-ignore argument.type
1577+
Stat::covariance([true, 1, 2], [3, 4, 5]);
14971578
}
14981579

14991580
public function test_covariance_non_numeric_y_throws(): void
15001581
{
15011582
$this->expectException(InvalidDataInputException::class);
1502-
Stat::covariance([1, 2, 3], [true, 4, 5]); // @phpstan-ignore argument.type
1583+
Stat::covariance([1, 2, 3], [true, 4, 5]);
15031584
}
15041585

15051586
public function test_kde_cumulative_bounded_kernels(): void

0 commit comments

Comments
 (0)