|
| 1 | + Missing Functions |
| 2 | + |
| 3 | + Python Function: median_grouped(data, interval) |
| 4 | + Description: Median of grouped/binned continuous data |
| 5 | + Status: Missing |
| 6 | + ──────────────────────────────────────── |
| 7 | + Python Function: kde(data, h, kernel) |
| 8 | + Description: Kernel Density Estimation |
| 9 | + Status: Missing |
| 10 | + ──────────────────────────────────────── |
| 11 | + Python Function: kde_random(data, h, kernel) |
| 12 | + Description: Random sampling from KDE |
| 13 | + Status: Missing |
| 14 | + |
| 15 | + Missing Parameters/Variants |
| 16 | + |
| 17 | + Feature: correlation() with method='ranked' |
| 18 | + Python: Supports both Pearson and Spearman rank correlation |
| 19 | + This Package: Only Pearson |
| 20 | + ──────────────────────────────────────── |
| 21 | + Feature: linear_regression() with proportional=True |
| 22 | + Python: Supports proportional regression (intercept forced to 0) |
| 23 | + This Package: No proportional option |
| 24 | + ──────────────────────────────────────── |
| 25 | + Feature: variance(data, xbar) / pvariance(data, mu) |
| 26 | + Python: Can pass pre-computed mean to avoid recalculation |
| 27 | + This Package: No pre-computed mean parameter |
| 28 | + ──────────────────────────────────────── |
| 29 | + Feature: quantiles() with method='inclusive' |
| 30 | + Python: Supports both exclusive and inclusive methods |
| 31 | + This Package: No method parameter |
| 32 | + |
| 33 | + Summary |
| 34 | + |
| 35 | + The package is actually very close to full parity with Python's statistics |
| 36 | + module. The gaps are: |
| 37 | + |
| 38 | + 1. median_grouped - interpolation-based median for grouped/binned data |
| 39 | + 2. kde / kde_random - Kernel Density Estimation (added in Python 3.13, |
| 40 | + relatively new) |
| 41 | + 3. Spearman rank correlation - via method parameter on correlation() |
| 42 | + 4. Proportional linear regression - forcing intercept through origin |
| 43 | + 5. Minor parameter additions (xbar/mu on variance/stdev, method on quantiles) |
| 44 | + |
| 45 | + Items 1, 3, and 4 would be the most practical additions to reach near-complete |
| 46 | + parity with Python's statistics module. The KDE functions (2) are newer and |
| 47 | + more niche. |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | + |
| 52 | + Currently Implemented (for reference) |
| 53 | + |
| 54 | + Central tendency, variance/stdev, median variants, mode/multimode, |
| 55 | + geometric/harmonic mean, quantiles, covariance, correlation, linear |
| 56 | + regression, normal distribution (PDF, CDF, inverse CDF, z-score), frequency |
| 57 | + tables. |
| 58 | + |
| 59 | + --- |
| 60 | + Missing Functions |
| 61 | + |
| 62 | + Descriptive Statistics |
| 63 | + |
| 64 | + - Trimmed/Truncated mean - mean after removing outliers (top/bottom x%) |
| 65 | + - Weighted median - median with weights (like fmean supports weights, but |
| 66 | + median doesn't) |
| 67 | + - Skewness - measure of asymmetry of the distribution |
| 68 | + - Kurtosis - measure of "tailedness" of the distribution |
| 69 | + - Standard error of the mean (SEM) |
| 70 | + - Coefficient of variation (CV) - stdev / mean, useful for comparing |
| 71 | + variability across datasets |
| 72 | + - Mean absolute deviation (MAD) |
| 73 | + - Percentile - arbitrary percentile (e.g., 90th percentile) — quantiles() |
| 74 | + exists but a direct percentile($data, $p) would be convenient |
| 75 | + |
| 76 | + Correlation & Regression |
| 77 | + |
| 78 | + - Spearman rank correlation - non-parametric correlation |
| 79 | + - Kendall tau correlation - another rank-based correlation |
| 80 | + - Multiple/polynomial regression |
| 81 | + - R-squared (coefficient of determination) |
| 82 | + |
| 83 | + Hypothesis Testing |
| 84 | + |
| 85 | + - T-test (one-sample, two-sample, paired) |
| 86 | + - Chi-squared test |
| 87 | + - Z-test |
| 88 | + - P-value calculation |
| 89 | + - Confidence intervals |
| 90 | + |
| 91 | + Other Distributions (beyond Normal) |
| 92 | + |
| 93 | + - Student's t-distribution |
| 94 | + - Chi-squared distribution |
| 95 | + - Binomial distribution |
| 96 | + - Poisson distribution |
| 97 | + - Uniform distribution |
| 98 | + - Exponential distribution |
| 99 | + |
| 100 | + Outlier Detection |
| 101 | + |
| 102 | + - IQR-based outlier detection (the building blocks exist with |
| 103 | + firstQuartile/thirdQuartile, but no dedicated method) |
| 104 | + - Z-score based outlier detection |
| 105 | + |
| 106 | + Ranking & Order Statistics |
| 107 | + |
| 108 | + - Rank - assign ranks to data points |
| 109 | + - Percentile rank - what percentile a given value falls at |
| 110 | + |
| 111 | + --- |
| 112 | + The most impactful additions would likely be skewness, kurtosis, coefficient |
| 113 | + of variation, percentile, and Spearman correlation — these are commonly needed |
| 114 | + and align well with the package's existing scope (inspired by Python's |
| 115 | + statistics module). |
0 commit comments