Summary
The codebase declares generic type parameters <T> on classes but uses hardcoded double internally, breaking the generic contract. Users who instantiate these classes with float, decimal, or other numeric types will get silent precision loss, type conversion failures, or incorrect results.
Scale: ~4,563 instances of hardcoded double types across ~4,665 of the 5,552 generic <T> class files in src/.
The Problem
What's happening
Classes are declared as generic <T> (accepting float, double, decimal, etc.) but internally use double for:
- Field declarations (
Vector<double>, Matrix<double>, double[])
- Local variables and intermediate computations
- Return types from helper methods
- Parameter types in private methods
Why it matters
- Silent precision loss: A user instantiating
SuperLearner<decimal> for financial ML gets their high-precision data silently truncated to double precision internally
- Type conversion failures:
NumOps.ToDouble() → compute in double → NumOps.FromDouble() loses data on types wider than double
- Broken generic contract: The
<T> parameter is a lie — the class doesn't actually operate in type T
- Inconsistent behavior: Some code paths use
T correctly while others use double, causing subtle inconsistencies
Concrete Examples
Example 1: Regression/SuperLearner.cs (38 instances)
// Class declared as generic <T>
public class SuperLearner<T> : NonLinearRegressionBase<T>
{
// BUT internal state uses hardcoded double!
private Vector<double>? _cvPerformance; // Line 62 — should be Vector<T>
private Vector<double>? _predMeans; // Line 67 — should be Vector<T>
private Vector<double>? _predStds; // Line 72 — should be Vector<T>
// Internal computations hardcoded to double
var metaFeatures = new Matrix<double>(n, numModels); // Line 132
var yData = new Vector<double>(n); // Line 133
double foldMse = 0; // Line 165
// Return types hardcoded
public Vector<double> GetCVPerformance() // Line 264 — should be Vector<T>
public Vector<double> GetModelContributions() // Line 273 — should be Vector<T>
}
Example 2: Regression/MixedEffectsModel.cs (40 instances)
public class MixedEffectsModel<T> : NonLinearRegressionBase<T>
{
// Hardcoded double in conversions
var yData = new Vector<double>(y.Length); // Line 138
var beta = new Vector<double>(_numFeatures + 1); // Line 149
// Return type hardcoded
public double ComputeICC() // Line 330 — should be T
public double GetLogLikelihood(...) // Line 352 — should be T
// Private helpers all use double
private Matrix<double> InitializeRandomEffectVariance(int dim) // Line 421
private Dictionary<int, Vector<double>> ComputeBLUPs(...) // Line 481
}
Example 3: Preprocessing/FeatureSelection/ (1,512 instances in 100+ files)
This is the worst offender directory. Statistical computations are entirely in double:
// GenericUnivariateSelect.cs
private double[]? _scores; // Line 48
private double[]? _pValues; // Line 49
public double[]? Scores => _scores; // Line 55
private (double[] Scores, double[] PValues) ComputeFClassif( // Line 240
Matrix<T> data, Vector<T> target, int n, int p)
{
var scores = new double[p]; // Line 242
var pValues = new double[p]; // Line 243
double overallMean = 0; // Line 258
double ssb = 0, ssw = 0; // Line 263
}
Top files in Preprocessing/FeatureSelection/:
| File |
double count |
Filter/Univariate/SelectPercentile.cs |
69 |
Filter/Univariate/SelectKBest.cs |
68 |
Helpers/StatisticalTestHelper.cs |
60 |
SelectPercentile.cs |
54 |
Bioinformatics/VolcanoPlotSelector.cs |
52 |
Causal/FCI_Selector.cs |
51 |
Affected Directories (by instance count)
| Directory |
Instances |
Description |
Preprocessing/ |
1,865 |
Feature selection, time series transforms, scalers |
AiDotNet.Playground/ |
212 |
Example service (may be acceptable here) |
MetaLearning/ |
197 |
Meta-learning algorithms |
Finance/ |
192 |
Financial forecasting (precision critical!) |
AnomalyDetection/ |
156 |
Anomaly detection algorithms |
Regression/ |
154 |
Regression models |
FederatedLearning/ |
153 |
Federated training |
Clustering/ |
120 |
Clustering algorithms |
NeuralNetworks/ |
115 |
Neural network layers |
Data/ |
93 |
Data loading |
TextToSpeech/ |
90 |
TTS models |
Classification/ |
79 |
Classification models |
Evaluation/ |
77 |
Model evaluation |
Audio/ |
63 |
Audio processing |
ComputerVision/ |
58 |
Vision models |
The Correct Pattern
AiDotNet already has the right infrastructure — INumericOperations<T> — it just isn't being used consistently:
// ❌ WRONG: Hardcoded double
private Vector<double>? _cvPerformance;
double foldMse = 0;
double diff = yData[valIdx[i]] - NumOps.ToDouble(predictions[i]);
// ✅ CORRECT: Use generic T with NumericOperations
private Vector<T>? _cvPerformance;
T foldMse = NumOps.Zero;
T diff = NumOps.Subtract(yData[valIdx[i]], predictions[i]);
For statistical computations where double is genuinely needed (e.g., p-values, F-statistics):
// ✅ ACCEPTABLE: Using double for well-defined statistical outputs
// that are always in double regardless of model precision
public double[] PValues => _pValues; // p-values are always double
// ✅ CORRECT: Convert at boundaries, compute in T internally
T featureScore = ComputeScore(data, target); // Internal: use T
double pValue = ComputePValue(NumOps.ToDouble(featureScore)); // Boundary: convert to double for stats
Proposed Fix Strategy
Phase 1: Audit and Categorize (~1 day)
- Categorize each
double usage as:
- Must fix: Fields, return types, parameters that should be
T
- Acceptable: Statistical outputs, p-values, probability values that are inherently
double
- Boundary: Conversions at I/O boundaries (logging, display, serialization)
Phase 2: Fix by Module (incremental PRs)
Priority order based on impact and user visibility:
Regression/ — Core regression models (user-facing)
Classification/ — Core classification models (user-facing)
Preprocessing/ — Feature selection and transforms (affects all pipelines)
MetaLearning/ — Meta-learning algorithms
Clustering/ — Clustering algorithms
AnomalyDetection/ — Anomaly detection
- Remaining directories
Phase 3: Add Roslyn Analyzer
Create a custom Roslyn analyzer that flags double usage inside <T> generic classes to prevent regression.
Impact
- Severity: Medium-High (silent data corruption for non-double types)
- Probability: High for any user using
float or decimal (100% of code paths affected)
- Risk of fix: Medium (incremental module-by-module approach limits blast radius)
Related Issues
Summary
The codebase declares generic type parameters
<T>on classes but uses hardcodeddoubleinternally, breaking the generic contract. Users who instantiate these classes withfloat,decimal, or other numeric types will get silent precision loss, type conversion failures, or incorrect results.Scale: ~4,563 instances of hardcoded
doubletypes across ~4,665 of the 5,552 generic<T>class files insrc/.The Problem
What's happening
Classes are declared as generic
<T>(acceptingfloat,double,decimal, etc.) but internally usedoublefor:Vector<double>,Matrix<double>,double[])Why it matters
SuperLearner<decimal>for financial ML gets their high-precision data silently truncated todoubleprecision internallyNumOps.ToDouble()→ compute indouble→NumOps.FromDouble()loses data on types wider thandouble<T>parameter is a lie — the class doesn't actually operate in typeTTcorrectly while others usedouble, causing subtle inconsistenciesConcrete Examples
Example 1:
Regression/SuperLearner.cs(38 instances)Example 2:
Regression/MixedEffectsModel.cs(40 instances)Example 3:
Preprocessing/FeatureSelection/(1,512 instances in 100+ files)This is the worst offender directory. Statistical computations are entirely in
double:Top files in
Preprocessing/FeatureSelection/:doublecountFilter/Univariate/SelectPercentile.csFilter/Univariate/SelectKBest.csHelpers/StatisticalTestHelper.csSelectPercentile.csBioinformatics/VolcanoPlotSelector.csCausal/FCI_Selector.csAffected Directories (by instance count)
Preprocessing/AiDotNet.Playground/MetaLearning/Finance/AnomalyDetection/Regression/FederatedLearning/Clustering/NeuralNetworks/Data/TextToSpeech/Classification/Evaluation/Audio/ComputerVision/The Correct Pattern
AiDotNet already has the right infrastructure —
INumericOperations<T>— it just isn't being used consistently:For statistical computations where
doubleis genuinely needed (e.g., p-values, F-statistics):Proposed Fix Strategy
Phase 1: Audit and Categorize (~1 day)
doubleusage as:TdoublePhase 2: Fix by Module (incremental PRs)
Priority order based on impact and user visibility:
Regression/— Core regression models (user-facing)Classification/— Core classification models (user-facing)Preprocessing/— Feature selection and transforms (affects all pipelines)MetaLearning/— Meta-learning algorithmsClustering/— Clustering algorithmsAnomalyDetection/— Anomaly detectionPhase 3: Add Roslyn Analyzer
Create a custom Roslyn analyzer that flags
doubleusage inside<T>generic classes to prevent regression.Impact
floatordecimal(100% of code paths affected)Related Issues