Summary — verified end-to-end
NormalApproximationStrategy.normal_approximation (and WilsonApproximationStrategy.wilson_approximation via inheritance) raises ValueError: math domain error when an algebraic metric estimate lands just outside its declared value_range=(0, 1) due to ordinary IEEE-754 accumulation in the user's accumulate_metrics_fn.
This is a hard crash of Experiment.run_evals, not a silent-metric-drop. The exception escapes through the final-aggregation path at controller.py:699 (_compute_final_metrics_for_pipelines → aggregator.online_strategy.add_confidence_interval_info), which is NOT wrapped by the broad try/except at controller.py:1248-1251 (that catch only protects a different live-metrics path). run_evals re-raises the ValueError and returns None.
Verified end-to-end via a working notebook that uses rapidfireai's normal public API (Experiment, RFLangChainRagSpec, RFAPIModelConfig, RFGridSearch) with real OpenAI gpt-4o-mini calls over a small FAISS+PyMuPDF RAG. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/a510d4c15aba56d7968bed4167404407
End-to-end repro
The notebook constructs an accumulate_metrics_fn that returns
{"DriftMetric": {"value": 1.0 + 1e-12, "is_algebraic": True, "value_range": (0, 1)}}
simulating the float-drift scenario this issue describes. Run output:
[accumulate] true_mean=1.0 drifted_mean=1.000000000001 over_upper_bound? True
run_evals raised: ValueError math domain error
run_evals completed in 84.5s. results type: NoneType
Stack trace captured in ~/rapidfireai/logs/bug244_repro/rapidfire.log:
File ".../rapidfireai/experiment.py", line 539, in run_evals
results = self.controller.run_multi_pipeline_inference(...)
File ".../rapidfireai/evals/scheduling/controller.py", line 1544, in run_multi_pipeline_inference
final_results = self._compute_final_metrics_for_pipelines(...)
File ".../rapidfireai/evals/scheduling/controller.py", line 699, in _compute_final_metrics_for_pipelines
cumulative_metrics = aggregator.online_strategy.add_confidence_interval_info(...)
File ".../rapidfireai/evals/metrics/online_strategies.py", line 104, in add_confidence_interval_info
estimate, lower, upper = self.get_confidence_interval_algebraic(...)
File ".../rapidfireai/evals/metrics/online_strategies.py", line 314, in get_confidence_interval_algebraic
return self.normal_approximation(...)
File ".../rapidfireai/evals/metrics/online_strategies.py", line 294, in normal_approximation
std_error = math.sqrt(variance)
ValueError: math domain error
Minimal direct repro (without run_evals)
from rapidfireai.evals.metrics.online_strategies import NormalApproximationStrategy
NormalApproximationStrategy().get_confidence_interval_algebraic(
estimate=1.0000000002, sample_size=36, value_range=(0, 1))
# ValueError: math domain error
Triggered in practice whenever accumulate_metrics_fn returns a weighted mean of [0, 1]-valued per-batch metrics: e.g. a Precision@k that is 1.0 on every batch can accumulate to 1.0 + 1ULP rather than exact 1.0, depending on the weight values. Equivalent crash for estimate = -1e-10.
Workaround
Clamp value to the exact value_range boundary inside accumulate_metrics_fn before returning. We use a 1e-9 snap tolerance:
def _clamp01(v): return 1.0 if v >= 1.0-1e-9 else (0.0 if v <= 1e-9 else v)
Environment
- rapidfireai:
main (HEAD 91d94de); same crash on 0.15.2 PyPI
- Python 3.12, Linux
- Setup used in the verification notebook: experiment_name
bug244_repro, endpoint bug244_gpt4omini, model gpt-4o-mini via MLflow gateway, 1 config, 2 questions, 1 shard
Summary — verified end-to-end
NormalApproximationStrategy.normal_approximation(andWilsonApproximationStrategy.wilson_approximationvia inheritance) raisesValueError: math domain errorwhen an algebraic metricestimatelands just outside its declaredvalue_range=(0, 1)due to ordinary IEEE-754 accumulation in the user'saccumulate_metrics_fn.This is a hard crash of
Experiment.run_evals, not a silent-metric-drop. The exception escapes through the final-aggregation path atcontroller.py:699(_compute_final_metrics_for_pipelines→aggregator.online_strategy.add_confidence_interval_info), which is NOT wrapped by the broad try/except atcontroller.py:1248-1251(that catch only protects a different live-metrics path).run_evalsre-raises theValueErrorand returnsNone.Verified end-to-end via a working notebook that uses rapidfireai's normal public API (
Experiment,RFLangChainRagSpec,RFAPIModelConfig,RFGridSearch) with real OpenAI gpt-4o-mini calls over a small FAISS+PyMuPDF RAG. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/a510d4c15aba56d7968bed4167404407End-to-end repro
The notebook constructs an
accumulate_metrics_fnthat returns{"DriftMetric": {"value": 1.0 + 1e-12, "is_algebraic": True, "value_range": (0, 1)}}simulating the float-drift scenario this issue describes. Run output:
Stack trace captured in
~/rapidfireai/logs/bug244_repro/rapidfire.log:Minimal direct repro (without run_evals)
Triggered in practice whenever
accumulate_metrics_fnreturns a weighted mean of[0, 1]-valued per-batch metrics: e.g. a Precision@k that is1.0on every batch can accumulate to1.0 + 1ULPrather than exact1.0, depending on the weight values. Equivalent crash forestimate = -1e-10.Workaround
Clamp
valueto the exactvalue_rangeboundary insideaccumulate_metrics_fnbefore returning. We use a1e-9snap tolerance:Environment
main(HEAD91d94de); same crash on 0.15.2 PyPIbug244_repro, endpointbug244_gpt4omini, modelgpt-4o-minivia MLflow gateway, 1 config, 2 questions, 1 shard