[BUG] normal_approximation crashes with math.sqrt domain error when estimate drifts past [0,1]

## Summary — verified end-to-end

`NormalApproximationStrategy.normal_approximation` (and `WilsonApproximationStrategy.wilson_approximation` via inheritance) raises `ValueError: math domain error` when an algebraic metric `estimate` lands just outside its declared `value_range=(0, 1)` due to ordinary IEEE-754 accumulation in the user's `accumulate_metrics_fn`.

**This is a hard crash of `Experiment.run_evals`, not a silent-metric-drop.** The exception escapes through the final-aggregation path at `controller.py:699` (`_compute_final_metrics_for_pipelines` → `aggregator.online_strategy.add_confidence_interval_info`), which is NOT wrapped by the broad try/except at `controller.py:1248-1251` (that catch only protects a different live-metrics path). `run_evals` re-raises the `ValueError` and returns `None`.

**Verified end-to-end** via a working notebook that uses rapidfireai's normal public API (`Experiment`, `RFLangChainRagSpec`, `RFAPIModelConfig`, `RFGridSearch`) with real OpenAI gpt-4o-mini calls over a small FAISS+PyMuPDF RAG. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/a510d4c15aba56d7968bed4167404407

## End-to-end repro

The notebook constructs an `accumulate_metrics_fn` that returns

```python
{"DriftMetric": {"value": 1.0 + 1e-12, "is_algebraic": True, "value_range": (0, 1)}}
```

simulating the float-drift scenario this issue describes. Run output:

```
[accumulate] true_mean=1.0 drifted_mean=1.000000000001 over_upper_bound? True
run_evals raised: ValueError math domain error
run_evals completed in 84.5s. results type: NoneType
```

Stack trace captured in `~/rapidfireai/logs/bug244_repro/rapidfire.log`:

```
File ".../rapidfireai/experiment.py", line 539, in run_evals
    results = self.controller.run_multi_pipeline_inference(...)
File ".../rapidfireai/evals/scheduling/controller.py", line 1544, in run_multi_pipeline_inference
    final_results = self._compute_final_metrics_for_pipelines(...)
File ".../rapidfireai/evals/scheduling/controller.py", line 699, in _compute_final_metrics_for_pipelines
    cumulative_metrics = aggregator.online_strategy.add_confidence_interval_info(...)
File ".../rapidfireai/evals/metrics/online_strategies.py", line 104, in add_confidence_interval_info
    estimate, lower, upper = self.get_confidence_interval_algebraic(...)
File ".../rapidfireai/evals/metrics/online_strategies.py", line 314, in get_confidence_interval_algebraic
    return self.normal_approximation(...)
File ".../rapidfireai/evals/metrics/online_strategies.py", line 294, in normal_approximation
    std_error = math.sqrt(variance)
ValueError: math domain error
```

## Minimal direct repro (without run_evals)

```python
from rapidfireai.evals.metrics.online_strategies import NormalApproximationStrategy
NormalApproximationStrategy().get_confidence_interval_algebraic(
    estimate=1.0000000002, sample_size=36, value_range=(0, 1))
# ValueError: math domain error
```

Triggered in practice whenever `accumulate_metrics_fn` returns a weighted mean of `[0, 1]`-valued per-batch metrics: e.g. a Precision@k that is `1.0` on every batch can accumulate to `1.0 + 1ULP` rather than exact `1.0`, depending on the weight values. Equivalent crash for `estimate = -1e-10`.

## Workaround

Clamp `value` to the exact `value_range` boundary inside `accumulate_metrics_fn` before returning. We use a `1e-9` snap tolerance:

```python
def _clamp01(v): return 1.0 if v >= 1.0-1e-9 else (0.0 if v <= 1e-9 else v)
```

## Environment
- rapidfireai: `main` (HEAD `91d94de`); same crash on 0.15.2 PyPI
- Python 3.12, Linux
- Setup used in the verification notebook: experiment_name `bug244_repro`, endpoint `bug244_gpt4omini`, model `gpt-4o-mini` via MLflow gateway, 1 config, 2 questions, 1 shard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] normal_approximation crashes with math.sqrt domain error when estimate drifts past [0,1] #244

Summary — verified end-to-end

End-to-end repro

Minimal direct repro (without run_evals)

Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] normal_approximation crashes with math.sqrt domain error when estimate drifts past [0,1] #244

Description

Summary — verified end-to-end

End-to-end repro

Minimal direct repro (without run_evals)

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions