Skip to content

[pull] develop from freqtrade:develop#1795

Merged
pull[bot] merged 10 commits into
Uncodedtech:developfrom
freqtrade:develop
Jun 21, 2026
Merged

[pull] develop from freqtrade:develop#1795
pull[bot] merged 10 commits into
Uncodedtech:developfrom
freqtrade:develop

Conversation

@pull

@pull pull Bot commented Jun 21, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

yongzhe2160cs and others added 10 commits June 4, 2026 14:50
Freqtrade already reports the SQN, which is the t-statistic of the mean
per-trade return (sqrt(n) * mean / std), but never tells the user whether
that value is statistically distinguishable from zero. This adds a
two-sided p-value for the null hypothesis "mean trade return = 0" so users
can judge whether a backtest result represents a real edge or just noise.

- `calculate_p_value()` in data/metrics.py computes the one-sample Student's
  t-test p-value. The Student's t CDF is evaluated via a pure-Python
  regularized incomplete beta function (continued fraction, Numerical
  Recipes), so no SciPy dependency is added to the core backtest path
  (SciPy is only an optional hyperopt extra).
- The value is added to the strategy stats as `profit_p_value` and shown as
  "Mean profit p-value" in the SUMMARY METRICS table, right after SQN.
- Backward compatible: older stored results without the key render "N/A".
- Tests validate the p-value against scipy.stats.ttest_1samp reference
  values, scale invariance, and edge cases (n<2, zero variance). Docs
  updated with the metric description and its i.i.d. caveat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Following review feedback on the backtest report's "Mean profit p-value"
metric, trim the legend entry to a crisp definition and add a plain-English
note for non-statisticians: the p-value is the chance pure luck would
produce an average result at least this far from zero if the strategy had
no edge (so p=0.48 ~ 48% chance from noise), lower means less likely a
fluke, the usual bar is <0.05, and the i.i.d. assumption plus multiple
testing mean it flags absence of significance rather than proving an edge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov flagged 7 uncovered patch lines, all in the new pure-Python
incomplete-beta helpers behind calculate_p_value. Add targeted tests:

- _regularized_incomplete_beta against closed-form values (uniform, x**2,
  2x-x**2, 3x**2-2x**3, arcsine) plus the x<=0 / x>=1 domain guards.
- calculate_p_value with break-even (zero-mean) returns -> p-value 1.0,
  exercising the x>=1 path through the public API.
- _beta_continued_fraction at degenerate points that trip the Lentz
  underflow guards, asserting the result stays finite.

The two continued-fraction c-underflow guards cannot be reached for the
bounded (a, b) argument range this routine uses (verified empirically over
~3.2M argument combinations), so they are marked `# pragma: no cover`.
Changed lines are now fully covered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Apply @xmatthias's review feedback (PR #13227):

- metrics.py: drop the misleading `# pragma: no cover` markers and the
  false "cannot be reached" claim on the continued-fraction underflow
  guards; leave them as honest, plainly-uncovered defensive code.
- Rename the stats key `profit_p_value` -> `p_value` to match the bare
  sibling keys (sqn, calmar, sharpe, sortino); no prefix, no plan for
  multiple p-value types.
- docs/backtesting.md: use the actual `.4g`-formatted value (0.4799)
  instead of the hand-rounded 0.48, and collapse the explanatory
  admonition (`!!!` -> `???`).
- test_optimize_reports.py: assert the exact computed p-value via
  pytest.approx instead of the weak 0 <= p <= 1 range check.
- test_metrics.py: compute the reference with scipy.stats.ttest_1samp
  directly (with an importorskip guard) rather than hard-coding values;
  drop the misleading "stay self-contained / don't introduce scipy"
  comment (scipy is in the dev/test env).
- conftest.py: give the macOS torch mock a dummy `Tensor` attribute so
  SciPy's array-API dispatch stays usable under the mock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per maintainer feedback on PR #13227:
- Replace hand-rolled betainc / t-distribution implementation with a
  one-liner: scipy.stats.ttest_1samp. Much simpler and avoids reinventing
  well-tested numerics.
- Move scipy from requirements-hyperopt.txt to requirements.txt so it is
  always installed and no random runtime errors occur.
- Update test to derive expected p-values live via scipy rather than
  hard-coded reference numbers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Mean profit p-value in the backtesting example was hand-computed
(0.4799). Regenerated it from the documented backtest command
(SampleStrategy on tests/testdata/config.tests.usdt.json, bybit futures,
20250701-20250801, 5m) so the figure is reproducible and verifiable:
the actual value is 0.4768. All other metrics in the example are
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enables freqUI to display this.
…pvalue

Add mean-trade-return p-value to backtest summary metrics
@pull pull Bot locked and limited conversation to collaborators Jun 21, 2026
@pull pull Bot added the ⤵️ pull label Jun 21, 2026
@pull pull Bot merged commit 064e67c into Uncodedtech:develop Jun 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants