[pull] develop from freqtrade:develop#1795
Merged
Merged
Conversation
Freqtrade already reports the SQN, which is the t-statistic of the mean per-trade return (sqrt(n) * mean / std), but never tells the user whether that value is statistically distinguishable from zero. This adds a two-sided p-value for the null hypothesis "mean trade return = 0" so users can judge whether a backtest result represents a real edge or just noise. - `calculate_p_value()` in data/metrics.py computes the one-sample Student's t-test p-value. The Student's t CDF is evaluated via a pure-Python regularized incomplete beta function (continued fraction, Numerical Recipes), so no SciPy dependency is added to the core backtest path (SciPy is only an optional hyperopt extra). - The value is added to the strategy stats as `profit_p_value` and shown as "Mean profit p-value" in the SUMMARY METRICS table, right after SQN. - Backward compatible: older stored results without the key render "N/A". - Tests validate the p-value against scipy.stats.ttest_1samp reference values, scale invariance, and edge cases (n<2, zero variance). Docs updated with the metric description and its i.i.d. caveat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Following review feedback on the backtest report's "Mean profit p-value" metric, trim the legend entry to a crisp definition and add a plain-English note for non-statisticians: the p-value is the chance pure luck would produce an average result at least this far from zero if the strategy had no edge (so p=0.48 ~ 48% chance from noise), lower means less likely a fluke, the usual bar is <0.05, and the i.i.d. assumption plus multiple testing mean it flags absence of significance rather than proving an edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov flagged 7 uncovered patch lines, all in the new pure-Python incomplete-beta helpers behind calculate_p_value. Add targeted tests: - _regularized_incomplete_beta against closed-form values (uniform, x**2, 2x-x**2, 3x**2-2x**3, arcsine) plus the x<=0 / x>=1 domain guards. - calculate_p_value with break-even (zero-mean) returns -> p-value 1.0, exercising the x>=1 path through the public API. - _beta_continued_fraction at degenerate points that trip the Lentz underflow guards, asserting the result stays finite. The two continued-fraction c-underflow guards cannot be reached for the bounded (a, b) argument range this routine uses (verified empirically over ~3.2M argument combinations), so they are marked `# pragma: no cover`. Changed lines are now fully covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Apply @xmatthias's review feedback (PR #13227): - metrics.py: drop the misleading `# pragma: no cover` markers and the false "cannot be reached" claim on the continued-fraction underflow guards; leave them as honest, plainly-uncovered defensive code. - Rename the stats key `profit_p_value` -> `p_value` to match the bare sibling keys (sqn, calmar, sharpe, sortino); no prefix, no plan for multiple p-value types. - docs/backtesting.md: use the actual `.4g`-formatted value (0.4799) instead of the hand-rounded 0.48, and collapse the explanatory admonition (`!!!` -> `???`). - test_optimize_reports.py: assert the exact computed p-value via pytest.approx instead of the weak 0 <= p <= 1 range check. - test_metrics.py: compute the reference with scipy.stats.ttest_1samp directly (with an importorskip guard) rather than hard-coding values; drop the misleading "stay self-contained / don't introduce scipy" comment (scipy is in the dev/test env). - conftest.py: give the macOS torch mock a dummy `Tensor` attribute so SciPy's array-API dispatch stays usable under the mock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per maintainer feedback on PR #13227: - Replace hand-rolled betainc / t-distribution implementation with a one-liner: scipy.stats.ttest_1samp. Much simpler and avoids reinventing well-tested numerics. - Move scipy from requirements-hyperopt.txt to requirements.txt so it is always installed and no random runtime errors occur. - Update test to derive expected p-values live via scipy rather than hard-coded reference numbers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Mean profit p-value in the backtesting example was hand-computed (0.4799). Regenerated it from the documented backtest command (SampleStrategy on tests/testdata/config.tests.usdt.json, bybit futures, 20250701-20250801, 5m) so the figure is reproducible and verifiable: the actual value is 0.4768. All other metrics in the example are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enables freqUI to display this.
…pvalue Add mean-trade-return p-value to backtest summary metrics
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )