Skip to content

Commit 2f20dcd

Browse files
committed
fix(backtesting): aggregate summary across all walk-forward runs (#511)
Primary fix ----------- - Backtest.save() now always rebuilds backtest_summary from the current backtest_runs (via generate_backtest_summary_metrics) before writing summary.json, so the on-disk summary is always self-consistent with the per-run metrics.json files (number_of_windows, total_net_gain, number_of_trades_closed, win_rate, drawdown, ...). - Backtest.merge() no longer calls the non-existent BacktestSummaryMetrics.add() method (which silently produced a stale / single-window summary). It now rebuilds the summary from the combined runs and propagates algorithm_id. Secondary fixes (B1-B5 from the issue) -------------------------------------- - B1: per-run BacktestMetrics.total_loss is now the gross loss magnitude (sum(abs(net_gain)) over losing trades), and total_loss_percentage is total_loss / initial_unallocated as a non-negative decimal. Snapshot-based get_total_loss is kept and marked deprecated in its docstring. - B2: aggregate total_loss is sum(per-run total_loss); aggregate total_loss_percentage is sum(total_loss) / sum(initial_unallocated) - same units (positive currency / non-negative decimal). - B3, B4: documented total_growth*/average_* as legacy duplicates / cross-window aggregates in the BacktestMetrics and BacktestSummaryMetrics docstrings (no API change). - B5: _compound_percentage_returns now expects/returns decimals (e.g. 0.10 for 10%), matching the rest of the framework. The earlier whole-number-percentage assumption silently produced results off by a factor of ~100 in multi-window summaries. - create_backtest_metrics now populates trading_symbol, initial_unallocated, and backtest_date_range_name on the BacktestMetrics it returns, so per-run metrics carry the context needed for downstream aggregation. Tests ----- - Added regression test test_save_multi_window_summary_aggregates_all_runs that constructs a 3-window backtest, deliberately seeds a stale single-window summary, calls save(), and asserts that summary.json on disk reports number_of_windows=3 and the correct sums. - Added test_merge_rebuilds_summary_from_all_runs verifying the merge path no longer raises AttributeError and produces a 2-window summary. - Added test_total_loss_is_gross_loss_magnitude regression test for B1. - Updated existing test_combine.py and test_generate_backtest_summary_metrics.py fixtures/assertions for the new gross-loss semantics and decimal-percentage convention. Refs: #511
1 parent 544eaa7 commit 2f20dcd

12 files changed

Lines changed: 413 additions & 84 deletions

File tree

investing_algorithm_framework/domain/backtesting/backtest.py

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -418,6 +418,23 @@ def save(
418418
os.makedirs(destination_run_path, exist_ok=True)
419419
br.save(destination_run_path)
420420

421+
# Always rebuild the summary from the current set of backtest runs
422+
# before writing it to disk. This guarantees that summary.json is
423+
# self-consistent with the per-run metrics.json files (e.g.
424+
# number_of_windows == number of runs, total_net_gain == sum of
425+
# per-run total_net_gain, etc.) regardless of how the in-memory
426+
# Backtest was constructed (single backtest, walk-forward,
427+
# merge(), …). See issue #511.
428+
if self.backtest_runs:
429+
per_run_metrics = [
430+
br.backtest_metrics for br in self.backtest_runs
431+
if br.backtest_metrics is not None
432+
]
433+
if per_run_metrics:
434+
self.backtest_summary = generate_backtest_summary_metrics(
435+
per_run_metrics
436+
)
437+
421438
# Save combined backtest metrics if available
422439
if self.backtest_summary:
423440
summary_file = os.path.join(
@@ -517,15 +534,22 @@ def merge(self, other: 'Backtest') -> 'Backtest':
517534
Backtest: The merged Backtest instance.
518535
"""
519536

520-
merged = Backtest()
537+
merged = Backtest(algorithm_id=self.algorithm_id)
521538
merged.backtest_runs = self.backtest_runs + other.backtest_runs
522539

523-
summary = BacktestSummaryMetrics()
524-
525-
for bt_run in merged.get_all_backtest_metrics():
526-
summary.add(bt_run)
540+
# Rebuild the summary from the full set of merged backtest runs.
541+
# `BacktestSummaryMetrics` is a plain dataclass and does not expose
542+
# an `add()` method, so the previous incremental approach raised
543+
# AttributeError and silently produced a stale / single-window
544+
# summary. See issue #511.
545+
merged_metrics = merged.get_all_backtest_metrics()
546+
if merged_metrics:
547+
merged.backtest_summary = generate_backtest_summary_metrics(
548+
merged_metrics
549+
)
550+
else:
551+
merged.backtest_summary = BacktestSummaryMetrics()
527552

528-
merged.backtest_summary = summary
529553
merged.backtest_permutation_tests = \
530554
self.backtest_permutation_tests + other.backtest_permutation_tests
531555

investing_algorithm_framework/domain/backtesting/backtest_metrics.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,21 @@ class BacktestMetrics:
2020
total return, annualized return, volatility, Sharpe ratio,
2121
and maximum drawdown.
2222
23+
.. note:: Field semantics & known duplicates (issue #511)
24+
25+
- ``total_loss`` is the **gross loss magnitude** (a non-negative
26+
number equal to ``sum(abs(net_gain))`` over losing trades).
27+
``total_loss_percentage`` is ``total_loss /
28+
initial_unallocated`` (decimal, non-negative). These fields
29+
no longer mirror ``total_net_gain`` / ``total_net_gain_pct``;
30+
see B1 in issue #511.
31+
32+
- ``total_growth`` / ``total_growth_percentage`` are
33+
numerically equivalent to ``total_net_gain`` /
34+
``total_net_gain_percentage`` for the standard
35+
mark-to-market portfolio used in backtests. They are kept as
36+
legacy aliases. See B3 in issue #511.
37+
2338
Attributes:
2439
backtest_date_range_name (str): The name of the date range
2540
used for the backtest.

investing_algorithm_framework/domain/backtesting/backtest_summary_metrics.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,30 @@ class BacktestSummaryMetrics:
1313
Represents the summarized results of a backtest,
1414
focusing on key headline performance and risk metrics.
1515
16+
.. note:: Field semantics & known duplicates (issue #511)
17+
18+
- ``total_loss`` / ``total_loss_percentage`` are gross-loss
19+
based: ``total_loss`` is ``sum(per-run gross_loss)`` (a
20+
non-negative magnitude in account currency) and
21+
``total_loss_percentage`` is ``total_loss /
22+
sum(initial_unallocated)`` (decimal). They no longer mix
23+
with net-return semantics. See B1/B2 in issue #511.
24+
25+
- ``total_growth`` / ``total_growth_percentage`` are
26+
numerically equivalent to ``total_net_gain`` /
27+
``total_net_gain_percentage`` for closed-position backtests
28+
because both are derived from the same start/end portfolio
29+
values. They are kept for backwards compatibility but should
30+
be considered legacy aliases. See B3 in issue #511.
31+
32+
- ``average_net_gain``, ``average_loss``, ``average_growth``
33+
(and their ``*_percentage`` counterparts) are time-weighted
34+
means **across windows**. For a single-window backtest they
35+
collapse to the corresponding ``total_*`` value by
36+
definition (weighted mean of one element equals that
37+
element). See B4 in issue #511. For per-trade averages use
38+
``average_trade_*`` instead.
39+
1640
Attributes:
1741
total_net_gain (float): Total net gain from the backtest.
1842
total_net_gain_percentage (float): Total net gain percentage

investing_algorithm_framework/domain/backtesting/combine_backtests.py

Lines changed: 35 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -39,28 +39,32 @@ def _compound_percentage_returns(percentages):
3939
Compound percentage returns across multiple periods.
4040
4141
For example, if period 1 has 10% return and period 2 has 5% return,
42-
the compounded return is: (1 + 0.10) * (1 + 0.05) - 1 = 15.5%
43-
NOT simply 10% + 5% = 15%
42+
the compounded return is: (1 + 0.10) * (1 + 0.05) - 1 = 0.155 = 15.5%
43+
NOT simply 0.10 + 0.05 = 0.15.
44+
45+
The framework consistently represents percentages as **decimals**
46+
(e.g. ``0.10`` for 10%), so this helper expects decimal inputs and
47+
returns a decimal. See issue #511 (B5) — earlier versions assumed
48+
whole-number percentages, which silently produced results off by a
49+
factor of ~100 once multi-window aggregation was exercised.
4450
4551
Args:
46-
percentages (List[float | None]): List of percentage returns
47-
(as whole numbers, e.g., 10 for 10%).
52+
percentages (List[float | None]): List of period returns expressed
53+
as decimals (e.g. ``0.10`` for 10%).
4854
4955
Returns:
50-
float | None: The compounded percentage return, or None if no
51-
valid percentages.
56+
float | None: The compounded return as a decimal, or ``None`` if
57+
no valid percentages.
5258
"""
5359
valid_percentages = [p for p in percentages if p is not None]
5460
if not valid_percentages:
5561
return None
5662

57-
# Convert percentages to decimals, compound, then convert back
5863
compounded = 1.0
5964
for pct in valid_percentages:
60-
compounded *= (1 + pct / 100)
65+
compounded *= (1 + pct)
6166

62-
# Convert back to percentage
63-
return (compounded - 1) * 100
67+
return compounded - 1
6468

6569

6670
def combine_backtests(backtests):
@@ -173,9 +177,13 @@ def generate_backtest_summary_metrics(
173177
b.total_net_gain for b in valid_metrics
174178
if b.total_net_gain is not None
175179
)
180+
# B1/B2 fix (issue #511): per-run ``total_loss`` is now the gross
181+
# loss magnitude, so the aggregate is simply the sum of per-run
182+
# ``total_loss`` (equivalent to ``sum(gross_loss)``). Both per-run
183+
# and aggregate use the same unit (positive currency).
176184
total_loss = sum(
177-
b.gross_loss for b in valid_metrics
178-
if b.gross_loss is not None
185+
b.total_loss for b in valid_metrics
186+
if b.total_loss is not None
179187
)
180188
total_growth = sum(
181189
b.total_growth for b in valid_metrics
@@ -184,13 +192,24 @@ def generate_backtest_summary_metrics(
184192

185193
# === PERCENTAGE RETURNS (compounded, not summed) ===
186194
# Compound returns: (1 + r1) * (1 + r2) * ... - 1
187-
# For percentages stored as whole numbers (e.g., 10 for 10%)
195+
# All percentages are stored as decimals (e.g. 0.10 for 10%).
188196
total_net_gain_percentage = _compound_percentage_returns(
189197
[b.total_net_gain_percentage for b in valid_metrics]
190198
)
191-
total_loss_percentage = _compound_percentage_returns(
192-
[b.total_loss_percentage for b in valid_metrics]
193-
)
199+
# ``total_loss`` is a non-multiplicative magnitude (it does not
200+
# compound across windows). Express the aggregate as the sum of
201+
# gross losses divided by the sum of initial capital across
202+
# windows, which keeps the unit (decimal fraction) consistent
203+
# with the per-run definition. See issue #511 (B2).
204+
total_initial_value = 0.0
205+
for b in valid_metrics:
206+
iv = getattr(b, "initial_unallocated", None)
207+
if isinstance(iv, (int, float)) and iv > 0:
208+
total_initial_value += iv
209+
if total_initial_value > 0 and total_loss is not None:
210+
total_loss_percentage = total_loss / total_initial_value
211+
else:
212+
total_loss_percentage = None
194213
total_growth_percentage = _compound_percentage_returns(
195214
[b.total_growth_percentage for b in valid_metrics]
196215
)

investing_algorithm_framework/services/metrics/generate.py

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
get_average_monthly_return, get_average_monthly_return_winning_months, \
2222
get_average_monthly_return_losing_months, get_cumulative_return, \
2323
get_cumulative_return_series
24-
from .returns import get_total_return, get_final_value, get_total_loss, \
24+
from .returns import get_total_return, get_final_value, \
2525
get_total_growth
2626
from .sharpe_ratio import get_sharpe_ratio, get_rolling_sharpe_ratio
2727
from .sortino_ratio import get_sortino_ratio
@@ -232,6 +232,11 @@ def create_backtest_metrics(
232232
backtest_metrics = BacktestMetrics(
233233
backtest_start_date=backtest_run.backtest_start_date,
234234
backtest_end_date=backtest_run.backtest_end_date,
235+
backtest_date_range_name=(
236+
backtest_run.backtest_date_range_name or ""
237+
),
238+
trading_symbol=backtest_run.trading_symbol or "",
239+
initial_unallocated=backtest_run.initial_unallocated or 0.0,
235240
)
236241

237242
def safe_set(metric_name, func, *args, index=None):
@@ -268,11 +273,26 @@ def safe_set(metric_name, func, *args, index=None):
268273

269274
if "total_loss" in metrics or "total_loss_percentage" in metrics:
270275
try:
271-
total_loss = get_total_loss(backtest_run.portfolio_snapshots)
276+
# B1 fix (issue #511): "total_loss" is now the **gross loss**
277+
# (sum of P&L of all losing trades, expressed as a non-negative
278+
# magnitude) rather than the snapshot net-return clamped at
279+
# zero. The latter was indistinguishable from
280+
# ``total_net_gain`` whenever the period was unprofitable and
281+
# always 0 otherwise. ``total_loss_percentage`` is the gross
282+
# loss expressed as a fraction of the initial unallocated
283+
# capital (decimal, e.g. ``0.05`` for a 5% loss magnitude).
284+
gross_loss_value = get_gross_loss(backtest_run.trades)
285+
initial_value = backtest_run.initial_unallocated or 0.0
286+
272287
if "total_loss" in metrics:
273-
backtest_metrics.total_loss = total_loss[0]
288+
backtest_metrics.total_loss = gross_loss_value
274289
if "total_loss_percentage" in metrics:
275-
backtest_metrics.total_loss_percentage = total_loss[1]
290+
if initial_value > 0:
291+
backtest_metrics.total_loss_percentage = (
292+
gross_loss_value / initial_value
293+
)
294+
else:
295+
backtest_metrics.total_loss_percentage = 0.0
276296
except OperationalException as e:
277297
logger.warning(f"total_loss failed: {e}")
278298

investing_algorithm_framework/services/metrics/returns.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -302,10 +302,19 @@ def get_total_loss(
302302
snapshots: List[PortfolioSnapshot]
303303
) -> Tuple[float, float]:
304304
"""
305-
Calculate the total loss from portfolio snapshots.
306-
307-
The total loss is calculated as the percentage change in portfolio value
308-
from the first snapshot to the last snapshot, only if there is a loss.
305+
Legacy helper — net portfolio change clamped at zero when positive.
306+
307+
.. deprecated::
308+
This is **not** a real loss metric. It is the net portfolio
309+
return clamped to zero whenever the period was profitable, which
310+
means it is identical to ``total_net_gain`` whenever the period
311+
is unprofitable and ``0.0`` otherwise. It is no longer used to
312+
populate ``BacktestMetrics.total_loss`` — that field now holds
313+
the **gross loss** (sum of P&L of all losing trades). See issue
314+
#511 (B1). This function is kept for backwards compatibility and
315+
will be removed in a future release; prefer
316+
:func:`get_gross_loss` from
317+
``investing_algorithm_framework.services.metrics.profit_factor``.
309318
310319
Args:
311320
snapshots (List[PortfolioSnapshot]): List of portfolio snapshots.

0 commit comments

Comments
 (0)