v8.7.2
Bug fixes — backtest pipeline memory
v8.7.1 bounded in-flight tasks but two underlying causes of OOM
remained for users with thousands of large backtests:
recalculate_backtests(List[Backtest])requires the entire batch
to be resident in the parent process before the call. With
thousands of backtests holding portfolio snapshots, trades and
timeseries, that alone is tens of GB.ProcessPoolExecutorreuses worker processes across tasks. Heavy
metric calculations leave behind cached pandas / polars / numpy
buffers the worker's allocator never returns to the OS, so RSS
grows over time even with bounded in-flight tasks.
New streaming APIs
from investing_algorithm_framework import (
recalculate_backtests_in_directory,
iter_backtests_from_directory,
)
# Stream-recalculate every bundle on disk; the parent never holds a
# Backtest in memory.
recalculate_backtests_in_directory("./backtests", workers=4)
# Process backtests one at a time without materialising a list.
for bt in iter_backtests_from_directory("./backtests"):
do_something(bt)
del btrecalculate_backtests_in_directory loads, recalculates and saves
each backtest inside a worker process. The Backtest never crosses
the process boundary — workers return only the destination path and a
small index row, used to rewrite index.parquet incrementally.
Worker recycling
max_tasks_per_child=16 was added to:
recalculate_backtestsrecalculate_backtests_in_directoryload_backtests_from_directorymigrate_backtests
On Python 3.11+ this uses the native ProcessPoolExecutor parameter;
on 3.10 it is emulated by closing and re-opening the executor every
max_tasks_per_child × workers completions. This keeps long-running
batches' RSS bounded.