Skip to content

v8.7.2

Choose a tag to compare

@MDUYN MDUYN released this 05 May 21:00
· 24 commits to main since this release

Bug fixes — backtest pipeline memory

v8.7.1 bounded in-flight tasks but two underlying causes of OOM
remained for users with thousands of large backtests:

  1. recalculate_backtests(List[Backtest]) requires the entire batch
    to be resident in the parent process before the call. With
    thousands of backtests holding portfolio snapshots, trades and
    timeseries, that alone is tens of GB.
  2. ProcessPoolExecutor reuses worker processes across tasks. Heavy
    metric calculations leave behind cached pandas / polars / numpy
    buffers the worker's allocator never returns to the OS, so RSS
    grows over time even with bounded in-flight tasks.

New streaming APIs

from investing_algorithm_framework import (
    recalculate_backtests_in_directory,
    iter_backtests_from_directory,
)

# Stream-recalculate every bundle on disk; the parent never holds a
# Backtest in memory.
recalculate_backtests_in_directory("./backtests", workers=4)

# Process backtests one at a time without materialising a list.
for bt in iter_backtests_from_directory("./backtests"):
    do_something(bt)
    del bt

recalculate_backtests_in_directory loads, recalculates and saves
each backtest inside a worker process. The Backtest never crosses
the process boundary — workers return only the destination path and a
small index row, used to rewrite index.parquet incrementally.

Worker recycling

max_tasks_per_child=16 was added to:

  • recalculate_backtests
  • recalculate_backtests_in_directory
  • load_backtests_from_directory
  • migrate_backtests

On Python 3.11+ this uses the native ProcessPoolExecutor parameter;
on 3.10 it is emulated by closing and re-opening the executor every
max_tasks_per_child × workers completions. This keeps long-running
batches' RSS bounded.