Skip to content

Commit b84cce7

Browse files
committed
perf(backtesting): 3x faster Backtest.open + dashboard bundle support
- Add domain/datetime_parsing.parse_datetime fast helper using datetime.fromisoformat with dateutil fallback. The dateutil parser was 88% of Backtest.open time per profiling. - Use the fast helper in PortfolioSnapshot, Order, Trade, TradeStopLoss, TradeTakeProfit hot from_dict paths. Backtest.open: 1025ms -> 334ms on the real example batch (3x faster). - BacktestReport.open() now discovers .iafbt bundle files in addition to legacy directory layouts, and accepts an optional workers= kwarg for parallel loading of large batches. - Document the .iafbt bundle format in README and Getting Started docs as the framework's custom optimized backtest persistence format (~21x smaller, ~27x fewer files, ~3x faster to load).
1 parent 0c56940 commit b84cce7

10 files changed

Lines changed: 127 additions & 23 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ This framework is built around the full loop: **create strategies → vector bac
8686
- 🎯 **Return Scenario Projections** — Good, average, bad & very bad year projections from backtest data
8787
- 📉 **Benchmark Comparison** — Beat-rate analysis vs Buy & Hold, DCA, risk-free & custom benchmarks
8888
- 📄 **One-Click HTML Report** — Self-contained file, no server, dark & light theme, shareable
89+
- 📦 **Custom `.iafbt` Bundle Format** — Optimized binary persistence (zstd + MessagePack) for backtest results: ~21× smaller and ~27× fewer files than the legacy directory layout, with parallel I/O for fast load/save of large batches
8990
- 🌐 **Load External Data** — Fetch CSV, JSON, or Parquet from any URL with caching and auto-refresh
9091
-**[Record Custom Variables](https://coding-kitties.github.io/investing-algorithm-framework/Advanced%20Concepts/recording-variables)** — Track any indicator or metric during backtests with `context.record()`
9192
- �🚀 **Build → Backtest → Deploy** — Local dev, cloud deploy (AWS / Azure), or monetize on Finterion

docusaurus/docs/Getting Started/backtest-reports.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,17 @@ report = BacktestReport.open(directory_path="./my_backtests")
6060
report.show()
6161
```
6262

63-
The `open()` method recursively finds all valid backtest directories (containing `algorithm_id.json` and a `runs/` folder) and loads them into a single report.
63+
The `open()` method recursively finds all valid backtest directories (containing `algorithm_id.json` and a `runs/` folder) **and** any `.iafbt` bundle files, and loads them into a single report.
64+
65+
:::tip Optimized `.iafbt` bundle format
66+
Backtests are saved by default in the framework's custom **`.iafbt` bundle format** — a single binary file per backtest combining zstd compression and MessagePack encoding. It is purpose-built for backtest reports: ~21× smaller and ~27× fewer files than the legacy directory format, and `BacktestReport.open()` loads it ~3× faster. The legacy directory format is still fully supported for backwards compatibility, and you can mix both in the same folder.
67+
68+
For very large batches, opt into parallel loading:
69+
70+
```python
71+
report = BacktestReport.open(directory_path="./my_backtests", workers=4)
72+
```
73+
:::
6474

6575
You can also combine disk and in-memory backtests:
6676

docusaurus/docs/Getting Started/backtesting.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,10 @@ from investing_algorithm_framework import load_backtests_from_directory
8585
backtests = load_backtests_from_directory("./my_backtests")
8686
```
8787

88+
:::info `.iafbt` bundle format
89+
Backtests are persisted in the framework's optimized **`.iafbt` bundle format** — a single binary file per backtest using zstd compression + MessagePack encoding. Compared to the legacy directory layout it is ~21× smaller, ~27× fewer files, and ~3× faster to load. Both `save_backtests_to_directory` and `load_backtests_from_directory` support parallel I/O via `workers=N`. Existing legacy directories keep working transparently; use `iaf migrate-backtests --src ... --dst ...` to convert them.
90+
:::
91+
8892
## Reporting
8993

9094
Use [Backtest Reports](/docs/Getting%20Started/backtest-reports) to turn

investing_algorithm_framework/app/reporting/backtest_report.py

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
from jinja2 import Environment, FileSystemLoader
1313

1414
from investing_algorithm_framework.domain import (
15-
Backtest, OperationalException, tqdm
15+
Backtest, OperationalException, tqdm, BUNDLE_EXT
1616
)
1717

1818
logger = logging.getLogger("investing_algorithm_framework")
@@ -217,9 +217,15 @@ def save(self, path):
217217

218218
@staticmethod
219219
def _is_backtest(backtest_path):
220+
if not os.path.exists(backtest_path):
221+
return False
222+
# Bundle file (.iafbt)
223+
if os.path.isfile(backtest_path) and \
224+
backtest_path.endswith(BUNDLE_EXT):
225+
return True
226+
# Legacy directory layout
220227
return (
221-
os.path.exists(backtest_path)
222-
and os.path.isdir(backtest_path)
228+
os.path.isdir(backtest_path)
223229
and os.path.isfile(
224230
os.path.join(backtest_path, "algorithm_id.json")
225231
)
@@ -231,6 +237,7 @@ def open(
231237
backtests: List[Backtest] = None,
232238
directory_path: Union[str, List[str], None] = None,
233239
show_progress: bool = False,
240+
workers: Union[int, None] = None,
234241
) -> "BacktestReport":
235242
loaded = []
236243
source_tags = []
@@ -254,7 +261,7 @@ def open(
254261
if BacktestReport._is_backtest(dp):
255262
backtest_paths.append((dp, tag))
256263
else:
257-
for root, dirs, _ in os.walk(dp):
264+
for root, dirs, files in os.walk(dp):
258265
for dir_name in dirs:
259266
subdir = os.path.join(
260267
root, dir_name
@@ -265,6 +272,11 @@ def open(
265272
backtest_paths.append(
266273
(subdir, tag)
267274
)
275+
for file_name in files:
276+
if file_name.endswith(BUNDLE_EXT):
277+
backtest_paths.append(
278+
(os.path.join(root, file_name), tag)
279+
)
268280

269281
iterator = backtest_paths
270282
if show_progress:
@@ -273,9 +285,43 @@ def open(
273285
desc="Loading backtests",
274286
)
275287

276-
for path, tag in iterator:
277-
loaded.append(Backtest.open(path))
278-
source_tags.append(tag)
288+
# Parallel load is opt-in (workers > 1). ProcessPoolExecutor
289+
# startup costs typically dwarf the per-backtest decode for
290+
# batches < ~30, so keep default behaviour serial. Pass
291+
# ``workers=N`` (or ``-1`` for cpu_count) to load large
292+
# batches in parallel.
293+
from investing_algorithm_framework.domain.backtesting.\
294+
backtest_utils import (
295+
_load_one_dispatch, _resolve_workers,
296+
)
297+
298+
n = len(backtest_paths)
299+
resolved_workers = (
300+
_resolve_workers(workers) if workers is not None else 1
301+
)
302+
if resolved_workers > 1 and n >= 4:
303+
from concurrent.futures import ProcessPoolExecutor
304+
305+
items = [
306+
(path, "bundle" if path.endswith(BUNDLE_EXT) else "dir")
307+
for path, _ in backtest_paths
308+
]
309+
with ProcessPoolExecutor(max_workers=resolved_workers) as ex:
310+
results = list(
311+
tqdm(
312+
ex.map(_load_one_dispatch, items),
313+
total=n,
314+
desc="Loading backtests",
315+
disable=not show_progress,
316+
)
317+
)
318+
for bt, (_, tag) in zip(results, backtest_paths):
319+
loaded.append(bt)
320+
source_tags.append(tag)
321+
else:
322+
for path, tag in iterator:
323+
loaded.append(Backtest.open(path))
324+
source_tags.append(tag)
279325

280326
for bt in backtests:
281327
if not isinstance(bt, Backtest):
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
"""Fast ISO-8601 datetime parsing helper.
2+
3+
``dateutil.parser.parse`` is the bottleneck in :class:`Backtest` loading
4+
(see issue #487 profiling notes). The strings emitted by :py:meth:`Backtest.to_dict`
5+
are always produced by :py:meth:`datetime.isoformat`, so the standard-library
6+
:py:meth:`datetime.fromisoformat` parser handles them ~50x faster.
7+
8+
This helper:
9+
10+
- Returns ``None`` for falsy / ``None`` values.
11+
- Passes already-parsed :class:`datetime.datetime` objects through unchanged.
12+
- Tries :py:meth:`datetime.fromisoformat` first (fast path).
13+
- Falls back to ``dateutil.parser.parse`` for anything exotic.
14+
"""
15+
from datetime import datetime
16+
17+
18+
def parse_datetime(value):
19+
if value is None or value == "":
20+
return None
21+
if isinstance(value, datetime):
22+
return value
23+
try:
24+
return datetime.fromisoformat(value)
25+
except (TypeError, ValueError):
26+
pass
27+
from dateutil.parser import parse as _du_parse
28+
return _du_parse(value)

investing_algorithm_framework/domain/models/order/order.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
from datetime import datetime, timezone
33

44
from dateutil.parser import parse
5+
from investing_algorithm_framework.domain.datetime_parsing import (
6+
parse_datetime as _parse_dt,
7+
)
58

69
from investing_algorithm_framework.domain.exceptions import \
710
OperationalException
@@ -280,10 +283,10 @@ def from_dict(data: dict):
280283
trading_symbol = data.get("trading_symbol", None)
281284

282285
if created_at is not None:
283-
created_at = parse(created_at)
286+
created_at = _parse_dt(created_at)
284287

285288
if updated_at is not None:
286-
updated_at = parse(updated_at)
289+
updated_at = _parse_dt(updated_at)
287290

288291
order = Order(
289292
id=data.get("id", None),

investing_algorithm_framework/domain/models/portfolio/portfolio_snapshot.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
from dateutil import parser
44

55
from investing_algorithm_framework.domain.models.base_model import BaseModel
6+
from investing_algorithm_framework.domain.datetime_parsing import (
7+
parse_datetime as _parse_dt,
8+
)
69

710

811
class PortfolioSnapshot(BaseModel):
@@ -36,7 +39,7 @@ def __init__(
3639
self.metadata = metadata if metadata is not None else {}
3740

3841
if created_at is not None and isinstance(created_at, str):
39-
self.created_at = parser.parse(created_at)
42+
self.created_at = _parse_dt(created_at)
4043
else:
4144
self.created_at = created_at
4245

@@ -185,7 +188,7 @@ def from_dict(data):
185188
PortfolioSnapshot: An instance of PortfolioSnapshot.
186189
"""
187190
created_at_str = data.get("created_at")
188-
created_at = parser.parse(created_at_str)
191+
created_at = _parse_dt(created_at_str)
189192

190193
# Ensure created_at is timezone aware
191194
created_at = created_at.replace(tzinfo=timezone.utc)

investing_algorithm_framework/domain/models/trade/trade.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
from dateutil.parser import parse
2+
from investing_algorithm_framework.domain.datetime_parsing import (
3+
parse_datetime as _parse_dt,
4+
)
25
from datetime import timezone
36

47
from investing_algorithm_framework.domain.models.base_model import BaseModel
@@ -318,13 +321,13 @@ def from_dict(data):
318321
orders = None
319322

320323
if "opened_at" in data and data["opened_at"] is not None:
321-
opened_at = parse(data["opened_at"])
324+
opened_at = _parse_dt(data["opened_at"])
322325

323326
if "closed_at" in data and data["closed_at"] is not None:
324-
closed_at = parse(data["closed_at"])
327+
closed_at = _parse_dt(data["closed_at"])
325328

326329
if "updated_at" in data and data["updated_at"] is not None:
327-
updated_at = parse(data["updated_at"])
330+
updated_at = _parse_dt(data["updated_at"])
328331

329332
if "stop_losses" in data and data["stop_losses"] is not None:
330333
stop_losses = [

investing_algorithm_framework/domain/models/trade/trade_stop_loss.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
from dateutil.parser import parse
2+
from investing_algorithm_framework.domain.datetime_parsing import (
3+
parse_datetime as _parse_dt,
4+
)
25
from datetime import timezone
36
from datetime import datetime
47

@@ -273,13 +276,13 @@ def ensure_iso(value):
273276

274277
@staticmethod
275278
def from_dict(data: dict):
276-
created_at = parse(data["created_at"]) \
279+
created_at = _parse_dt(data["created_at"]) \
277280
if data.get("created_at") is not None else None
278-
updated_at = parse(data["updated_at"]) \
281+
updated_at = _parse_dt(data["updated_at"]) \
279282
if data.get("updated_at") is not None else None
280-
triggered_at = parse(data["triggered_at"]) \
283+
triggered_at = _parse_dt(data["triggered_at"]) \
281284
if data.get("triggered_at") is not None else None
282-
high_water_mark_date = parse(data.get("high_water_mark_date")) \
285+
high_water_mark_date = _parse_dt(data.get("high_water_mark_date")) \
283286
if data.get("high_water_mark_date") is not None else None
284287

285288
# Make sure all the dates are timezone utc aware

investing_algorithm_framework/domain/models/trade/trade_take_profit.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
from datetime import timezone, datetime
22
from dateutil.parser import parse
3+
from investing_algorithm_framework.domain.datetime_parsing import (
4+
parse_datetime as _parse_dt,
5+
)
36

47
from investing_algorithm_framework.domain.models.base_model import BaseModel
58

@@ -307,13 +310,13 @@ def ensure_iso(value):
307310

308311
@staticmethod
309312
def from_dict(data: dict):
310-
created_at = parse(data["created_at"]) \
313+
created_at = _parse_dt(data["created_at"]) \
311314
if data.get("created_at") is not None else None
312-
updated_at = parse(data["updated_at"]) \
315+
updated_at = _parse_dt(data["updated_at"]) \
313316
if data.get("updated_at") is not None else None
314-
triggered_at = parse(data["triggered_at"]) \
317+
triggered_at = _parse_dt(data["triggered_at"]) \
315318
if data.get("triggered_at") is not None else None
316-
high_water_mark_date = parse(data.get("high_water_mark_date")) \
319+
high_water_mark_date = _parse_dt(data.get("high_water_mark_date")) \
317320
if data.get("high_water_mark_date") is not None else None
318321

319322
# Make sure all the dates are timezone utc aware

0 commit comments

Comments
 (0)