Skip to content

Commit a0751e9

Browse files
committed
docs(site): add Backtest Storage Layer page + report-scaling guidance
New 'Getting Started/Backtest Storage Layer' page covering: - mental model (Tier-1 SQLite / Tier-2 Parquet / canonical .iafbt / Tier-3 content-addressed OHLCV) - the BacktestStore protocol and when to pick LocalDirStore vs LocalTieredStore - the canonical 5-step developer workflow: run sweep -> build index -> filter/rank in SQLite -> materialise winners -> render report - 'Avoid overloading your report.html': size-vs-bundle table, the BacktestReport.open(directory_path=...) anti-pattern, rules of thumb for narrow vs mega-reports - pointers to examples/storage_layer_demo and the migrate-store CLI Wired into the Getting Started sidebar between backtest-reports and deployment, and added a tip block on backtest-reports.md pointing to it for users with thousands of backtests.
1 parent f823991 commit a0751e9

3 files changed

Lines changed: 241 additions & 42 deletions

File tree

docusaurus/docs/Getting Started/backtest-reports.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ sidebar_position: 10
66

77
The framework generates self-contained HTML dashboard reports for analyzing backtest results. Reports work for both single and multi-strategy backtests — no external dependencies required.
88

9+
:::tip Working with hundreds or thousands of backtests?
10+
A `BacktestReport` inlines every backtest into a single HTML file, which becomes too heavy for a browser past a few dozen backtests. Use the [Backtest Storage Layer](./backtest-storage.md) to filter your collection down (in SQLite, sub-100 ms) and render reports only over the winners.
11+
:::
12+
913
## Quick Start
1014

1115
```python
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
---
2+
sidebar_position: 11
3+
---
4+
5+
# Backtest Storage Layer
6+
7+
Once you start sweeping parameter grids and walk-forward windows, you quickly end up with **hundreds or thousands of backtests on disk**. A flat folder of `.iafbt` bundles works for tens of them, but it stops scaling once you want to compare them all in a single HTML dashboard — every comparison re-decodes multi-MB Parquet metric blobs just to read a Sharpe number, and the resulting `report.html` becomes too heavy for a browser to open.
8+
9+
The **backtest storage layer** is the framework's answer to that. It separates *where bundles live* from *how you query them*, and it gives you the tools to keep your dashboards fast even when the backing collection grows into the thousands.
10+
11+
## Mental model
12+
13+
```
14+
┌─────────────────────────────────────────────┐
15+
│ Tier-1: SQLite index (index.sqlite) │
16+
│ - one row per .iafbt │
17+
│ - all summary metrics promoted to columns │
18+
│ - sub-100 ms ranks / filters over 10k+ │
19+
└─────────────────────────────────────────────┘
20+
│ derived from
21+
22+
┌─────────────────────────────────────────────┐
23+
│ Tier-2: Parquet sidecars (analytics-ready) │
24+
│ - hive-partitioned on run_id │
25+
│ - portfolio_snapshots / trades / orders │
26+
└─────────────────────────────────────────────┘
27+
│ derived from
28+
29+
┌─────────────────────────────────────────────┐
30+
│ CANONICAL: .iafbt bundles │
31+
│ - the single source of truth │
32+
│ - everything else can be rebuilt from it │
33+
└─────────────────────────────────────────────┘
34+
│ references
35+
36+
┌─────────────────────────────────────────────┐
37+
│ Tier-3: content-addressed OHLCV chunks │
38+
│ - <sha256>.parquet, deduped across all │
39+
│ bundles that reference the same data │
40+
└─────────────────────────────────────────────┘
41+
```
42+
43+
The `.iafbt` bundle is **canonical**. The SQLite index, the Tier-2 Parquet sidecars and the Tier-3 OHLCV chunks are all *derived* — they can be rebuilt from the bundles at any time and they're best-effort: a malformed sidecar never blocks a write or read against the bundle.
44+
45+
## The `BacktestStore` protocol
46+
47+
Two concrete implementations ship today, both exposing the same API:
48+
49+
| Store | Layout | Best for |
50+
|---|---|---|
51+
| `LocalDirStore` | flat folder of `.iafbt` files (+ `index.sqlite`) | Most users, simple to inspect, fast `ls` |
52+
| `LocalTieredStore` | full Tier-1/2/3 layout | Large collections, OHLCV dedup, analytics workflows |
53+
54+
Both are drop-in interchangeable — swap the implementation without touching call sites:
55+
56+
```python
57+
from investing_algorithm_framework.services.backtest_store import (
58+
LocalDirStore,
59+
)
60+
from investing_algorithm_framework.services.backtest_store.\
61+
local_tiered_store import LocalTieredStore
62+
63+
store = LocalDirStore("./my-backtests/")
64+
# store = LocalTieredStore("./my-backtests/") # same API
65+
66+
len(store) # how many bundles?
67+
"momentum_v1.iafbt" in store # exists?
68+
bt = store.open("momentum_v1.iafbt")
69+
for handle in store.iter_handles():
70+
...
71+
```
72+
73+
## The normal developer workflow
74+
75+
Below is the canonical loop most users will run. The **same five steps** hold whether you have 10 backtests or 10,000 — you just lean harder on the index as the collection grows.
76+
77+
### 1. Run a sweep, persist the bundles
78+
79+
```python
80+
backtests = app.run_vector_backtests(
81+
strategies=[StrategyA(), StrategyB(), StrategyC()],
82+
backtest_date_ranges=[range_2022, range_2023, range_2024],
83+
n_workers=-1,
84+
backtest_storage_directory="./my-backtests/", # writes .iafbt here
85+
show_progress=True,
86+
)
87+
```
88+
89+
After this you have a folder of `.iafbt` bundles on disk. That folder is *the* artifact — everything downstream operates on it.
90+
91+
### 2. Build the Tier-1 index
92+
93+
```bash
94+
iaf index ./my-backtests/
95+
```
96+
97+
Or from Python:
98+
99+
```python
100+
from investing_algorithm_framework.cli.index_command import build_index
101+
build_index("./my-backtests/")
102+
```
103+
104+
This walks the folder once, writes `index.sqlite` with every scalar from `BacktestSummaryMetrics` promoted to its own column, and is **idempotent** — re-run it any time after adding new bundles.
105+
106+
### 3. Filter / rank in SQLite (no bundles opened)
107+
108+
The point of the index is that **you never need to decode a Parquet metric blob just to choose which backtests are interesting**. Pick winners with a SQL `WHERE` clause:
109+
110+
```python
111+
from investing_algorithm_framework.cli.index_command import (
112+
list_index, rank_index,
113+
)
114+
115+
# Top 20 by Sharpe, but only among bundles with > 50 trades.
116+
top = rank_index(
117+
"./my-backtests/",
118+
by="sharpe_ratio",
119+
where="summary_number_of_trades > 50",
120+
limit=20,
121+
)
122+
123+
for r in top:
124+
print(r["algorithm_id"], r["summary_sharpe_ratio"])
125+
```
126+
127+
Or from the shell:
128+
129+
```bash
130+
iaf rank ./my-backtests/ \
131+
--by sharpe_ratio \
132+
--where "summary_number_of_trades > 50" -n 20
133+
iaf list ./my-backtests/ --sort calmar_ratio --json
134+
```
135+
136+
This step is **sub-100 ms** even over 10k+ bundles. No Parquet, no decompression, no bundle opens.
137+
138+
### 4. Materialise only the bundles you actually need
139+
140+
```python
141+
store = LocalDirStore("./my-backtests/")
142+
backtests = [store.open(row["bundle_path"]) for row in top]
143+
```
144+
145+
`bundle_path` from the index row is exactly the store handle, so this is a one-liner. **You only pay the bundle-decode cost for the bundles you selected**, not the whole collection.
146+
147+
### 5. Render the report
148+
149+
```python
150+
from investing_algorithm_framework import BacktestReport
151+
152+
BacktestReport(backtests=backtests).save("top20.html")
153+
```
154+
155+
That's the whole loop.
156+
157+
## Avoid overloading your `report.html`
158+
159+
The `BacktestReport` produces a **self-contained** HTML file: every backtest's full per-run data (equity curve, drawdown series, trades, positions, monthly returns) is inlined into the document so the dashboard works offline with no server.
160+
161+
The trade-off: file size grows linearly with the number of backtests inlined. Rough orders of magnitude:
162+
163+
| Backtests in report | Approx. HTML size | Browser experience |
164+
|---|---|---|
165+
| 1 – 10 | tens of KB to ~1 MB | instant |
166+
| 10 – 50 | a few MB | smooth |
167+
| 50 – 200 | 10 – 50 MB | slower load, still usable |
168+
| 200+ | 100 MB+ | browsers struggle / refuse to open |
169+
170+
The point of the storage layer is that **you don't need to put 200 backtests in one report to compare them**. The Tier-1 index is your comparison surface for the full collection; the HTML report is your deep-dive surface for a small, hand-picked subset.
171+
172+
### Anti-pattern
173+
174+
```python
175+
# DON'T do this with thousands of bundles.
176+
report = BacktestReport.open(directory_path="./my-backtests/", workers=-1)
177+
report.save("everything.html") # multi-hundred-MB file, browser dies
178+
```
179+
180+
This decodes every bundle in the folder and inlines all of them. Fine for a few dozen; fatal at scale.
181+
182+
### The right pattern
183+
184+
```python
185+
# Filter in SQLite first, then render only the winners.
186+
top = rank_index("./my-backtests/", by="sharpe_ratio", limit=25)
187+
store = LocalDirStore("./my-backtests/")
188+
BacktestReport(
189+
backtests=[store.open(r["bundle_path"]) for r in top],
190+
).save("top25_by_sharpe.html")
191+
```
192+
193+
Same principle applies for slicing by anything else — most-trades, best-Calmar, lowest-drawdown, only-2024-windows, only-momentum-strategies, etc. Compose multiple narrow reports rather than one giant one.
194+
195+
### Rules of thumb
196+
197+
- **Keep any single `report.html` to ≤ 50 backtests.** Past that, render multiple narrower reports (one per strategy family, one per regime, one for the top-N) instead of one mega-report.
198+
- **Use the index as your comparison plane** for the full collection. CLI: `iaf list` / `iaf rank`. Python: `list_index` / `rank_index`. SQL: `sqlite3 index.sqlite` for anything ad-hoc.
199+
- **Render for the audience.** A "winners" report (top 10–20) is what you actually send to teammates. A "full deep-dive" report on one strategy is what you keep for yourself.
200+
- **Don't trust `BacktestReport.open(directory_path=…)` at scale.** It walks and decodes the whole folder; it's a convenience for ≤ 50-bundle directories, not a scaling story.
201+
202+
## When to use which store
203+
204+
- **`LocalDirStore`** — start here. A flat folder of `.iafbt` files is what every other tool understands (you can `ls`, `rsync`, `tar`, `git lfs` it). Tier-1 SQLite gets built next to the bundles. This is the default for `app.run_vector_backtests(backtest_storage_directory=...)`.
205+
206+
- **`LocalTieredStore`** — switch to this when you need any of:
207+
- **Cross-bundle analytics** without decoding bundles (DuckDB / Polars over the Tier-2 Parquet sidecars: `read_parquet('store/parquet/trades/**/*.parquet', hive_partitioning=True)`).
208+
- **OHLCV deduplication** — every bundle that references the same `BTC/EUR:1h` data shares one `<sha256>.parquet` blob on disk; reclaim orphans with `store.garbage_collect_ohlcv()`.
209+
- **Migration target** for archival / production pipelines.
210+
211+
Move a whole collection between store kinds with a single command:
212+
213+
```bash
214+
iaf migrate-store --from local-dir --src ./my-backtests/ \
215+
--to local-tiered --dst ./tiered/
216+
```
217+
218+
## End-to-end runnable example
219+
220+
A complete worked example (seed bundles → build index → rank → load winners → render dashboard) lives in the repo at [`examples/storage_layer_demo/`](https://github.com/coding-kitties/investing-algorithm-framework/tree/main/examples/storage_layer_demo). Run it from a checkout:
221+
222+
```bash
223+
source .venv/bin/activate
224+
python examples/storage_layer_demo/demo.py
225+
```
226+
227+
It prints each step, leaves the bundles + index + dashboard in a temp directory, and shows you the exact `iaf` CLI commands you could run by hand against the same data.
228+
229+
## Reference
230+
231+
- CLI: `iaf index`, `iaf list`, `iaf rank`, `iaf migrate-store` (see `iaf <cmd> --help`)
232+
- Python: `investing_algorithm_framework.cli.index_command.{build_index, list_index, rank_index}`
233+
- Stores: `investing_algorithm_framework.services.backtest_store.{LocalDirStore, LocalTieredStore}`
234+
- Bundle format: see [Backtest Data](../Data/backtest_data.md)
235+
- Report API: see [Backtest Reports](./backtest-reports.md)

docusaurus/sidebars.js

Lines changed: 2 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,6 @@ const sidebars = {
99
type: 'doc',
1010
id: 'Getting Started/installation',
1111
},
12-
{
13-
type: 'doc',
14-
id: 'Getting Started/simple-example',
15-
},
1612
{
1713
type: 'doc',
1814
id: 'Getting Started/application-setup',
@@ -45,21 +41,13 @@ const sidebars = {
4541
type: 'doc',
4642
id: 'Getting Started/backtesting',
4743
},
48-
{
49-
type: 'doc',
50-
id: 'Getting Started/event-backtesting',
51-
},
52-
{
53-
type: 'doc',
54-
id: 'Getting Started/vector-backtesting',
55-
},
5644
{
5745
type: 'doc',
5846
id: 'Getting Started/backtest-reports',
5947
},
6048
{
6149
type: 'doc',
62-
id: 'Getting Started/metrics',
50+
id: 'Getting Started/backtest-storage',
6351
},
6452
{
6553
type: 'doc',
@@ -87,10 +75,6 @@ const sidebars = {
8775
type: 'doc',
8876
id: 'Data/external-data',
8977
},
90-
{
91-
type: 'doc',
92-
id: 'Data/backtest_data',
93-
},
9478
],
9579
},
9680
{
@@ -129,35 +113,11 @@ const sidebars = {
129113
type: 'doc',
130114
id: 'Advanced Concepts/PARALLEL_PROCESSING_GUIDE',
131115
},
132-
{
133-
type: 'doc',
134-
id: 'Advanced Concepts/recording-variables',
135-
},
136-
{
137-
type: 'doc',
138-
id: 'Advanced Concepts/portfolio-sync',
139-
},
140-
{
141-
type: 'doc',
142-
id: 'Advanced Concepts/pipelines',
143-
},
144-
{
145-
type: 'doc',
146-
id: 'Advanced Concepts/pipelines-event-backtest',
147-
},
148-
{
149-
type: 'doc',
150-
id: 'Advanced Concepts/pipelines-vector-backtest',
151-
},
152-
{
153-
type: 'doc',
154-
id: 'Advanced Concepts/pipelines-live',
155-
},
156116
],
157117
},
158118
{
159119
type: "category",
160-
label: "Contributing Guide",
120+
label: "Contributing",
161121
items: [
162122
{
163123
type: 'doc',

0 commit comments

Comments
 (0)