Skip to content

Commit 65a5025

Browse files
committed
feat(index): incremental indexing + scalar_summary alias + benchmark
Closes the remaining Phase 2 gaps for epic #540: - Backtest.scalar_summary() alias (canonical Phase 1 naming). - SqliteBacktestIndex tracks bundle_mtime_ns + bundle_size (schema v2); is_up_to_date() lets the indexer skip unchanged bundles. - build_index(..., incremental=True) skips up-to-date bundles by default; 'iaf index --rebuild' forces a full reindex. - 4 new tests covering skip, re-ingest on mtime bump, --rebuild, and the scalar_summary alias. - scripts/bench_540_phase2.py: acceptance benchmark. At 12,500 bundles: cold build 86s, incremental 536ms, list top-20 in 8.3ms (12x under the 100ms target), index footprint 2.5 MiB. - examples/storage_layer_demo/: end-to-end walkthrough of write -> index -> list -> rank -> open-with-summary-only, plus inline backtest report.
1 parent c4d30d4 commit 65a5025

8 files changed

Lines changed: 1114 additions & 11 deletions

File tree

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Storage layer demo — `iaf index`, `iaf list`, `iaf rank`
2+
3+
End-to-end demo of the new tiered backtest storage layer
4+
(epic [#540](https://github.com/coding-kitties/investing-algorithm-framework/issues/540) — phase 2).
5+
6+
It shows how to:
7+
8+
1. Save a directory of `.iafbt` backtest bundles.
9+
2. Build a SQLite Tier-1 index over them with `iaf index` (or
10+
`build_index` from Python).
11+
3. Query / sort / filter the index with `iaf list` and `iaf rank`
12+
(or the equivalent `list_index` / `rank_index` Python helpers)
13+
without ever decoding the per-run Parquet metric blobs.
14+
4. Drop into raw SQL when you need a custom report.
15+
16+
## Why this matters
17+
18+
Previously, comparing 50 walk-forward backtest variants meant
19+
opening every `.iafbt` bundle (each with multi-MB Parquet metric
20+
blobs) just to read scalar headline metrics like `sharpe_ratio` or
21+
`max_drawdown`.
22+
23+
The new Tier-1 SQLite index gives you a single file (`index.sqlite`)
24+
with one row per bundle and every scalar from
25+
`BacktestSummaryMetrics` promoted to its own column. Filtering and
26+
ranking 12,500 bundles becomes a sub-100 ms SQL query.
27+
28+
The `.iafbt` bundles themselves remain the source of truth — the
29+
index can always be rebuilt from them with `iaf index`.
30+
31+
## Run it
32+
33+
From the repo root:
34+
35+
```bash
36+
source .venv/bin/activate
37+
python examples/storage_layer_demo/demo.py
38+
```
39+
40+
The script will:
41+
42+
1. Create a temp directory and write 6 `.iafbt` bundles with
43+
varying synthetic Sharpe / Sortino / drawdown values.
44+
2. Build `index.sqlite` over them.
45+
3. Print the equivalent `iaf` CLI commands you could run by hand.
46+
4. Run `list_index` / `rank_index` / a raw SQL query and print
47+
the formatted tables.
48+
5. Open the top-ranked bundle and print its full backtest report
49+
(this is the only step that decodes per-run Parquet metric blobs).
50+
6. Walk the index in rank order and print a one-line summary per
51+
bundle straight out of the SQLite index — no bundle is opened.
52+
7. Iterate every bundle in rank order and print a full per-bundle
53+
report so you can scan _all_ backtests at a glance.
54+
55+
## CLI cheatsheet
56+
57+
```bash
58+
# Build the index
59+
iaf index ./my-backtests/
60+
61+
# Top 5 by Sharpe
62+
iaf rank ./my-backtests/ --by sharpe_ratio -n 5
63+
64+
# Same, but only among bundles with > 50 trades
65+
iaf rank ./my-backtests/ \
66+
--by sortino_ratio \
67+
--where "summary_number_of_trades > 50" \
68+
-n 10
69+
70+
# Full listing with custom columns + JSON output
71+
iaf list ./my-backtests/ \
72+
--sort calmar_ratio \
73+
--columns "algorithm_id,tag,summary_calmar_ratio,summary_max_drawdown" \
74+
--json
75+
76+
# Raw SQL — anything sqlite3 can do
77+
sqlite3 ./my-backtests/index.sqlite \
78+
"SELECT algorithm_id, summary_sharpe_ratio
79+
FROM backtest_index
80+
WHERE summary_max_drawdown > -0.1
81+
ORDER BY summary_sharpe_ratio DESC LIMIT 5;"
82+
```
83+
84+
## Where this is going
85+
86+
Phase 3 of #540 introduces a `BacktestStore` Protocol with two
87+
implementations: `LocalDirStore` (the current behavior) and
88+
`LocalTieredStore` (Tier-1 SQLite + Tier-2 Parquet datasets +
89+
Tier-3 content-addressed chunks). The `iaf list` / `iaf rank`
90+
commands shown here are forward-compatible — they will work
91+
unchanged against any store backing the same Tier-1 schema.
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
"""End-to-end demo of the new tiered backtest storage layer.
2+
3+
Epic #540 phase 2:
4+
5+
* save a directory of ``.iafbt`` bundles
6+
* build a SQLite Tier-1 index over them (``iaf index``)
7+
* query / sort / filter the index (``iaf list`` / ``iaf rank``)
8+
without ever decoding the per-run Parquet metric blobs
9+
* drop into raw SQL when needed
10+
11+
Run from the repo root::
12+
13+
source .venv/bin/activate
14+
python examples/storage_layer_demo/demo.py
15+
"""
16+
17+
from __future__ import annotations
18+
19+
import os
20+
import sqlite3
21+
import sys
22+
import tempfile
23+
from pathlib import Path
24+
25+
from investing_algorithm_framework.domain import Backtest, BUNDLE_EXT
26+
from investing_algorithm_framework.domain.backtesting.bundle import (
27+
save_bundle,
28+
)
29+
from investing_algorithm_framework.cli.index_command import (
30+
build_index,
31+
list_index,
32+
rank_index,
33+
format_table,
34+
)
35+
36+
37+
# Directory-format fixture shipped with the test suite. We use it
38+
# only as a *template* — we re-save N copies under different
39+
# algorithm_ids and Sharpe / Sortino / drawdown values so the demo
40+
# has something interesting to rank.
41+
REPO_ROOT = Path(__file__).resolve().parents[2]
42+
TEMPLATE = (
43+
REPO_ROOT
44+
/ "tests"
45+
/ "resources"
46+
/ "backtest_reports_for_testing"
47+
/ "test_algorithm_backtest"
48+
)
49+
50+
51+
def _print_section(title: str) -> None:
52+
bar = "=" * 72
53+
print(f"\n{bar}\n {title}\n{bar}")
54+
55+
56+
def _print_backtest_report(bt: Backtest) -> None:
57+
"""Render a compact, fixture-tolerant report for a backtest.
58+
59+
We deliberately read fields that exist on every ``.iafbt`` bundle
60+
(identity, run dates, summary metrics) instead of relying on the
61+
full :func:`pretty_print_backtest` helper, which assumes a
62+
populated :class:`BacktestMetrics` per run.
63+
"""
64+
s = bt.backtest_summary
65+
runs = bt.backtest_runs or []
66+
print(f" algorithm_id : {bt.algorithm_id}")
67+
print(f" tag : {bt.tag}")
68+
if runs:
69+
first, last = runs[0], runs[-1]
70+
print(
71+
f" date range : {first.backtest_start_date} -> "
72+
f"{last.backtest_end_date}"
73+
)
74+
print(f" number of runs : {len(runs)}")
75+
76+
if s is None:
77+
print(" (no summary metrics available)")
78+
return
79+
80+
def _row(label: str, value, fmt: str = "") -> None:
81+
if value is None:
82+
rendered = "n/a"
83+
elif fmt:
84+
rendered = format(value, fmt)
85+
else:
86+
rendered = str(value)
87+
print(f" {label:<22}: {rendered}")
88+
89+
print(" --- summary metrics ---")
90+
_row("sharpe_ratio", s.sharpe_ratio, ".4f")
91+
_row("sortino_ratio", s.sortino_ratio, ".4f")
92+
_row("calmar_ratio", s.calmar_ratio, ".4f")
93+
_row(
94+
"total_net_gain_pct",
95+
None if s.total_net_gain_percentage is None
96+
else s.total_net_gain_percentage * 100,
97+
".2f",
98+
)
99+
_row(
100+
"max_drawdown_pct",
101+
None if s.max_drawdown is None else s.max_drawdown * 100,
102+
".2f",
103+
)
104+
_row("number_of_trades", s.number_of_trades)
105+
_row(
106+
"win_rate_pct",
107+
None if s.win_rate is None else s.win_rate * 100,
108+
".2f",
109+
)
110+
111+
112+
def _seed_bundles(out_dir: Path, n: int = 6) -> None:
113+
"""Write ``n`` synthetic ``.iafbt`` bundles into *out_dir*."""
114+
if not TEMPLATE.is_dir():
115+
sys.exit(
116+
f"Could not find the template fixture at {TEMPLATE}. "
117+
"Run this demo from inside a git checkout of the "
118+
"investing-algorithm-framework repository."
119+
)
120+
121+
template = Backtest.open(str(TEMPLATE))
122+
123+
# Three "strategy families", two variants each, with distinct
124+
# risk-adjusted profiles so ranking is meaningful.
125+
profiles = [
126+
("momentum_v1", 1.85, 2.40, -0.08, 145, 0.61),
127+
("momentum_v2", 1.42, 1.91, -0.12, 132, 0.58),
128+
("mean_revert_v1", 0.95, 1.20, -0.18, 88, 0.54),
129+
("mean_revert_v2", 1.10, 1.45, -0.15, 102, 0.55),
130+
("breakout_v1", 0.42, 0.55, -0.31, 41, 0.48),
131+
("breakout_v2", -0.20, -0.25, -0.42, 27, 0.39),
132+
][:n]
133+
134+
for algo_id, sharpe, sortino, mdd, n_trades, win_rate in profiles:
135+
bt = Backtest.from_dict(template.to_dict())
136+
bt.algorithm_id = algo_id
137+
bt.tag = "demo"
138+
if bt.backtest_summary is not None:
139+
s = bt.backtest_summary
140+
s.sharpe_ratio = sharpe
141+
s.sortino_ratio = sortino
142+
s.max_drawdown = mdd
143+
s.number_of_trades = n_trades
144+
s.win_rate = win_rate
145+
# Calmar = CAGR / |max_drawdown|; fake it for the demo.
146+
s.calmar_ratio = round(abs(sharpe / mdd), 3)
147+
s.total_net_gain_percentage = round(sharpe * 0.12, 4)
148+
save_bundle(
149+
bt, str(out_dir / f"{algo_id}{BUNDLE_EXT}"),
150+
)
151+
152+
153+
def main() -> None:
154+
work = Path(tempfile.mkdtemp(prefix="iaf-storage-demo-"))
155+
print(f"Working directory: {work}")
156+
157+
# ------------------------------------------------------------------
158+
# 1. Save bundles
159+
# ------------------------------------------------------------------
160+
_print_section("1. Save synthetic .iafbt bundles")
161+
_seed_bundles(work)
162+
for p in sorted(work.glob(f"*{BUNDLE_EXT}")):
163+
print(f" {p.name}")
164+
165+
# ------------------------------------------------------------------
166+
# 2. Build the Tier-1 SQLite index
167+
# ------------------------------------------------------------------
168+
_print_section("2. Build the SQLite Tier-1 index")
169+
print(f"$ iaf index {work}")
170+
index_path = build_index(str(work), show_progress=False)
171+
print(f" -> wrote {index_path}")
172+
print(f" -> file size: {os.path.getsize(index_path)} bytes")
173+
174+
# ------------------------------------------------------------------
175+
# 3. iaf list — sort by Sharpe, top 4
176+
# ------------------------------------------------------------------
177+
_print_section("3. iaf list — sort by sharpe_ratio (top 4)")
178+
print(f"$ iaf list {work} --sort sharpe_ratio -n 4\n")
179+
rows = list_index(
180+
str(work), sort_by="sharpe_ratio", limit=4,
181+
)
182+
print(format_table(rows))
183+
184+
# ------------------------------------------------------------------
185+
# 4. iaf rank — risk-adjusted, filtered
186+
# ------------------------------------------------------------------
187+
_print_section(
188+
"4. iaf rank — by sortino_ratio, with WHERE filter"
189+
)
190+
print(
191+
f'$ iaf rank {work} --by sortino_ratio '
192+
f'--where "summary_number_of_trades > 50" -n 5\n'
193+
)
194+
rows = rank_index(
195+
str(work),
196+
by="sortino_ratio",
197+
where="summary_number_of_trades > 50",
198+
limit=5,
199+
)
200+
print(format_table(rows))
201+
202+
# ------------------------------------------------------------------
203+
# 5. Raw SQL — anything sqlite3 can do
204+
# ------------------------------------------------------------------
205+
_print_section("5. Raw SQL — custom report")
206+
sql = """
207+
SELECT algorithm_id,
208+
summary_sharpe_ratio AS sharpe,
209+
summary_max_drawdown AS max_dd,
210+
summary_calmar_ratio AS calmar
211+
FROM backtest_index
212+
WHERE summary_max_drawdown > -0.20
213+
ORDER BY summary_calmar_ratio DESC
214+
LIMIT 5
215+
"""
216+
print(f"$ sqlite3 {index_path} '<query>'\n{sql.strip()}\n")
217+
conn = sqlite3.connect(index_path)
218+
conn.row_factory = sqlite3.Row
219+
cur = conn.execute(sql)
220+
rows = [dict(r) for r in cur.fetchall()]
221+
conn.close()
222+
print(format_table(rows))
223+
224+
# ------------------------------------------------------------------
225+
# 6. Open the winner's full backtest report
226+
# ------------------------------------------------------------------
227+
_print_section(
228+
"6. Open the top-ranked bundle and print its full report"
229+
)
230+
top = rank_index(str(work), by="sharpe_ratio", limit=1)[0]
231+
winner_path = work / top["bundle_path"]
232+
print(
233+
f"Top by sharpe_ratio: {top['algorithm_id']} "
234+
f"(sharpe={top['summary_sharpe_ratio']})\n"
235+
f"Loading full bundle: {winner_path}\n"
236+
)
237+
# The index gave us a scalar-only view; for the full report we
238+
# open the .iafbt bundle (this is the only step that decodes the
239+
# per-run Parquet metric blobs).
240+
winner_bt = Backtest.open(str(winner_path))
241+
_print_backtest_report(winner_bt)
242+
243+
# ------------------------------------------------------------------
244+
# 7. Iterate the index and print a one-line report per backtest
245+
# ------------------------------------------------------------------
246+
_print_section(
247+
"7. Iterate the index and print a one-line summary per bundle"
248+
)
249+
print(
250+
"Walking the SQLite index in rank order — no bundle is opened "
251+
"for these summaries.\n"
252+
)
253+
all_rows = list_index(str(work), sort_by="sharpe_ratio")
254+
for i, r in enumerate(all_rows, start=1):
255+
print(
256+
f" {i}. {r['algorithm_id']:<14} "
257+
f"sharpe={r['summary_sharpe_ratio']:>6.2f} "
258+
f"return={r['summary_total_net_gain_percentage'] * 100:>6.2f}% "
259+
f"max_dd={r['summary_max_drawdown']:>6.2%} "
260+
f"trades={r['summary_number_of_trades']:>3}"
261+
)
262+
263+
# ------------------------------------------------------------------
264+
# 8. Full report for every backtest in rank order
265+
# ------------------------------------------------------------------
266+
_print_section(
267+
"8. Full report per backtest (open each bundle in rank order)"
268+
)
269+
for i, r in enumerate(all_rows, start=1):
270+
bundle_path = work / r["bundle_path"]
271+
bt = Backtest.open(str(bundle_path))
272+
print(f"\n--- [{i}] {r['algorithm_id']} ".ljust(72, "-"))
273+
_print_backtest_report(bt)
274+
275+
# ------------------------------------------------------------------
276+
# Wrap up
277+
# ------------------------------------------------------------------
278+
_print_section("Done")
279+
print(
280+
"Bundles + index left in:\n"
281+
f" {work}\n"
282+
"Try the CLI directly:\n"
283+
f" iaf list {work} --sort calmar_ratio --json\n"
284+
f" iaf rank {work} --by sharpe_ratio -n 3\n"
285+
)
286+
287+
288+
if __name__ == "__main__":
289+
main()

0 commit comments

Comments
 (0)