feat(store): epic #540 phase 3 (a-d) + docs/demo — rescue merge to dev#549
Merged
Conversation
Closes the remaining Phase 2 gaps for epic #540: - Backtest.scalar_summary() alias (canonical Phase 1 naming). - SqliteBacktestIndex tracks bundle_mtime_ns + bundle_size (schema v2); is_up_to_date() lets the indexer skip unchanged bundles. - build_index(..., incremental=True) skips up-to-date bundles by default; 'iaf index --rebuild' forces a full reindex. - 4 new tests covering skip, re-ingest on mtime bump, --rebuild, and the scalar_summary alias. - scripts/bench_540_phase2.py: acceptance benchmark. At 12,500 bundles: cold build 86s, incremental 536ms, list top-20 in 8.3ms (12x under the 100ms target), index footprint 2.5 MiB. - examples/storage_layer_demo/: end-to-end walkthrough of write -> index -> list -> rank -> open-with-summary-only, plus inline backtest report.
Introduces the storage seam that decouples *where* a backtest is persisted from the rest of the framework. Phase 3a is intentionally scoped to the Protocol + a thin adapter over today's .iafbt layout; LocalTieredStore (Tier-2 Parquet + Tier-3 chunks) and FinterionStore land in follow-up PRs. - BacktestStore Protocol: write / open / exists / delete / iter_handles / iter_index_rows / __len__ / __contains__. Mirrors today's Backtest.save_bundle / Backtest.open semantics so LocalDirStore is a 1:1 adapter. - StoreHandle: opaque str token (relative bundle path for LocalDirStore; uuid7 run_id for the upcoming tiered stores). - StoreError + StoreHandleNotFoundError. - Optional capability mixin SupportsCopyFrom — declared as a separate runtime_checkable Protocol so 'iaf migrate-store' (Phase 3d) and 'finterion push' (closed-source) can isinstance-test for it. Future capabilities (SupportsRelations for the strategy/version/report graph, SupportsContentAddressedChunks for Tier-3 dedup) follow the same pattern. - LocalDirStore: handle = bundle path relative to the root, so the store stays portable across moves. Sidecar SqliteBacktestIndex (built lazily, incrementally — same machinery as 'iaf index') backs iter_index_rows so listing does not re-decode bundles. Path-traversal guards reject handles that escape the root. - Tests (19, all passing): Protocol/SupportsCopyFrom conformance, round-trip, summary_only, default-handle derivation, sidecar index caching, copy_from with and without a handle subset, handle normalisation, path-traversal rejection, missing-handle error, delete idempotency. Targeted suite (store + index + cli): 86 / 86 passing.
…540 phase 3b) Second slice of Phase 3 of epic #540. Adds a real tiered storage implementation that ships the analytics value (cross-run DuckDB / Polars queries) without yet replacing the canonical .iafbt bundle. Layout under <root>: index.sqlite Tier-1 (always in sync) bundles/<handle>.iafbt canonical bytes parquet/portfolio_snapshots/run_id=<h>/... Tier-2 hive-partitioned parquet/trades/run_id=<h>/... Tier-2 parquet/orders/run_id=<h>/... Tier-2 Phase 3b deliberately keeps the bundle as the canonical representation; Tier-2 sidecars are auxiliary, written best-effort, and a malformed sidecar never blocks a write or a read. This trivially preserves byte-identical Backtest round-trips today. Byte-identical Tier-2 -> Backtest reassembly (no bundle on the read path) is Phase 3d. - decompose.py: Backtest -> flat record lists for snapshots / trades / orders, adding run_id and window_name columns so downstream tools group cleanly across walk-forward windows. Extension point for metric_series and any future kind is the DATASETS tuple. - LocalTieredStore: implements BacktestStore + SupportsCopyFrom. write() saves the bundle, upserts the Tier-1 row, and writes hive-partitioned Parquet sidecars per dataset. delete() removes all three tiers. iter_index_rows() serves from SQLite directly. rebuild_index() recreates Tier-1 from the bundles (useful after a software upgrade that adds new index columns). - scan('portfolio_snapshots' | 'trades' | 'orders') returns a pyarrow.dataset.Dataset that DuckDB / Polars can query across every run with partition pruning on run_id. - 15 new tests: Protocol + SupportsCopyFrom conformance, three-tier layout, handle normalisation, round-trip, summary_only, Tier-1 always-in-sync (write/delete/len), Tier-2 cross-run scan, copy_from from LocalDirStore, rebuild_index, missing-handle errors. Includes a synthetic-records test that asserts hive partitions are written and that scan() returns the expected rows + columns. Targeted suite (backtest_store + backtest_index + cli): 101 / 101 passing.
…e (epic #540 phase 3c) Wires LocalTieredStore into the existing OHLCV side-store machinery so identical (symbol, timeframe) Parquet bytes are written exactly once and shared across every bundle that references them. - write() now routes save_bundle's OHLCV writes to <root>/ohlcv/ whenever backtest.ohlcv is non-empty. The bundle envelope keeps its content-addressed manifest unchanged, so old bundles remain readable. - open() forwards the same shared directory to open_bundle so OHLCV lookups resolve regardless of what path the bundle was originally written with. - delete() intentionally does NOT touch ohlcv/. Chunks are globally shared; orphans are reclaimed via garbage_collect_ohlcv(dry_run=…). - Introspection helpers required by the dedup-upload protocol (docs/design/ohlcv-dedup-protocol.md): * iter_ohlcv_hashes() / ohlcv_referenced_hashes() * ohlcv_stored_hashes() * ohlcv_stats() -> stored_blobs / stored_bytes / referenced_blobs / orphan_blobs / missing_blobs * garbage_collect_ohlcv(dry_run=False) Manifests are decoded straight from the bundle envelope (_decode_payload) so the cost is one msgpack read per bundle — no full Backtest instantiation. 9 new tests: - No OHLCV -> no chunk dir created. - Identical OHLCV is stored once across distinct handles (dedup). - Different OHLCV yields separate chunks. - Round-trip via store.open() resolves OHLCV from the shared dir. - delete() keeps still-referenced chunks; orphans only after GC. - garbage_collect_ohlcv(dry_run=True) lists without deleting; the real call removes them. - iter_ohlcv_hashes() emits per-reference; ohlcv_referenced_hashes() dedups. - Hash strings are 64-char lowercase hex (matches the upload protocol spec). Targeted suite (backtest_store + backtest_index + cli): 110 / 110 passing.
…phase 3d) Closes the open Phase 3 deliverables that turn the new store abstraction into something users can actually move data through: - iaf migrate-store --from <kind> --src <path> --to <kind> --dst <path> delegates to dst.copy_from(src), so it is incremental, restartable, and tier-aware: when the destination is a local-tiered store, identical OHLCV chunks are written exactly once across the destination regardless of how many bundles reference them (Phase 3c invariant). Optional --handles subset selector for partial migrations. - migrate_store() programmatic helper for in-process pipelines. - BacktestStoreContractTest: a parameterised conformance suite that runs identical scenarios against every concrete store implementation (LocalDirStore, LocalTieredStore today, future remote stores tomorrow). Catches divergence as a failing subTest with the store class name in the label. Covers Protocol + SupportsCopyFrom conformance, write/open round-trip, summary_only, exists, idempotent delete, missing-handle errors, listing, iter_index_rows, and copy_from with both full and subset handle selection. - bug fix in LazyOhlcvDict: items() and values() were inheriting the empty backing dict's iteration, so any code path that did 'for k, v in bt.ohlcv.items()' silently dropped every blob after a tiered round-trip. Now both methods walk the manifest and materialise lazily on access. Caught by the migration dedup test. Note on what is *not* in this PR: byte-identical Tier-2 -> Backtest reassembly (so .iafbt could become export-only) is intentionally deferred. The current model where the bundle is canonical and Tier-1/2/3 are derived is simpler, preserves the existing round-trip contract bit-for-bit, and is what every test in the contract suite already exercises against both stores. Targeted suite (backtest_store + backtest_index + cli): 128 / 128 passing + 26 subTests. Full non-scenario suite: 1705 / 1705 passing with no regressions from the LazyOhlcvDict fix.
…L dashboard Two new sections in examples/storage_layer_demo/demo.py: - 6b. _print_backtest_full_report(): per-run breakdown (window / days / orders / trades / positions / final_value), end-of-backtest positions snapshot, first few trades, and a richer slice of per-run BacktestMetrics (cagr, annual_volatility, max_drawdown_absolute, gross_profit/loss, best_trade, max consecutive wins/losses) with safe n/a fallbacks. Built on top of the existing compact _print_backtest_report(). - 9. Storage layer -> HTML dashboard: wires the Tier-1 SQLite index, the Tier-2 LocalDirStore and the BacktestReport HTML dashboard end-to-end. rank_index() picks the top-N bundles from SQLite alone, store.open(handle) materialises just those via the BacktestStore protocol, BacktestReport(backtests=[...]).save() renders a self-contained interactive HTML dashboard. Demonstrates that the new storage layer plugs straight into the existing reporting stack with no glue code. README updated to describe both new sections.
- New feature bullet linking the storage_layer_demo - New '<details>' section explaining Tier-1 SQLite index, Tier-2 BacktestStore adapters (LocalDirStore / LocalTieredStore) and Tier-3 content-addressed OHLCV chunks - Python + CLI workflow showing the canonical pattern: build_index -> rank_index -> store.open(handle) -> BacktestReport(backtests=[...]).save(...) - Links to examples/storage_layer_demo/ for the runnable end-to-end
Add a 'From backtest results to a report' subsection under 'Backtest Analysis & Dashboard' demonstrating the canonical paths from a Backtest (or list of Backtests) to a BacktestReport: - single event-driven app.run_backtest(...) - a sweep via app.run_vector_backtests(..., backtest_storage_directory=...) - loading a persisted folder back via BacktestReport.open(directory_path=..., workers=-1) Cross-links to the Backtest Storage Layer section for sweeps that scale into the thousands.
New 'Getting Started/Backtest Storage Layer' page covering:
- mental model (Tier-1 SQLite / Tier-2 Parquet / canonical .iafbt /
Tier-3 content-addressed OHLCV)
- the BacktestStore protocol and when to pick LocalDirStore vs
LocalTieredStore
- the canonical 5-step developer workflow:
run sweep -> build index -> filter/rank in SQLite ->
materialise winners -> render report
- 'Avoid overloading your report.html': size-vs-bundle table,
the BacktestReport.open(directory_path=...) anti-pattern,
rules of thumb for narrow vs mega-reports
- pointers to examples/storage_layer_demo and the migrate-store CLI
Wired into the Getting Started sidebar between backtest-reports and
deployment, and added a tip block on backtest-reports.md pointing to
it for users with thousands of backtests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brings epic #540 phases 3a-3d (plus the docs, demo, and README updates) onto
dev.What happened
PRs #543, #544, #545, #546 and #547 were all marked merged on GitHub but their
commits never reached
dev. The stack was opened with each PR's base set tothe previous branch in the chain (
#543 -> feature/bundle-format-v2,#544 -> feature/iaf-index-cli, …). When#537 -> devmerged first,GitHub did not auto-retarget the rest of the stack, so each subsequent
"merge" landed on an intermediate branch that was already a dead end.
Net result: only
#537and#541are actually ondev. This PR carriesthe rest across in a single squash-friendly delivery.
What's in this PR
The 9 commits already reviewed and merged via #543–#547:
65a50258feat(index): incremental indexing + scalar_summary alias + benchmark13d8a58ffeat(store): BacktestStore Protocol + LocalDirStore (phase 3a, was feat(store): BacktestStore Protocol + LocalDirStore (epic #540 phase 3a) #544)f4072bb4feat(store): LocalTieredStore — Tier-1 SQLite + Tier-2 Parquet (phase 3b, was feat(store): LocalTieredStore — Tier-1 SQLite + Tier-2 Parquet (epic #540 phase 3b) #545)0950f1b0feat(store): Tier-3 content-addressed OHLCV chunks (phase 3c, was feat(store): Tier-3 content-addressed OHLCV chunks (epic #540 phase 3c) #546)b6ebab0ffeat(store): iaf migrate-store + dual-store contract suite (phase 3d, was feat(store): iaf migrate-store + dual-store contract suite (epic #540 phase 3d) #547)1655a44bdocs(demo): expand storage_layer_demo with full backtest report + HTML dashboardc6cce6a4docs(readme): add Backtest Storage Layer feature + workflow examplef8239916docs(readme): show vector/event backtest -> report workflowadfeaa0fdocs(site): add Backtest Storage Layer page + report-scaling guidanceNo code changes versus what was already approved on the original PRs — this
is purely a re-targeting of the merge.
Verification
backtest_store + backtest_index + cli): 128 / 128 passing + 26 subTestsexamples/storage_layer_demo/demo.pyruns end-to-end (sections 1-9, including the HTML dashboard render)Closes the gap left by the dead-ended stack merges; epic #540 phase 3 is then
fully landed on
dev.