Skip to content

feat(store): epic #540 phase 3 (a-d) + docs/demo — rescue merge to dev#549

Merged
MDUYN merged 9 commits into
devfrom
feature/iaf-migrate-store
May 12, 2026
Merged

feat(store): epic #540 phase 3 (a-d) + docs/demo — rescue merge to dev#549
MDUYN merged 9 commits into
devfrom
feature/iaf-migrate-store

Conversation

@MDUYN
Copy link
Copy Markdown
Collaborator

@MDUYN MDUYN commented May 12, 2026

Brings epic #540 phases 3a-3d (plus the docs, demo, and README updates) onto dev.

What happened

PRs #543, #544, #545, #546 and #547 were all marked merged on GitHub but their
commits never reached dev. The stack was opened with each PR's base set to
the previous branch in the chain (#543 -> feature/bundle-format-v2,
#544 -> feature/iaf-index-cli, …). When #537 -> dev merged first,
GitHub did not auto-retarget the rest of the stack, so each subsequent
"merge" landed on an intermediate branch that was already a dead end.

Net result: only #537 and #541 are actually on dev. This PR carries
the rest across in a single squash-friendly delivery.

What's in this PR

The 9 commits already reviewed and merged via #543#547:

No code changes versus what was already approved on the original PRs — this
is purely a re-targeting of the merge.

Verification

  • Targeted suite (backtest_store + backtest_index + cli): 128 / 128 passing + 26 subTests
  • Full non-scenario suite: 1705 / 1705 passing
  • examples/storage_layer_demo/demo.py runs end-to-end (sections 1-9, including the HTML dashboard render)

Closes the gap left by the dead-ended stack merges; epic #540 phase 3 is then
fully landed on dev.

MDUYN added 9 commits May 11, 2026 13:16
Closes the remaining Phase 2 gaps for epic #540:

- Backtest.scalar_summary() alias (canonical Phase 1 naming).
- SqliteBacktestIndex tracks bundle_mtime_ns + bundle_size (schema v2);
  is_up_to_date() lets the indexer skip unchanged bundles.
- build_index(..., incremental=True) skips up-to-date bundles by default;
  'iaf index --rebuild' forces a full reindex.
- 4 new tests covering skip, re-ingest on mtime bump, --rebuild, and the
  scalar_summary alias.
- scripts/bench_540_phase2.py: acceptance benchmark. At 12,500 bundles:
  cold build 86s, incremental 536ms, list top-20 in 8.3ms (12x under the
  100ms target), index footprint 2.5 MiB.
- examples/storage_layer_demo/: end-to-end walkthrough of write -> index
  -> list -> rank -> open-with-summary-only, plus inline backtest report.
Introduces the storage seam that decouples *where* a backtest is
persisted from the rest of the framework. Phase 3a is intentionally
scoped to the Protocol + a thin adapter over today's .iafbt layout;
LocalTieredStore (Tier-2 Parquet + Tier-3 chunks) and FinterionStore
land in follow-up PRs.

- BacktestStore Protocol: write / open / exists / delete / iter_handles
  / iter_index_rows / __len__ / __contains__. Mirrors today's
  Backtest.save_bundle / Backtest.open semantics so LocalDirStore is
  a 1:1 adapter.
- StoreHandle: opaque str token (relative bundle path for
  LocalDirStore; uuid7 run_id for the upcoming tiered stores).
- StoreError + StoreHandleNotFoundError.
- Optional capability mixin SupportsCopyFrom — declared as a separate
  runtime_checkable Protocol so 'iaf migrate-store' (Phase 3d) and
  'finterion push' (closed-source) can isinstance-test for it. Future
  capabilities (SupportsRelations for the strategy/version/report
  graph, SupportsContentAddressedChunks for Tier-3 dedup) follow the
  same pattern.
- LocalDirStore: handle = bundle path relative to the root, so the
  store stays portable across moves. Sidecar SqliteBacktestIndex
  (built lazily, incrementally — same machinery as 'iaf index') backs
  iter_index_rows so listing does not re-decode bundles. Path-traversal
  guards reject handles that escape the root.
- Tests (19, all passing): Protocol/SupportsCopyFrom conformance,
  round-trip, summary_only, default-handle derivation, sidecar index
  caching, copy_from with and without a handle subset, handle
  normalisation, path-traversal rejection, missing-handle error,
  delete idempotency.

Targeted suite (store + index + cli): 86 / 86 passing.
…540 phase 3b)

Second slice of Phase 3 of epic #540. Adds a real tiered storage
implementation that ships the analytics value (cross-run DuckDB /
Polars queries) without yet replacing the canonical .iafbt bundle.

Layout under <root>:
  index.sqlite                                    Tier-1 (always in sync)
  bundles/<handle>.iafbt                          canonical bytes
  parquet/portfolio_snapshots/run_id=<h>/...      Tier-2 hive-partitioned
  parquet/trades/run_id=<h>/...                   Tier-2
  parquet/orders/run_id=<h>/...                   Tier-2

Phase 3b deliberately keeps the bundle as the canonical representation;
Tier-2 sidecars are auxiliary, written best-effort, and a malformed
sidecar never blocks a write or a read. This trivially preserves
byte-identical Backtest round-trips today. Byte-identical
Tier-2 -> Backtest reassembly (no bundle on the read path) is Phase 3d.

- decompose.py: Backtest -> flat record lists for snapshots / trades /
  orders, adding run_id and window_name columns so downstream tools
  group cleanly across walk-forward windows. Extension point for
  metric_series and any future kind is the DATASETS tuple.

- LocalTieredStore: implements BacktestStore + SupportsCopyFrom.
  write() saves the bundle, upserts the Tier-1 row, and writes
  hive-partitioned Parquet sidecars per dataset. delete() removes all
  three tiers. iter_index_rows() serves from SQLite directly.
  rebuild_index() recreates Tier-1 from the bundles (useful after a
  software upgrade that adds new index columns).

- scan('portfolio_snapshots' | 'trades' | 'orders') returns a
  pyarrow.dataset.Dataset that DuckDB / Polars can query across every
  run with partition pruning on run_id.

- 15 new tests: Protocol + SupportsCopyFrom conformance, three-tier
  layout, handle normalisation, round-trip, summary_only, Tier-1
  always-in-sync (write/delete/len), Tier-2 cross-run scan, copy_from
  from LocalDirStore, rebuild_index, missing-handle errors. Includes
  a synthetic-records test that asserts hive partitions are written
  and that scan() returns the expected rows + columns.

Targeted suite (backtest_store + backtest_index + cli): 101 / 101 passing.
…e (epic #540 phase 3c)

Wires LocalTieredStore into the existing OHLCV side-store machinery
so identical (symbol, timeframe) Parquet bytes are written exactly
once and shared across every bundle that references them.

- write() now routes save_bundle's OHLCV writes to <root>/ohlcv/
  whenever backtest.ohlcv is non-empty. The bundle envelope keeps
  its content-addressed manifest unchanged, so old bundles remain
  readable.
- open() forwards the same shared directory to open_bundle so OHLCV
  lookups resolve regardless of what path the bundle was originally
  written with.
- delete() intentionally does NOT touch ohlcv/. Chunks are globally
  shared; orphans are reclaimed via garbage_collect_ohlcv(dry_run=…).
- Introspection helpers required by the dedup-upload protocol
  (docs/design/ohlcv-dedup-protocol.md):
    * iter_ohlcv_hashes() / ohlcv_referenced_hashes()
    * ohlcv_stored_hashes()
    * ohlcv_stats() -> stored_blobs / stored_bytes / referenced_blobs
                       / orphan_blobs / missing_blobs
    * garbage_collect_ohlcv(dry_run=False)
  Manifests are decoded straight from the bundle envelope
  (_decode_payload) so the cost is one msgpack read per bundle —
  no full Backtest instantiation.

9 new tests:
- No OHLCV -> no chunk dir created.
- Identical OHLCV is stored once across distinct handles (dedup).
- Different OHLCV yields separate chunks.
- Round-trip via store.open() resolves OHLCV from the shared dir.
- delete() keeps still-referenced chunks; orphans only after GC.
- garbage_collect_ohlcv(dry_run=True) lists without deleting; the
  real call removes them.
- iter_ohlcv_hashes() emits per-reference; ohlcv_referenced_hashes()
  dedups.
- Hash strings are 64-char lowercase hex (matches the upload protocol
  spec).

Targeted suite (backtest_store + backtest_index + cli): 110 / 110 passing.
…phase 3d)

Closes the open Phase 3 deliverables that turn the new store
abstraction into something users can actually move data through:

- iaf migrate-store --from <kind> --src <path> --to <kind> --dst <path>
  delegates to dst.copy_from(src), so it is incremental, restartable,
  and tier-aware: when the destination is a local-tiered store,
  identical OHLCV chunks are written exactly once across the
  destination regardless of how many bundles reference them
  (Phase 3c invariant). Optional --handles subset selector for
  partial migrations.
- migrate_store() programmatic helper for in-process pipelines.
- BacktestStoreContractTest: a parameterised conformance suite
  that runs identical scenarios against every concrete store
  implementation (LocalDirStore, LocalTieredStore today, future
  remote stores tomorrow). Catches divergence as a failing subTest
  with the store class name in the label. Covers Protocol +
  SupportsCopyFrom conformance, write/open round-trip, summary_only,
  exists, idempotent delete, missing-handle errors, listing,
  iter_index_rows, and copy_from with both full and subset handle
  selection.
- bug fix in LazyOhlcvDict: items() and values() were inheriting
  the empty backing dict's iteration, so any code path that did
  'for k, v in bt.ohlcv.items()' silently dropped every blob after
  a tiered round-trip. Now both methods walk the manifest and
  materialise lazily on access. Caught by the migration dedup
  test.

Note on what is *not* in this PR: byte-identical Tier-2 -> Backtest
reassembly (so .iafbt could become export-only) is intentionally
deferred. The current model where the bundle is canonical and
Tier-1/2/3 are derived is simpler, preserves the existing
round-trip contract bit-for-bit, and is what every test in the
contract suite already exercises against both stores.

Targeted suite (backtest_store + backtest_index + cli): 128 / 128
passing + 26 subTests. Full non-scenario suite: 1705 / 1705 passing
with no regressions from the LazyOhlcvDict fix.
…L dashboard

Two new sections in examples/storage_layer_demo/demo.py:

- 6b. _print_backtest_full_report(): per-run breakdown (window /
  days / orders / trades / positions / final_value), end-of-backtest
  positions snapshot, first few trades, and a richer slice of
  per-run BacktestMetrics (cagr, annual_volatility,
  max_drawdown_absolute, gross_profit/loss, best_trade, max
  consecutive wins/losses) with safe n/a fallbacks. Built on top
  of the existing compact _print_backtest_report().

- 9. Storage layer -> HTML dashboard: wires the Tier-1 SQLite
  index, the Tier-2 LocalDirStore and the BacktestReport HTML
  dashboard end-to-end. rank_index() picks the top-N bundles from
  SQLite alone, store.open(handle) materialises just those via the
  BacktestStore protocol, BacktestReport(backtests=[...]).save()
  renders a self-contained interactive HTML dashboard. Demonstrates
  that the new storage layer plugs straight into the existing
  reporting stack with no glue code.

README updated to describe both new sections.
- New feature bullet linking the storage_layer_demo
- New '<details>' section explaining Tier-1 SQLite index, Tier-2
  BacktestStore adapters (LocalDirStore / LocalTieredStore) and
  Tier-3 content-addressed OHLCV chunks
- Python + CLI workflow showing the canonical pattern:
  build_index -> rank_index -> store.open(handle) ->
  BacktestReport(backtests=[...]).save(...)
- Links to examples/storage_layer_demo/ for the runnable end-to-end
Add a 'From backtest results to a report' subsection under
'Backtest Analysis & Dashboard' demonstrating the canonical paths
from a Backtest (or list of Backtests) to a BacktestReport:

- single event-driven app.run_backtest(...)
- a sweep via app.run_vector_backtests(..., backtest_storage_directory=...)
- loading a persisted folder back via BacktestReport.open(directory_path=..., workers=-1)

Cross-links to the Backtest Storage Layer section for sweeps that
scale into the thousands.
New 'Getting Started/Backtest Storage Layer' page covering:

- mental model (Tier-1 SQLite / Tier-2 Parquet / canonical .iafbt /
  Tier-3 content-addressed OHLCV)
- the BacktestStore protocol and when to pick LocalDirStore vs
  LocalTieredStore
- the canonical 5-step developer workflow:
    run sweep -> build index -> filter/rank in SQLite ->
    materialise winners -> render report
- 'Avoid overloading your report.html': size-vs-bundle table,
  the BacktestReport.open(directory_path=...) anti-pattern,
  rules of thumb for narrow vs mega-reports
- pointers to examples/storage_layer_demo and the migrate-store CLI

Wired into the Getting Started sidebar between backtest-reports and
deployment, and added a tip block on backtest-reports.md pointing to
it for users with thousands of backtests.
@MDUYN MDUYN merged commit d929c4b into dev May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant