Skip to content

feat(store): LocalTieredStore — Tier-1 SQLite + Tier-2 Parquet (epic #540 phase 3b)#545

Merged
MDUYN merged 1 commit into
feature/iaf-backtest-storefrom
feature/iaf-local-tiered-store
May 12, 2026
Merged

feat(store): LocalTieredStore — Tier-1 SQLite + Tier-2 Parquet (epic #540 phase 3b)#545
MDUYN merged 1 commit into
feature/iaf-backtest-storefrom
feature/iaf-local-tiered-store

Conversation

@MDUYN
Copy link
Copy Markdown
Collaborator

@MDUYN MDUYN commented May 11, 2026

Stacked on #544 (feature/iaf-backtest-store), which is stacked on #543#537. Merge order: #537#543#544 → this PR.

Second slice of Phase 3 of epic #540 — adds a real tiered storage implementation that delivers the cross-run analytics value while keeping the canonical .iafbt bundle as the source of truth (full Tier-2-as-canonical lands in Phase 3d).

What's in this PR

Storage layout

<root>/
  index.sqlite                                  ← Tier-1, always in sync
  bundles/<handle>.iafbt                        ← canonical bytes
  parquet/
    portfolio_snapshots/run_id=<handle>/part-0.parquet   ← Tier-2 hive-partitioned
    trades/run_id=<handle>/part-0.parquet                ← Tier-2
    orders/run_id=<handle>/part-0.parquet                ← Tier-2

The bundle is the canonical representation. Tier-1 and Tier-2 are derived, eagerly maintained, and best-effort: a malformed sidecar never blocks a write or read against the bundle. This keeps Phase 3b's invariants simple and trivially preserves byte-identical Backtest.save_bundle / Backtest.open round-trips today. Byte-identical Tier-2 → Backtest reassembly (no bundle on the read path) is Phase 3d.

Modules

  • decompose.pyBacktest → flat record lists for snapshots / trades / orders, adding run_id and window_name columns so downstream tools group cleanly across walk-forward windows. Extension point for metric_series and any future kind is the DATASETS tuple.
  • LocalTieredStore — implements BacktestStore + SupportsCopyFrom. write() saves the bundle, upserts the Tier-1 row, and writes hive-partitioned Parquet sidecars per dataset. delete() removes all three tiers. iter_index_rows() serves from SQLite directly. rebuild_index() recreates Tier-1 from the bundles (useful after a software upgrade that adds new index columns).

Cross-run analytics

store = LocalTieredStore("~/.iaf/store")
ds = store.scan("portfolio_snapshots")  # pyarrow.dataset.Dataset
df = ds.to_table(filter=ds.field("run_id") == "top_run_x").to_pandas()

Or directly from DuckDB:

SELECT run_id, total_value
FROM read_parquet('store/parquet/portfolio_snapshots/**/*.parquet',
                  hive_partitioning=True)
WHERE run_id IN (SELECT bundle_path FROM read_csv('top20.csv'));

Partition pruning on run_id is automatic — DuckDB scans only the relevant directories.

Tests

15 new tests, all passing:

  • Protocol + SupportsCopyFrom conformance.
  • Three-tier layout on write().
  • Handle normalisation (.iafbt suffix stripped).
  • Round-trip preserves algorithm_id; summary_only honoured.
  • Tier-1 always in sync: iter_index_rows after writes; delete removes all tiers; __len__ uses the index.
  • Tier-2 cross-run scan returns an Arrow Dataset with run_id column; unknown dataset name raises.
  • copy_from from LocalDirStore (interop with Phase 3a).
  • rebuild_index() recreates Tier-1 from bundles.
  • Missing handles raise StoreHandleNotFoundError.
  • Synthetic-records test asserts hive partitions are written and that scan() returns the expected rows + columns when records are present.

Targeted suite (tests/services/backtest_store/ + tests/services/backtest_index/ + tests/cli/): 101 / 101 passing.

What's still coming in Phase 3

Slice Scope
3c Tier-3 content-addressed chunks (ohlcv, code, params, symbols) with SHA-256 dedup — where the 64 GB → 20 GB headline lives
3d iaf migrate-store --from local-dir --to local-tiered; byte-identical Tier-2 → Backtest reassembly (.iafbt becomes export-only); the parameterised test fixture that runs every backtest test against both stores

…540 phase 3b)

Second slice of Phase 3 of epic #540. Adds a real tiered storage
implementation that ships the analytics value (cross-run DuckDB /
Polars queries) without yet replacing the canonical .iafbt bundle.

Layout under <root>:
  index.sqlite                                    Tier-1 (always in sync)
  bundles/<handle>.iafbt                          canonical bytes
  parquet/portfolio_snapshots/run_id=<h>/...      Tier-2 hive-partitioned
  parquet/trades/run_id=<h>/...                   Tier-2
  parquet/orders/run_id=<h>/...                   Tier-2

Phase 3b deliberately keeps the bundle as the canonical representation;
Tier-2 sidecars are auxiliary, written best-effort, and a malformed
sidecar never blocks a write or a read. This trivially preserves
byte-identical Backtest round-trips today. Byte-identical
Tier-2 -> Backtest reassembly (no bundle on the read path) is Phase 3d.

- decompose.py: Backtest -> flat record lists for snapshots / trades /
  orders, adding run_id and window_name columns so downstream tools
  group cleanly across walk-forward windows. Extension point for
  metric_series and any future kind is the DATASETS tuple.

- LocalTieredStore: implements BacktestStore + SupportsCopyFrom.
  write() saves the bundle, upserts the Tier-1 row, and writes
  hive-partitioned Parquet sidecars per dataset. delete() removes all
  three tiers. iter_index_rows() serves from SQLite directly.
  rebuild_index() recreates Tier-1 from the bundles (useful after a
  software upgrade that adds new index columns).

- scan('portfolio_snapshots' | 'trades' | 'orders') returns a
  pyarrow.dataset.Dataset that DuckDB / Polars can query across every
  run with partition pruning on run_id.

- 15 new tests: Protocol + SupportsCopyFrom conformance, three-tier
  layout, handle normalisation, round-trip, summary_only, Tier-1
  always-in-sync (write/delete/len), Tier-2 cross-run scan, copy_from
  from LocalDirStore, rebuild_index, missing-handle errors. Includes
  a synthetic-records test that asserts hive partitions are written
  and that scan() returns the expected rows + columns.

Targeted suite (backtest_store + backtest_index + cli): 101 / 101 passing.
@MDUYN MDUYN merged commit f5fefbe into feature/iaf-backtest-store May 12, 2026
2 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant