Skip to content

Commit b3f86a0

Browse files
authored
Merge pull request #537 from coding-kitties/feature/bundle-format-v2
feat(bundle): format v2 + groundwork for tiered storage rewrite (epic #540)
2 parents d34fdb7 + 78a203b commit b3f86a0

22 files changed

Lines changed: 2501 additions & 58 deletions

File tree

docs/design/bundle-format-v2.md

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Bundle Format v2 — Public Specification
2+
3+
**Status:** Stable. Default writer since `v8.9.0` (May 2026).
4+
**File extension:** `.iafbt`
5+
**Backwards compatibility:** v1 bundles remain readable indefinitely.
6+
7+
This document describes the on-disk binary format produced by
8+
`save_bundle()` and consumed by `open_bundle()` /
9+
`Backtest.open()`. Third-party tools (e.g. the Finterion upload CLI
10+
and ingestion pipeline) can rely on this contract.
11+
12+
---
13+
14+
## Why v2
15+
16+
v1 stored the entire `Backtest.to_dict()` graph as a single
17+
zstd-compressed MessagePack document. That was already efficient for
18+
small backtests, but two structural problems became visible at scale
19+
(thousands of bundles per user):
20+
21+
1. **Heavy time series stored as JSON-ish lists of `(float,
22+
ISO-string)` tuples** — the strings dominate the on-disk size for
23+
long-running backtests (e.g. 10y daily ≈ 2,500 entries × 8 series).
24+
ISO-8601 strings are ~25 bytes each; an `int64` epoch-ms is 8 bytes
25+
and Parquet's columnar dictionary compression collapses repeated
26+
timestamps further.
27+
28+
2. **No way to distinguish vector from event backtests** in the
29+
on-disk envelope, even though they're produced by separate engines
30+
with subtly different semantics. Reports and analyses had to
31+
guess from filename or metadata.
32+
33+
v2 fixes both without breaking v1.
34+
35+
---
36+
37+
## Outer envelope (unchanged from v1)
38+
39+
```
40+
+-----------+-----------+--------------------------------+
41+
| 4 bytes | 4 bytes | N bytes |
42+
| "IAFB" | uint32 LE | zstd(level=7, msgpack(doc)) |
43+
+-----------+-----------+--------------------------------+
44+
magic version compressed body
45+
```
46+
47+
The 4-byte little-endian uint32 holds the format version (1 or 2).
48+
The body is always zstd-compressed MessagePack with `use_bin_type=True`.
49+
50+
Readers MUST reject any version > the highest they support, and SHOULD
51+
inspect the magic before attempting to decompress.
52+
53+
---
54+
55+
## v2 document structure
56+
57+
```python
58+
{
59+
"format_version": 2,
60+
"engine_type": "vector" | "event" | None,
61+
62+
# Engine-agnostic top-level fields (carry across both engines)
63+
"algorithm_id": str,
64+
"metadata": dict,
65+
"risk_free_rate": float | None,
66+
"strategy_ids": list,
67+
"parameters": dict,
68+
"tag": str | None,
69+
"backtest_permutation_tests": list | None,
70+
71+
# Exactly ONE of these pairs is populated based on engine_type:
72+
"vector_runs": [run_dict, ...], # if engine_type == "vector"
73+
"vector_metrics": summary_dict, # if engine_type == "vector"
74+
75+
"event_runs": [run_dict, ...], # if engine_type == "event"
76+
"event_metrics": summary_dict, # if engine_type == "event"
77+
78+
# Fallback for legacy / unknown-engine bundles:
79+
"backtest_runs": [run_dict, ...], # if engine_type is None
80+
"backtest_summary": summary_dict, # if engine_type is None
81+
82+
# Optional: embedded heavy-series Parquet blobs
83+
"blobs": {
84+
"runs/<idx>/metrics/<field>.parquet": bytes,
85+
...
86+
},
87+
88+
# Optional: OHLCV manifest (unchanged from v1)
89+
"ohlcv": {
90+
"store_dir": str, # relative to bundle file
91+
"manifest": {key: relative_path},
92+
},
93+
}
94+
```
95+
96+
### Engine routing
97+
98+
| `engine_type` | Runs key | Summary key |
99+
| ------------- | ------------- | ----------------- |
100+
| `"vector"` | `vector_runs` | `vector_metrics` |
101+
| `"event"` | `event_runs` | `event_metrics` |
102+
| `None` | `backtest_runs` | `backtest_summary` |
103+
104+
A bundle holds exactly **one** engine's results. Mixing engines in a
105+
single bundle is not supported in v2 — produce two bundles and store
106+
them in the same directory.
107+
108+
### Metric blob extraction
109+
110+
Eight `BacktestMetrics` fields are extracted from each run's
111+
`backtest_metrics` dict and replaced with a `{"@blob": "<key>"}`
112+
reference; the actual Parquet bytes go into the top-level `blobs` map.
113+
114+
The eight fields are all `List[Tuple[float, datetime|date]]`:
115+
116+
- `equity_curve`
117+
- `drawdown_series`
118+
- `cumulative_return_series`
119+
- `rolling_sharpe_ratio`
120+
- `monthly_returns`
121+
- `yearly_returns`
122+
- `twr_equity_curve`
123+
- `twr_drawdown_series`
124+
125+
Each blob is a 2-column Parquet file (zstd compression level 5):
126+
127+
| Column | Type | Semantics |
128+
| ------ | ------ | ------------------------------------------ |
129+
| `ts` | int64 | UTC epoch milliseconds |
130+
| `value`| float64| The metric value |
131+
132+
The blob key follows the convention
133+
`runs/<index>/metrics/<field_name>.parquet` where `<index>` is the
134+
zero-based offset of the run within `vector_runs` / `event_runs` /
135+
`backtest_runs` and `<field_name>` is one of the eight names above.
136+
137+
If a series has fewer than 2 entries, the writer leaves it inline
138+
(no blob extraction). Readers MUST handle both cases for any field.
139+
140+
### Other fields
141+
142+
Fields that are NOT extracted into Parquet blobs in v2:
143+
144+
- `portfolio_snapshots`, `trades`, `orders`, `positions` — stay as
145+
msgpack lists of dicts. Their schemas are unstable across model
146+
changes, and msgpack is sufficient for the typical row counts.
147+
- All scalar metrics (`sharpe_ratio`, `max_drawdown`, etc.) — stay
148+
inline. The whole point is keeping these fast to read.
149+
- `signals`, `signal_events`, `recorded_values`, `data_sources`,
150+
`metadata` on each run — stay inline.
151+
152+
A future v2.x revision MAY extract additional fields. Readers MUST
153+
treat the `blobs` map as authoritative: any key found there
154+
overrides the inline value (the writer is required to leave the
155+
inline placeholder as `{"@blob": "<key>"}` to make this unambiguous).
156+
157+
---
158+
159+
## Reader contract
160+
161+
`open_bundle(path)` MUST:
162+
163+
1. Read 8 bytes; verify magic, parse version.
164+
2. Decompress (zstd) and unpack (msgpack) the body.
165+
3. If `version == 1`: dispatch through the v1 reader (legacy
166+
`{"backtest": <to_dict>}` envelope).
167+
4. If `version == 2`: route runs/summary based on `engine_type`,
168+
resolve blob references against the `blobs` map (replacing each
169+
`{"@blob": "<key>"}` with the decoded `[(value, iso_string), ...]`
170+
list), and reconstruct a `Backtest` via `Backtest.from_dict`.
171+
5. Reject any `version > BUNDLE_FORMAT_VERSION`.
172+
173+
### Summary-only mode
174+
175+
`open_bundle(path, summary_only=True)` skips the Parquet decode step.
176+
Each blob reference is replaced with an empty list (so
177+
`BacktestMetrics.from_dict` doesn't choke). All scalar summary
178+
metrics (Sharpe, Sortino, max DD, CAGR, win-rate, …) remain fully
179+
populated. Use this for bulk listing / ranking pipelines that don't
180+
draw charts.
181+
182+
---
183+
184+
## Writer contract
185+
186+
`save_bundle(backtest, path)` MUST:
187+
188+
1. Default to `format_version = BUNDLE_FORMAT_VERSION` (currently 2).
189+
2. Accept `format_version=1` for explicit downgrade.
190+
3. Write atomically (write to `<path>.tmp`, then `os.replace`).
191+
4. Set `engine_type` from `backtest.engine_type`.
192+
5. For v2: extract the eight metric series into Parquet blobs only
193+
when the source list has at least one usable `(value, datetime)`
194+
pair; leave malformed or empty series inline.
195+
196+
### OHLCV float32 quantization
197+
198+
`save_bundle(..., float32_ohlcv=True)` downcasts float64 OHLCV
199+
columns to float32 before Parquet encoding. Typical reduction is ~2x
200+
on the OHLCV side store; backtest metrics are unaffected for
201+
crypto / equity time series. Off by default to preserve the v1
202+
exact-round-trip contract — opt in for upload / archive workflows.
203+
204+
---
205+
206+
## Size expectations
207+
208+
For a 10-year daily backtest with one run, three trades per week,
209+
typical metric-series savings:
210+
211+
| Item | v1 inline (ISO strings)| v2 Parquet blob |
212+
| ------------------------------ | ----------------------:| ---------------:|
213+
| `equity_curve` (2,500 entries) | ~120 KB | ~25 KB |
214+
| `drawdown_series` (2,500) | ~120 KB | ~22 KB |
215+
| `monthly_returns` (120) | ~6 KB | ~2 KB |
216+
| 8 series total | ~500 KB | ~80 KB |
217+
218+
Typical full-bundle size reduction for "metric-heavy" backtests
219+
(many runs, long horizons): **30-80%**. For "snapshot-heavy"
220+
backtests where `portfolio_snapshots` dominates, savings are smaller
221+
(snapshots aren't extracted in v2.0); a future v2.x revision will
222+
address this.
223+
224+
For tiny / smoke-test backtests with <50 entries per series, v2 can
225+
be **slightly larger** than v1 because Parquet's per-file overhead
226+
(~100 bytes) exceeds the savings. This is expected and harmless.
227+
228+
---
229+
230+
## Versioning policy
231+
232+
- Bumping the bundle `format_version` integer is a **breaking change
233+
for readers** of older framework versions.
234+
- The framework will continue to read all historical versions
235+
indefinitely. There is no plan to drop v1 read support.
236+
- Writers default to the highest version the framework knows about.
237+
- Additive changes within v2 (e.g. extracting more fields into
238+
blobs) MUST be safe for v2 readers that don't know about the new
239+
blobs — they should receive the inline value as a fallback.
240+
- A bundle with `format_version=2` MAY contain blob keys the reader
241+
doesn't recognise. Readers MUST ignore unknown blob keys.

0 commit comments

Comments
 (0)