|
1 | 1 | --- |
2 | 2 | sidebar_position: 11 |
3 | | -title: Pipelines — Vector backtest (roadmap) |
4 | | -description: Vector-mode pipeline execution. Tracked under #502. |
| 3 | +title: Pipelines — Vector backtest |
| 4 | +description: Vector-mode pipeline execution. Phase 2 (#502) — shipped. |
5 | 5 | --- |
6 | 6 |
|
7 | 7 | # Pipelines: Vector backtest |
8 | 8 |
|
9 | | -:::info Status: not yet shipped (Phase 2) |
10 | | - |
11 | | -Vector-mode pipelines are tracked under |
12 | | -[#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502). |
13 | | -The public API (`Pipeline`, `Factor`, `Filter`) defined in |
14 | | -[Pipelines: Event-driven backtest](pipelines-event-backtest.md) is |
15 | | -intentionally engine-agnostic, so strategies you write against Phase 1 |
16 | | -will keep working when Phase 2 lands. |
| 9 | +:::tip Status: shipped (Phase 2) |
| 10 | +Vector-mode pipelines run by default whenever you backtest with |
| 11 | +`BacktestService` — no opt-in needed. The vector engine evaluates each |
| 12 | +declared factor across the **entire** backtest window once per |
| 13 | +strategy iteration, with shared sub-expression caching. |
17 | 14 | ::: |
18 | 15 |
|
19 | | -## What's planned |
| 16 | +## How it works |
| 17 | + |
| 18 | +When `BacktestService` runs, it inspects each strategy's `pipelines` |
| 19 | +list and routes them through `VectorPipelineEngine`. For every |
| 20 | +iteration: |
| 21 | + |
| 22 | +1. The engine builds a long-form Polars panel |
| 23 | + `(datetime, symbol, open, high, low, close, volume)` truncated at |
| 24 | + the current bar (no look-ahead). |
| 25 | +2. Each declared `Factor` is evaluated **once** in vectorised Polars, |
| 26 | + per symbol, over the full window. |
| 27 | +3. A per-evaluation cache (a `ContextVar`) memoises shared |
| 28 | + sub-expressions — for example `r.zscore() - r.demean()` only |
| 29 | + computes `r` once. |
| 30 | +4. The optional `universe` mask filters the result; the universe |
| 31 | + column itself is dropped from the output. |
| 32 | +5. The strategy receives the wide frame via |
| 33 | + `data["YourPipelineClassName"]`. |
| 34 | + |
| 35 | +The strategy author surface is unchanged from |
| 36 | +[Pipelines: Event-driven backtest](pipelines-event-backtest.md): you |
| 37 | +write the same `Pipeline` subclasses and read the same |
| 38 | +`data["..."]` frames. |
| 39 | + |
| 40 | +## Lazy / streaming execution |
| 41 | + |
| 42 | +For memory-bound runs over very large universes you can opt the |
| 43 | +post-factor pipeline (universe filter + drop + sort) onto Polars' |
| 44 | +streaming engine: |
| 45 | + |
| 46 | +```python |
| 47 | +from investing_algorithm_framework.services.pipeline import ( |
| 48 | + VectorPipelineEngine, |
| 49 | +) |
20 | 50 |
|
21 | | -- A vector executor that materialises every factor in the pipeline |
22 | | - once over the **entire** backtest window, instead of rebuilding the |
23 | | - panel on each event. |
24 | | -- Integration with the existing vector backtester (see |
25 | | - [Vector backtesting](vector-backtesting.md)). |
26 | | -- Cached intermediate frames so a `rank` of a `Returns` doesn't |
27 | | - recompute returns. |
28 | | -- Optional Polars **lazy** execution path for memory-bound runs. |
| 51 | +engine = VectorPipelineEngine(lazy=True) |
| 52 | +result = engine.evaluate_window( |
| 53 | + pipeline_cls=MomentumScreener, |
| 54 | + data_object=panel_data, |
| 55 | + symbol_to_identifier=sym_id, |
| 56 | +) |
| 57 | +``` |
29 | 58 |
|
30 | | -## What stays the same |
| 59 | +`lazy=True` is **bit-for-bit equivalent** to the default eager mode |
| 60 | +(this is verified by an equivalence test in the suite). It only |
| 61 | +changes how the result frame is collected — factors themselves still |
| 62 | +return eager `pl.Series` values per symbol. On older Polars versions |
| 63 | +that don't accept `engine="streaming"` on `collect`, the engine falls |
| 64 | +back to a default collect transparently. |
31 | 65 |
|
32 | | -The strategy author surface — declaring a `Pipeline` subclass, listing |
33 | | -it on `strategy.pipelines`, and reading |
34 | | -`data["YourPipelineClassName"]` inside `run_strategy` — does not |
35 | | -change. Switching from event mode to vector mode is meant to be a |
36 | | -runner choice, not a strategy rewrite. |
| 66 | +You typically don't need to instantiate `VectorPipelineEngine` |
| 67 | +yourself; `BacktestService` handles it. The `lazy` flag is exposed for |
| 68 | +direct users of the engine and for performance experiments. |
37 | 69 |
|
38 | | -## Want to help? |
| 70 | +## Equivalence with event mode |
39 | 71 |
|
40 | | -Track or comment on the implementation issue: |
41 | | -[#502 — Pipeline API: Phase 2 (vector executor)](https://github.com/coding-kitties/investing-algorithm-framework/issues/502). |
| 72 | +Vector and event mode are required to produce **identical** factor |
| 73 | +values for the same panel and same `as_of`. The test suite enforces |
| 74 | +this with cross-mode equivalence tests in |
| 75 | +`tests/services/pipeline/test_vector_pipeline_engine.py`. If you find |
| 76 | +a discrepancy, that's a bug — please file an issue. |
42 | 77 |
|
43 | 78 | ## See also |
44 | 79 |
|
45 | | -- [Pipelines](pipelines.md) — concept page. |
46 | | -- [Pipelines: Event-driven backtest](pipelines-event-backtest.md) — what works today. |
| 80 | +- [Pipelines](pipelines.md) — concept page (factor algebra, transforms). |
| 81 | +- [Pipelines: Event-driven backtest](pipelines-event-backtest.md) — |
| 82 | + same surface, event executor. |
| 83 | +- [Pipelines: Live trading](pipelines-live.md) — stateless / serverless |
| 84 | + notes. |
0 commit comments