Skip to content

Commit 4af8b2d

Browse files
committed
docs(pipeline): document factor algebra, cross-sectional transforms, lazy executor (#502)
- Add 'Cross-sectional transforms' and 'Factor algebra' sections to pipelines.md covering zscore/demean/winsorize and arithmetic operators with shared sub-expression caching. - Rewrite pipelines-vector-backtest.md from roadmap stub to user-facing reference: how the engine evaluates factors, lazy/streaming option, equivalence guarantee with event mode. - Update Phase 2 status to shipped.
1 parent aabf9ae commit 4af8b2d

2 files changed

Lines changed: 116 additions & 31 deletions

File tree

Lines changed: 68 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,84 @@
11
---
22
sidebar_position: 11
3-
title: Pipelines — Vector backtest (roadmap)
4-
description: Vector-mode pipeline execution. Tracked under #502.
3+
title: Pipelines — Vector backtest
4+
description: Vector-mode pipeline execution. Phase 2 (#502) — shipped.
55
---
66

77
# Pipelines: Vector backtest
88

9-
:::info Status: not yet shipped (Phase 2)
10-
11-
Vector-mode pipelines are tracked under
12-
[#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502).
13-
The public API (`Pipeline`, `Factor`, `Filter`) defined in
14-
[Pipelines: Event-driven backtest](pipelines-event-backtest.md) is
15-
intentionally engine-agnostic, so strategies you write against Phase 1
16-
will keep working when Phase 2 lands.
9+
:::tip Status: shipped (Phase 2)
10+
Vector-mode pipelines run by default whenever you backtest with
11+
`BacktestService` — no opt-in needed. The vector engine evaluates each
12+
declared factor across the **entire** backtest window once per
13+
strategy iteration, with shared sub-expression caching.
1714
:::
1815

19-
## What's planned
16+
## How it works
17+
18+
When `BacktestService` runs, it inspects each strategy's `pipelines`
19+
list and routes them through `VectorPipelineEngine`. For every
20+
iteration:
21+
22+
1. The engine builds a long-form Polars panel
23+
`(datetime, symbol, open, high, low, close, volume)` truncated at
24+
the current bar (no look-ahead).
25+
2. Each declared `Factor` is evaluated **once** in vectorised Polars,
26+
per symbol, over the full window.
27+
3. A per-evaluation cache (a `ContextVar`) memoises shared
28+
sub-expressions — for example `r.zscore() - r.demean()` only
29+
computes `r` once.
30+
4. The optional `universe` mask filters the result; the universe
31+
column itself is dropped from the output.
32+
5. The strategy receives the wide frame via
33+
`data["YourPipelineClassName"]`.
34+
35+
The strategy author surface is unchanged from
36+
[Pipelines: Event-driven backtest](pipelines-event-backtest.md): you
37+
write the same `Pipeline` subclasses and read the same
38+
`data["..."]` frames.
39+
40+
## Lazy / streaming execution
41+
42+
For memory-bound runs over very large universes you can opt the
43+
post-factor pipeline (universe filter + drop + sort) onto Polars'
44+
streaming engine:
45+
46+
```python
47+
from investing_algorithm_framework.services.pipeline import (
48+
VectorPipelineEngine,
49+
)
2050

21-
- A vector executor that materialises every factor in the pipeline
22-
once over the **entire** backtest window, instead of rebuilding the
23-
panel on each event.
24-
- Integration with the existing vector backtester (see
25-
[Vector backtesting](vector-backtesting.md)).
26-
- Cached intermediate frames so a `rank` of a `Returns` doesn't
27-
recompute returns.
28-
- Optional Polars **lazy** execution path for memory-bound runs.
51+
engine = VectorPipelineEngine(lazy=True)
52+
result = engine.evaluate_window(
53+
pipeline_cls=MomentumScreener,
54+
data_object=panel_data,
55+
symbol_to_identifier=sym_id,
56+
)
57+
```
2958

30-
## What stays the same
59+
`lazy=True` is **bit-for-bit equivalent** to the default eager mode
60+
(this is verified by an equivalence test in the suite). It only
61+
changes how the result frame is collected — factors themselves still
62+
return eager `pl.Series` values per symbol. On older Polars versions
63+
that don't accept `engine="streaming"` on `collect`, the engine falls
64+
back to a default collect transparently.
3165

32-
The strategy author surface — declaring a `Pipeline` subclass, listing
33-
it on `strategy.pipelines`, and reading
34-
`data["YourPipelineClassName"]` inside `run_strategy` — does not
35-
change. Switching from event mode to vector mode is meant to be a
36-
runner choice, not a strategy rewrite.
66+
You typically don't need to instantiate `VectorPipelineEngine`
67+
yourself; `BacktestService` handles it. The `lazy` flag is exposed for
68+
direct users of the engine and for performance experiments.
3769

38-
## Want to help?
70+
## Equivalence with event mode
3971

40-
Track or comment on the implementation issue:
41-
[#502 — Pipeline API: Phase 2 (vector executor)](https://github.com/coding-kitties/investing-algorithm-framework/issues/502).
72+
Vector and event mode are required to produce **identical** factor
73+
values for the same panel and same `as_of`. The test suite enforces
74+
this with cross-mode equivalence tests in
75+
`tests/services/pipeline/test_vector_pipeline_engine.py`. If you find
76+
a discrepancy, that's a bug — please file an issue.
4277

4378
## See also
4479

45-
- [Pipelines](pipelines.md) — concept page.
46-
- [Pipelines: Event-driven backtest](pipelines-event-backtest.md) — what works today.
80+
- [Pipelines](pipelines.md) — concept page (factor algebra, transforms).
81+
- [Pipelines: Event-driven backtest](pipelines-event-backtest.md)
82+
same surface, event executor.
83+
- [Pipelines: Live trading](pipelines-live.md) — stateless / serverless
84+
notes.

docusaurus/docs/Advanced Concepts/pipelines.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,53 @@ factor.top(n) # boolean mask: top-n by descending value
9090
factor.bottom(n) # boolean mask: bottom-n by ascending value
9191
```
9292

93+
### Cross-sectional transforms
94+
95+
Per-bar normalisation operators (Phase 2). Each takes an optional
96+
`mask` so the statistic is computed only over the universe that
97+
passes the mask:
98+
99+
```python
100+
factor.zscore(mask=universe) # (x - mean) / std per bar
101+
factor.demean(mask=universe) # x - mean per bar
102+
factor.winsorize(0.01, 0.99, # clip to per-bar quantiles
103+
mask=universe)
104+
```
105+
106+
Where the cross-sectional `std` is `0` or undefined (e.g. only one
107+
symbol survives the mask), `zscore` returns `null` rather than
108+
`inf`/`NaN`. Masked-out symbols are excluded from the bar's
109+
statistic *and* from the bar's output.
110+
111+
### Factor algebra
112+
113+
Factors compose via the standard arithmetic operators. The framework
114+
auto-coerces scalar operands and shares sub-expression results via a
115+
per-evaluation cache, so the same input factor is computed once even
116+
when it appears multiple times:
117+
118+
```python
119+
class MyScreener(Pipeline):
120+
momentum = Returns(window=30)
121+
vol = Volatility(window=30)
122+
123+
universe = AverageDollarVolume(window=30).top(100)
124+
125+
# Composite alphas — `momentum` is computed once even though it
126+
# appears in two terms.
127+
risk_adjusted = momentum / vol
128+
score = (
129+
momentum.zscore(mask=universe)
130+
- 0.5 * vol.zscore(mask=universe)
131+
)
132+
```
133+
134+
Supported operators: `+`, `-`, `*`, `/`, unary `-`. Both operands may
135+
be `Factor` instances; either may be a Python `int` or `float`.
136+
Division by zero leaves `inf` in place (downstream filters can drop
137+
it) — for safe normalisation prefer `zscore`, which guards against
138+
zero dispersion.
139+
93140
## Phased rollout
94141

95142
Pipelines run today in the **event-driven backtest** path and in
@@ -99,7 +146,7 @@ and cached/lazy execution are tracked separately.
99146
| Mode | Status | Page |
100147
| --- | --- | --- |
101148
| Event-driven backtest | ✅ Phase 1 | [Pipelines: Event-driven backtest](pipelines-event-backtest.md) |
102-
| Vector backtest | 🚧 Phase 2 ([#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502)) | [Pipelines: Vector backtest](pipelines-vector-backtest.md) |
149+
| Vector backtest | Phase 2 ([#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502)) | [Pipelines: Vector backtest](pipelines-vector-backtest.md) |
103150
| Live trading | 🚧 Phase 3 ([#503](https://github.com/coding-kitties/investing-algorithm-framework/issues/503)) | [Pipelines: Live trading](pipelines-live.md) |
104151

105152
Start with the event-driven backtest page — it covers the full Phase 1

0 commit comments

Comments
 (0)