Skip to content

Commit 9ece0fb

Browse files
authored
Merge pull request #506 from coding-kitties/feat/501-pipeline-api-phase1
feat(pipeline): add Phase 1 cross-sectional Pipeline API (closes #501)
2 parents 355c4ec + 43c7631 commit 9ece0fb

28 files changed

Lines changed: 2257 additions & 1 deletion

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,4 +164,5 @@ venv
164164
examples/tutorial/data
165165
examples/tutorial/backtest_results
166166
examples/tutorial/resources
167+
examples/**/resources/
167168
.data_cache/
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
---
2+
sidebar_position: 10
3+
title: Pipelines — Event-driven backtest
4+
description: Use cross-sectional pipelines in the event-driven backtest engine (Phase 1, available today).
5+
---
6+
7+
# Pipelines: Event-driven backtest
8+
9+
This is the full Phase 1 reference. Read [Pipelines](pipelines.md) first
10+
for the high-level concept.
11+
12+
## Quick start
13+
14+
```python
15+
from typing import Any, Dict
16+
17+
from investing_algorithm_framework import (
18+
AverageDollarVolume,
19+
BacktestDateRange,
20+
Context,
21+
DataSource,
22+
Pipeline,
23+
Returns,
24+
TimeUnit,
25+
TradingStrategy,
26+
create_app,
27+
)
28+
29+
30+
class MomentumScreener(Pipeline):
31+
dollar_volume = AverageDollarVolume(window=30)
32+
momentum = Returns(window=30)
33+
34+
universe = dollar_volume.top(3)
35+
alpha = momentum.rank(mask=universe)
36+
37+
38+
class CrossSectionalMomentum(TradingStrategy):
39+
algorithm_id = "cross-sectional-momentum"
40+
time_unit = TimeUnit.DAY
41+
interval = 1
42+
data_sources = [
43+
DataSource(
44+
data_type="OHLCV",
45+
market="binance",
46+
symbol=symbol,
47+
warmup_window=60,
48+
time_frame="1d",
49+
identifier=f"{symbol}-ohlcv",
50+
)
51+
for symbol in ["BTC/EUR", "ETH/EUR", "SOL/EUR", "ADA/EUR", "XRP/EUR"]
52+
]
53+
pipelines = [MomentumScreener]
54+
55+
def run_strategy(self, context: Context, data: Dict[str, Any]):
56+
screen = data["MomentumScreener"]
57+
top = screen.sort("alpha", descending=True).head(2)
58+
for row in top.iter_rows(named=True):
59+
print(row["symbol"], row["momentum"], row["alpha"])
60+
61+
62+
app = create_app()
63+
app.add_strategy(CrossSectionalMomentum)
64+
app.add_market(market="binance", trading_symbol="EUR", initial_balance=1000)
65+
66+
if __name__ == "__main__":
67+
app.run_backtest(
68+
backtest_date_range=BacktestDateRange(
69+
start_date="2024-01-01", end_date="2024-06-01"
70+
),
71+
)
72+
```
73+
74+
A complete runnable example lives in
75+
[`examples/pipeline_momentum_screener.py`](https://github.com/coding-kitties/investing-algorithm-framework/blob/dev/examples/pipeline_momentum_screener.py).
76+
77+
## How it works
78+
79+
On every iteration of the event loop:
80+
81+
1. The framework discovers your strategy's OHLCV data sources from
82+
`strategy.data_sources` (filtered by `DataType.OHLCV`).
83+
2. For each `Pipeline` listed in `strategy.pipelines`:
84+
- Each per-symbol OHLCV frame is converted to Polars (Pandas inputs
85+
are auto-converted) and lower-cased.
86+
- The frames are stacked into a long-form panel and **truncated at
87+
the current bar** (`datetime <= as_of`) — guaranteed no
88+
look-ahead.
89+
- Each declared `Factor` is computed in vectorised Polars over the
90+
full panel.
91+
- The pipeline's `universe` filter (if any) is applied as a top-level
92+
mask; symbols failing the mask are dropped.
93+
- The frame is sliced to the current bar, and the result is stored
94+
under `data["YourPipelineClassName"]`.
95+
96+
The output is a `polars.DataFrame` with columns:
97+
98+
```
99+
symbol | <factor_1> | <factor_2> | ... | <factor_n>
100+
```
101+
102+
Symbols with no data at the current bar (e.g. listed late) are dropped.
103+
104+
## Declaring a pipeline
105+
106+
```python
107+
from investing_algorithm_framework import (
108+
AverageDollarVolume, Pipeline, Returns, RSI, SMA, Volatility,
109+
)
110+
111+
class MyScreen(Pipeline):
112+
# Any factor declared as a class attribute becomes an output column
113+
# named after the attribute.
114+
momentum = Returns(window=30)
115+
rsi = RSI(window=14)
116+
vol = Volatility(window=30)
117+
118+
# Optional: a Filter assigned to `universe` becomes the master mask.
119+
# Every other column is restricted to symbols where universe is True.
120+
# The universe column itself is NOT exposed in the output.
121+
universe = AverageDollarVolume(window=30).top(100)
122+
123+
# `rank` works inside the universe.
124+
alpha = momentum.rank(mask=universe)
125+
```
126+
127+
Rules enforced at class definition time:
128+
129+
- A pipeline must declare at least one factor column (otherwise
130+
`TypeError`).
131+
- The `universe` attribute, if present, **must** be a `Filter` (e.g.
132+
`factor.top(n)` / `factor.bottom(n)`); using a plain `Factor` raises
133+
`TypeError`.
134+
- Factors and filters are inherited via the MRO; subclass declarations
135+
override parent ones with the same name.
136+
137+
## Built-in factors
138+
139+
| Class | Inputs | Notes |
140+
| --- | --- | --- |
141+
| `Returns(window)` | close | `close[t] / close[t - window] - 1` |
142+
| `AverageDollarVolume(window)` | close, volume | rolling mean of `close * volume` |
143+
| `SMA(window)` | close | simple moving average |
144+
| `RSI(window)` | close | Wilder's RSI; clamps to 100 when there are no losses |
145+
| `Volatility(window, periods_per_year=252)` | close | rolling stdev of log returns × √periods_per_year |
146+
147+
All built-ins compute per-symbol via `over("symbol")` so symbols are
148+
independent.
149+
150+
## Custom factors
151+
152+
Subclass `CustomFactor` and implement `compute_panel`:
153+
154+
```python
155+
import polars as pl
156+
from investing_algorithm_framework import CustomFactor
157+
158+
class HighLowRange(CustomFactor):
159+
inputs = ["high", "low"]
160+
window = 1
161+
162+
def compute_panel(self, panel: pl.DataFrame) -> pl.Series:
163+
return (panel["high"] - panel["low"]).rename("range")
164+
```
165+
166+
`compute_panel` receives the full long-form panel and must return a
167+
`pl.Series` aligned with the panel rows. Set:
168+
169+
- `inputs` — the OHLCV columns you read from the panel.
170+
- `window` — the lookback in bars (used for warmup sizing checks in
171+
future phases; also exposed via `pipeline.required_window()`).
172+
173+
## Cross-sectional ops
174+
175+
```python
176+
factor.rank(mask=optional_filter) # ascending ordinal rank per bar
177+
factor.top(n) # mask: top-n per bar by descending value
178+
factor.bottom(n) # mask: bottom-n per bar by ascending value
179+
```
180+
181+
`rank` returns ordinal ranks (1, 2, 3, …) within each `datetime`. With
182+
a `mask`, symbols outside the mask receive `null`.
183+
184+
## Reading the result
185+
186+
```python
187+
def run_strategy(self, context, data):
188+
screen: pl.DataFrame = data["MomentumScreener"]
189+
if screen.is_empty():
190+
return # universe drained or warmup not yet satisfied
191+
192+
top = screen.sort("alpha", descending=True).head(5)
193+
symbols = top["symbol"].to_list()
194+
```
195+
196+
Common patterns:
197+
198+
```python
199+
# Symbol → row dict
200+
rows = {row["symbol"]: row for row in screen.iter_rows(named=True)}
201+
202+
# To pandas if you prefer:
203+
pdf = screen.to_pandas()
204+
```
205+
206+
## Performance notes
207+
208+
Phase 1 is **eager**: the panel is rebuilt on every iteration. That is
209+
fine for daily/hourly backtests with up to a few hundred symbols. If
210+
you need to push further, Phase 2 ([#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502))
211+
introduces a vector-mode pipeline executor that materialises factors
212+
once over the full backtest window.
213+
214+
## Limitations (Phase 1)
215+
216+
- No factor arithmetic (`a + b`, `a / b`, `(a - b).zscore()`); use a
217+
`CustomFactor` for now.
218+
- No cached results between bars (rebuilt each iteration).
219+
- Only OHLCV inputs. External data joining is on the roadmap.
220+
221+
These are intentional — the goal of Phase 1 is to nail down the public
222+
declarative surface (`Pipeline`, `Factor`, `Filter`, `top` / `bottom` /
223+
`rank`) before scaling the executor.
224+
225+
## Troubleshooting
226+
227+
- **`TypeError: <Pipeline>.universe must be a Filter`** — assign a
228+
`Filter` (e.g. `factor.top(100)`), not a raw `Factor`.
229+
- **`KeyError: ... missing required column 'volume'`** — your data
230+
source is not OHLCV; pipelines need full OHLCV frames.
231+
- **Empty result frame** — either your warmup hasn't yet satisfied the
232+
largest `window`, or your universe filtered every symbol out.
233+
234+
## See also
235+
236+
- [Pipelines](pipelines.md) — concept page.
237+
- [Pipelines: Vector backtest](pipelines-vector-backtest.md) — Phase 2 roadmap.
238+
- [Pipelines: Live trading](pipelines-live.md) — Phase 3 roadmap.
239+
- Design doc: [`docs/design/pipeline-api.md`](https://github.com/coding-kitties/investing-algorithm-framework/blob/dev/docs/design/pipeline-api.md).
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
sidebar_position: 12
3+
title: Pipelines — Live trading (roadmap)
4+
description: Pipelines in live and paper trading. Tracked under #503.
5+
---
6+
7+
# Pipelines: Live trading
8+
9+
:::info Status: hardening (Phase 3)
10+
11+
Because pipelines run inside the same event loop as backtests, they
12+
**already execute** when you run a strategy live or in paper mode.
13+
However, Phase 3 covers the production-grade work needed to declare
14+
this officially supported: streaming-friendly panel updates, tighter
15+
warmup-window guarantees, and explicit market-hours / partial-bar
16+
handling.
17+
18+
Tracked under
19+
[#503](https://github.com/coding-kitties/investing-algorithm-framework/issues/503).
20+
:::
21+
22+
## What works today
23+
24+
Strategies with `pipelines = [...]` run end-to-end against any
25+
configured live market. The pipeline output appears in
26+
`data["YourPipelineClassName"]` exactly like in a backtest.
27+
28+
If you want to experiment, the same example covers both:
29+
[`examples/pipeline_momentum_screener.py`](https://github.com/coding-kitties/investing-algorithm-framework/blob/dev/examples/pipeline_momentum_screener.py).
30+
31+
## What's planned for Phase 3
32+
33+
- A streaming panel that appends new bars instead of rebuilding from
34+
scratch.
35+
- Validation of `warmup_window` against `pipeline.required_window()`
36+
so you get a clear error at startup if any data source is too short.
37+
- First-class handling of partial bars (so you don't accidentally trade
38+
on an unclosed candle).
39+
- Live observability hooks for pipeline output (print/log/snapshot).
40+
41+
## What stays the same
42+
43+
The same `Pipeline` subclasses you use in backtests run live.
44+
45+
## Want to help?
46+
47+
Track or comment on the implementation issue:
48+
[#503 — Pipeline API: Phase 3 (live hardening)](https://github.com/coding-kitties/investing-algorithm-framework/issues/503).
49+
50+
## See also
51+
52+
- [Pipelines](pipelines.md) — concept page.
53+
- [Pipelines: Event-driven backtest](pipelines-event-backtest.md) — full Phase 1 reference.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
sidebar_position: 11
3+
title: Pipelines — Vector backtest (roadmap)
4+
description: Vector-mode pipeline execution. Tracked under #502.
5+
---
6+
7+
# Pipelines: Vector backtest
8+
9+
:::info Status: not yet shipped (Phase 2)
10+
11+
Vector-mode pipelines are tracked under
12+
[#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502).
13+
The public API (`Pipeline`, `Factor`, `Filter`) defined in
14+
[Pipelines: Event-driven backtest](pipelines-event-backtest.md) is
15+
intentionally engine-agnostic, so strategies you write against Phase 1
16+
will keep working when Phase 2 lands.
17+
:::
18+
19+
## What's planned
20+
21+
- A vector executor that materialises every factor in the pipeline
22+
once over the **entire** backtest window, instead of rebuilding the
23+
panel on each event.
24+
- Integration with the existing vector backtester (see
25+
[Vector backtesting](vector-backtesting.md)).
26+
- Cached intermediate frames so a `rank` of a `Returns` doesn't
27+
recompute returns.
28+
- Optional Polars **lazy** execution path for memory-bound runs.
29+
30+
## What stays the same
31+
32+
The strategy author surface — declaring a `Pipeline` subclass, listing
33+
it on `strategy.pipelines`, and reading
34+
`data["YourPipelineClassName"]` inside `run_strategy` — does not
35+
change. Switching from event mode to vector mode is meant to be a
36+
runner choice, not a strategy rewrite.
37+
38+
## Want to help?
39+
40+
Track or comment on the implementation issue:
41+
[#502 — Pipeline API: Phase 2 (vector executor)](https://github.com/coding-kitties/investing-algorithm-framework/issues/502).
42+
43+
## See also
44+
45+
- [Pipelines](pipelines.md) — concept page.
46+
- [Pipelines: Event-driven backtest](pipelines-event-backtest.md) — what works today.

0 commit comments

Comments
 (0)