|
| 1 | +--- |
| 2 | +sidebar_position: 10 |
| 3 | +title: Pipelines — Event-driven backtest |
| 4 | +description: Use cross-sectional pipelines in the event-driven backtest engine (Phase 1, available today). |
| 5 | +--- |
| 6 | + |
| 7 | +# Pipelines: Event-driven backtest |
| 8 | + |
| 9 | +This is the full Phase 1 reference. Read [Pipelines](pipelines.md) first |
| 10 | +for the high-level concept. |
| 11 | + |
| 12 | +## Quick start |
| 13 | + |
| 14 | +```python |
| 15 | +from typing import Any, Dict |
| 16 | + |
| 17 | +from investing_algorithm_framework import ( |
| 18 | + AverageDollarVolume, |
| 19 | + BacktestDateRange, |
| 20 | + Context, |
| 21 | + DataSource, |
| 22 | + Pipeline, |
| 23 | + Returns, |
| 24 | + TimeUnit, |
| 25 | + TradingStrategy, |
| 26 | + create_app, |
| 27 | +) |
| 28 | + |
| 29 | + |
| 30 | +class MomentumScreener(Pipeline): |
| 31 | + dollar_volume = AverageDollarVolume(window=30) |
| 32 | + momentum = Returns(window=30) |
| 33 | + |
| 34 | + universe = dollar_volume.top(3) |
| 35 | + alpha = momentum.rank(mask=universe) |
| 36 | + |
| 37 | + |
| 38 | +class CrossSectionalMomentum(TradingStrategy): |
| 39 | + algorithm_id = "cross-sectional-momentum" |
| 40 | + time_unit = TimeUnit.DAY |
| 41 | + interval = 1 |
| 42 | + data_sources = [ |
| 43 | + DataSource( |
| 44 | + data_type="OHLCV", |
| 45 | + market="binance", |
| 46 | + symbol=symbol, |
| 47 | + warmup_window=60, |
| 48 | + time_frame="1d", |
| 49 | + identifier=f"{symbol}-ohlcv", |
| 50 | + ) |
| 51 | + for symbol in ["BTC/EUR", "ETH/EUR", "SOL/EUR", "ADA/EUR", "XRP/EUR"] |
| 52 | + ] |
| 53 | + pipelines = [MomentumScreener] |
| 54 | + |
| 55 | + def run_strategy(self, context: Context, data: Dict[str, Any]): |
| 56 | + screen = data["MomentumScreener"] |
| 57 | + top = screen.sort("alpha", descending=True).head(2) |
| 58 | + for row in top.iter_rows(named=True): |
| 59 | + print(row["symbol"], row["momentum"], row["alpha"]) |
| 60 | + |
| 61 | + |
| 62 | +app = create_app() |
| 63 | +app.add_strategy(CrossSectionalMomentum) |
| 64 | +app.add_market(market="binance", trading_symbol="EUR", initial_balance=1000) |
| 65 | + |
| 66 | +if __name__ == "__main__": |
| 67 | + app.run_backtest( |
| 68 | + backtest_date_range=BacktestDateRange( |
| 69 | + start_date="2024-01-01", end_date="2024-06-01" |
| 70 | + ), |
| 71 | + ) |
| 72 | +``` |
| 73 | + |
| 74 | +A complete runnable example lives in |
| 75 | +[`examples/pipeline_momentum_screener.py`](https://github.com/coding-kitties/investing-algorithm-framework/blob/dev/examples/pipeline_momentum_screener.py). |
| 76 | + |
| 77 | +## How it works |
| 78 | + |
| 79 | +On every iteration of the event loop: |
| 80 | + |
| 81 | +1. The framework discovers your strategy's OHLCV data sources from |
| 82 | + `strategy.data_sources` (filtered by `DataType.OHLCV`). |
| 83 | +2. For each `Pipeline` listed in `strategy.pipelines`: |
| 84 | + - Each per-symbol OHLCV frame is converted to Polars (Pandas inputs |
| 85 | + are auto-converted) and lower-cased. |
| 86 | + - The frames are stacked into a long-form panel and **truncated at |
| 87 | + the current bar** (`datetime <= as_of`) — guaranteed no |
| 88 | + look-ahead. |
| 89 | + - Each declared `Factor` is computed in vectorised Polars over the |
| 90 | + full panel. |
| 91 | + - The pipeline's `universe` filter (if any) is applied as a top-level |
| 92 | + mask; symbols failing the mask are dropped. |
| 93 | + - The frame is sliced to the current bar, and the result is stored |
| 94 | + under `data["YourPipelineClassName"]`. |
| 95 | + |
| 96 | +The output is a `polars.DataFrame` with columns: |
| 97 | + |
| 98 | +``` |
| 99 | +symbol | <factor_1> | <factor_2> | ... | <factor_n> |
| 100 | +``` |
| 101 | + |
| 102 | +Symbols with no data at the current bar (e.g. listed late) are dropped. |
| 103 | + |
| 104 | +## Declaring a pipeline |
| 105 | + |
| 106 | +```python |
| 107 | +from investing_algorithm_framework import ( |
| 108 | + AverageDollarVolume, Pipeline, Returns, RSI, SMA, Volatility, |
| 109 | +) |
| 110 | + |
| 111 | +class MyScreen(Pipeline): |
| 112 | + # Any factor declared as a class attribute becomes an output column |
| 113 | + # named after the attribute. |
| 114 | + momentum = Returns(window=30) |
| 115 | + rsi = RSI(window=14) |
| 116 | + vol = Volatility(window=30) |
| 117 | + |
| 118 | + # Optional: a Filter assigned to `universe` becomes the master mask. |
| 119 | + # Every other column is restricted to symbols where universe is True. |
| 120 | + # The universe column itself is NOT exposed in the output. |
| 121 | + universe = AverageDollarVolume(window=30).top(100) |
| 122 | + |
| 123 | + # `rank` works inside the universe. |
| 124 | + alpha = momentum.rank(mask=universe) |
| 125 | +``` |
| 126 | + |
| 127 | +Rules enforced at class definition time: |
| 128 | + |
| 129 | +- A pipeline must declare at least one factor column (otherwise |
| 130 | + `TypeError`). |
| 131 | +- The `universe` attribute, if present, **must** be a `Filter` (e.g. |
| 132 | + `factor.top(n)` / `factor.bottom(n)`); using a plain `Factor` raises |
| 133 | + `TypeError`. |
| 134 | +- Factors and filters are inherited via the MRO; subclass declarations |
| 135 | + override parent ones with the same name. |
| 136 | + |
| 137 | +## Built-in factors |
| 138 | + |
| 139 | +| Class | Inputs | Notes | |
| 140 | +| --- | --- | --- | |
| 141 | +| `Returns(window)` | close | `close[t] / close[t - window] - 1` | |
| 142 | +| `AverageDollarVolume(window)` | close, volume | rolling mean of `close * volume` | |
| 143 | +| `SMA(window)` | close | simple moving average | |
| 144 | +| `RSI(window)` | close | Wilder's RSI; clamps to 100 when there are no losses | |
| 145 | +| `Volatility(window, periods_per_year=252)` | close | rolling stdev of log returns × √periods_per_year | |
| 146 | + |
| 147 | +All built-ins compute per-symbol via `over("symbol")` so symbols are |
| 148 | +independent. |
| 149 | + |
| 150 | +## Custom factors |
| 151 | + |
| 152 | +Subclass `CustomFactor` and implement `compute_panel`: |
| 153 | + |
| 154 | +```python |
| 155 | +import polars as pl |
| 156 | +from investing_algorithm_framework import CustomFactor |
| 157 | + |
| 158 | +class HighLowRange(CustomFactor): |
| 159 | + inputs = ["high", "low"] |
| 160 | + window = 1 |
| 161 | + |
| 162 | + def compute_panel(self, panel: pl.DataFrame) -> pl.Series: |
| 163 | + return (panel["high"] - panel["low"]).rename("range") |
| 164 | +``` |
| 165 | + |
| 166 | +`compute_panel` receives the full long-form panel and must return a |
| 167 | +`pl.Series` aligned with the panel rows. Set: |
| 168 | + |
| 169 | +- `inputs` — the OHLCV columns you read from the panel. |
| 170 | +- `window` — the lookback in bars (used for warmup sizing checks in |
| 171 | + future phases; also exposed via `pipeline.required_window()`). |
| 172 | + |
| 173 | +## Cross-sectional ops |
| 174 | + |
| 175 | +```python |
| 176 | +factor.rank(mask=optional_filter) # ascending ordinal rank per bar |
| 177 | +factor.top(n) # mask: top-n per bar by descending value |
| 178 | +factor.bottom(n) # mask: bottom-n per bar by ascending value |
| 179 | +``` |
| 180 | + |
| 181 | +`rank` returns ordinal ranks (1, 2, 3, …) within each `datetime`. With |
| 182 | +a `mask`, symbols outside the mask receive `null`. |
| 183 | + |
| 184 | +## Reading the result |
| 185 | + |
| 186 | +```python |
| 187 | +def run_strategy(self, context, data): |
| 188 | + screen: pl.DataFrame = data["MomentumScreener"] |
| 189 | + if screen.is_empty(): |
| 190 | + return # universe drained or warmup not yet satisfied |
| 191 | + |
| 192 | + top = screen.sort("alpha", descending=True).head(5) |
| 193 | + symbols = top["symbol"].to_list() |
| 194 | +``` |
| 195 | + |
| 196 | +Common patterns: |
| 197 | + |
| 198 | +```python |
| 199 | +# Symbol → row dict |
| 200 | +rows = {row["symbol"]: row for row in screen.iter_rows(named=True)} |
| 201 | + |
| 202 | +# To pandas if you prefer: |
| 203 | +pdf = screen.to_pandas() |
| 204 | +``` |
| 205 | + |
| 206 | +## Performance notes |
| 207 | + |
| 208 | +Phase 1 is **eager**: the panel is rebuilt on every iteration. That is |
| 209 | +fine for daily/hourly backtests with up to a few hundred symbols. If |
| 210 | +you need to push further, Phase 2 ([#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502)) |
| 211 | +introduces a vector-mode pipeline executor that materialises factors |
| 212 | +once over the full backtest window. |
| 213 | + |
| 214 | +## Limitations (Phase 1) |
| 215 | + |
| 216 | +- No factor arithmetic (`a + b`, `a / b`, `(a - b).zscore()`); use a |
| 217 | + `CustomFactor` for now. |
| 218 | +- No cached results between bars (rebuilt each iteration). |
| 219 | +- Only OHLCV inputs. External data joining is on the roadmap. |
| 220 | + |
| 221 | +These are intentional — the goal of Phase 1 is to nail down the public |
| 222 | +declarative surface (`Pipeline`, `Factor`, `Filter`, `top` / `bottom` / |
| 223 | +`rank`) before scaling the executor. |
| 224 | + |
| 225 | +## Troubleshooting |
| 226 | + |
| 227 | +- **`TypeError: <Pipeline>.universe must be a Filter`** — assign a |
| 228 | + `Filter` (e.g. `factor.top(100)`), not a raw `Factor`. |
| 229 | +- **`KeyError: ... missing required column 'volume'`** — your data |
| 230 | + source is not OHLCV; pipelines need full OHLCV frames. |
| 231 | +- **Empty result frame** — either your warmup hasn't yet satisfied the |
| 232 | + largest `window`, or your universe filtered every symbol out. |
| 233 | + |
| 234 | +## See also |
| 235 | + |
| 236 | +- [Pipelines](pipelines.md) — concept page. |
| 237 | +- [Pipelines: Vector backtest](pipelines-vector-backtest.md) — Phase 2 roadmap. |
| 238 | +- [Pipelines: Live trading](pipelines-live.md) — Phase 3 roadmap. |
| 239 | +- Design doc: [`docs/design/pipeline-api.md`](https://github.com/coding-kitties/investing-algorithm-framework/blob/dev/docs/design/pipeline-api.md). |
0 commit comments