Skip to content

Commit abd4895

Browse files
authored
Merge pull request #513 from coding-kitties/dev
release: v8.6.1 (dev → main)
2 parents bdccc47 + 355c4ec commit abd4895

26 files changed

Lines changed: 1913 additions & 360 deletions

File tree

.github/workflows/test.yml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,15 +58,6 @@ jobs:
5858
with:
5959
python-version: ${{ matrix.python-version }}
6060
#----------------------------------------------
61-
# ----- install distutils if needed -----
62-
#----------------------------------------------
63-
- name: Install distutils on Ubuntu
64-
if: matrix.os == 'ubuntu-latest'
65-
run: |
66-
sudo add-apt-repository ppa:deadsnakes/ppa
67-
sudo apt-get update
68-
sudo apt install python${{ matrix.python-version }}-distutils
69-
#----------------------------------------------
7061
# ----- install & configure poetry -----
7162
#----------------------------------------------
7263
- name: Install Poetry

docs/design/pipeline-api.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Pipeline API — Design Doc
2+
3+
> Status: **DRAFT for review**.
4+
> Tracking issue: [#438](https://github.com/coding-kitties/investing-algorithm-framework/issues/438).
5+
> Phase issues: [#501](https://github.com/coding-kitties/investing-algorithm-framework/issues/501) (event), [#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502) (vector), [#503](https://github.com/coding-kitties/investing-algorithm-framework/issues/503) (live).
6+
7+
## 1. Goals & non-goals
8+
9+
### Goals
10+
11+
1. Declarative cross-sectional factor / filter / classifier computation across an asset universe at each bar.
12+
2. Look-ahead-safe by construction: a factor evaluated at bar `t` sees only data with timestamp `≤ t`.
13+
3. Strict opt-in: strategies without `pipelines = [...]` see **zero** behavioural or performance change.
14+
4. Three execution backends (event backtest, vector backtest, live) sharing the same `Pipeline` definition.
15+
16+
### Non-goals (v1)
17+
18+
- Full Zipline-Pipeline parity (no classifier hierarchies, no winsorization, no OLS factors).
19+
- Live pipelines on sub-daily timeframes.
20+
- Universes outside the supported envelope (see §6).
21+
- Cross-market order routing (separate issue).
22+
23+
## 2. Public API
24+
25+
```python
26+
from investing_algorithm_framework import (
27+
TradingStrategy, Pipeline, Returns, AverageDollarVolume, TimeUnit,
28+
)
29+
30+
class MomentumScreener(Pipeline):
31+
dollar_volume = AverageDollarVolume(window=30)
32+
momentum = Returns(window=60)
33+
34+
universe = dollar_volume.top(100)
35+
alpha = momentum.rank(mask=universe)
36+
37+
38+
class MyStrategy(TradingStrategy):
39+
time_unit = TimeUnit.DAY
40+
interval = 1
41+
pipelines = [MomentumScreener]
42+
universe = ["BTC/EUR", "ETH/EUR", ...] # candidate symbols
43+
44+
def run_strategy(self, context, data):
45+
out = data["MomentumScreener"] # pl.DataFrame, one row per surviving symbol
46+
...
47+
```
48+
49+
### Class attributes added to `TradingStrategy`
50+
51+
| Attribute | Type | Default | Meaning |
52+
|---|---|---|---|
53+
| `pipelines` | `list[type[Pipeline]]` | `[]` | Pipelines to run before each `run_strategy` call. |
54+
| `universe` | `list[str] \| list[DataSource]` | `[]` | Candidate symbols. Folded into `data_sources` at app startup; pipelines filter down. |
55+
56+
### `Pipeline` class
57+
58+
- Class attributes that are `Factor` / `Filter` instances are introspected via `__init_subclass__`.
59+
- A class attribute named `universe` is treated as the **root mask**: if present, every other column is computed on the masked subset.
60+
- All other attributes become columns of the output frame.
61+
62+
## 3. Panel shape
63+
64+
The engine's internal representation is a **long-form Polars DataFrame**:
65+
66+
```
67+
schema = {
68+
"datetime": pl.Datetime,
69+
"symbol": pl.Utf8,
70+
"open": pl.Float32,
71+
"high": pl.Float32,
72+
"low": pl.Float32,
73+
"close": pl.Float32,
74+
"volume": pl.Float32,
75+
}
76+
```
77+
78+
Long-form is chosen because:
79+
80+
- Polars rolling/group-by is faster on long form than on wide.
81+
- Sparse symbols (delisted, late-listed) are natural — no NaN columns.
82+
- Cache files are smaller (no per-symbol column duplication).
83+
84+
Per-bar pipeline output handed to the strategy is a **wide** frame keyed by symbol:
85+
86+
```
87+
out = pl.DataFrame({
88+
"symbol": pl.Utf8,
89+
"<factor name>": pl.Float64, # one column per Factor/Filter on the Pipeline
90+
...
91+
})
92+
```
93+
94+
## 4. Engine API (internal)
95+
96+
```python
97+
class PipelineEngine(Protocol):
98+
def evaluate_at(
99+
self,
100+
pipeline: type[Pipeline],
101+
as_of: datetime,
102+
) -> pl.DataFrame: ...
103+
"""Event mode: return wide per-symbol frame for the given timestamp."""
104+
105+
def evaluate_range(
106+
self,
107+
pipeline: type[Pipeline],
108+
start: datetime,
109+
end: datetime,
110+
) -> pl.DataFrame: ...
111+
"""Vector mode: return long (date, symbol)-indexed frame for the range."""
112+
```
113+
114+
Two implementations:
115+
116+
- `LazyPolarsPipelineEngine` (event + vector). Compiles Factor expressions into a single `pl.LazyFrame` plan; `collect()` only at the boundary.
117+
- `LiveBatchedPipelineEngine` (Phase 3). Adds async batched fetch + universe-refresh.
118+
119+
## 5. Cache key (Phase 2)
120+
121+
Cache lives under `<resource_dir>/pipeline_cache/`. Key:
122+
123+
```
124+
hash(
125+
universe_hash: sha1(sorted(symbol_list)),
126+
daterange: (start.isoformat(), end.isoformat()),
127+
timeframe: e.g. "1d",
128+
expr_hash: sha1(canonical_repr(factor_expression_tree)),
129+
schema_version: int, # bump on any cache-incompatible change
130+
)
131+
```
132+
133+
Hits return the cached panel/factor frame without recomputation.
134+
Parameter sweeps over **non-pipeline** attributes (signal thresholds, position sizing) reuse the cache for free.
135+
136+
## 6. Performance contract
137+
138+
| Mode | Timeframe | Max universe | Tested in CI |
139+
|---|---|---|---|
140+
| Event BT | daily | 5,000 ||
141+
| Event BT | 4h / 1h | 1,000 / 500 ||
142+
| Event BT | < 1h || ❌ raises |
143+
| Vector BT | daily | 5,000 ||
144+
| Vector BT | 4h / 1h | 1,000 / 500 ||
145+
| Vector BT | < 1h || ❌ raises |
146+
| Live | daily | 50 | smoke only |
147+
| Live | < daily || ❌ raises |
148+
149+
**Opt-in guarantee (CI-asserted):** vector backtest of the existing single-symbol example must run within ±10% of the pre-pipeline baseline wall-clock.
150+
151+
## 7. Built-in factors (v1)
152+
153+
| Factor | Formula |
154+
|---|---|
155+
| `Returns(window=N)` | `close.pct_change(N)` |
156+
| `AverageDollarVolume(window=N)` | `(close * volume).rolling_mean(N)` |
157+
| `SMA(window=N)` | `close.rolling_mean(N)` |
158+
| `RSI(window=N)` | standard Wilder RSI |
159+
| `Volatility(window=N)` | `log_returns.rolling_std(N) * sqrt(periods_per_year)` |
160+
161+
All other factors mentioned in #438's original draft (`MACD`, `BollingerBands`, `EWMA`, `VWAP`, `MaxDrawdown`) are deferred. Users can subclass `CustomFactor`:
162+
163+
```python
164+
class MACD(CustomFactor):
165+
inputs = ["close"]
166+
window = 26
167+
168+
def compute(self, close: pl.Series) -> pl.Series:
169+
...
170+
```
171+
172+
## 8. Look-ahead safety
173+
174+
Factors operate on a Polars `LazyFrame` filtered to `datetime <= as_of` *before* any rolling op. Rolling windows are right-aligned (closed on the right). Tests must assert that injecting a future bar does not change a past factor value.
175+
176+
## 9. Open questions
177+
178+
1. **Universe declaration ergonomics.** Do we accept a callable `universe = lambda ctx: top_500_by_market_cap()` or only a static list in v1? (Proposed: static list in v1, callable in v2.)
179+
2. **Pipeline scheduling.** Always run every bar, or honour a per-pipeline `time_unit`? (Proposed: same `time_unit` as the strategy in v1; per-pipeline scheduling in v2.)
180+
3. **Multiple pipelines on one strategy.** Independent (each gets its own cache key) or composable (one pipeline can reference another's column)? (Proposed: independent in v1.)
181+
4. **Float32 vs float64.** Default to float32 for memory; users opt into float64 per factor? (Proposed: yes, factor-level `dtype=` override.)
182+
183+
## 10. Out of code, in the order of work
184+
185+
1. ✅ This doc reviewed and merged.
186+
2. Phase 1 ([#501](https://github.com/coding-kitties/investing-algorithm-framework/issues/501)) — event backtest + 5 factors.
187+
3. Phase 2 ([#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502)) — vector + cache + benchmark.
188+
4. Phase 3 ([#503](https://github.com/coding-kitties/investing-algorithm-framework/issues/503)) — live, gated on async CCXT fetch.

0 commit comments

Comments
 (0)