Skip to content

Commit 0b2b939

Browse files
committed
Fix flake8 errors, docstrings, and add data provider docs
- Remove unused imports (F401) in alpha_vantage, polygon, yahoo, ohlcv_base - Fix CSVOHLCVDataProvider __init__ docstring placement - Fix PandasOHLCVDataProvider docstring: 'CSV file' -> 'pandas DataFrame' - Add custom data providers documentation - Update data sources docs with market info and links
1 parent 0521db7 commit 0b2b939

9 files changed

Lines changed: 595 additions & 170 deletions

File tree

Lines changed: 347 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,347 @@
1+
---
2+
sidebar_position: 6
3+
---
4+
5+
# Custom Data Providers
6+
7+
Learn how the data provider system works and how to build your own data provider to integrate any data source into the framework.
8+
9+
## How Data Providers Work
10+
11+
The framework uses a **priority-based provider resolution** system. When a strategy declares a `DataSource`, the framework automatically finds the right data provider to fulfill it:
12+
13+
1. **Registration** — All data providers are registered in a `DataProviderIndex`
14+
2. **Matching** — When a `DataSource` is declared, the framework calls `has_data()` on each registered provider
15+
3. **Priority** — If multiple providers match, the one with the lowest `priority` value wins
16+
4. **Instantiation** — The winning provider's `copy()` method creates a dedicated instance for that data source
17+
5. **Data retrieval** — The framework calls `get_data()` (live) or `get_backtest_data()` (backtesting) on the matched provider
18+
19+
```
20+
DataSource("AAPL", market="YAHOO", time_frame="1d")
21+
22+
23+
┌─────────────────────┐
24+
│ DataProviderIndex │
25+
│ ┌─────────────────┐ │
26+
│ │ has_data()? │ │ ← loops all registered providers
27+
│ └─────────────────┘ │
28+
└──────────┬──────────┘
29+
30+
┌────────────┼────────────┐
31+
▼ ▼ ▼
32+
CCXT Yahoo Polygon
33+
✗ ✓ ✗
34+
35+
36+
copy(data_source)
37+
38+
39+
Dedicated instance
40+
for AAPL / 1d / YAHOO
41+
```
42+
43+
## Built-in Data Providers
44+
45+
The framework ships with these OHLCV data providers:
46+
47+
| Provider | Market | API Key Required | Supported Assets |
48+
|----------|--------|-----------------|------------------|
49+
| `CCXTOHLCVDataProvider` | Any CCXT exchange (e.g. `BINANCE`, `BITVAVO`) | Depends on exchange | Crypto |
50+
| `YahooOHLCVDataProvider` | `YAHOO` | No | Stocks, ETFs, indices, forex, crypto |
51+
| `AlphaVantageOHLCVDataProvider` | `ALPHA_VANTAGE` | Yes | Stocks, forex, crypto |
52+
| `PolygonOHLCVDataProvider` | `POLYGON` | Yes | US stocks, options, forex, crypto |
53+
| `CSVOHLCVDataProvider` | N/A | No | Any (from local CSV files) |
54+
| `PandasOHLCVDataProvider` | N/A | No | Any (from pandas DataFrames) |
55+
56+
All OHLCV providers return data as [Polars](https://pola.rs/) DataFrames with columns: `Datetime`, `Open`, `High`, `Low`, `Close`, `Volume`.
57+
58+
## Creating a Custom OHLCV Data Provider
59+
60+
The easiest way to add a new data source is to extend `OHLCVDataProviderBase`. This base class handles all the boilerplate — storage caching, date range resolution, backtesting, `copy()` — and you only need to implement the API-specific download logic.
61+
62+
### Minimal Example
63+
64+
```python
65+
import polars as pl
66+
from datetime import datetime
67+
from investing_algorithm_framework import OHLCVDataProviderBase
68+
69+
class MyBrokerOHLCVDataProvider(OHLCVDataProviderBase):
70+
# The market string that DataSources will use
71+
market_name = "MY_BROKER"
72+
73+
# Unique identifier for this provider
74+
data_provider_identifier = "my_broker_ohlcv"
75+
76+
# Map framework timeframes to your API's format
77+
timeframe_map = {
78+
"1m": "1min",
79+
"5m": "5min",
80+
"1h": "60min",
81+
"1d": "daily",
82+
}
83+
84+
def _download_ohlcv(
85+
self,
86+
symbol: str,
87+
time_frame,
88+
start_date: datetime,
89+
end_date: datetime,
90+
) -> pl.DataFrame:
91+
"""
92+
Download OHLCV data from your broker's API.
93+
94+
Must return a Polars DataFrame with columns:
95+
Datetime (timezone-aware UTC), Open, High, Low, Close, Volume
96+
"""
97+
import my_broker_sdk
98+
99+
api_key = self._get_api_key() # reads from MarketCredential
100+
client = my_broker_sdk.Client(api_key)
101+
102+
interval = self._get_provider_interval() # resolves from timeframe_map
103+
raw_data = client.get_candles(
104+
symbol=symbol,
105+
interval=interval,
106+
start=start_date.isoformat(),
107+
end=end_date.isoformat(),
108+
)
109+
110+
# Convert to the required DataFrame format
111+
import pandas as pd
112+
df = pd.DataFrame(raw_data)
113+
df["Datetime"] = pd.to_datetime(df["timestamp"], utc=True)
114+
df = df.rename(columns={
115+
"open": "Open",
116+
"high": "High",
117+
"low": "Low",
118+
"close": "Close",
119+
"volume": "Volume",
120+
})
121+
122+
return pl.from_pandas(
123+
df[["Datetime", "Open", "High", "Low", "Close", "Volume"]]
124+
)
125+
```
126+
127+
### Using It
128+
129+
```python
130+
from investing_algorithm_framework import (
131+
create_app,
132+
DataSource,
133+
MarketCredential,
134+
TradingStrategy,
135+
TimeUnit,
136+
)
137+
138+
app = create_app()
139+
140+
# Register API credentials
141+
app.add_market_credential(
142+
MarketCredential(
143+
market="MY_BROKER",
144+
api_key="your_api_key",
145+
)
146+
)
147+
148+
# Register the custom provider
149+
app.add_data_provider(MyBrokerOHLCVDataProvider())
150+
151+
# Use it in a strategy
152+
class MyStrategy(TradingStrategy):
153+
time_unit = TimeUnit.DAY
154+
interval = 1
155+
symbols = ["AAPL"]
156+
trading_symbol = "USD"
157+
158+
data_sources = [
159+
DataSource(
160+
identifier="aapl_daily",
161+
market="MY_BROKER", # matches market_name
162+
symbol="AAPL",
163+
data_type="OHLCV",
164+
time_frame="1d", # must be in timeframe_map
165+
warmup_window=50,
166+
),
167+
]
168+
```
169+
170+
## OHLCVDataProviderBase Reference
171+
172+
### Class Attributes
173+
174+
| Attribute | Type | Required | Description |
175+
|-----------|------|----------|-------------|
176+
| `market_name` | `str` | Yes | The market identifier string (e.g. `"MY_BROKER"`). DataSources match against this. |
177+
| `timeframe_map` | `dict` | Yes | Maps framework timeframe strings (`"1m"`, `"1d"`, etc.) to provider-specific values. |
178+
| `data_provider_identifier` | `str` | Yes | Unique identifier for this provider type. |
179+
180+
### Methods to Override
181+
182+
#### `_download_ohlcv()` (required)
183+
184+
```python
185+
def _download_ohlcv(
186+
self,
187+
symbol: str,
188+
time_frame,
189+
start_date: datetime,
190+
end_date: datetime,
191+
) -> pl.DataFrame:
192+
```
193+
194+
Downloads OHLCV data from your external API. Must return a Polars DataFrame with columns `Datetime`, `Open`, `High`, `Low`, `Close`, `Volume`. The `Datetime` column must be timezone-aware UTC.
195+
196+
Use `self._get_provider_interval()` to get the mapped interval value from `timeframe_map`.
197+
198+
Use `self._get_api_key()` to retrieve the API key from the configured `MarketCredential`.
199+
200+
#### `_validate_symbol()` (optional)
201+
202+
```python
203+
def _validate_symbol(self, data_source: DataSource) -> bool:
204+
```
205+
206+
Called during `has_data()` to validate whether the provider supports the requested symbol. Defaults to returning `True`. Override this if your API provides a way to verify symbol availability.
207+
208+
#### `_storage_file_suffix()` (optional)
209+
210+
```python
211+
def _storage_file_suffix(self) -> str:
212+
```
213+
214+
Returns the suffix used for cached CSV file names. Defaults to `market_name.lower()`. Override if you need a different naming convention (e.g. `"alpha_vantage"` instead of `"alpha_vantage"`).
215+
216+
### Inherited Methods (no override needed)
217+
218+
These are handled automatically by the base class:
219+
220+
- `has_data()` — checks market name, timeframe support, storage cache, and calls `_validate_symbol()`
221+
- `get_data()` — resolves date ranges, checks cache, calls `_download_ohlcv()`, handles storage
222+
- `prepare_backtest_data()` — downloads full range and caches for backtesting
223+
- `get_backtest_data()` — slices cached data by backtest index date and window
224+
- `copy()` — creates a dedicated provider instance for a matched DataSource
225+
- `get_number_of_data_points()` — calculates expected data points for a date range
226+
- `get_missing_data_dates()` — returns dates with missing data
227+
228+
## Creating a Fully Custom Data Provider
229+
230+
If you need to provide non-OHLCV data or need complete control, extend `DataProvider` directly. You must implement all abstract methods:
231+
232+
```python
233+
from investing_algorithm_framework import DataProvider, DataType, DataSource
234+
235+
class CustomSentimentDataProvider(DataProvider):
236+
data_type = DataType.CUSTOM_DATA
237+
data_provider_identifier = "sentiment_provider"
238+
239+
def has_data(self, data_source, start_date=None, end_date=None):
240+
"""Return True if this provider can serve the data source."""
241+
return (
242+
data_source.data_type == "CUSTOM_DATA"
243+
and data_source.market == "SENTIMENT_API"
244+
)
245+
246+
def get_data(self, date=None, start_date=None, end_date=None, save=False):
247+
"""Fetch live data."""
248+
# Your API call here
249+
return {"sentiment_score": 0.75, "volume_buzz": 1.2}
250+
251+
def prepare_backtest_data(
252+
self, backtest_start_date, backtest_end_date,
253+
fill_missing_data=False, show_progress=False,
254+
):
255+
"""Download and cache historical data for backtesting."""
256+
self.data = self._fetch_historical(
257+
backtest_start_date, backtest_end_date
258+
)
259+
260+
def get_backtest_data(
261+
self, backtest_index_date, backtest_start_date=None,
262+
backtest_end_date=None, data_source=None,
263+
):
264+
"""Return data for a specific backtest date."""
265+
return self.data.get(backtest_index_date)
266+
267+
def copy(self, data_source):
268+
"""Create a new instance configured for this data source."""
269+
provider = CustomSentimentDataProvider()
270+
provider.symbol = data_source.symbol
271+
provider.market = data_source.market
272+
return provider
273+
274+
def get_number_of_data_points(self, start_date, end_date):
275+
return 0
276+
277+
def get_missing_data_dates(self, start_date, end_date):
278+
return []
279+
280+
def get_data_source_file_path(self):
281+
return None
282+
```
283+
284+
## Provider Priority
285+
286+
When multiple providers can serve the same DataSource, the framework picks the one with the lowest `priority` value:
287+
288+
```python
289+
class PrimaryProvider(OHLCVDataProviderBase):
290+
market_name = "STOCKS"
291+
priority = 0 # highest priority (default)
292+
...
293+
294+
class FallbackProvider(OHLCVDataProviderBase):
295+
market_name = "STOCKS"
296+
priority = 10 # lower priority, used as fallback
297+
...
298+
```
299+
300+
Custom providers added via `app.add_data_provider()` receive a default priority of `3`. Built-in providers have a priority of `0`.
301+
302+
## API Key Configuration
303+
304+
Providers that require authentication use `MarketCredential`:
305+
306+
```python
307+
from investing_algorithm_framework import MarketCredential
308+
309+
app.add_market_credential(
310+
MarketCredential(
311+
market="MY_BROKER", # must match provider's market_name
312+
api_key="your_api_key",
313+
secret_key="your_secret", # optional
314+
)
315+
)
316+
```
317+
318+
Inside your provider, call `self._get_api_key()` to retrieve the key. This reads from the `MarketCredential` whose `market` matches your provider's `market_name`.
319+
320+
API keys can also be configured via environment variables. `MarketCredential` automatically reads `{MARKET}_API_KEY` and `{MARKET}_SECRET_KEY`:
321+
322+
```bash
323+
export MY_BROKER_API_KEY=your_api_key
324+
export MY_BROKER_SECRET_KEY=your_secret
325+
```
326+
327+
```python
328+
# This will auto-read from MY_BROKER_API_KEY env var
329+
app.add_market_credential(MarketCredential(market="MY_BROKER"))
330+
```
331+
332+
## Storage and Caching
333+
334+
`OHLCVDataProviderBase` automatically caches downloaded data as CSV files. Files are named using the pattern:
335+
336+
```
337+
{symbol}_{timeframe}_{suffix}.csv
338+
```
339+
340+
For example: `AAPL_1d_my_broker.csv`
341+
342+
The storage directory is resolved in order:
343+
1. `storage_directory` passed to the constructor
344+
2. `storage_path` from the DataSource
345+
3. `RESOURCE_DIRECTORY/data/` from the app config
346+
347+
To disable caching, don't configure a storage directory and don't set `save=True` on the DataSource.

0 commit comments

Comments
 (0)