|
| 1 | +--- |
| 2 | +sidebar_position: 6 |
| 3 | +--- |
| 4 | + |
| 5 | +# Custom Data Providers |
| 6 | + |
| 7 | +Learn how the data provider system works and how to build your own data provider to integrate any data source into the framework. |
| 8 | + |
| 9 | +## How Data Providers Work |
| 10 | + |
| 11 | +The framework uses a **priority-based provider resolution** system. When a strategy declares a `DataSource`, the framework automatically finds the right data provider to fulfill it: |
| 12 | + |
| 13 | +1. **Registration** — All data providers are registered in a `DataProviderIndex` |
| 14 | +2. **Matching** — When a `DataSource` is declared, the framework calls `has_data()` on each registered provider |
| 15 | +3. **Priority** — If multiple providers match, the one with the lowest `priority` value wins |
| 16 | +4. **Instantiation** — The winning provider's `copy()` method creates a dedicated instance for that data source |
| 17 | +5. **Data retrieval** — The framework calls `get_data()` (live) or `get_backtest_data()` (backtesting) on the matched provider |
| 18 | + |
| 19 | +``` |
| 20 | +DataSource("AAPL", market="YAHOO", time_frame="1d") |
| 21 | + │ |
| 22 | + ▼ |
| 23 | + ┌─────────────────────┐ |
| 24 | + │ DataProviderIndex │ |
| 25 | + │ ┌─────────────────┐ │ |
| 26 | + │ │ has_data()? │ │ ← loops all registered providers |
| 27 | + │ └─────────────────┘ │ |
| 28 | + └──────────┬──────────┘ |
| 29 | + │ |
| 30 | + ┌────────────┼────────────┐ |
| 31 | + ▼ ▼ ▼ |
| 32 | + CCXT Yahoo Polygon |
| 33 | + ✗ ✓ ✗ |
| 34 | + │ |
| 35 | + ▼ |
| 36 | + copy(data_source) |
| 37 | + │ |
| 38 | + ▼ |
| 39 | + Dedicated instance |
| 40 | + for AAPL / 1d / YAHOO |
| 41 | +``` |
| 42 | + |
| 43 | +## Built-in Data Providers |
| 44 | + |
| 45 | +The framework ships with these OHLCV data providers: |
| 46 | + |
| 47 | +| Provider | Market | API Key Required | Supported Assets | |
| 48 | +|----------|--------|-----------------|------------------| |
| 49 | +| `CCXTOHLCVDataProvider` | Any CCXT exchange (e.g. `BINANCE`, `BITVAVO`) | Depends on exchange | Crypto | |
| 50 | +| `YahooOHLCVDataProvider` | `YAHOO` | No | Stocks, ETFs, indices, forex, crypto | |
| 51 | +| `AlphaVantageOHLCVDataProvider` | `ALPHA_VANTAGE` | Yes | Stocks, forex, crypto | |
| 52 | +| `PolygonOHLCVDataProvider` | `POLYGON` | Yes | US stocks, options, forex, crypto | |
| 53 | +| `CSVOHLCVDataProvider` | N/A | No | Any (from local CSV files) | |
| 54 | +| `PandasOHLCVDataProvider` | N/A | No | Any (from pandas DataFrames) | |
| 55 | + |
| 56 | +All OHLCV providers return data as [Polars](https://pola.rs/) DataFrames with columns: `Datetime`, `Open`, `High`, `Low`, `Close`, `Volume`. |
| 57 | + |
| 58 | +## Creating a Custom OHLCV Data Provider |
| 59 | + |
| 60 | +The easiest way to add a new data source is to extend `OHLCVDataProviderBase`. This base class handles all the boilerplate — storage caching, date range resolution, backtesting, `copy()` — and you only need to implement the API-specific download logic. |
| 61 | + |
| 62 | +### Minimal Example |
| 63 | + |
| 64 | +```python |
| 65 | +import polars as pl |
| 66 | +from datetime import datetime |
| 67 | +from investing_algorithm_framework import OHLCVDataProviderBase |
| 68 | + |
| 69 | +class MyBrokerOHLCVDataProvider(OHLCVDataProviderBase): |
| 70 | + # The market string that DataSources will use |
| 71 | + market_name = "MY_BROKER" |
| 72 | + |
| 73 | + # Unique identifier for this provider |
| 74 | + data_provider_identifier = "my_broker_ohlcv" |
| 75 | + |
| 76 | + # Map framework timeframes to your API's format |
| 77 | + timeframe_map = { |
| 78 | + "1m": "1min", |
| 79 | + "5m": "5min", |
| 80 | + "1h": "60min", |
| 81 | + "1d": "daily", |
| 82 | + } |
| 83 | + |
| 84 | + def _download_ohlcv( |
| 85 | + self, |
| 86 | + symbol: str, |
| 87 | + time_frame, |
| 88 | + start_date: datetime, |
| 89 | + end_date: datetime, |
| 90 | + ) -> pl.DataFrame: |
| 91 | + """ |
| 92 | + Download OHLCV data from your broker's API. |
| 93 | +
|
| 94 | + Must return a Polars DataFrame with columns: |
| 95 | + Datetime (timezone-aware UTC), Open, High, Low, Close, Volume |
| 96 | + """ |
| 97 | + import my_broker_sdk |
| 98 | + |
| 99 | + api_key = self._get_api_key() # reads from MarketCredential |
| 100 | + client = my_broker_sdk.Client(api_key) |
| 101 | + |
| 102 | + interval = self._get_provider_interval() # resolves from timeframe_map |
| 103 | + raw_data = client.get_candles( |
| 104 | + symbol=symbol, |
| 105 | + interval=interval, |
| 106 | + start=start_date.isoformat(), |
| 107 | + end=end_date.isoformat(), |
| 108 | + ) |
| 109 | + |
| 110 | + # Convert to the required DataFrame format |
| 111 | + import pandas as pd |
| 112 | + df = pd.DataFrame(raw_data) |
| 113 | + df["Datetime"] = pd.to_datetime(df["timestamp"], utc=True) |
| 114 | + df = df.rename(columns={ |
| 115 | + "open": "Open", |
| 116 | + "high": "High", |
| 117 | + "low": "Low", |
| 118 | + "close": "Close", |
| 119 | + "volume": "Volume", |
| 120 | + }) |
| 121 | + |
| 122 | + return pl.from_pandas( |
| 123 | + df[["Datetime", "Open", "High", "Low", "Close", "Volume"]] |
| 124 | + ) |
| 125 | +``` |
| 126 | + |
| 127 | +### Using It |
| 128 | + |
| 129 | +```python |
| 130 | +from investing_algorithm_framework import ( |
| 131 | + create_app, |
| 132 | + DataSource, |
| 133 | + MarketCredential, |
| 134 | + TradingStrategy, |
| 135 | + TimeUnit, |
| 136 | +) |
| 137 | + |
| 138 | +app = create_app() |
| 139 | + |
| 140 | +# Register API credentials |
| 141 | +app.add_market_credential( |
| 142 | + MarketCredential( |
| 143 | + market="MY_BROKER", |
| 144 | + api_key="your_api_key", |
| 145 | + ) |
| 146 | +) |
| 147 | + |
| 148 | +# Register the custom provider |
| 149 | +app.add_data_provider(MyBrokerOHLCVDataProvider()) |
| 150 | + |
| 151 | +# Use it in a strategy |
| 152 | +class MyStrategy(TradingStrategy): |
| 153 | + time_unit = TimeUnit.DAY |
| 154 | + interval = 1 |
| 155 | + symbols = ["AAPL"] |
| 156 | + trading_symbol = "USD" |
| 157 | + |
| 158 | + data_sources = [ |
| 159 | + DataSource( |
| 160 | + identifier="aapl_daily", |
| 161 | + market="MY_BROKER", # matches market_name |
| 162 | + symbol="AAPL", |
| 163 | + data_type="OHLCV", |
| 164 | + time_frame="1d", # must be in timeframe_map |
| 165 | + warmup_window=50, |
| 166 | + ), |
| 167 | + ] |
| 168 | +``` |
| 169 | + |
| 170 | +## OHLCVDataProviderBase Reference |
| 171 | + |
| 172 | +### Class Attributes |
| 173 | + |
| 174 | +| Attribute | Type | Required | Description | |
| 175 | +|-----------|------|----------|-------------| |
| 176 | +| `market_name` | `str` | Yes | The market identifier string (e.g. `"MY_BROKER"`). DataSources match against this. | |
| 177 | +| `timeframe_map` | `dict` | Yes | Maps framework timeframe strings (`"1m"`, `"1d"`, etc.) to provider-specific values. | |
| 178 | +| `data_provider_identifier` | `str` | Yes | Unique identifier for this provider type. | |
| 179 | + |
| 180 | +### Methods to Override |
| 181 | + |
| 182 | +#### `_download_ohlcv()` (required) |
| 183 | + |
| 184 | +```python |
| 185 | +def _download_ohlcv( |
| 186 | + self, |
| 187 | + symbol: str, |
| 188 | + time_frame, |
| 189 | + start_date: datetime, |
| 190 | + end_date: datetime, |
| 191 | +) -> pl.DataFrame: |
| 192 | +``` |
| 193 | + |
| 194 | +Downloads OHLCV data from your external API. Must return a Polars DataFrame with columns `Datetime`, `Open`, `High`, `Low`, `Close`, `Volume`. The `Datetime` column must be timezone-aware UTC. |
| 195 | + |
| 196 | +Use `self._get_provider_interval()` to get the mapped interval value from `timeframe_map`. |
| 197 | + |
| 198 | +Use `self._get_api_key()` to retrieve the API key from the configured `MarketCredential`. |
| 199 | + |
| 200 | +#### `_validate_symbol()` (optional) |
| 201 | + |
| 202 | +```python |
| 203 | +def _validate_symbol(self, data_source: DataSource) -> bool: |
| 204 | +``` |
| 205 | + |
| 206 | +Called during `has_data()` to validate whether the provider supports the requested symbol. Defaults to returning `True`. Override this if your API provides a way to verify symbol availability. |
| 207 | + |
| 208 | +#### `_storage_file_suffix()` (optional) |
| 209 | + |
| 210 | +```python |
| 211 | +def _storage_file_suffix(self) -> str: |
| 212 | +``` |
| 213 | + |
| 214 | +Returns the suffix used for cached CSV file names. Defaults to `market_name.lower()`. Override if you need a different naming convention (e.g. `"alpha_vantage"` instead of `"alpha_vantage"`). |
| 215 | + |
| 216 | +### Inherited Methods (no override needed) |
| 217 | + |
| 218 | +These are handled automatically by the base class: |
| 219 | + |
| 220 | +- `has_data()` — checks market name, timeframe support, storage cache, and calls `_validate_symbol()` |
| 221 | +- `get_data()` — resolves date ranges, checks cache, calls `_download_ohlcv()`, handles storage |
| 222 | +- `prepare_backtest_data()` — downloads full range and caches for backtesting |
| 223 | +- `get_backtest_data()` — slices cached data by backtest index date and window |
| 224 | +- `copy()` — creates a dedicated provider instance for a matched DataSource |
| 225 | +- `get_number_of_data_points()` — calculates expected data points for a date range |
| 226 | +- `get_missing_data_dates()` — returns dates with missing data |
| 227 | + |
| 228 | +## Creating a Fully Custom Data Provider |
| 229 | + |
| 230 | +If you need to provide non-OHLCV data or need complete control, extend `DataProvider` directly. You must implement all abstract methods: |
| 231 | + |
| 232 | +```python |
| 233 | +from investing_algorithm_framework import DataProvider, DataType, DataSource |
| 234 | + |
| 235 | +class CustomSentimentDataProvider(DataProvider): |
| 236 | + data_type = DataType.CUSTOM_DATA |
| 237 | + data_provider_identifier = "sentiment_provider" |
| 238 | + |
| 239 | + def has_data(self, data_source, start_date=None, end_date=None): |
| 240 | + """Return True if this provider can serve the data source.""" |
| 241 | + return ( |
| 242 | + data_source.data_type == "CUSTOM_DATA" |
| 243 | + and data_source.market == "SENTIMENT_API" |
| 244 | + ) |
| 245 | + |
| 246 | + def get_data(self, date=None, start_date=None, end_date=None, save=False): |
| 247 | + """Fetch live data.""" |
| 248 | + # Your API call here |
| 249 | + return {"sentiment_score": 0.75, "volume_buzz": 1.2} |
| 250 | + |
| 251 | + def prepare_backtest_data( |
| 252 | + self, backtest_start_date, backtest_end_date, |
| 253 | + fill_missing_data=False, show_progress=False, |
| 254 | + ): |
| 255 | + """Download and cache historical data for backtesting.""" |
| 256 | + self.data = self._fetch_historical( |
| 257 | + backtest_start_date, backtest_end_date |
| 258 | + ) |
| 259 | + |
| 260 | + def get_backtest_data( |
| 261 | + self, backtest_index_date, backtest_start_date=None, |
| 262 | + backtest_end_date=None, data_source=None, |
| 263 | + ): |
| 264 | + """Return data for a specific backtest date.""" |
| 265 | + return self.data.get(backtest_index_date) |
| 266 | + |
| 267 | + def copy(self, data_source): |
| 268 | + """Create a new instance configured for this data source.""" |
| 269 | + provider = CustomSentimentDataProvider() |
| 270 | + provider.symbol = data_source.symbol |
| 271 | + provider.market = data_source.market |
| 272 | + return provider |
| 273 | + |
| 274 | + def get_number_of_data_points(self, start_date, end_date): |
| 275 | + return 0 |
| 276 | + |
| 277 | + def get_missing_data_dates(self, start_date, end_date): |
| 278 | + return [] |
| 279 | + |
| 280 | + def get_data_source_file_path(self): |
| 281 | + return None |
| 282 | +``` |
| 283 | + |
| 284 | +## Provider Priority |
| 285 | + |
| 286 | +When multiple providers can serve the same DataSource, the framework picks the one with the lowest `priority` value: |
| 287 | + |
| 288 | +```python |
| 289 | +class PrimaryProvider(OHLCVDataProviderBase): |
| 290 | + market_name = "STOCKS" |
| 291 | + priority = 0 # highest priority (default) |
| 292 | + ... |
| 293 | + |
| 294 | +class FallbackProvider(OHLCVDataProviderBase): |
| 295 | + market_name = "STOCKS" |
| 296 | + priority = 10 # lower priority, used as fallback |
| 297 | + ... |
| 298 | +``` |
| 299 | + |
| 300 | +Custom providers added via `app.add_data_provider()` receive a default priority of `3`. Built-in providers have a priority of `0`. |
| 301 | + |
| 302 | +## API Key Configuration |
| 303 | + |
| 304 | +Providers that require authentication use `MarketCredential`: |
| 305 | + |
| 306 | +```python |
| 307 | +from investing_algorithm_framework import MarketCredential |
| 308 | + |
| 309 | +app.add_market_credential( |
| 310 | + MarketCredential( |
| 311 | + market="MY_BROKER", # must match provider's market_name |
| 312 | + api_key="your_api_key", |
| 313 | + secret_key="your_secret", # optional |
| 314 | + ) |
| 315 | +) |
| 316 | +``` |
| 317 | + |
| 318 | +Inside your provider, call `self._get_api_key()` to retrieve the key. This reads from the `MarketCredential` whose `market` matches your provider's `market_name`. |
| 319 | + |
| 320 | +API keys can also be configured via environment variables. `MarketCredential` automatically reads `{MARKET}_API_KEY` and `{MARKET}_SECRET_KEY`: |
| 321 | + |
| 322 | +```bash |
| 323 | +export MY_BROKER_API_KEY=your_api_key |
| 324 | +export MY_BROKER_SECRET_KEY=your_secret |
| 325 | +``` |
| 326 | + |
| 327 | +```python |
| 328 | +# This will auto-read from MY_BROKER_API_KEY env var |
| 329 | +app.add_market_credential(MarketCredential(market="MY_BROKER")) |
| 330 | +``` |
| 331 | + |
| 332 | +## Storage and Caching |
| 333 | + |
| 334 | +`OHLCVDataProviderBase` automatically caches downloaded data as CSV files. Files are named using the pattern: |
| 335 | + |
| 336 | +``` |
| 337 | +{symbol}_{timeframe}_{suffix}.csv |
| 338 | +``` |
| 339 | + |
| 340 | +For example: `AAPL_1d_my_broker.csv` |
| 341 | + |
| 342 | +The storage directory is resolved in order: |
| 343 | +1. `storage_directory` passed to the constructor |
| 344 | +2. `storage_path` from the DataSource |
| 345 | +3. `RESOURCE_DIRECTORY/data/` from the app config |
| 346 | + |
| 347 | +To disable caching, don't configure a storage directory and don't set `save=True` on the DataSource. |
0 commit comments