Full institutional pairs trading system: universe screening (Engle-Granger + Johansen) → Kalman Filter dynamic hedge ratio → OU parameter estimation → XGBoost regime classifier → event-driven backtest → multi-pair portfolio with cross-correlation constraint.
Project Structure
project4_stat_arb/
├── config.py ← Screening thresholds, KF params, entry/exit rules
├── data_gen.py ← Cointegrated + non-cointegrated synthetic universe
├── screener.py ← Correlation → EG → ADF → Johansen → OU → entry-z
├── kalman_filter.py ← 2-state KF for dynamic hedge ratio β(t)
├── ou_model.py ← OU estimation, z-score, half-life, stationary σ
├── regime.py ← Hurst + autocorr features → GBM regime classifier
├── trader.py ← Event-driven pair backtest engine
├── portfolio.py ← Greedy pair selection + cross-correlation constraint
├── analytics.py ← Sharpe, profit factor, win rate, max drawdown
├── dashboard.py ← 8-panel stat-arb analytics dashboard
├── main.py ← Pipeline orchestrator
└── requirements.txt
How to Run
cd project4_stat_arb
pip install -r requirements.txt
python main.pyExpected terminal output:
Universe generated: 12 series × 3,000 bars
Screening 15 pairs...
4 cointegrated pairs found.
Pair Corr EG-p β HL(d) entry-z
EURUSD/GBPUSD 0.891 0.0231 0.7412 18.4 2.100
AUDUSD/NZDUSD 0.924 0.0118 1.0981 9.2 1.850
...
Portfolio: 3 pairs selected (cross-corr constraint: max 0.65)
Backtesting EURUSD/GBPUSD...
n_trades=47 win_rate=0.621 sharpe=1.43 pf=1.82 dd=-1.24%
Dashboard saved → statarb_dashboard.png
Pipeline
Universe prices
│
Screener (screener.py)
├─ Pearson correlation filter (|ρ| > 0.55)
├─ Engle-Granger cointegration test (p < 0.10)
├─ ADF test on OLS residuals
├─ Johansen rank test
├─ OU parameter estimation (κ, σ, half-life via AR(1) OLS)
└─ Optimal entry z-score (Elliott 1994 approximation)
│
KalmanFilter (kalman_filter.py)
└─ 2-state KF: β(t) updated every bar — no look-ahead drift
│
RegimeClassifier (regime.py)
└─ Features: Hurst (R/S), autocorr lag 1/5, vol ratio, |z-score|
└─ GBM classifier: regime 1 = mean-reverting, 0 = trending
│
PairTrader (trader.py)
└─ Entry: |z| > entry_z and regime == 1
Exit: |z| < 0.5 OR |z| > 4.0 (stop) OR t > 60 bars (time stop)
│
Portfolio (portfolio.py)
└─ Greedy selection: max 4 pairs, cross-spread corr < 0.65
│
Analytics + Dashboard
Step Test Threshold
1 Pearson correlation |ρ| > 0.55
2 Engle-Granger ADF on residuals p-value < 0.10
3 ADF on OLS spread p-value < 0.10
4 OU half-life 1 ≤ HL ≤ 120 days
5 Johansen rank computed, displayed
A static OLS hedge ratio is estimated once and fixed for the entire backtest. In practice, pair relationships drift due to regime changes and carry shifts. The 2-state Kalman Filter updates β every single bar, capturing these drifts before they cause large losses.
State: [β, intercept]
Observation: p_a(t) = β(t)·p_b(t) + intercept(t) + ε
Transition: state(t) = state(t−1) + w (random walk prior)
statarb_dashboard.png — 8-panel dashboard:
Price series (both legs, dual y-axis)
Kalman dynamic hedge ratio β(t)
Kalman-filtered spread
Z-score with trade entry/exit markers (● entry, × exit)
Regime filter overlay (mean-reverting blue / trending red + Hurst line)
Equity curve with green/red fill
Trade P&L distribution (winners vs losers)
Portfolio summary table (all pairs: Sharpe, win rate, profit factor, drawdown)
Engle & Granger (1987). Co-integration and Error Correction. Econometrica.
Johansen (1988). Statistical Analysis of Cointegration Vectors. J. Econ. Dyn. Control.
Kalman (1960). A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng.
Elliott (1994). Optimal Trading of Mean Reverting Processes. Stanford Technical Report.
Pole (2007). Statistical Arbitrage: Algorithmic Trading Insights. Wiley.
Requirements
numpy>=1.26
pandas>=2.1
scipy>=1.11
statsmodels>=0.14
scikit-learn>=1.4
matplotlib>=3.8