This project builds a machine learning pipeline to forecast enterprise Wi-Fi Access Point booking volumes for Cisco's product catalog. The core challenge is that COVID-era disruptions (FY2021–2022) create a structural break in the time series, making naive full-history models unreliable. The solution segments the data into pre- and post-COVID steady-state regimes, engineers lag and rolling-mean features, and applies a Bayesian-optimized Random Forest alongside SARIMA and XGBoost baselines. The best model achieves 13.6% MAPE on the High-End AP segment, outperforming all baselines on peak/trough handling while remaining interpretable.
- Built a multi-model demand forecasting pipeline (Random Forest, SARIMA, XGBoost) on Cisco Wi-Fi AP booking data, applying Bayesian hyperparameter optimization via Optuna across 50 trials to achieve 13.6% MAPE on the High-End AP segment
- Engineered a COVID regime-filtering strategy that isolates pre-2020 and post-2022 steady-state periods, eliminating structural-break noise and reducing model error by constructing lag-[1,3,12] and rolling-mean features on 54 clean training months
- Benchmarked four forecasting strategies (Seasonal Naive, Random Forest, SARIMA, XGBoost) across total-volume and segment-level targets, surfacing the bias-variance trade-off between MAPE-optimal (Seasonal Naive) and RMSE-optimal (Random Forest) forecasts
The dataset contains monthly Cisco Wi-Fi AP booking units segmented by product family, WiFi generation (Wi-Fi 5, Wi-Fi 6), and market segment (High End AP, Mid End AP, Premium End AP). The time series exhibits a structural break during COVID (FY2021–2022), so those years are excluded and the remaining data is split into two regime chunks — pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) — then concatenated to prevent cross-regime lag artifacts from polluting feature engineering. A RandomForestRegressor is first tuned via RandomizedSearchCV (15 iterations, 3-fold CV) and then further refined using Optuna's Bayesian sampler (50 trials, minimizing RMSE). A complementary XGBoost model (n_estimators=500, lr=0.05, max_depth=4) is developed in a parallel notebook, adding generation-share forecasting with lifecycle features (months_since_launch, End-of-Support date filtering) and per-generation lag/rolling-mean features. The 6-month holdout test window (2024-07 through 2024-12) is used for all final evaluations.
In this project, I forecasted enterprise Wi-Fi Access Point demand for Cisco using historical booking data spanning 2016 to 2024. The main technical challenge was COVID: the pandemic created a structural break in the time series that made standard models perform poorly. My solution was to exclude FY2021 and FY2022 entirely and stitch together the pre- and post-COVID steady-state periods into a single clean training set of 54 months. I then built lag and rolling-mean features — capturing short-term momentum (lag_1, lag_3) and seasonal patterns (lag_12) — and compared four forecasting approaches: Seasonal Naive as a strong baseline, SARIMA for classical time-series modeling, and Random Forest with both RandomizedSearch and Bayesian (Optuna) optimization. The Random Forest with Bayesian tuning achieved 13.6% MAPE on the High-End AP segment. A key insight was that Seasonal Naive had the lowest average MAPE (safest forecast), while Random Forest had lower RMSE (better at capturing demand spikes) — a bias-variance trade-off with real business implications for how conservative or aggressive a planner wants to be.
- Domain realism: The COVID regime-filtering is not a textbook trick — it reflects how practitioners actually handle structural breaks in enterprise demand data
- Dual-model architecture: Two complementary approaches (total-volume Random Forest in
WiFiModel (1).ipynb; generation-share XGBoost inWireless Forecasting.ipynb) cover different forecasting horizons and business questions - Interpretable trade-off analysis: The project explicitly surfaces the MAPE vs. RMSE trade-off rather than optimizing a single metric, which maps directly to business risk tolerance
- Bayesian optimization: Using Optuna for 50-trial hyperparameter search goes beyond standard grid/random search and demonstrates familiarity with modern AutoML tooling
Statistical & ML
- Time series feature engineering (lag features, rolling averages, calendar features)
- Regime detection and structural break handling
- Bayesian hyperparameter optimization (Optuna)
- Seasonal decomposition and SARIMA modeling
- Gradient boosting (XGBoost)
Programming & Tools
- Python (pandas, numpy, scikit-learn, statsmodels, optuna, xgboost, matplotlib)
- Jupyter Notebooks / Google Colab
- Excel data ingestion with
openpyxl
Analytical
- Model comparison and metric interpretation (MAPE vs. RMSE trade-off)
- Segment-level drill-down (High End AP, Mid End AP, Premium End AP)
- Generation lifecycle analysis (WiFi 5 → WiFi 6 share shift, EOS date filtering)
- 13.6% MAPE on High End AP segment (Bayesian-optimized Random Forest)
- 4 forecasting models benchmarked: Seasonal Naive, Moving Average, Random Forest, SARIMA
- 54 clean training months after COVID regime filtering (vs. ~96 raw months)
- 6-month holdout window used for all final evaluation (2024-07 to 2024-12)
Academic / collaborative project. Authors: Reid Sendroff and Jawad Hussein.