Skip to content

Latest commit

 

History

History
56 lines (38 loc) · 5.54 KB

File metadata and controls

56 lines (38 loc) · 5.54 KB

Project Summary — Cisco Wi-Fi Access Point Demand Forecasting

Concise Summary

This project builds a machine learning pipeline to forecast enterprise Wi-Fi Access Point booking volumes for Cisco's product catalog. The core challenge is that COVID-era disruptions (FY2021–2022) create a structural break in the time series, making naive full-history models unreliable. The solution segments the data into pre- and post-COVID steady-state regimes, engineers lag and rolling-mean features, and applies a Bayesian-optimized Random Forest alongside SARIMA and XGBoost baselines. The best model achieves 13.6% MAPE on the High-End AP segment, outperforming all baselines on peak/trough handling while remaining interpretable.

Resume Bullets

  • Built a multi-model demand forecasting pipeline (Random Forest, SARIMA, XGBoost) on Cisco Wi-Fi AP booking data, applying Bayesian hyperparameter optimization via Optuna across 50 trials to achieve 13.6% MAPE on the High-End AP segment
  • Engineered a COVID regime-filtering strategy that isolates pre-2020 and post-2022 steady-state periods, eliminating structural-break noise and reducing model error by constructing lag-[1,3,12] and rolling-mean features on 54 clean training months
  • Benchmarked four forecasting strategies (Seasonal Naive, Random Forest, SARIMA, XGBoost) across total-volume and segment-level targets, surfacing the bias-variance trade-off between MAPE-optimal (Seasonal Naive) and RMSE-optimal (Random Forest) forecasts

Technical Explanation

The dataset contains monthly Cisco Wi-Fi AP booking units segmented by product family, WiFi generation (Wi-Fi 5, Wi-Fi 6), and market segment (High End AP, Mid End AP, Premium End AP). The time series exhibits a structural break during COVID (FY2021–2022), so those years are excluded and the remaining data is split into two regime chunks — pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) — then concatenated to prevent cross-regime lag artifacts from polluting feature engineering. A RandomForestRegressor is first tuned via RandomizedSearchCV (15 iterations, 3-fold CV) and then further refined using Optuna's Bayesian sampler (50 trials, minimizing RMSE). A complementary XGBoost model (n_estimators=500, lr=0.05, max_depth=4) is developed in a parallel notebook, adding generation-share forecasting with lifecycle features (months_since_launch, End-of-Support date filtering) and per-generation lag/rolling-mean features. The 6-month holdout test window (2024-07 through 2024-12) is used for all final evaluations.

Interview Version

In this project, I forecasted enterprise Wi-Fi Access Point demand for Cisco using historical booking data spanning 2016 to 2024. The main technical challenge was COVID: the pandemic created a structural break in the time series that made standard models perform poorly. My solution was to exclude FY2021 and FY2022 entirely and stitch together the pre- and post-COVID steady-state periods into a single clean training set of 54 months. I then built lag and rolling-mean features — capturing short-term momentum (lag_1, lag_3) and seasonal patterns (lag_12) — and compared four forecasting approaches: Seasonal Naive as a strong baseline, SARIMA for classical time-series modeling, and Random Forest with both RandomizedSearch and Bayesian (Optuna) optimization. The Random Forest with Bayesian tuning achieved 13.6% MAPE on the High-End AP segment. A key insight was that Seasonal Naive had the lowest average MAPE (safest forecast), while Random Forest had lower RMSE (better at capturing demand spikes) — a bias-variance trade-off with real business implications for how conservative or aggressive a planner wants to be.

Why This Project Stands Out

  • Domain realism: The COVID regime-filtering is not a textbook trick — it reflects how practitioners actually handle structural breaks in enterprise demand data
  • Dual-model architecture: Two complementary approaches (total-volume Random Forest in WiFiModel (1).ipynb; generation-share XGBoost in Wireless Forecasting.ipynb) cover different forecasting horizons and business questions
  • Interpretable trade-off analysis: The project explicitly surfaces the MAPE vs. RMSE trade-off rather than optimizing a single metric, which maps directly to business risk tolerance
  • Bayesian optimization: Using Optuna for 50-trial hyperparameter search goes beyond standard grid/random search and demonstrates familiarity with modern AutoML tooling

Key Skills Demonstrated

Statistical & ML

  • Time series feature engineering (lag features, rolling averages, calendar features)
  • Regime detection and structural break handling
  • Bayesian hyperparameter optimization (Optuna)
  • Seasonal decomposition and SARIMA modeling
  • Gradient boosting (XGBoost)

Programming & Tools

  • Python (pandas, numpy, scikit-learn, statsmodels, optuna, xgboost, matplotlib)
  • Jupyter Notebooks / Google Colab
  • Excel data ingestion with openpyxl

Analytical

  • Model comparison and metric interpretation (MAPE vs. RMSE trade-off)
  • Segment-level drill-down (High End AP, Mid End AP, Premium End AP)
  • Generation lifecycle analysis (WiFi 5 → WiFi 6 share shift, EOS date filtering)

Project Outcomes

  • 13.6% MAPE on High End AP segment (Bayesian-optimized Random Forest)
  • 4 forecasting models benchmarked: Seasonal Naive, Moving Average, Random Forest, SARIMA
  • 54 clean training months after COVID regime filtering (vs. ~96 raw months)
  • 6-month holdout window used for all final evaluation (2024-07 to 2024-12)

Context

Academic / collaborative project. Authors: Reid Sendroff and Jawad Hussein.