Skip to content

reidsendroff/WiFi-Forecasting-Project

Repository files navigation

Cisco Wi-Fi Access Point Demand Forecasting

Machine learning pipeline for forecasting enterprise Wi-Fi AP booking volumes, with COVID-era regime filtering for steady-state dynamics.

Python pandas scikit-learn statsmodels optuna XGBoost Jupyter Last Commit


Overview

Enterprise network hardware demand is notoriously hard to forecast. Booking cycles are long, product generations overlap, and macro disruptions — like the COVID-19 pandemic — can invalidate years of historical patterns overnight. This project builds a forecasting pipeline for Cisco Wi-Fi Access Point bookings that explicitly handles these challenges rather than papering over them.

The central methodological insight is regime filtering: fiscal years 2021 and 2022, when COVID caused structural breaks in both supply and demand, are excluded from the training set. The remaining data is split into two steady-state windows — pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) — and concatenated to produce 54 clean training months with no cross-regime lag artifacts. This produces a training distribution that reflects how the business actually behaves, rather than a distorted average across fundamentally different operating environments.

Four forecasting approaches are benchmarked: Seasonal Naive, Moving Average, SARIMA, and a Bayesian-optimized Random Forest. A complementary notebook develops an XGBoost model that adds generation-share forecasting — tracking the market-share transition from Wi-Fi 5 to Wi-Fi 6 — with generation lifecycle features including End-of-Support date filtering and months-since-launch. The best configuration achieves 13.6% MAPE on the High-End AP segment.


What This Project Demonstrates

  • Structural break detection and regime-based train/test splitting for enterprise time series
  • Multi-model forecasting benchmarking with interpretation of the MAPE vs. RMSE trade-off
  • Bayesian hyperparameter optimization with Optuna (50 trials, cross-validated RMSE objective)
  • Lag feature engineering with leak-free rolling statistics (shifted by 1 period)
  • Segment-level drill-down forecasting (High End AP, Mid End AP, Premium End AP)
  • Generation-share modeling with lifecycle features (months_since_launch, EOS date masking)
  • SARIMA modeling on pre-COVID regime with seasonal differencing

Methods Used

Method Type Use Case
Seasonal Naive Baseline Repeats prior-year month; strong MAPE benchmark
Moving Average (3-month) Baseline Smoothed trend extrapolation
SARIMA (1,1,1)(1,1,1,12) Classical time series Captures autocorrelation and seasonality on pre-COVID regime
Random Forest (baseline) Ensemble ML Feature-based regression with lag/rolling inputs
Random Forest + RandomizedSearchCV Tuned ensemble ML 15-iteration random search across depth/estimator space
Random Forest + Optuna (Bayesian) Bayesian-optimized ensemble ML 50-trial smart search; best overall performance
XGBoost Gradient boosting Total volume + generation-share forecasting in parallel notebook

Datasets / Inputs

Two transformed Excel files are included in this repository:

File Size Contents
WiFi transformed Data - Use First Tab.xlsx ~156 KB Monthly totals by WiFi generation and segment; used by Wireless Forecasting.ipynb
Wifi bookings - transformed.xlsx ~222 KB Full booking records by product family, generation, segment, and fiscal period; used by WiFiModel (1).ipynb

Original source (not included): WiFiBookingsDB.xlsx — proprietary Cisco internal booking database. Cannot be distributed publicly.

Key columns:

  • Standard Bookings Units — target variable (float, booking units)
  • WiFi Generattion — generation label; note: misspelled in source data
  • Segment — market tier: "High End AP", "Mid End AP", "Premium End AP"
  • Fiscal Period ID — YYYYMM integer encoding of the Cisco fiscal period
  • Period_YearMonth — derived datetime index

Preprocessing summary:

  • COVID years (FY2021, FY2022) and transition years (FY2025, FY2026) removed
  • Dataset split into pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) regimes
  • Regimes concatenated: 54 total training months after filtering
  • Lag features: lag_1, lag_3, lag_12 (built per-regime to avoid cross-regime contamination)
  • Rolling mean: rolling_mean_3 (3-month window, shifted by 1 to prevent leakage)
  • Calendar features: month_num, quarter, is_quarter_end

See data/README.md for full schema and data/DATA_SOURCES.md for download instructions.


Key Technical Steps

Raw Excel data
      |
      v
[1. Load & clean]
 Cast types, decode Fiscal Period ID to datetime, strip whitespace
      |
      v
[2. Regime filtering]
 Remove FY2021-2022 (COVID) and FY2025-2026 (future/transition)
 Split into Regime 1 (FY<=2020) and Regime 2 (FY>=2023)
      |
      v
[3. Feature engineering (per-regime)]
 lag_1, lag_3, lag_12
 rolling_mean_3 (shifted 1)
 month_num, quarter, is_quarter_end
      |
      v
[4. Concatenate regimes]
 54 clean training months; no cross-regime lag artifacts
      |
      v
[5. Baseline benchmarking]
 Seasonal Naive, Moving Average, SARIMA, RF (n_estimators=100)
      |
      v
[6. Hyperparameter tuning]
 RandomizedSearchCV (15 iter) then Optuna Bayesian (50 trials)
      |
      v
[7. Segment-level evaluation]
 High End AP drill-down; 13.6% MAPE
      |
      v
[8. Generation-share modeling (Wireless Forecasting.ipynb)]
 XGBoost; lifecycle features; EOS filtering; share renormalization

Step 2: Regime Filtering — Rationale

Fiscal years 2021 and 2022 are excluded because COVID disruptions created a distribution shift that does not reflect steady-state booking behavior. Training through this period would either inflate variance estimates or require explicit intervention modeling. The cleaner solution — used here — is to treat pre- and post-COVID as two samples of the same underlying process and concatenate them after independent feature engineering.

Step 3: Lag Feature Engineering

Lag features are built within each regime before concatenation to prevent the lag at the regime boundary from crossing the COVID gap:

# Applied independently to df_r1 (pre-COVID) and df_r2 (post-COVID)
df['lag_1']  = df['Standard Bookings Units'].shift(1)
df['lag_3']  = df['Standard Bookings Units'].shift(3)
df['lag_12'] = df['Standard Bookings Units'].shift(12)
df['rolling_mean_3'] = df['Standard Bookings Units'].shift(1).rolling(3).mean()

Step 6: Bayesian Optimization Search Space

Parameter Range / Choices
n_estimators 100 – 1000
max_depth 3 – 30
min_samples_split 2 – 10
min_samples_leaf 1 – 5
max_features 'sqrt', 'log2', None
bootstrap True, False

Objective: minimize negative MSE via 3-fold cross-validation over 50 Optuna trials.

Step 8: Generation-Share Modeling (XGBoost)

After forecasting total volume, Wireless Forecasting.ipynb forecasts each generation's share of that total. Key features:

  • months_since_launch — time since first non-zero booking month per generation
  • share_lag1 — prior month's share, computed per generation
  • ma_share_3 — 3-month rolling share average, shifted by 1
  • End-of-Support filter: Wi-Fi 5 rows after 2023-04-01 masked as inactive
  • Predictions clipped to [0, 1] and renormalized to sum to 1.0 per month

Results and Interpretation

Test window: 2024-07-01 through 2024-12-01 (6 months)
Training months: 54 (post-regime filtering)

Model Segment MAPE
Bayesian-optimized Random Forest High End AP 13.6%
Seasonal Naive Total volume TODO: extract from Cell 25 metrics table
Random Forest (baseline) Total volume TODO: extract from Cell 9 metrics table
SARIMA (1,1,1)(1,1,1,12) Total volume TODO: extract from Cell 25 metrics table

TODO: Run WiFiModel (1).ipynb to completion and read the styled metrics DataFrames in Cells 9, 25, and 26 to fill in exact MAE, RMSE, and MAPE values for all models.

Key interpretive finding: Seasonal Naive consistently achieves the lowest average MAPE — it is the safest forecast in the sense of minimizing typical deviation. The Bayesian-optimized Random Forest achieves lower RMSE, meaning it handles demand spikes and troughs better, but at the cost of higher average error. The right model depends on whether planners need to be right on average (Naive) or right on the peaks (Random Forest).


Example Visualizations

The following plots are produced by the notebooks. Export instructions and plt.savefig(...) snippets are in images/README.md.

  • images/total_bookings_over_time.png — Full time series (2016–2024); COVID dip visible
  • images/baseline_forecast_comparison.png — Seasonal Naive vs. Random Forest vs. actuals
  • images/optimized_6month_forecast.png — Bayesian RF with COVID gap shading; 6-month holdout
  • images/high_end_ap_segment_forecast.png — High End AP drill-down; 13.6% MAPE result
  • images/total_and_generations_quarterly.png — Wi-Fi 5 to Wi-Fi 6 generational share shift
  • images/xgboost_total_forecast.png — XGBoost 6-month validation forecast

All images pending export. See images/README.md.


Repository Structure

WiFi-Forecasting-Project/
├── WiFiModel (1).ipynb                        # Main: RF + SARIMA + Bayesian optimization
├── WiFiModel (3).ipynb                        # Copy of WiFiModel (1).ipynb
├── Wireless Forecasting.ipynb                 # XGBoost total + generation-share model
├── WiFi transformed Data - Use First Tab.xlsx # Dataset for Wireless Forecasting.ipynb (~156 KB)
├── Wifi bookings - transformed.xlsx           # Dataset for WiFiModel (1).ipynb (~222 KB)
├── requirements.txt                           # Pinned library dependencies
├── .gitignore                                 # Python/Jupyter ignores
├── README.md                                  # This file
├── PROJECT_SUMMARY.md                         # Portfolio elevator pitch and resume bullets
├── data/
│   ├── README.md                              # Dataset schemas and preprocessing notes
│   └── DATA_SOURCES.md                        # Download instructions and file status
└── images/
    └── README.md                              # Visualization catalog and export instructions

How to Run

Prerequisites

  • Python 3.9+
  • Google Colab account (notebooks use Google Drive for data loading) or local Jupyter with data files in the project root

Installation

git clone https://github.com/reidsendroff/WiFi-Forecasting-Project.git
cd WiFi-Forecasting-Project
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Running locally (Jupyter)

  1. Update the DATA_PATH variable in each notebook to point to the local .xlsx file.
  2. Launch Jupyter:
    jupyter notebook
  3. Open notebooks in this order:
    • WiFiModel (1).ipynb — main pipeline (recommended starting point)
    • Wireless Forecasting.ipynb — XGBoost + generation-share model

Running on Google Colab

  1. Upload the .xlsx data files to your Google Drive under a shared folder.
  2. Update the DATA_PATH / drive.mount(...) paths in each notebook to match your Drive structure.
  3. Open each notebook in Colab and run all cells top-to-bottom.

Skills Demonstrated

Mathematical / Statistical

  • Time series stationarity, seasonal differencing (SARIMA)
  • Regime detection and structural break handling
  • Cross-validated hyperparameter search (MAPE, RMSE, MAE)
  • Bayesian optimization with tree-structured Parzen estimators (Optuna)
  • Ensemble methods: Random Forest bootstrap aggregation
  • Gradient boosting (XGBoost)

Programming & Tools

  • Python: pandas, numpy, matplotlib, scikit-learn, statsmodels, optuna, xgboost
  • Jupyter Notebooks / Google Colab
  • Excel ingestion with openpyxl

Workflow

  • Multi-notebook project organization
  • Reproducible, leak-free feature engineering
  • Metric-driven model selection with business-level interpretation
  • Git version control

Project Context

Collaborative academic project. Authors: Reid Sendroff and Jawad Hussein.


Why This Matters

Enterprise hardware demand forecasting directly affects inventory planning, sales target setting, and supply chain decisions. A model that underestimates demand during a product-generation transition can leave customers waiting weeks for backlogged hardware; one that overestimates leaves inventory sitting in warehouses. The regime-filtering approach developed here is a transferable pattern for any demand forecasting problem where a known macro-event creates a structural break — applicable to automotive, semiconductor, cloud infrastructure, and other long-cycle hardware markets.


Future Improvements

  • Export all six visualizations to images/ and embed them in this README
  • Fill in exact MAE, RMSE, and MAPE values for all models (see Results section TODOs)
  • Add Meraki vs. Cisco brand-level split to control for the mixed-product confound noted in the notebooks
  • Extend generation-share model to Wi-Fi 6E and Wi-Fi 7 as data becomes available
  • Add SHAP feature importance analysis to explain which lags drive the Random Forest predictions
  • Containerize with a Dockerfile so the pipeline runs without Google Drive
  • Rename notebooks with descriptive, version-safe filenames (no spaces, no parentheses)
  • Write unit tests for lag/rolling feature engineering and regime-split logic

Author

Reid Sendroff and Jawad Hussein


Forecasting enterprise Wi-Fi demand by treating COVID not as noise to average through, but as a structural break to excise — leaving two clean samples of steady-state dynamics that a well-tuned Random Forest can actually learn from.

About

Cisco Wi-Fi AP demand forecasting with Random Forest, XGBoost, and SARIMA; COVID regime-filtering achieves 13.6% MAPE

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors