Cisco Wi-Fi Access Point Demand Forecasting

Machine learning pipeline for forecasting enterprise Wi-Fi AP booking volumes, with COVID-era regime filtering for steady-state dynamics.

Overview

Enterprise network hardware demand is notoriously hard to forecast. Booking cycles are long, product generations overlap, and macro disruptions — like the COVID-19 pandemic — can invalidate years of historical patterns overnight. This project builds a forecasting pipeline for Cisco Wi-Fi Access Point bookings that explicitly handles these challenges rather than papering over them.

The central methodological insight is regime filtering: fiscal years 2021 and 2022, when COVID caused structural breaks in both supply and demand, are excluded from the training set. The remaining data is split into two steady-state windows — pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) — and concatenated to produce 54 clean training months with no cross-regime lag artifacts. This produces a training distribution that reflects how the business actually behaves, rather than a distorted average across fundamentally different operating environments.

Four forecasting approaches are benchmarked: Seasonal Naive, Moving Average, SARIMA, and a Bayesian-optimized Random Forest. A complementary notebook develops an XGBoost model that adds generation-share forecasting — tracking the market-share transition from Wi-Fi 5 to Wi-Fi 6 — with generation lifecycle features including End-of-Support date filtering and months-since-launch. The best configuration achieves 13.6% MAPE on the High-End AP segment.

What This Project Demonstrates

Structural break detection and regime-based train/test splitting for enterprise time series
Multi-model forecasting benchmarking with interpretation of the MAPE vs. RMSE trade-off
Bayesian hyperparameter optimization with Optuna (50 trials, cross-validated RMSE objective)
Lag feature engineering with leak-free rolling statistics (shifted by 1 period)
Segment-level drill-down forecasting (High End AP, Mid End AP, Premium End AP)
Generation-share modeling with lifecycle features (months_since_launch, EOS date masking)
SARIMA modeling on pre-COVID regime with seasonal differencing

Methods Used

Method	Type	Use Case
Seasonal Naive	Baseline	Repeats prior-year month; strong MAPE benchmark
Moving Average (3-month)	Baseline	Smoothed trend extrapolation
SARIMA (1,1,1)(1,1,1,12)	Classical time series	Captures autocorrelation and seasonality on pre-COVID regime
Random Forest (baseline)	Ensemble ML	Feature-based regression with lag/rolling inputs
Random Forest + RandomizedSearchCV	Tuned ensemble ML	15-iteration random search across depth/estimator space
Random Forest + Optuna (Bayesian)	Bayesian-optimized ensemble ML	50-trial smart search; best overall performance
XGBoost	Gradient boosting	Total volume + generation-share forecasting in parallel notebook

Datasets / Inputs

Two transformed Excel files are included in this repository:

File	Size	Contents
`WiFi transformed Data - Use First Tab.xlsx`	~156 KB	Monthly totals by WiFi generation and segment; used by `Wireless Forecasting.ipynb`
`Wifi bookings - transformed.xlsx`	~222 KB	Full booking records by product family, generation, segment, and fiscal period; used by `WiFiModel (1).ipynb`

Original source (not included): WiFiBookingsDB.xlsx — proprietary Cisco internal booking database. Cannot be distributed publicly.

Key columns:

Standard Bookings Units — target variable (float, booking units)
WiFi Generattion — generation label; note: misspelled in source data
Segment — market tier: "High End AP", "Mid End AP", "Premium End AP"
Fiscal Period ID — YYYYMM integer encoding of the Cisco fiscal period
Period_YearMonth — derived datetime index

Preprocessing summary:

COVID years (FY2021, FY2022) and transition years (FY2025, FY2026) removed
Dataset split into pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) regimes
Regimes concatenated: 54 total training months after filtering
Lag features: lag_1, lag_3, lag_12 (built per-regime to avoid cross-regime contamination)
Rolling mean: rolling_mean_3 (3-month window, shifted by 1 to prevent leakage)
Calendar features: month_num, quarter, is_quarter_end

See data/README.md for full schema and data/DATA_SOURCES.md for download instructions.

Key Technical Steps

Raw Excel data
      |
      v
[1. Load & clean]
 Cast types, decode Fiscal Period ID to datetime, strip whitespace
      |
      v
[2. Regime filtering]
 Remove FY2021-2022 (COVID) and FY2025-2026 (future/transition)
 Split into Regime 1 (FY<=2020) and Regime 2 (FY>=2023)
      |
      v
[3. Feature engineering (per-regime)]
 lag_1, lag_3, lag_12
 rolling_mean_3 (shifted 1)
 month_num, quarter, is_quarter_end
      |
      v
[4. Concatenate regimes]
 54 clean training months; no cross-regime lag artifacts
      |
      v
[5. Baseline benchmarking]
 Seasonal Naive, Moving Average, SARIMA, RF (n_estimators=100)
      |
      v
[6. Hyperparameter tuning]
 RandomizedSearchCV (15 iter) then Optuna Bayesian (50 trials)
      |
      v
[7. Segment-level evaluation]
 High End AP drill-down; 13.6% MAPE
      |
      v
[8. Generation-share modeling (Wireless Forecasting.ipynb)]
 XGBoost; lifecycle features; EOS filtering; share renormalization

Step 2: Regime Filtering — Rationale

Fiscal years 2021 and 2022 are excluded because COVID disruptions created a distribution shift that does not reflect steady-state booking behavior. Training through this period would either inflate variance estimates or require explicit intervention modeling. The cleaner solution — used here — is to treat pre- and post-COVID as two samples of the same underlying process and concatenate them after independent feature engineering.

Step 3: Lag Feature Engineering

Lag features are built within each regime before concatenation to prevent the lag at the regime boundary from crossing the COVID gap:

# Applied independently to df_r1 (pre-COVID) and df_r2 (post-COVID)
df['lag_1']  = df['Standard Bookings Units'].shift(1)
df['lag_3']  = df['Standard Bookings Units'].shift(3)
df['lag_12'] = df['Standard Bookings Units'].shift(12)
df['rolling_mean_3'] = df['Standard Bookings Units'].shift(1).rolling(3).mean()

Step 6: Bayesian Optimization Search Space

Parameter	Range / Choices
`n_estimators`	100 – 1000
`max_depth`	3 – 30
`min_samples_split`	2 – 10
`min_samples_leaf`	1 – 5
`max_features`	`'sqrt'`, `'log2'`, `None`
`bootstrap`	`True`, `False`

Objective: minimize negative MSE via 3-fold cross-validation over 50 Optuna trials.

Step 8: Generation-Share Modeling (XGBoost)

After forecasting total volume, Wireless Forecasting.ipynb forecasts each generation's share of that total. Key features:

months_since_launch — time since first non-zero booking month per generation
share_lag1 — prior month's share, computed per generation
ma_share_3 — 3-month rolling share average, shifted by 1
End-of-Support filter: Wi-Fi 5 rows after 2023-04-01 masked as inactive
Predictions clipped to [0, 1] and renormalized to sum to 1.0 per month

Results and Interpretation

Test window: 2024-07-01 through 2024-12-01 (6 months)
Training months: 54 (post-regime filtering)

Model	Segment	MAPE
Bayesian-optimized Random Forest	High End AP	13.6%
Seasonal Naive	Total volume	TODO: extract from Cell 25 metrics table
Random Forest (baseline)	Total volume	TODO: extract from Cell 9 metrics table
SARIMA (1,1,1)(1,1,1,12)	Total volume	TODO: extract from Cell 25 metrics table

TODO: Run WiFiModel (1).ipynb to completion and read the styled metrics DataFrames in Cells 9, 25, and 26 to fill in exact MAE, RMSE, and MAPE values for all models.

Key interpretive finding: Seasonal Naive consistently achieves the lowest average MAPE — it is the safest forecast in the sense of minimizing typical deviation. The Bayesian-optimized Random Forest achieves lower RMSE, meaning it handles demand spikes and troughs better, but at the cost of higher average error. The right model depends on whether planners need to be right on average (Naive) or right on the peaks (Random Forest).

Example Visualizations

The following plots are produced by the notebooks. Export instructions and plt.savefig(...) snippets are in images/README.md.

images/total_bookings_over_time.png — Full time series (2016–2024); COVID dip visible
images/baseline_forecast_comparison.png — Seasonal Naive vs. Random Forest vs. actuals
images/optimized_6month_forecast.png — Bayesian RF with COVID gap shading; 6-month holdout
images/high_end_ap_segment_forecast.png — High End AP drill-down; 13.6% MAPE result
images/total_and_generations_quarterly.png — Wi-Fi 5 to Wi-Fi 6 generational share shift
images/xgboost_total_forecast.png — XGBoost 6-month validation forecast

All images pending export. See images/README.md.

Repository Structure

WiFi-Forecasting-Project/
├── WiFiModel (1).ipynb                        # Main: RF + SARIMA + Bayesian optimization
├── WiFiModel (3).ipynb                        # Copy of WiFiModel (1).ipynb
├── Wireless Forecasting.ipynb                 # XGBoost total + generation-share model
├── WiFi transformed Data - Use First Tab.xlsx # Dataset for Wireless Forecasting.ipynb (~156 KB)
├── Wifi bookings - transformed.xlsx           # Dataset for WiFiModel (1).ipynb (~222 KB)
├── requirements.txt                           # Pinned library dependencies
├── .gitignore                                 # Python/Jupyter ignores
├── README.md                                  # This file
├── PROJECT_SUMMARY.md                         # Portfolio elevator pitch and resume bullets
├── data/
│   ├── README.md                              # Dataset schemas and preprocessing notes
│   └── DATA_SOURCES.md                        # Download instructions and file status
└── images/
    └── README.md                              # Visualization catalog and export instructions

How to Run

Prerequisites

Python 3.9+
Google Colab account (notebooks use Google Drive for data loading) or local Jupyter with data files in the project root

Installation

git clone https://github.com/reidsendroff/WiFi-Forecasting-Project.git
cd WiFi-Forecasting-Project
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Running locally (Jupyter)

Update the DATA_PATH variable in each notebook to point to the local .xlsx file.
Launch Jupyter:
```
jupyter notebook
```
Open notebooks in this order:
- WiFiModel (1).ipynb — main pipeline (recommended starting point)
- Wireless Forecasting.ipynb — XGBoost + generation-share model

Running on Google Colab

Upload the .xlsx data files to your Google Drive under a shared folder.
Update the DATA_PATH / drive.mount(...) paths in each notebook to match your Drive structure.
Open each notebook in Colab and run all cells top-to-bottom.

Skills Demonstrated

Mathematical / Statistical

Time series stationarity, seasonal differencing (SARIMA)
Regime detection and structural break handling
Cross-validated hyperparameter search (MAPE, RMSE, MAE)
Bayesian optimization with tree-structured Parzen estimators (Optuna)
Ensemble methods: Random Forest bootstrap aggregation
Gradient boosting (XGBoost)

Programming & Tools

Python: pandas, numpy, matplotlib, scikit-learn, statsmodels, optuna, xgboost
Jupyter Notebooks / Google Colab
Excel ingestion with openpyxl

Workflow

Multi-notebook project organization
Reproducible, leak-free feature engineering
Metric-driven model selection with business-level interpretation
Git version control

Project Context

Collaborative academic project. Authors: Reid Sendroff and Jawad Hussein.

Why This Matters

Enterprise hardware demand forecasting directly affects inventory planning, sales target setting, and supply chain decisions. A model that underestimates demand during a product-generation transition can leave customers waiting weeks for backlogged hardware; one that overestimates leaves inventory sitting in warehouses. The regime-filtering approach developed here is a transferable pattern for any demand forecasting problem where a known macro-event creates a structural break — applicable to automotive, semiconductor, cloud infrastructure, and other long-cycle hardware markets.

Future Improvements

Export all six visualizations to images/ and embed them in this README
Fill in exact MAE, RMSE, and MAPE values for all models (see Results section TODOs)
Add Meraki vs. Cisco brand-level split to control for the mixed-product confound noted in the notebooks
Extend generation-share model to Wi-Fi 6E and Wi-Fi 7 as data becomes available
Add SHAP feature importance analysis to explain which lags drive the Random Forest predictions
Containerize with a Dockerfile so the pipeline runs without Google Drive
Rename notebooks with descriptive, version-safe filenames (no spaces, no parentheses)
Write unit tests for lag/rolling feature engineering and regime-split logic

Author

Reid Sendroff and Jawad Hussein

Forecasting enterprise Wi-Fi demand by treating COVID not as noise to average through, but as a structural break to excise — leaving two clean samples of steady-state dynamics that a well-tuned Random Forest can actually learn from.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cisco Wi-Fi Access Point Demand Forecasting

Overview

What This Project Demonstrates

Methods Used

Datasets / Inputs

Key Technical Steps

Step 2: Regime Filtering — Rationale

Step 3: Lag Feature Engineering

Step 6: Bayesian Optimization Search Space

Step 8: Generation-Share Modeling (XGBoost)

Results and Interpretation

Example Visualizations

Repository Structure

How to Run

Prerequisites

Installation

Running locally (Jupyter)

Running on Google Colab

Skills Demonstrated

Project Context

Why This Matters

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
images		images
.gitignore		.gitignore
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
WiFi transformed Data - Use First Tab.xlsx		WiFi transformed Data - Use First Tab.xlsx
WiFiModel (1).ipynb		WiFiModel (1).ipynb
WiFiModel (3).ipynb		WiFiModel (3).ipynb
Wifi bookings - transformed.xlsx		Wifi bookings - transformed.xlsx
Wireless Forecasting.ipynb		Wireless Forecasting.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Cisco Wi-Fi Access Point Demand Forecasting

Overview

What This Project Demonstrates

Methods Used

Datasets / Inputs

Key Technical Steps

Step 2: Regime Filtering — Rationale

Step 3: Lag Feature Engineering

Step 6: Bayesian Optimization Search Space

Step 8: Generation-Share Modeling (XGBoost)

Results and Interpretation

Example Visualizations

Repository Structure

How to Run

Prerequisites

Installation

Running locally (Jupyter)

Running on Google Colab

Skills Demonstrated

Project Context

Why This Matters

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages