Machine learning pipeline for forecasting enterprise Wi-Fi AP booking volumes, with COVID-era regime filtering for steady-state dynamics.
Enterprise network hardware demand is notoriously hard to forecast. Booking cycles are long, product generations overlap, and macro disruptions — like the COVID-19 pandemic — can invalidate years of historical patterns overnight. This project builds a forecasting pipeline for Cisco Wi-Fi Access Point bookings that explicitly handles these challenges rather than papering over them.
The central methodological insight is regime filtering: fiscal years 2021 and 2022, when COVID caused structural breaks in both supply and demand, are excluded from the training set. The remaining data is split into two steady-state windows — pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) — and concatenated to produce 54 clean training months with no cross-regime lag artifacts. This produces a training distribution that reflects how the business actually behaves, rather than a distorted average across fundamentally different operating environments.
Four forecasting approaches are benchmarked: Seasonal Naive, Moving Average, SARIMA, and a Bayesian-optimized Random Forest. A complementary notebook develops an XGBoost model that adds generation-share forecasting — tracking the market-share transition from Wi-Fi 5 to Wi-Fi 6 — with generation lifecycle features including End-of-Support date filtering and months-since-launch. The best configuration achieves 13.6% MAPE on the High-End AP segment.
- Structural break detection and regime-based train/test splitting for enterprise time series
- Multi-model forecasting benchmarking with interpretation of the MAPE vs. RMSE trade-off
- Bayesian hyperparameter optimization with Optuna (50 trials, cross-validated RMSE objective)
- Lag feature engineering with leak-free rolling statistics (shifted by 1 period)
- Segment-level drill-down forecasting (High End AP, Mid End AP, Premium End AP)
- Generation-share modeling with lifecycle features (months_since_launch, EOS date masking)
- SARIMA modeling on pre-COVID regime with seasonal differencing
| Method | Type | Use Case |
|---|---|---|
| Seasonal Naive | Baseline | Repeats prior-year month; strong MAPE benchmark |
| Moving Average (3-month) | Baseline | Smoothed trend extrapolation |
| SARIMA (1,1,1)(1,1,1,12) | Classical time series | Captures autocorrelation and seasonality on pre-COVID regime |
| Random Forest (baseline) | Ensemble ML | Feature-based regression with lag/rolling inputs |
| Random Forest + RandomizedSearchCV | Tuned ensemble ML | 15-iteration random search across depth/estimator space |
| Random Forest + Optuna (Bayesian) | Bayesian-optimized ensemble ML | 50-trial smart search; best overall performance |
| XGBoost | Gradient boosting | Total volume + generation-share forecasting in parallel notebook |
Two transformed Excel files are included in this repository:
| File | Size | Contents |
|---|---|---|
WiFi transformed Data - Use First Tab.xlsx |
~156 KB | Monthly totals by WiFi generation and segment; used by Wireless Forecasting.ipynb |
Wifi bookings - transformed.xlsx |
~222 KB | Full booking records by product family, generation, segment, and fiscal period; used by WiFiModel (1).ipynb |
Original source (not included): WiFiBookingsDB.xlsx — proprietary Cisco internal booking database. Cannot be distributed publicly.
Key columns:
Standard Bookings Units— target variable (float, booking units)WiFi Generattion— generation label; note: misspelled in source dataSegment— market tier: "High End AP", "Mid End AP", "Premium End AP"Fiscal Period ID— YYYYMM integer encoding of the Cisco fiscal periodPeriod_YearMonth— derived datetime index
Preprocessing summary:
- COVID years (FY2021, FY2022) and transition years (FY2025, FY2026) removed
- Dataset split into pre-COVID (FY2016–2020) and post-COVID (FY2023–2024) regimes
- Regimes concatenated: 54 total training months after filtering
- Lag features:
lag_1,lag_3,lag_12(built per-regime to avoid cross-regime contamination) - Rolling mean:
rolling_mean_3(3-month window, shifted by 1 to prevent leakage) - Calendar features:
month_num,quarter,is_quarter_end
See data/README.md for full schema and data/DATA_SOURCES.md for download instructions.
Raw Excel data
|
v
[1. Load & clean]
Cast types, decode Fiscal Period ID to datetime, strip whitespace
|
v
[2. Regime filtering]
Remove FY2021-2022 (COVID) and FY2025-2026 (future/transition)
Split into Regime 1 (FY<=2020) and Regime 2 (FY>=2023)
|
v
[3. Feature engineering (per-regime)]
lag_1, lag_3, lag_12
rolling_mean_3 (shifted 1)
month_num, quarter, is_quarter_end
|
v
[4. Concatenate regimes]
54 clean training months; no cross-regime lag artifacts
|
v
[5. Baseline benchmarking]
Seasonal Naive, Moving Average, SARIMA, RF (n_estimators=100)
|
v
[6. Hyperparameter tuning]
RandomizedSearchCV (15 iter) then Optuna Bayesian (50 trials)
|
v
[7. Segment-level evaluation]
High End AP drill-down; 13.6% MAPE
|
v
[8. Generation-share modeling (Wireless Forecasting.ipynb)]
XGBoost; lifecycle features; EOS filtering; share renormalization
Fiscal years 2021 and 2022 are excluded because COVID disruptions created a distribution shift that does not reflect steady-state booking behavior. Training through this period would either inflate variance estimates or require explicit intervention modeling. The cleaner solution — used here — is to treat pre- and post-COVID as two samples of the same underlying process and concatenate them after independent feature engineering.
Lag features are built within each regime before concatenation to prevent the lag at the regime boundary from crossing the COVID gap:
# Applied independently to df_r1 (pre-COVID) and df_r2 (post-COVID)
df['lag_1'] = df['Standard Bookings Units'].shift(1)
df['lag_3'] = df['Standard Bookings Units'].shift(3)
df['lag_12'] = df['Standard Bookings Units'].shift(12)
df['rolling_mean_3'] = df['Standard Bookings Units'].shift(1).rolling(3).mean()| Parameter | Range / Choices |
|---|---|
n_estimators |
100 – 1000 |
max_depth |
3 – 30 |
min_samples_split |
2 – 10 |
min_samples_leaf |
1 – 5 |
max_features |
'sqrt', 'log2', None |
bootstrap |
True, False |
Objective: minimize negative MSE via 3-fold cross-validation over 50 Optuna trials.
After forecasting total volume, Wireless Forecasting.ipynb forecasts each generation's share of that total. Key features:
months_since_launch— time since first non-zero booking month per generationshare_lag1— prior month's share, computed per generationma_share_3— 3-month rolling share average, shifted by 1- End-of-Support filter: Wi-Fi 5 rows after 2023-04-01 masked as inactive
- Predictions clipped to [0, 1] and renormalized to sum to 1.0 per month
Test window: 2024-07-01 through 2024-12-01 (6 months)
Training months: 54 (post-regime filtering)
| Model | Segment | MAPE |
|---|---|---|
| Bayesian-optimized Random Forest | High End AP | 13.6% |
| Seasonal Naive | Total volume | TODO: extract from Cell 25 metrics table |
| Random Forest (baseline) | Total volume | TODO: extract from Cell 9 metrics table |
| SARIMA (1,1,1)(1,1,1,12) | Total volume | TODO: extract from Cell 25 metrics table |
TODO: Run
WiFiModel (1).ipynbto completion and read the styled metrics DataFrames in Cells 9, 25, and 26 to fill in exact MAE, RMSE, and MAPE values for all models.
Key interpretive finding: Seasonal Naive consistently achieves the lowest average MAPE — it is the safest forecast in the sense of minimizing typical deviation. The Bayesian-optimized Random Forest achieves lower RMSE, meaning it handles demand spikes and troughs better, but at the cost of higher average error. The right model depends on whether planners need to be right on average (Naive) or right on the peaks (Random Forest).
The following plots are produced by the notebooks. Export instructions and plt.savefig(...) snippets are in images/README.md.
images/total_bookings_over_time.png— Full time series (2016–2024); COVID dip visibleimages/baseline_forecast_comparison.png— Seasonal Naive vs. Random Forest vs. actualsimages/optimized_6month_forecast.png— Bayesian RF with COVID gap shading; 6-month holdoutimages/high_end_ap_segment_forecast.png— High End AP drill-down; 13.6% MAPE resultimages/total_and_generations_quarterly.png— Wi-Fi 5 to Wi-Fi 6 generational share shiftimages/xgboost_total_forecast.png— XGBoost 6-month validation forecast
All images pending export. See
images/README.md.
WiFi-Forecasting-Project/
├── WiFiModel (1).ipynb # Main: RF + SARIMA + Bayesian optimization
├── WiFiModel (3).ipynb # Copy of WiFiModel (1).ipynb
├── Wireless Forecasting.ipynb # XGBoost total + generation-share model
├── WiFi transformed Data - Use First Tab.xlsx # Dataset for Wireless Forecasting.ipynb (~156 KB)
├── Wifi bookings - transformed.xlsx # Dataset for WiFiModel (1).ipynb (~222 KB)
├── requirements.txt # Pinned library dependencies
├── .gitignore # Python/Jupyter ignores
├── README.md # This file
├── PROJECT_SUMMARY.md # Portfolio elevator pitch and resume bullets
├── data/
│ ├── README.md # Dataset schemas and preprocessing notes
│ └── DATA_SOURCES.md # Download instructions and file status
└── images/
└── README.md # Visualization catalog and export instructions
- Python 3.9+
- Google Colab account (notebooks use Google Drive for data loading) or local Jupyter with data files in the project root
git clone https://github.com/reidsendroff/WiFi-Forecasting-Project.git
cd WiFi-Forecasting-Project
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt- Update the
DATA_PATHvariable in each notebook to point to the local.xlsxfile. - Launch Jupyter:
jupyter notebook
- Open notebooks in this order:
WiFiModel (1).ipynb— main pipeline (recommended starting point)Wireless Forecasting.ipynb— XGBoost + generation-share model
- Upload the
.xlsxdata files to your Google Drive under a shared folder. - Update the
DATA_PATH/drive.mount(...)paths in each notebook to match your Drive structure. - Open each notebook in Colab and run all cells top-to-bottom.
Mathematical / Statistical
- Time series stationarity, seasonal differencing (SARIMA)
- Regime detection and structural break handling
- Cross-validated hyperparameter search (MAPE, RMSE, MAE)
- Bayesian optimization with tree-structured Parzen estimators (Optuna)
- Ensemble methods: Random Forest bootstrap aggregation
- Gradient boosting (XGBoost)
Programming & Tools
- Python: pandas, numpy, matplotlib, scikit-learn, statsmodels, optuna, xgboost
- Jupyter Notebooks / Google Colab
- Excel ingestion with openpyxl
Workflow
- Multi-notebook project organization
- Reproducible, leak-free feature engineering
- Metric-driven model selection with business-level interpretation
- Git version control
Collaborative academic project. Authors: Reid Sendroff and Jawad Hussein.
Enterprise hardware demand forecasting directly affects inventory planning, sales target setting, and supply chain decisions. A model that underestimates demand during a product-generation transition can leave customers waiting weeks for backlogged hardware; one that overestimates leaves inventory sitting in warehouses. The regime-filtering approach developed here is a transferable pattern for any demand forecasting problem where a known macro-event creates a structural break — applicable to automotive, semiconductor, cloud infrastructure, and other long-cycle hardware markets.
- Export all six visualizations to
images/and embed them in this README - Fill in exact MAE, RMSE, and MAPE values for all models (see Results section TODOs)
- Add Meraki vs. Cisco brand-level split to control for the mixed-product confound noted in the notebooks
- Extend generation-share model to Wi-Fi 6E and Wi-Fi 7 as data becomes available
- Add SHAP feature importance analysis to explain which lags drive the Random Forest predictions
- Containerize with a
Dockerfileso the pipeline runs without Google Drive - Rename notebooks with descriptive, version-safe filenames (no spaces, no parentheses)
- Write unit tests for lag/rolling feature engineering and regime-split logic
Reid Sendroff and Jawad Hussein
Forecasting enterprise Wi-Fi demand by treating COVID not as noise to average through, but as a structural break to excise — leaving two clean samples of steady-state dynamics that a well-tuned Random Forest can actually learn from.