This folder is the reproducibility section for the paper. It contains all data, configurations, code, and the pinned environment needed to reproduce the published experimental results.
Package assembled: May 28, 2026 with the pinned package versions in experiments/requirements-experiments.txt.
First, clone the repository:
git clone https://github.com/StatMixedML/Hyper-Trees.git
cd Hyper-TreesWe use uv as the package manager. Install it first if you don't already have it:
pip install uvOur paper runs used uv 0.8.12.
In the project's top-level folder (the Hyper-Trees/ folder you cloned into), create a Python 3.11 venv and install the pinned experiments environment:
# Create a Python 3.11.0 venv (must be exactly 3.11.0 to match the pinned environment)
uv venv --python 3.11.0
# Activate the venv
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate
# Install all dependencies (including transitive) at exact pinned versions
uv pip install -r experiments/requirements-experiments.txt --index-strategy unsafe-best-matchexperiments/requirements-experiments.txt pins every dependency, including transitive ones, to the exact versions used in our paper experiments, ensuring reproducibility.
All experiments in the paper were run on GPU using PyTorch 2.1.1 with CUDA 11.8. GPU is required for the deep learning baselines (DeepAR, TFT) and was also used for training Hyper-TreeNet models. The requirements-experiments.txt file pins torch==2.1.1+cu118 and declares --extra-index-url https://download.pytorch.org/whl/cu118, so the CUDA build is pulled in automatically.
CPU-only environments: the Hyper-Tree, Hyper-TreeNet, LightGBM, and classical baselines will still run on CPU. The deep-learning baselines (DeepAR, TFT) technically fall back to CPU via PyTorch but become impractically slow for the larger datasets; a CUDA-11.8-compatible GPU is strongly recommended for a full end-to-end reproducibility run.
⚠️ Important:
- Using any versions other than those pinned in
requirements-experiments.txt(including Python, CUDA and PyTorch) will produce different results.- Do not install
hypertrees-forecastingfrom PyPI or GitHub alongside this environment. The pinnedrequirements-experiments.txtalready includes the exact version of the package and all its dependencies at fixed versions. Installing from PyPI or GitHub would pull in newer (unpinned) transitive dependencies, breaking version consistency and reproducibility.- If you already have
hypertrees-forecastinginstalled from PyPI or GitHub, uninstall it first before setting up the experiments environment:Then proceed with the reproducible installation above.uv pip uninstall hypertrees-forecasting
All experiments in the paper were conducted with the following specifications:
- OS: Windows 11
- CPU: 13th Gen Intel(R) Core(TM) i9-13900H (14 cores)
- RAM: 64 GB
- GPU: NVIDIA RTX 3500 Ada Generation Laptop GPU (12 GB memory)
The single entry point is experiments/Reproduce.ipynb
Open experiments/Reproduce.ipynb in Jupyter, VS Code, or PyCharm and run all cells. This reproduces every table, figure, and ablation in the paper (global, local, Rossmann A1-A11, embedding-dimension ablation, paper figures, and the final metrics tables). When the run finishes, every metrics table and every paper figure is rendered inline at the bottom of the notebook.
A full run takes approximately 4 hours 21 minutes on the paper hardware, broken down as:
| Stage | Runtime |
|---|---|
| Global Hyper-Trees | 21.09 min |
| Rossmann ablations (A1-A11) | 45.87 min |
| Embedding-dimension ablation | 22.12 min |
| Local Hyper-Trees | 14.97 min |
| Global LightGBM | 4.24 min |
| Local LightGBM | 8.64 min |
| Global Deep Learning | 104.75 min |
| Global ETS | 2.47 min |
| Local Classical | 15.99 min |
| Figure creation | 20.33 min |
| Total | 261.00 min |
Outputs:
- forecast CSVs per dataset and model family in
experiments/runs/results/{global,local}/(e.g.,rossmann_hypertrees_fcsts.csv,rossmann_lgbm_fcsts.csv,rossmann_deeplearning_fcsts.csv,rossmann_ets_fcsts.csv) andexperiments/runs/results/ablation/{rossmann,embedding_evaluation}/ - metrics tables at
experiments/runs/results/metrics/{global,local,ablation_rossmann,ablation_embeddings}_metrics.csv - paper figure PDFs + PNGs in
experiments/runs/results/plots/
Explicit mapping from paper element to the code and output that produces it.
| Paper element | Produced by | Output location |
|---|---|---|
| Table 1 (Air Passengers Results) | Reproduce.ipynb -> local stage (airpassengers) |
results/metrics/local_metrics.csv |
| Table 2 (Local Model Results) | Reproduce.ipynb -> local stage (auselectricity, ausretail, tourism_monthly) |
results/metrics/local_metrics.csv |
| Table 3 (Global Model Results) | Reproduce.ipynb -> global stage (all datasets) |
results/metrics/global_metrics.csv |
| Table 4 (Rossmann Ablation A1-A11) | Reproduce.ipynb -> Rossmann ablation (rossmann_A1.ipynb ... rossmann_A11.ipynb) |
results/metrics/ablation_rossmann_metrics.csv |
| Table G1 (Embedding-Dimension Analysis) | Reproduce.ipynb -> embedding ablation (embedding_ablation.ipynb) |
results/metrics/ablation_embeddings_metrics.csv |
| Figure 4 (Runtime Scaling) | runs/notebooks/scaling_comparison.ipynb |
results/plots/runtime_scaling.{pdf,png} |
| Figure 6 (Hyper-Tree-STL Decomposition) | runs/notebooks/stl.ipynb |
results/plots/STL_Trend.{pdf,png}, STL_Seasonality.{pdf,png} |
| Figure 7 (Estimated Parameters of Hyper-Tree-STL) | runs/notebooks/stl.ipynb |
results/plots/STL_a0.{pdf,png}, STL_a1.{pdf,png}, STL_c1.{pdf,png}, STL_d1.{pdf,png} |
| Figure 8 (Feature Importance of Hyper-Tree-STL) | runs/notebooks/stl.ipynb |
results/plots/shap_a0.{pdf,png}, shap_a1.{pdf,png}, shap_c1.{pdf,png}, shap_d1.{pdf,png} |
| Figure D1 (Global Model Forecasts) | runs/notebooks/example_forecasts.ipynb |
results/plots/ |
| Figures F1, F2, F3 (Time-Varying AR Parameters) | runs/notebooks/time_varying_params.ipynb |
results/plots/ |
| Figures G1, G2 (Tree Embeddings) | runs/notebooks/embedding_visualization.ipynb |
results/plots/ |
The evaluation section of Reproduce.ipynb calls evaluate_fcsts(), which runs four evaluators in experiments/utils.py (global, local, Rossmann ablation, embedding-dim ablation) and returns a dict of metrics DataFrames rendered inline as tables. Each evaluator computes MAPE, sMAPE, WAPE, RMSE, MAE, and (where applicable) MASE per series, averages across series, and saves the aggregated table to experiments/runs/results/metrics/{global,local,ablation_rossmann,ablation_embeddings}_metrics.csv. For a lightweight standalone variant, experiments/runs/fcst_evaluation.ipynb loads the global or local result CSVs and displays them in a single cell.
experiments/
├── datasets/ # Datasets with configs
│ ├── airpassengers/ # Local only
│ ├── auselectricity/ # Global + Local (with ETS padding)
│ ├── ausretail/ # Global + Local
│ ├── m3_monthly/ # Global only
│ ├── m3_yearly/ # Global only (with ETS padding)
│ ├── m5_agg/ # Global only
│ ├── rossmann/ # Global only
│ └── tourism_monthly/ # Global + Local (with ETS padding)
├── models.py # Forecast model wrappers (Hyper-Tree*, baselines)
├── README.md # This file
├── repro.py # Papermill orchestration used by Reproduce.ipynb
├── Reproduce.ipynb # Single entry point: runs all experiments
├── requirements-experiments.txt # Pinned dependencies for paper reproduction
├── runs/
│ ├── fcst_evaluation.ipynb # Standalone metrics viewer (reads result CSVs)
│ ├── results/ # Outputs from experiment runs (auto-generated)
│ │ ├── ablation/
│ │ │ ├── embedding_evaluation/ # Embedding-dim ablation forecast CSVs (+ tree
│ │ │ │ # embeddings / AR parameters for airpassengers)
│ │ │ └── rossmann/ # Rossmann ablation forecast CSVs
│ │ ├── global/ # Global-stage forecast CSVs (+ AR parameters
│ │ │ # for rossmann / m5_agg)
│ │ ├── local/ # Local-stage forecast CSVs (+ AR parameters
│ │ │ # / tree embeddings for airpassengers)
│ │ ├── metrics/ # Aggregated metrics tables
│ │ │ # (global, local, ablation_rossmann,
│ │ │ # ablation_embeddings)_metrics.csv
│ │ └── plots/ # PDF + PNG outputs from figure + STL notebooks
│ └── notebooks/ # Parameterized templates executed by papermill
└── utils.py # Data loading, metrics, plotting + CSV-loading helpers
Each dataset folder contains:
train.parquet/test.parquet: training and test splitsmeta.json: dataset metadata (series IDs, lags, features, forecast horizon, frequency)config_global.pyand/orconfig_local.py: hyperparameters for each experiment typedataset_source.txt: original data source reference
Datasets used for the global Hyper-Tree-ETS experiments (auselectricity, m3_yearly, tourism_monthly) additionally provide:
train_padded.parquet/test_padded.parquet: back-appended series with uniform lengthmeta_ets.json: ETS-specific metadata including themaskcolumn for valid observations
All datasets used in this paper are publicly available and are included directly in the replication kit under experiments/datasets/ as pre-processed train.parquet / test.parquet files. No dataset requires registration, payment, or NDA access. Upstream sources and citations are documented per dataset in dataset_source.txt. To run the experiments, one does not need to download any additional data to run the reproducibility check. The {train,test}.parquet and {train,test}_padded.parquet files in each experiments/datasets/<name>/ folder are produced by one-time preprocessing from the upstream raw sources cited in each dataset_source.txt.
Hyper-Tree models:
Hyper-Tree-AR: Autoregressive model with tree-learned, time-varying AR(p) parametersHyper-TreeNet-AR: Hybrid GBDT encoder + MLP decoder for AR(p) parametersHyper-Tree-ETS: Exponential smoothing with tree-learned parameters
Tree-based baselines:
LightGBM: Standard LightGBMLightGBM-AR: LightGBM with autoregressive lag featuresLightGBM-STL: LightGBM on STL residuals (local only)
Deep learning baselines:
DeepAR: Autoregressive RNNTFT: Temporal Fusion TransformerChronos: Pre-trained foundation model (chronos-t5-base)
Classical baselines:
AutoARIMA/AutoARIMA-X(local only): ARIMA with automatic(p,d,q)selection; the-Xvariant adds features.AR(p)/AR(p)-X(local only): fixed-orderARIMA(p, 0, 0); the-Xvariant adds features.AutoETS(local only): Automatic Exponential Smoothing (used as MASE reference).
For questions about reproducing the results, please open an issue on the project's GitHub repository.
Code in this repository is released under the Apache License 2.0 with Commons Clause License Condition v1.0. See the LICENSE file at the repository root.