Name	Name	Last commit message	Last commit date
parent directory ..
datasets	datasets
runs	runs
README.md	README.md
Reproduce.ipynb	Reproduce.ipynb
__init__.py	__init__.py
models.py	models.py
repro.py	repro.py
requirements-experiments.txt	requirements-experiments.txt
utils.py	utils.py

Hyper-Trees Experiments

This folder is the reproducibility section for the paper. It contains all data, configurations, code, and the pinned environment needed to reproduce the published experimental results.

Package assembled: May 28, 2026 with the pinned package versions in experiments/requirements-experiments.txt.

Installation

First, clone the repository:

git clone https://github.com/StatMixedML/Hyper-Trees.git
cd Hyper-Trees

We use uv as the package manager. Install it first if you don't already have it:

pip install uv

Our paper runs used uv 0.8.12.

In the project's top-level folder (the Hyper-Trees/ folder you cloned into), create a Python 3.11 venv and install the pinned experiments environment:

# Create a Python 3.11.0 venv (must be exactly 3.11.0 to match the pinned environment)
uv venv --python 3.11.0

# Activate the venv
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate

# Install all dependencies (including transitive) at exact pinned versions
uv pip install -r experiments/requirements-experiments.txt --index-strategy unsafe-best-match

experiments/requirements-experiments.txt pins every dependency, including transitive ones, to the exact versions used in our paper experiments, ensuring reproducibility.

CUDA Installation

All experiments in the paper were run on GPU using PyTorch 2.1.1 with CUDA 11.8. GPU is required for the deep learning baselines (DeepAR, TFT) and was also used for training Hyper-TreeNet models. The requirements-experiments.txt file pins torch==2.1.1+cu118 and declares --extra-index-url https://download.pytorch.org/whl/cu118, so the CUDA build is pulled in automatically.

CPU-only environments: the Hyper-Tree, Hyper-TreeNet, LightGBM, and classical baselines will still run on CPU. The deep-learning baselines (DeepAR, TFT) technically fall back to CPU via PyTorch but become impractically slow for the larger datasets; a CUDA-11.8-compatible GPU is strongly recommended for a full end-to-end reproducibility run.

Important Note

⚠️ Important:
Using any versions other than those pinned in requirements-experiments.txt (including Python, CUDA and PyTorch) will produce different results.

Do not install hypertrees-forecasting from PyPI or GitHub alongside this environment. The pinned requirements-experiments.txt already includes the exact version of the package and all its dependencies at fixed versions. Installing from PyPI or GitHub would pull in newer (unpinned) transitive dependencies, breaking version consistency and reproducibility.
If you already have hypertrees-forecasting installed from PyPI or GitHub, uninstall it first before setting up the experiments environment:
uv pip uninstall hypertrees-forecasting
Then proceed with the reproducible installation above.

Hardware Specifications

All experiments in the paper were conducted with the following specifications:

OS: Windows 11
CPU: 13th Gen Intel(R) Core(TM) i9-13900H (14 cores)
RAM: 64 GB
GPU: NVIDIA RTX 3500 Ada Generation Laptop GPU (12 GB memory)

Running Experiments

The single entry point is experiments/Reproduce.ipynb

Open experiments/Reproduce.ipynb in Jupyter, VS Code, or PyCharm and run all cells. This reproduces every table, figure, and ablation in the paper (global, local, Rossmann A1-A11, embedding-dimension ablation, paper figures, and the final metrics tables). When the run finishes, every metrics table and every paper figure is rendered inline at the bottom of the notebook.

A full run takes approximately 4 hours 21 minutes on the paper hardware, broken down as:

Stage	Runtime
Global Hyper-Trees	21.09 min
Rossmann ablations (A1-A11)	45.87 min
Embedding-dimension ablation	22.12 min
Local Hyper-Trees	14.97 min
Global LightGBM	4.24 min
Local LightGBM	8.64 min
Global Deep Learning	104.75 min
Global ETS	2.47 min
Local Classical	15.99 min
Figure creation	20.33 min
Total	261.00 min

Outputs:

forecast CSVs per dataset and model family in experiments/runs/results/{global,local}/ (e.g., rossmann_hypertrees_fcsts.csv, rossmann_lgbm_fcsts.csv, rossmann_deeplearning_fcsts.csv, rossmann_ets_fcsts.csv) and experiments/runs/results/ablation/{rossmann,embedding_evaluation}/
metrics tables at experiments/runs/results/metrics/{global,local,ablation_rossmann,ablation_embeddings}_metrics.csv
paper figure PDFs + PNGs in experiments/runs/results/plots/

Paper Artefact Map

Explicit mapping from paper element to the code and output that produces it.

Paper element	Produced by	Output location
Table 1 (Air Passengers Results)	`Reproduce.ipynb` -> local stage (airpassengers)	`results/metrics/local_metrics.csv`
Table 2 (Local Model Results)	`Reproduce.ipynb` -> local stage (auselectricity, ausretail, tourism_monthly)	`results/metrics/local_metrics.csv`
Table 3 (Global Model Results)	`Reproduce.ipynb` -> global stage (all datasets)	`results/metrics/global_metrics.csv`
Table 4 (Rossmann Ablation A1-A11)	`Reproduce.ipynb` -> Rossmann ablation (`rossmann_A1.ipynb` ... `rossmann_A11.ipynb`)	`results/metrics/ablation_rossmann_metrics.csv`
Table G1 (Embedding-Dimension Analysis)	`Reproduce.ipynb` -> embedding ablation (`embedding_ablation.ipynb`)	`results/metrics/ablation_embeddings_metrics.csv`
Figure 4 (Runtime Scaling)	`runs/notebooks/scaling_comparison.ipynb`	`results/plots/runtime_scaling.{pdf,png}`
Figure 6 (Hyper-Tree-STL Decomposition)	`runs/notebooks/stl.ipynb`	`results/plots/STL_Trend.{pdf,png}`, `STL_Seasonality.{pdf,png}`
Figure 7 (Estimated Parameters of Hyper-Tree-STL)	`runs/notebooks/stl.ipynb`	`results/plots/STL_a0.{pdf,png}`, `STL_a1.{pdf,png}`, `STL_c1.{pdf,png}`, `STL_d1.{pdf,png}`
Figure 8 (Feature Importance of Hyper-Tree-STL)	`runs/notebooks/stl.ipynb`	`results/plots/shap_a0.{pdf,png}`, `shap_a1.{pdf,png}`, `shap_c1.{pdf,png}`, `shap_d1.{pdf,png}`
Figure D1 (Global Model Forecasts)	`runs/notebooks/example_forecasts.ipynb`	`results/plots/`
Figures F1, F2, F3 (Time-Varying AR Parameters)	`runs/notebooks/time_varying_params.ipynb`	`results/plots/`
Figures G1, G2 (Tree Embeddings)	`runs/notebooks/embedding_visualization.ipynb`	`results/plots/`

Evaluating Results

The evaluation section of Reproduce.ipynb calls evaluate_fcsts(), which runs four evaluators in experiments/utils.py (global, local, Rossmann ablation, embedding-dim ablation) and returns a dict of metrics DataFrames rendered inline as tables. Each evaluator computes MAPE, sMAPE, WAPE, RMSE, MAE, and (where applicable) MASE per series, averages across series, and saves the aggregated table to experiments/runs/results/metrics/{global,local,ablation_rossmann,ablation_embeddings}_metrics.csv. For a lightweight standalone variant, experiments/runs/fcst_evaluation.ipynb loads the global or local result CSVs and displays them in a single cell.

Folder Structure

experiments/
├── datasets/                                   # Datasets with configs
│   ├── airpassengers/                          # Local only
│   ├── auselectricity/                         # Global + Local (with ETS padding)
│   ├── ausretail/                              # Global + Local
│   ├── m3_monthly/                             # Global only
│   ├── m3_yearly/                              # Global only (with ETS padding)
│   ├── m5_agg/                                 # Global only
│   ├── rossmann/                               # Global only
│   └── tourism_monthly/                        # Global + Local (with ETS padding)
├── models.py                                   # Forecast model wrappers (Hyper-Tree*, baselines)
├── README.md                                   # This file
├── repro.py                                    # Papermill orchestration used by Reproduce.ipynb
├── Reproduce.ipynb                             # Single entry point: runs all experiments
├── requirements-experiments.txt                # Pinned dependencies for paper reproduction
├── runs/
│   ├── fcst_evaluation.ipynb                   # Standalone metrics viewer (reads result CSVs)
│   ├── results/                                # Outputs from experiment runs (auto-generated)
│   │   ├── ablation/
│   │   │   ├── embedding_evaluation/           # Embedding-dim ablation forecast CSVs (+ tree
│   │   │   │                                   #   embeddings / AR parameters for airpassengers)
│   │   │   └── rossmann/                       # Rossmann ablation forecast CSVs
│   │   ├── global/                             # Global-stage forecast CSVs (+ AR parameters
│   │   │                                       #   for rossmann / m5_agg)
│   │   ├── local/                              # Local-stage forecast CSVs (+ AR parameters
│   │   │                                       #   / tree embeddings for airpassengers)
│   │   ├── metrics/                            # Aggregated metrics tables
│   │   │                                       #   (global, local, ablation_rossmann,
│   │   │                                       #    ablation_embeddings)_metrics.csv
│   │   └── plots/                              # PDF + PNG outputs from figure + STL notebooks
│   └── notebooks/                              # Parameterized templates executed by papermill
└── utils.py                                    # Data loading, metrics, plotting + CSV-loading helpers

Dataset Folder Contents

Each dataset folder contains:

train.parquet / test.parquet: training and test splits
meta.json: dataset metadata (series IDs, lags, features, forecast horizon, frequency)
config_global.py and/or config_local.py: hyperparameters for each experiment type
dataset_source.txt: original data source reference

Datasets used for the global Hyper-Tree-ETS experiments (auselectricity, m3_yearly, tourism_monthly) additionally provide:

train_padded.parquet / test_padded.parquet: back-appended series with uniform length
meta_ets.json: ETS-specific metadata including the mask column for valid observations

Data availability

All datasets used in this paper are publicly available and are included directly in the replication kit under experiments/datasets/ as pre-processed train.parquet / test.parquet files. No dataset requires registration, payment, or NDA access. Upstream sources and citations are documented per dataset in dataset_source.txt. To run the experiments, one does not need to download any additional data to run the reproducibility check. The {train,test}.parquet and {train,test}_padded.parquet files in each experiments/datasets/<name>/ folder are produced by one-time preprocessing from the upstream raw sources cited in each dataset_source.txt.

Models Compared

Hyper-Tree models:

Hyper-Tree-AR: Autoregressive model with tree-learned, time-varying AR(p) parameters
Hyper-TreeNet-AR: Hybrid GBDT encoder + MLP decoder for AR(p) parameters
Hyper-Tree-ETS: Exponential smoothing with tree-learned parameters

Tree-based baselines:

LightGBM: Standard LightGBM
LightGBM-AR: LightGBM with autoregressive lag features
LightGBM-STL: LightGBM on STL residuals (local only)

Deep learning baselines:

DeepAR: Autoregressive RNN
TFT: Temporal Fusion Transformer
Chronos: Pre-trained foundation model (chronos-t5-base)

Classical baselines:

AutoARIMA / AutoARIMA-X (local only): ARIMA with automatic (p,d,q) selection; the -X variant adds features.
AR(p) / AR(p)-X (local only): fixed-order ARIMA(p, 0, 0); the -X variant adds features.
AutoETS (local only): Automatic Exponential Smoothing (used as MASE reference).

Contact

For questions about reproducing the results, please open an issue on the project's GitHub repository.

License

Code in this repository is released under the Apache License 2.0 with Commons Clause License Condition v1.0. See the LICENSE file at the repository root.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Hyper-Trees Experiments

Installation

CUDA Installation

Important Note

Hardware Specifications

Running Experiments

Paper Artefact Map

Evaluating Results

Folder Structure

Dataset Folder Contents

Data availability

Models Compared

Contact

License

FilesExpand file tree

experiments

Directory actions

More options

Directory actions

More options

Latest commit

History

experiments

Folders and files

parent directory

README.md

Hyper-Trees Experiments

Installation

CUDA Installation

Important Note

Hardware Specifications

Running Experiments

Paper Artefact Map

Evaluating Results

Folder Structure

Dataset Folder Contents

Data availability

Models Compared

Contact

License