Model behavior and training setup are fully defined through TOML configuration files (see configs/*.toml).
Each section controls a specific component of the pipeline, from data loading to optimization.
Below is an overview of the most relevant sections and what they control. Some default values are provided as examples and can be used as a good starting point.
Defines how and where the model runs.
| Key | Description | Example |
|---|---|---|
device_kind |
Execution device (cpu or cuda). |
cuda |
ddp |
Enable/disable distributed training (DDP). | false |
Reproducibility and randomness control.
| Key | Description | Example |
|---|---|---|
seed |
Random seed for reproducibility. | 42069 |
persist_to_wandb_project |
Name of WandB project to log to (requires authentication.wandb_api_key). |
"forecasting-s4" |
| Key | Description | Value |
|---|---|---|
batch_size |
Batch size per step | 32 |
evaluation_interval |
Evaluate every N steps | 1000 |
checkpoint_interval |
Save checkpoint every N steps | 1000 |
benchmarking_interval |
Run benchmarking every N steps | 10_000 |
maximum_steps |
Total training steps | 500_000 |
task |
Training task | prediction |
Core model architecture and input/output settings.
| Key | Description | Value |
|---|---|---|
context_window |
context window | 32 |
predict_width |
Forecast horizon | 2 |
base_sample_interval_minutes |
Base sampling interval (minutes) | 15 |
input_sample_intervals_minutes |
Input sampling intervals for multi‑rate training (minutes). Each value must be a multiple of base_sample_interval_minutes. |
[15, 60] |
output_sample_interval_minutes |
Output sampling intervals for predictions (minutes). Each value must be a multiple of base_sample_interval_minutes. |
[15] |
alignment |
Temporal alignment window (minutes) | 1440 |
model |
Backbone architecture | ssm |
| Key | Description | Choices / Examples |
|---|---|---|
[model.loss] |
Defines the training loss function used for optimization. | loss: "mse", "nll", "pinball" |
[model.output_head] |
Specifies the output layer or distribution type. | arch: "gmm", "quantile" |
[model.patch_encoder] |
Encodes temporal patches of input data before modeling. | arch: "linear", "conv", "gemma" |
[model.patch_decoder] |
Decodes or reconstructs temporal output patches. | arch: "linear", "conv", "none" |
The choice of model.loss.loss must match model.output_head.arch:
arch = "gmm"→ useloss = "nll"(likelihood over Gaussian mixture).arch = "quantile"→ useloss = "pinball"(quantile regression).
Standard optimizer configuration.
| Key | Description | Value |
|---|---|---|
learning_rate |
Base learning rate | 3e-5 |
weight_decay |
Optimizer weight decay | 0 |
gradient_clipping |
Gradient clipping threshold | 1 |
Defines evaluation targets and thresholds for monitoring. Split up into local, stef and gift benchmarks.
| Key | Description | Example |
|---|---|---|
locations |
List of benchmark sites or time series. | ["Ameland"] |
thresholds |
Corresponding performance thresholds per location. | [-4, -15.9, ...] |
n_day_ahead |
Forecast horizon for benchmarking. | 1 |
| Key | Description | Example |
|---|---|---|
targets_file |
List of targets for stef beam. | liander2024_targets.yaml |
input_window_days |
Number of input days. | 30 |
n_day_ahead |
Forecast horizon for benchmarking. | 1 |
Configures data input and output.
| Key | Description | Example |
|---|---|---|
feature_order |
Features that are used for training . | ["measurements_cdb", "weather", "time"] |
Credentials and external service tokens (optional). Provide only what you need.
| Key | Description | Example |
|---|---|---|
wandb_api_key |
API key to enable Weights & Biases logging. | "...your key..." |
Without wandb_api_key logging falls back to stdout/CSV only.