Skip to content

Change default almost_fair_crps_alpha from 0.95 to 1.0#1139

Merged
mcgibbon merged 7 commits into
mainfrom
feature/fair-crps-default
May 27, 2026
Merged

Change default almost_fair_crps_alpha from 0.95 to 1.0#1139
mcgibbon merged 7 commits into
mainfrom
feature/fair-crps-default

Conversation

@mcgibbon
Copy link
Copy Markdown
Contributor

@mcgibbon mcgibbon commented May 7, 2026

Changes the default almost_fair_crps_alpha in EnsembleLoss from 0.95 (almost-fair CRPS) to 1.0 (fair CRPS). The almost-fair modification was originally motivated by avoiding unconstrained ensemble members when one member exactly matches the target, but in practice the loss is smooth in this regime and the gradient signal is sufficient without the modification.

Changes:

  • fme.core.loss.EnsembleLoss: change almost_fair_crps_alpha default from 0.95 to 1.0

  • Tests added

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Depends on #1138

Ran benchmark on 4-degree daily era5-only no-co2 training:

alpha=0.95: https://wandb.ai/ai2cm/ace/runs/injiirnf
alpha=1.0: https://wandb.ai/ai2cm/ace/runs/ozsyoxtz

mcgibbon and others added 5 commits May 7, 2026 17:02
Adds FiniteDifferenceCRPSLoss which computes CRPS on spatial finite
differences, with optional multi-level coarsening via avg_pool2d. Integrates
into EnsembleLoss via finite_difference_crps_weight and
finite_difference_crps_levels parameters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mcgibbon
Copy link
Copy Markdown
Contributor Author

mcgibbon commented May 7, 2026

This is motivated by a) never having shown that afCRPS is helpful for our use case, and b) the theoretical motivation for afCRPS breaking down when we include energy score or finite difference CRPS as part of the loss, and c) the original theoretical motivation for afCRPS not being fully convincing, and the paper not clearly expressing it was done to fix an encountered issue vs just chosen for theoretical reasons.

Regarding b), afCRPS exists for the case where one of the predictions equals exactly the target, making it so the CRPS doesn't constrain the other target at all. However, we never only use CRPS in practice, and the other loss terms (energy score and finite difference CRPS) require many outputs to exactly equal the target, which is not a risk.

Regarding c), the failure mode really just means that sample won't contribute to gradient updates - the behavior for epsilon differences from the target is smooth, with small gradients/updates that reduce to zero as one of the two ensemble values approaches the target.

Finally, I'm a little worried that our SSR metrics are consistently a bit uncalibrated. It would be nice to remove this as a potential source of that mis-calibration.

Comment thread fme/core/loss.py
finite_difference_crps_weight: float = 0.0,
finite_difference_crps_levels: int = 1,
almost_fair_crps_alpha: float = 0.95,
almost_fair_crps_alpha: float = 1.0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do decide this is the better approach, I lean toward making this "opt-in" and adding it to the baseline configs, rather than changing the pre-existing default behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hesitance with that option is that in the past when Troy and I have had that scenario (we have a new config set we agree to use, and we update the baseline configs) we invariably forget to add it to several experiments. I think we're still missing affine_norms: true in a lot of our experimental configs.

@Arcomano1234
Copy link
Copy Markdown
Contributor

It would be nice to just launch a quick experiment to verify this doesn't lead to any noticeable degradation.

@mcgibbon
Copy link
Copy Markdown
Contributor Author

mcgibbon commented May 8, 2026

It would be nice to just launch a quick experiment to verify this doesn't lead to any noticeable degradation.

Yeah we can hold off for this - it's a one-line PR that isn't likely to develop merge conflicts.

Base automatically changed from feature/finite-difference-crps to main May 11, 2026 14:36
@mcgibbon
Copy link
Copy Markdown
Contributor Author

Ran benchmark on 4-degree daily era5-only no-co2 training:

alpha=0.95: https://wandb.ai/ai2cm/ace/runs/injiirnf
alpha=1.0: https://wandb.ai/ai2cm/ace/runs/ozsyoxtz

The runs show slightly lower CRPS across many variables while having slightly higher RMSE, and SSRs are slightly higher (which is good as most variables generally go low-biased on signal later in training.
image
image
image

A stronger sign that this is a good change is that long-inference power spectral biases are improved for many variables, despite this not being directly optimized by the change:
image

… default of 1.0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@Arcomano1234 Arcomano1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tend to agree that the "best" values for these types of parameters should be the default because we will inevitably be copying and pasting old configs and forget to explicitly set this. It also seems to not hurt most metrics and in some cases as Jeremy pointed out it improves them (marginally).

@mcgibbon mcgibbon enabled auto-merge (squash) May 27, 2026 17:26
@mcgibbon mcgibbon merged commit ddcb9d5 into main May 27, 2026
7 checks passed
@mcgibbon mcgibbon deleted the feature/fair-crps-default branch May 27, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants