Add documentation note about data standardization for ARD kernels by r69shabh · Pull Request #2751 · cornellius-gp/gpytorch

r69shabh · 2026-05-04T07:07:11Z

Addresses issue #724: Add warning about zero gradients with high-variance data

For ARD kernels (when ard_num_dims is not None), input data with very different scales across dimensions can cause the kernel matrix to numerically underflow to zero, resulting in zero gradients for lengthscale parameters during training.

This change adds a note to the RBFKernel and MaternKernel docstrings recommending data standardization for numerical stability.

Addresses issue cornellius-gp#724: Add warning about zero gradients with high-variance data For ARD kernels (when ard_num_dims is not None), input data with very different scales across dimensions can cause the kernel matrix to numerically underflow to zero, resulting in zero gradients for lengthscale parameters during training. This change adds a note to the RBFKernel and MaternKernel docstrings recommending data standardization for numerical stability.

Copilot

Pull request overview

Adds documentation guidance to help users avoid numerical underflow / zero-gradient issues when using ARD kernels with poorly scaled inputs.

Changes:

Add an ARD-specific documentation note to RBFKernel recommending input standardization for numerical stability.
Add the same ARD-specific documentation note to MaternKernel.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
gpytorch/kernels/rbf_kernel.py	Adds a docstring note warning about ARD + unstandardized inputs causing numerical underflow and zero gradients.
gpytorch/kernels/matern_kernel.py	Adds the same docstring note for Matern ARD usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        For ARD kernels (when :attr:`ard_num_dims` is not None), it is highly recommended
+        to standardize the input data (e.g., subtract the mean and divide by the standard
+        deviation) before passing it to the kernel. With input data that has very different
+        scales across dimensions, the kernel matrix can numerically underflow to zero,
+        causing zero gradients for the lengthscale parameters. Standardizing the data
+        ensures numerical stability and proper gradient flow during training.


+        For ARD kernels (when :attr:`ard_num_dims` is not None), it is highly recommended
+        to standardize the input data (e.g., subtract the mean and divide by the standard
+        deviation) before passing it to the kernel. With input data that has very different
+        scales across dimensions, the kernel matrix can numerically underflow to zero,
+        causing zero gradients for the lengthscale parameters. Standardizing the data
+        ensures numerical stability and proper gradient flow during training.


Copilot AI review requested due to automatic review settings May 4, 2026 07:07

Copilot started reviewing on behalf of r69shabh May 4, 2026 07:07 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

r69shabh mentioned this pull request May 4, 2026

[Examples] Standardize Predictors for ARD #724

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation note about data standardization for ARD kernels#2751

Add documentation note about data standardization for ARD kernels#2751
r69shabh wants to merge 1 commit into
cornellius-gp:mainfrom
r69shabh:fix/issue-724-standardize-data-note

r69shabh commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r69shabh commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants