Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions gpytorch/kernels/matern_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@ class MaternKernel(Kernel):
This kernel does not have an `outputscale` parameter. To add a scaling parameter,
decorate this kernel with a :class:`gpytorch.kernels.ScaleKernel`.

.. note::

For ARD kernels (when :attr:`ard_num_dims` is not None), it is highly recommended
to standardize the input data (e.g., subtract the mean and divide by the standard
deviation) before passing it to the kernel. With input data that has very different
scales across dimensions, the kernel matrix can numerically underflow to zero,
causing zero gradients for the lengthscale parameters. Standardizing the data
ensures numerical stability and proper gradient flow during training.
Comment on lines +44 to +49

:param nu: (Default: 2.5) The smoothness parameter.
:type nu: float (0.5, 1.5, or 2.5)
:param ard_num_dims: (Default: `None`) Set this if you want a separate lengthscale for each
Expand Down
9 changes: 9 additions & 0 deletions gpytorch/kernels/rbf_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@ class RBFKernel(Kernel):
This kernel does not have an `outputscale` parameter. To add a scaling parameter,
decorate this kernel with a :class:`gpytorch.kernels.ScaleKernel`.

.. note::

For ARD kernels (when :attr:`ard_num_dims` is not None), it is highly recommended
to standardize the input data (e.g., subtract the mean and divide by the standard
deviation) before passing it to the kernel. With input data that has very different
scales across dimensions, the kernel matrix can numerically underflow to zero,
causing zero gradients for the lengthscale parameters. Standardizing the data
ensures numerical stability and proper gradient flow during training.
Comment on lines +36 to +41

:param ard_num_dims: Set this if you want a separate lengthscale for each input
dimension. It should be `d` if :math:`\mathbf{x_1}` is a `n x d` matrix. (Default: `None`.)
:param batch_shape: Set this if you want a separate lengthscale for each batch of input
Expand Down
Loading