Covariance functions
Gaussian-process regression & optimisation model the spatial structure of data using a covariance function which specifies the covariance between any two points in the space.
The available covariance functions are implemented as classes within inference.gp
,
and can be passed either to GpRegressor
or GpOptimiser
via the kernel
keyword
argument as follows
from inference.gp import GpRegressor, SquaredExponential
GP = GpRegressor(x, y, kernel=SquaredExponential())
SquaredExponential
- class inference.gp.SquaredExponential(hyperpar_bounds=None)
SquaredExponential
is a covariance-function class which can be passed toGpRegressor
via thekernel
keyword argument. It uses the ‘squared-exponential’ covariance function given by:\[K(\underline{u}, \underline{v}) = A^2 \exp \left( -\frac{1}{2} \sum_{i=1}^{n} \left(\frac{u_i - v_i}{l_i}\right)^2 \right)\]The hyperparameter vector \(\underline{\theta}\) used by
SquaredExponential
to define the above function is structured as follows:\[\underline{\theta} = [ \ln{A}, \ln{l_1}, \ldots, \ln{l_n}]\]- Parameters
hyperpar_bounds – By default,
SquaredExponential
will automatically set sensible lower and upper bounds on the value of the hyperparameters based on the available data. However, this keyword allows the bounds to be specified manually as a list of length-2 tuples giving the lower/upper bounds for each parameter.
RationalQuadratic
- class inference.gp.RationalQuadratic(hyperpar_bounds=None)
RationalQuadratic
is a covariance-function class which can be passed toGpRegressor
via thekernel
keyword argument. It uses the ‘rational quadratic’ covariance function given by:\[K(\underline{u}, \underline{v}) = A^2 \left( 1 + \frac{1}{2\alpha} \sum_{i=1}^{n} \left(\frac{u_i - v_i}{l_i}\right)^2 \right)^{-\alpha}\]The hyper-parameter vector \(\underline{\theta}\) used by
RationalQuadratic
to define the above function is structured as follows:\[\underline{\theta} = [ \ln{A}, \ln{\alpha}, \ln{l_1}, \ldots, \ln{l_n}]\]- Parameters
hyperpar_bounds – By default,
RationalQuadratic
will automatically set sensible lower and upper bounds on the value of the hyperparameters based on the available data. However, this keyword allows the bounds to be specified manually as a list of length-2 tuples giving the lower/upper bounds for each parameter.
WhiteNoise
- class inference.gp.WhiteNoise(hyperpar_bounds=None)
WhiteNoise
is a covariance-function class which models the presence of independent identically-distributed Gaussian (i.e. white) noise on the input data. The covariance can be expressed as:\[K(x_i, x_j) = \delta_{ij} \sigma_{n}^{2}\]where \(\delta_{ij}\) is the Kronecker delta and \(\sigma_{n}\) is the Gaussian noise standard-deviation. The natural log of the noise-level \(\ln{\sigma_{n}}\) is the only hyperparameter.
WhiteNoise
should be used as part of a ‘composite’ covariance function, as it doesn’t model the underlying structure of the data by itself. Composite covariance functions can be constructed by addition, for example:from inference.gp import SquaredExponential, WhiteNoise composite_kernel = SquaredExponential() + WhiteNoise()
- Parameters
hyperpar_bounds – By default,
WhiteNoise
will automatically set sensible lower and upper bounds on the value of the log-noise-level based on the available data. However, this keyword allows the bounds to be specified manually as a length-2 tuple giving the lower/upper bound.
HeteroscedasticNoise
- class inference.gp.HeteroscedasticNoise(hyperpar_bounds=None)
HeteroscedasticNoise
is a covariance-function class which models the presence of heteroscedastic, independent Gaussian noise on the input data. ‘Heteroscedastic’ refers to the noise variance not being constant across the input data. The covariance can be expressed as:\[K(x_i, x_j) = \delta_{ij} \sigma_i^{2}\]where \(\delta_{ij}\) is the Kronecker delta and \(\sigma_{i}\) is the Gaussian noise standard-deviation for the \(i\)’th data value. The hyper-parameters \(\underline{\theta}\) are the natural log of standard-deviations for each of the \(m\) data values:
\[\underline{\theta} = [ \ln{\sigma_1}, \ln{\sigma_2}, \ldots, \ln{\sigma_m}]\]Note that because the number of hyper-parameters is equal to the number of data values, hyper-parameter optimisation can become very expensive for larger datasets.
HeteroscedasticNoise
should be used as part of a ‘composite’ covariance function, as it doesn’t model the underlying structure of the data by itself. Composite covariance functions can be constructed by addition, for example:from inference.gp import SquaredExponential, HeteroscedasticNoise composite_kernel = SquaredExponential() + HeteroscedasticNoise()
- Parameters
hyperpar_bounds – By default,
HeteroscedasticNoise
will automatically set sensible lower and upper bounds on the values of the log-standard-deviation based on the available data. However, this keyword allows the bounds to be specified manually as a sequence of length-2 tuples giving the lower/upper bounds.
ChangePoint
- class inference.gp.ChangePoint(kernels: Sequence, axis: int = 0, location_bounds: Sequence = None, width_bounds: Sequence = None)
ChangePoint
is a covariance function which divides the input space into multiple regions (at various points along a chosen input dimension), allowing each of the regions to be modelled using a separate covariance function. The boundaries which define the extent of each region are referred to as ‘change-points’. The locations of the change-points, and the width over which the transition between regions occurs are hyperparameters determined from the data.This is useful in cases where properties of the data (e.g. the scale-lengths over which the data vary) change significantly over the input dimension which is used to divide the space.
The change-point kernel \(K_{\mathrm{cp}}\) is a weighted-sum of the input kernels \(K_{1}, \, K_{2}, \dots , K_{n}\) which model each of the \(n\) regions:
\[K_{\mathrm{cp}}(u, v) = K_1 a_1 + \left(\sum_{i=2}^{n-1} K_i a_i b_{i-1}\right) + K_n b_{n-1}\]where
\[a_{i}(u, v) = (1 - f_i (u)) (1 - f_i (v)), \quad b_{i}(u, v) = f_i (u) f_i (v)\]and \(f_i\) is the logistic weighting function associated with the \(i\)’th change-point:
\[f_i(x) = \frac{1}{1 + e^{-(x - c_i) / w_i}}\]and \(c_i, \, w_i\) are the location and width of the \(i\)’th change-point respectively. The \(c_i\) and \(w_i\) are hyperparameters which are determined automatically (alongside the hyperparameters for the kernels in each region).
- Parameters
kernels – A tuple of the kernel objects to be used
(K1, K2, K3, ...)
axis (int) – The spatial axis over which the transitions between kernels occur.
location_bounds – The bounds for the change-point location hyperparameters \(c_i\) as a tuple of the form
((lower_bound_0, upper_bound_0),(lower_bound_1, upper_bound_1),...)
. There should always be \(n-1\) pairs of bounds where \(n\) is the number of kernels specified.width_bounds – The bounds for the change-point width hyperparameters \(w_i\) as a tuple of the form
((lower_bound_0, upper_bound_0),(lower_bound_1, upper_bound_1),...)
. There should always be \(n-1\) pairs of bounds where \(n\) is the number of kernels specified.