Covariance functions

Gaussian-process regression & optimisation model the spatial structure of data using a covariance function which specifies the covariance between any two points in the space.

The available covariance functions are implemented as classes within inference.gp, and can be passed either to GpRegressor or GpOptimiser via the kernel keyword argument as follows

from inference.gp import GpRegressor, SquaredExponential
GP = GpRegressor(x, y, kernel=SquaredExponential())

SquaredExponential

class inference.gp.SquaredExponential(hyperpar_bounds=None)

SquaredExponential is a covariance-function class which can be passed to GpRegressor via the kernel keyword argument. It uses the ‘squared-exponential’ covariance function given by:

\[K(\underline{u}, \underline{v}) = A^2 \exp \left( -\frac{1}{2} \sum_{i=1}^{n} \left(\frac{u_i - v_i}{l_i}\right)^2 \right)\]

The hyperparameter vector \(\underline{\theta}\) used by SquaredExponential to define the above function is structured as follows:

\[\underline{\theta} = [ \ln{A}, \ln{l_1}, \ldots, \ln{l_n}]\]

Parameters: hyperpar_bounds – By default, SquaredExponential will automatically set sensible lower and upper bounds on the value of the hyperparameters based on the available data. However, this keyword allows the bounds to be specified manually as a list of length-2 tuples giving the lower/upper bounds for each parameter.

RationalQuadratic

class inference.gp.RationalQuadratic(hyperpar_bounds=None)

RationalQuadratic is a covariance-function class which can be passed to GpRegressor via the kernel keyword argument. It uses the ‘rational quadratic’ covariance function given by:

\[K(\underline{u}, \underline{v}) = A^2 \left( 1 + \frac{1}{2\alpha} \sum_{i=1}^{n} \left(\frac{u_i - v_i}{l_i}\right)^2 \right)^{-\alpha}\]

The hyper-parameter vector \(\underline{\theta}\) used by RationalQuadratic to define the above function is structured as follows:

\[\underline{\theta} = [ \ln{A}, \ln{\alpha}, \ln{l_1}, \ldots, \ln{l_n}]\]

Parameters: hyperpar_bounds – By default, RationalQuadratic will automatically set sensible lower and upper bounds on the value of the hyperparameters based on the available data. However, this keyword allows the bounds to be specified manually as a list of length-2 tuples giving the lower/upper bounds for each parameter.

WhiteNoise

class inference.gp.WhiteNoise(hyperpar_bounds=None)

WhiteNoise is a covariance-function class which models the presence of independent identically-distributed Gaussian (i.e. white) noise on the input data. The covariance can be expressed as:

\[K(x_i, x_j) = \delta_{ij} \sigma_{n}^{2}\]

where \(\delta_{ij}\) is the Kronecker delta and \(\sigma_{n}\) is the Gaussian noise standard-deviation. The natural log of the noise-level \(\ln{\sigma_{n}}\) is the only hyperparameter.

WhiteNoise should be used as part of a ‘composite’ covariance function, as it doesn’t model the underlying structure of the data by itself. Composite covariance functions can be constructed by addition, for example:

from inference.gp import SquaredExponential, WhiteNoise
composite_kernel = SquaredExponential() + WhiteNoise()

Parameters: hyperpar_bounds – By default, WhiteNoise will automatically set sensible lower and upper bounds on the value of the log-noise-level based on the available data. However, this keyword allows the bounds to be specified manually as a length-2 tuple giving the lower/upper bound.

HeteroscedasticNoise

class inference.gp.HeteroscedasticNoise(hyperpar_bounds=None)

HeteroscedasticNoise is a covariance-function class which models the presence of heteroscedastic, independent Gaussian noise on the input data. ‘Heteroscedastic’ refers to the noise variance not being constant across the input data. The covariance can be expressed as:

\[K(x_i, x_j) = \delta_{ij} \sigma_i^{2}\]

where \(\delta_{ij}\) is the Kronecker delta and \(\sigma_{i}\) is the Gaussian noise standard-deviation for the \(i\)’th data value. The hyper-parameters \(\underline{\theta}\) are the natural log of standard-deviations for each of the \(m\) data values:

\[\underline{\theta} = [ \ln{\sigma_1}, \ln{\sigma_2}, \ldots, \ln{\sigma_m}]\]

Note that because the number of hyper-parameters is equal to the number of data values, hyper-parameter optimisation can become very expensive for larger datasets. HeteroscedasticNoise should be used as part of a ‘composite’ covariance function, as it doesn’t model the underlying structure of the data by itself. Composite covariance functions can be constructed by addition, for example:

from inference.gp import SquaredExponential, HeteroscedasticNoise
composite_kernel = SquaredExponential() + HeteroscedasticNoise()

Parameters: hyperpar_bounds – By default, HeteroscedasticNoise will automatically set sensible lower and upper bounds on the values of the log-standard-deviation based on the available data. However, this keyword allows the bounds to be specified manually as a sequence of length-2 tuples giving the lower/upper bounds.

ChangePoint

class inference.gp.ChangePoint(kernels: Sequence, axis: int = 0, location_bounds: Sequence = None, width_bounds: Sequence = None)

ChangePoint is a covariance function which divides the input space into multiple regions (at various points along a chosen input dimension), allowing each of the regions to be modelled using a separate covariance function. The boundaries which define the extent of each region are referred to as ‘change-points’. The locations of the change-points, and the width over which the transition between regions occurs are hyperparameters determined from the data.

This is useful in cases where properties of the data (e.g. the scale-lengths over which the data vary) change significantly over the input dimension which is used to divide the space.

The change-point kernel \(K_{\mathrm{cp}}\) is a weighted-sum of the input kernels \(K_{1}, \, K_{2}, \dots , K_{n}\) which model each of the \(n\) regions:

\[K_{\mathrm{cp}}(u, v) = K_1 a_1 + \left(\sum_{i=2}^{n-1} K_i a_i b_{i-1}\right) + K_n b_{n-1}\]

where

\[a_{i}(u, v) = (1 - f_i (u)) (1 - f_i (v)), \quad b_{i}(u, v) = f_i (u) f_i (v)\]

and \(f_i\) is the logistic weighting function associated with the \(i\)’th change-point:

\[f_i(x) = \frac{1}{1 + e^{-(x - c_i) / w_i}}\]

and \(c_i, \, w_i\) are the location and width of the \(i\)’th change-point respectively. The \(c_i\) and \(w_i\) are hyperparameters which are determined automatically (alongside the hyperparameters for the kernels in each region).

Parameters

kernels – A tuple of the kernel objects to be used (K1, K2, K3, ...)
axis (int) – The spatial axis over which the transitions between kernels occur.
location_bounds – The bounds for the change-point location hyperparameters \(c_i\) as a tuple of the form ((lower_bound_0, upper_bound_0),(lower_bound_1, upper_bound_1),...). There should always be \(n-1\) pairs of bounds where \(n\) is the number of kernels specified.
width_bounds – The bounds for the change-point width hyperparameters \(w_i\) as a tuple of the form ((lower_bound_0, upper_bound_0),(lower_bound_1, upper_bound_1),...). There should always be \(n-1\) pairs of bounds where \(n\) is the number of kernels specified.