PcaChain

class inference.mcmc.PcaChain(*args, bounds=None, **kwargs)

A class which performs Gibbs sampling over the eigenvectors of the covariance matrix.

The PcaChain sampler uses ‘principal component analysis’ (PCA) to improve the performance of Gibbs sampling in cases where strong linear correlation exists between two or more variables in a problem.

For an N-parameter problem, PcaChain produces a new sample by making N sequential 1D Metropolis-Hastings steps in the direction of each of the N eigenvectors of the NxN covariance matrix.

As an initial guess the covariance matrix is taken to be diagonal, which results in standard gibbs sampling for the first samples in the chain. As the chain advances, the covariance matrix is periodically updated with an estimate derived from the sample itself, and the eigenvectors are re-calculated.

Parameters
  • posterior (func) – A function which takes the vector of model parameters as a numpy.ndarray, and returns the posterior log-probability.

  • start – Vector of model parameters which correspond to the parameter-space coordinates at which the chain will start.

  • widths – Vector of standard deviations which serve as initial guesses for the widths of the proposal distribution for each model parameter. If not specified, the starting widths will be approximated as 5% of the values in ‘start’.

  • bounds – An instance of the inference.mcmc.Bounds class, or a sequence of two numpy.ndarray specifying the lower and upper bounds for the parameters in the form (lower_bounds, upper_bounds).

  • display_progress (bool) – If set as True, a message is displayed during sampling showing the current progress and an estimated time until completion.

advance(m: int)

Advances the chain by taking m new steps.

Parameters

m (int) – Number of steps the chain will advance.

get_interval(interval=0.95, burn: int = 1, thin: int = 1, samples=None)

Return the samples from the chain which lie inside a chosen highest-density interval.

Parameters
  • interval (float) – Total probability of the desired interval. For example, if interval = 0.95, then the samples corresponding to the top 95% of posterior probability values are returned.

  • burn (int) – Number of samples to discard from the start of the chain.

  • thin (int) – Instead of returning every sample which is not discarded as part of the burn-in, every m’th sample is returned for a specified integer m.

  • samples (int) – The number of samples that should be returned from the requested interval. Note that specifying samples overrides the value of thin.

Returns

List containing sample points stored as tuples, and a corresponding list of log-probability values.

get_marginal(index: int, burn: int = 1, thin: int = 1, unimodal=False) DensityEstimator

Estimate the 1D marginal distribution of a chosen parameter.

Parameters
  • index (int) – Index of the parameter for which the marginal distribution is to be estimated.

  • burn (int) – Number of samples to discard from the start of the chain.

  • thin (int) – Rather than using every sample which is not discarded as part of the burn-in, every m’th sample is used for a specified integer m.

  • unimodal (bool) – Selects the type of density estimation to be used. The default value is False, which causes a GaussianKDE object to be returned. If however the marginal distribution being estimated is known to be unimodal, setting unimodal = True will result in the UnimodalPdf class being used to estimate the density.

Returns

An instance of a DensityEstimator from the inference.pdf module which represents the 1D marginal distribution of the chosen parameter. If the unimodal argument is False a GaussianKDE instance is returned, else if True a UnimodalPdf instance is returned.

get_parameter(index: int, burn: int = 1, thin: int = 1) ndarray

Return sample values for a chosen parameter.

Parameters
  • index (int) – Index of the parameter for which samples are to be returned.

  • burn (int) – Number of samples to discard from the start of the chain.

  • thin (int) – Instead of returning every sample which is not discarded as part of the burn-in, every m’th sample is returned for a specified integer m.

Returns

Samples for the parameter specified by index as a numpy.ndarray.

get_sample(burn: int = 1, thin: int = 1) ndarray

Return the sample as a 2D numpy.ndarray.

Parameters
  • burn (int) – Number of steps from the start of the chain which are ignored.

  • thin (int) – Sets the factor by which the sample is ‘thinned’ before being returned. If thin is set to some integer value m, then only every m’th sample is used, and the remainder are ignored.

Returns

The sample as a numpy.ndarray of shape (n_samples, n_parameters).

matrix_plot(params: list[int] = None, burn: int = 0, thin: int = 1, **kwargs)

Construct a ‘matrix plot’ of the parameters (or a subset) which displays all 1D and 2D marginal distributions. See the documentation of inference.plotting.matrix_plot for a description of other allowed keyword arguments.

Parameters
  • params – A list of integers specifying the indices of parameters which are to be plotted. If not specified, all parameters are plotted.

  • burn (int) – Sets the number of samples from the beginning of the chain which are ignored when generating the plot.

  • thin (int) – Sets the factor by which the sample is ‘thinned’ before generating the plot. If thin is set to some integer value m, then only every m’th sample is used, and the remainder are ignored.

mode() ndarray

Return the sample with the current highest posterior probability.

Returns

Parameter values as a numpy.ndarray.

plot_diagnostics(show=True, filename=None)

Plot diagnostic traces that give information on how the chain is progressing.

Currently this method plots:

  • The posterior log-probability as a function of step number, which is useful for checking if the chain has reached a maximum. Any early parts of the chain where the probability is rising rapidly should be removed as burn-in.

  • The history of changes to the proposal widths for each parameter. Ideally, the proposal widths should converge, and the point in the chain where this occurs is often a good choice for the end of the burn-in. For highly-correlated pdfs, the proposal widths may never fully converge, but in these cases small fluctuations in the width values are acceptable.

Parameters
  • show (bool) – If set to True, the plot is displayed.

  • filename (str) – File path to which the diagnostics plot will be saved. If left unspecified the plot won’t be saved.

run_for(minutes=0, hours=0, days=0)

Advances the chain for a chosen amount of computation time.

Parameters
  • minutes (int) – number of minutes for which to run the chain.

  • hours (int) – number of hours for which to run the chain.

  • days (int) – number of days for which to run the chain.

trace_plot(params: list[int] = None, burn: int = 0, thin: int = 1, **kwargs)

Construct a ‘trace plot’ of the parameters (or a subset) which displays the value of the parameters as a function of step number in the chain. See the documentation of inference.plotting.trace_plot for a description of other allowed keyword arguments.

Parameters
  • params – A list of integers specifying the indices of parameters which are to be plotted. If not specified, all parameters are plotted.

  • burn (int) – Sets the number of samples from the beginning of the chain which are ignored when generating the plot.

  • thin (int) – Sets the factor by which the sample is ‘thinned’ before generating the plot. If thin is set to some integer value m, then only every m’th sample is used, and the remainder are ignored.