PcaChain
- class inference.mcmc.PcaChain(*args, bounds=None, **kwargs)
A class which performs Gibbs sampling over the eigenvectors of the covariance matrix.
The PcaChain sampler uses ‘principal component analysis’ (PCA) to improve the performance of Gibbs sampling in cases where strong linear correlation exists between two or more variables in a problem.
For an N-parameter problem, PcaChain produces a new sample by making N sequential 1D Metropolis-Hastings steps in the direction of each of the N eigenvectors of the NxN covariance matrix.
As an initial guess the covariance matrix is taken to be diagonal, which results in standard gibbs sampling for the first samples in the chain. As the chain advances, the covariance matrix is periodically updated with an estimate derived from the sample itself, and the eigenvectors are re-calculated.
- Parameters
posterior (func) – A function which takes the vector of model parameters as a
numpy.ndarray
, and returns the posterior log-probability.start – Vector of model parameters which correspond to the parameter-space coordinates at which the chain will start.
widths – Vector of standard deviations which serve as initial guesses for the widths of the proposal distribution for each model parameter. If not specified, the starting widths will be approximated as 5% of the values in ‘start’.
bounds – An instance of the
inference.mcmc.Bounds
class, or a sequence of twonumpy.ndarray
specifying the lower and upper bounds for the parameters in the form(lower_bounds, upper_bounds)
.display_progress (bool) – If set as
True
, a message is displayed during sampling showing the current progress and an estimated time until completion.
- advance(m: int)
Advances the chain by taking
m
new steps.- Parameters
m (int) – Number of steps the chain will advance.
- get_interval(interval=0.95, burn: int = 1, thin: int = 1, samples=None)
Return the samples from the chain which lie inside a chosen highest-density interval.
- Parameters
interval (float) – Total probability of the desired interval. For example, if interval = 0.95, then the samples corresponding to the top 95% of posterior probability values are returned.
burn (int) – Number of samples to discard from the start of the chain.
thin (int) – Instead of returning every sample which is not discarded as part of the burn-in, every m’th sample is returned for a specified integer m.
samples (int) – The number of samples that should be returned from the requested interval. Note that specifying
samples
overrides the value ofthin
.
- Returns
List containing sample points stored as tuples, and a corresponding list of log-probability values.
- get_marginal(index: int, burn: int = 1, thin: int = 1, unimodal=False) DensityEstimator
Estimate the 1D marginal distribution of a chosen parameter.
- Parameters
index (int) – Index of the parameter for which the marginal distribution is to be estimated.
burn (int) – Number of samples to discard from the start of the chain.
thin (int) – Rather than using every sample which is not discarded as part of the burn-in, every m’th sample is used for a specified integer m.
unimodal (bool) – Selects the type of density estimation to be used. The default value is False, which causes a GaussianKDE object to be returned. If however the marginal distribution being estimated is known to be unimodal, setting
unimodal = True
will result in theUnimodalPdf
class being used to estimate the density.
- Returns
An instance of a
DensityEstimator
from theinference.pdf
module which represents the 1D marginal distribution of the chosen parameter. If theunimodal
argument isFalse
aGaussianKDE
instance is returned, else ifTrue
aUnimodalPdf
instance is returned.
- get_parameter(index: int, burn: int = 1, thin: int = 1) ndarray
Return sample values for a chosen parameter.
- Parameters
index (int) – Index of the parameter for which samples are to be returned.
burn (int) – Number of samples to discard from the start of the chain.
thin (int) – Instead of returning every sample which is not discarded as part of the burn-in, every m’th sample is returned for a specified integer m.
- Returns
Samples for the parameter specified by
index
as anumpy.ndarray
.
- get_sample(burn: int = 1, thin: int = 1) ndarray
Return the sample as a 2D
numpy.ndarray
.- Parameters
burn (int) – Number of steps from the start of the chain which are ignored.
thin (int) – Sets the factor by which the sample is ‘thinned’ before being returned. If
thin
is set to some integer value m, then only every m’th sample is used, and the remainder are ignored.
- Returns
The sample as a
numpy.ndarray
of shape(n_samples, n_parameters)
.
- matrix_plot(params: list[int] = None, burn: int = 0, thin: int = 1, **kwargs)
Construct a ‘matrix plot’ of the parameters (or a subset) which displays all 1D and 2D marginal distributions. See the documentation of
inference.plotting.matrix_plot
for a description of other allowed keyword arguments.- Parameters
params – A list of integers specifying the indices of parameters which are to be plotted. If not specified, all parameters are plotted.
burn (int) – Sets the number of samples from the beginning of the chain which are ignored when generating the plot.
thin (int) – Sets the factor by which the sample is ‘thinned’ before generating the plot. If
thin
is set to some integer value m, then only every m’th sample is used, and the remainder are ignored.
- mode() ndarray
Return the sample with the current highest posterior probability.
- Returns
Parameter values as a
numpy.ndarray
.
- plot_diagnostics(show=True, filename=None)
Plot diagnostic traces that give information on how the chain is progressing.
Currently this method plots:
The posterior log-probability as a function of step number, which is useful for checking if the chain has reached a maximum. Any early parts of the chain where the probability is rising rapidly should be removed as burn-in.
The history of changes to the proposal widths for each parameter. Ideally, the proposal widths should converge, and the point in the chain where this occurs is often a good choice for the end of the burn-in. For highly-correlated pdfs, the proposal widths may never fully converge, but in these cases small fluctuations in the width values are acceptable.
- Parameters
show (bool) – If set to True, the plot is displayed.
filename (str) – File path to which the diagnostics plot will be saved. If left unspecified the plot won’t be saved.
- run_for(minutes=0, hours=0, days=0)
Advances the chain for a chosen amount of computation time.
- Parameters
minutes (int) – number of minutes for which to run the chain.
hours (int) – number of hours for which to run the chain.
days (int) – number of days for which to run the chain.
- trace_plot(params: list[int] = None, burn: int = 0, thin: int = 1, **kwargs)
Construct a ‘trace plot’ of the parameters (or a subset) which displays the value of the parameters as a function of step number in the chain. See the documentation of
inference.plotting.trace_plot
for a description of other allowed keyword arguments.- Parameters
params – A list of integers specifying the indices of parameters which are to be plotted. If not specified, all parameters are plotted.
burn (int) – Sets the number of samples from the beginning of the chain which are ignored when generating the plot.
thin (int) – Sets the factor by which the sample is ‘thinned’ before generating the plot. If
thin
is set to some integer value m, then only every m’th sample is used, and the remainder are ignored.