Density estimation and sample analysis
The inference.pdf
module provides tools for analysing sample data, including density
estimation and highest-density interval calculation. Example code for GaussianKDE
and UnimodalPdf
can be found in the density estimation jupyter notebook demo.
GaussianKDE
- class inference.pdf.GaussianKDE(sample: ndarray, bandwidth=None, cross_validation=False, max_cv_samples=5000)
Construct a GaussianKDE object, which can be called as a function to return the estimated PDF of the given sample.
GaussianKDE uses Gaussian kernel-density estimation to estimate the PDF associated with a given sample.
- Parameters
sample – 1D array of samples from which to estimate the probability distribution
bandwidth (float) – Width of the Gaussian kernels used for the estimate. If not specified, an appropriate width is estimated based on sample data.
cross_validation (bool) – Indicate whether cross-validation should be used to estimate the bandwidth in place of the simple ‘rule of thumb’ estimate which is normally used.
max_cv_samples (int) – The maximum number of samples to be used when estimating the bandwidth via cross-validation. The computational cost scales roughly quadratically with the number of samples used, and can become prohibitive for samples of size in the tens of thousands and up. Instead, if the sample size is greater than max_cv_samples, the cross-validation is performed on a sub-sample of this size.
- __call__(x: ndarray) ndarray
Evaluate the estimate of the probability distribution function (PDF) at the given parameter values.
- Parameters
x – axis location(s) at which to evaluate the estimate.
- Returns
values of the PDF estimate at the specified locations.
- interval(fraction: float) tuple[float, float]
Calculates the ‘highest-density interval’, the shortest single interval which contains a chosen fraction of the total probability.
- Parameters
fraction – Fraction of the total probability contained by the interval. The given value must be between 0 and 1.
- Returns
A tuple of the lower and upper limits of the highest-density interval in the form
(lower_limit, upper_limit)
.
- plot_summary(filename=None, show=True, label=None)
Plot the estimated PDF along with summary statistics.
- Parameters
filename (str) – Filename to which the plot will be saved. If unspecified, the plot will not be saved.
show (bool) – Boolean value indicating whether the plot should be displayed in a window. (Default is True)
label (str) – The label to be used for the x-axis on the plot as a string.
UnimodalPdf
- class inference.pdf.UnimodalPdf(sample: ndarray)
Construct a UnimodalPdf object, which can be called as a function to return the estimated PDF of the given sample.
The UnimodalPdf class is designed to robustly estimate univariate, unimodal probability distributions given a sample drawn from that distribution. This is a parametric method based on a heavily modified student-t distribution, which is extremely flexible.
- Parameters
sample – 1D array of samples from which to estimate the probability distribution.
- __call__(x: ndarray) ndarray
Evaluate the PDF estimate at a set of given axis positions.
- Parameters
x – axis location(s) at which to evaluate the estimate.
- Returns
values of the PDF estimate at the specified locations.
- interval(fraction: float) tuple[float, float]
Calculates the ‘highest-density interval’, the shortest single interval which contains a chosen fraction of the total probability.
- Parameters
fraction – Fraction of the total probability contained by the interval. The given value must be between 0 and 1.
- Returns
A tuple of the lower and upper limits of the highest-density interval in the form
(lower_limit, upper_limit)
.
- plot_summary(filename=None, show=True, label=None)
Plot the estimated PDF along with summary statistics.
- Parameters
filename (str) – Filename to which the plot will be saved. If unspecified, the plot will not be saved.
show (bool) – Boolean value indicating whether the plot should be displayed in a window. (Default is True)
label (str) – The label to be used for the x-axis on the plot as a string.
sample_hdi
- inference.pdf.sample_hdi(sample: ndarray, fraction: float, allow_double=False)
Estimate the highest-density interval(s) for a given sample.
This function computes the shortest possible interval which contains a chosen fraction of the elements in the given sample.
- Parameters
sample – A sample for which the interval will be determined.
fraction (float) – The fraction of the total probability to be contained by the interval.
allow_double (bool) – When set to True, a double-interval is returned instead if one exists whose total length is meaningfully shorter than the optimal single interval.
- Returns
Tuple(s) specifying the lower and upper bounds of the highest-density interval(s).