Density estimation and sample analysis
The inference.pdf
module provides tools for analysing sample data, including density
estimation and highest-density interval calculation. Example code for GaussianKDE
and UnimodalPdf
can be found in the density estimation jupyter notebook demo.
GaussianKDE
- class inference.pdf.GaussianKDE(sample, bandwidth=None, cross_validation=False, max_cv_samples=5000)
Construct a GaussianKDE object, which can be called as a function to return the estimated PDF of the given sample.
GaussianKDE uses Gaussian kernel-density estimation to estimate the PDF associated with a given sample.
- Parameters
sample – 1D array of samples from which to estimate the probability distribution
bandwidth (float) – Width of the Gaussian kernels used for the estimate. If not specified, an appropriate width is estimated based on sample data.
cross_validation (bool) – Indicate whether or not cross-validation should be used to estimate the bandwidth in place of the simple ‘rule of thumb’ estimate which is normally used.
max_cv_samples (int) – The maximum number of samples to be used when estimating the bandwidth via cross-validation. The computational cost scales roughly quadratically with the number of samples used, and can become prohibitive for samples of size in the tens of thousands and up. Instead, if the sample size is greater than max_cv_samples, the cross-validation is performed on a sub-sample of this size.
- __call__(x_vals)
Evaluate the PDF estimate at a set of given axis positions.
- Parameters
x_vals – axis location(s) at which to evaluate the estimate.
- Returns
values of the PDF estimate at the specified locations.
- interval(frac=0.95)
Calculate the highest-density interval(s) which contain a given fraction of total probability.
- Parameters
frac (float) – Fraction of total probability contained by the desired interval(s).
- Returns
A list of tuples which specify the intervals.
- mode
The mode of the pdf, calculated automatically when an instance of GaussianKDE is created.
- plot_summary(filename=None, show=True, label=None)
Plot the estimated PDF along with summary statistics.
- Parameters
filename (str) – Filename to which the plot will be saved. If unspecified, the plot will not be saved.
show (bool) – Boolean value indicating whether the plot should be displayed in a window. (Default is True)
label (str) – The label to be used for the x-axis on the plot as a string.
UnimodalPdf
- class inference.pdf.UnimodalPdf(sample)
Construct a UnimodalPdf object, which can be called as a function to return the estimated PDF of the given sample.
The UnimodalPdf class is designed to robustly estimate univariate, unimodal probability distributions given a sample drawn from that distribution. This is a parametric method based on an heavily modified student-t distribution, which is extremely flexible.
- Parameters
sample – 1D array of samples from which to estimate the probability distribution
- __call__(x)
Evaluate the PDF estimate at a set of given axis positions.
- Parameters
x – axis location(s) at which to evaluate the estimate.
- Returns
values of the PDF estimate at the specified locations.
- plot_summary(filename=None, show=True, label=None)
Plot the estimated PDF along with summary statistics.
- Parameters
filename (str) – Filename to which the plot will be saved. If unspecified, the plot will not be saved.
show (bool) – Boolean value indicating whether the plot should be displayed in a window. (Default is True)
label (str) – The label to be used for the x-axis on the plot as a string.
sample_hdi
- inference.pdf.sample_hdi(sample: ndarray, fraction: float, allow_double=False)
Estimate the highest-density interval(s) for a given sample.
This function computes the shortest possible interval which contains a chosen fraction of the elements in the given sample.
- Parameters
sample – A sample for which the interval will be determined.
fraction (float) – The fraction of the total probability to be contained by the interval.
allow_double (bool) – When set to True, a double-interval is returned instead if one exists whose total length is meaningfully shorter than the optimal single interval.
- Returns
Tuple(s) specifying the lower and upper bounds of the highest-density interval(s).