GaussianMixtureModel¶

lumicks.pylake.GaussianMixtureModel

class GaussianMixtureModel(data, n_states, init_method, n_init, tol, max_iter)¶

A wrapper around sklearn.mixture.GaussianMixture.

This model accepts a 1D array as training data. The state parameters are sorted according to state mean in order to facilitate comparison of models with different number of states or trained on different datasets. As the current implementation is designed to specifically handle 1D data, model parameters are also returned as 1D arrays (numpy.squeeze() is applied to the results) so that users do not have to be concerned with the shape of the output results.

Warning

This is early access alpha functionality. While usable, this has not yet been tested in a large number of different scenarios. The API can still be subject to change without any prior deprecation notice! If you use this functionality keep a close eye on the changelog for any changes that may affect your analysis.

Parameters

data (numpy.ndarray) – Data array used for model training.
n_states (int) – The number of Gaussian components in the model.
init_method ({'kmeans', 'random'}) –
- “kmeans” : parameters are initialized via k-means algorithm
- ”random” : parameters are initialized randomly
n_init (int) – The number of initializations to perform.
tol (float) – The tolerance for training convergence.
max_iter (int) – The maximum number of iterations to perform.

extract_dwell_times(trace, *, exclude_ambiguous_dwells=True)¶

Calculate lists of dwelltimes for each state in a time-ordered statepath array.

Parameters

trace (Slice) – Channel data to be analyzed.
exclude_ambiguous_dwells (bool) – Determines whether to exclude dwelltimes which are not exactly determined. If True, the first and last dwells are not used in the analysis, since the exact start/stop times of these events are not definitively known.

Returns

Dictionary of all dwell times (in seconds) for each state. Keys are state labels.

Return type

dict

classmethod from_channel(slc, n_states, init_method='kmeans', n_init=1, tol=0.001, max_iter=100)¶

Initialize a model from channel data.

Parameters

slc (Slice) – Channel data used for model training.
n_states (int) – The number of Gaussian components in the model.
init_method ({'kmeans', 'random'}) –
- “kmeans” : parameters are initialized via k-means algorithm
- ”random” : parameters are initialized randomly
n_init (int) – The number of initializations to perform.
tol (float) – The tolerance for training convergence.
max_iter (int) – The maximum number of iterations to perform.

hist(trace, n_bins=100, plot_kwargs=None, hist_kwargs=None)¶

Plot a histogram of the data overlaid with the model PDF.

Parameters

trace (Slice) – Data object to histogram.
n_bins (int) – Number of histogram bins.
plot_kwargs (Optional[dict]) – Plotting keyword arguments passed to the PDF line plot.
hist_kwargs (Optional[dict]) – Plotting keyword arguments passed to the histogram plot.

label(trace)¶

Label channel data as states.

Parameters: trace (Slice) – Channel data to label.

pdf(x)¶

Calculate the Probability Distribution Function (PDF) given the independent data array x.

Parameters: x (numpy.ndarray) – Array of independent variable values at which to calculate the PDF.
Returns: PDF array split into components for each state with shape (n_states, x.size). The full normalized PDF can be calculated by summing across rows.
Return type: numpy.ndarray

plot(trace, trace_kwargs=None, label_kwargs=None)¶

Plot a time trace with each data point labeled with the state assignment.

Parameters

trace (Slice) – Data object to histogram.
trace_kwargs (Optional[dict]) – Plotting keyword arguments passed to the data line plot.
label_kwargs (Optional[dict]) – Plotting keyword arguments passed to the state labels plot.

property aic: float¶

Calculates the Akaike Information Criterion:

\[AIC = 2 k - 2 \ln{(L)}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function.

property bic: float¶

Calculates the Bayesian Information Criterion:

\[BIC = k \ln{(n)} - 2 \ln{(L)}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function

property exit_flag: dict¶: Model optimization information.

property means: numpy.ndarray¶: Model state means.

property std: numpy.ndarray¶: Model state standard deviations.

property variances: numpy.ndarray¶: Model state variances.

property weights: numpy.ndarray¶: Model state weights.