GaussianMixtureModel¶
lumicks.pylake.GaussianMixtureModel
- class GaussianMixtureModel(data, n_states, init_method, n_init, tol, max_iter)¶
A wrapper around scikit-learn’s GMM.
This model accepts a 1D array as training data. The state parameters are sorted according to state mean in order to facilitate comparison of models with different number of states or trained on different datasets. As the current implementation is designed to specifically handle 1D data, model parameters are also returned as 1D arrays (np.squeeze() is applied to the results) so that users do not have to be concerned with the shape of the output results.
- Parameters
data (array_like) – Data object used for model training.
n_states (int) – The number of Gaussian components in the model.
init_method ('kmeans' or 'random') – The method used to initialize parameters.
n_init (int) – The number of initializations to perform.
tol (float) – The tolerance for training convergence.
max_iter (int) – The maximum number of iterations to perform.
- extract_dwell_times(trace, *, exclude_ambiguous_dwells=True)¶
Calculate lists of dwelltimes for each state in a time-ordered statepath array.
- Parameters
trace (lumicks.pylake.channel.Slice) – Channel data to be analyzed.
exclude_ambiguous_dwells (bool) – Determines whether to exclude dwelltimes which are not exactly determined. If
True
, the first and last dwells are not used in the analysis, since the exact start/stop times of these events are not definitively known.
- Returns
dict – Dictionary of all dwell times (in seconds) for each state. Keys are state labels.
dict – Dictionary of slicing indices for all dwell ranges for each state. Keys are state labels.
- classmethod from_channel(slc, n_states, init_method='kmeans', n_init=1, tol=0.001, max_iter=100)¶
Initialize a model from channel data.
- Parameters
slc (Slice) – Channel data used for model training.
n_states (int) – The number of Gaussian components in the model.
init_method ('kmeans' or 'random') – The method used to initialize parameters.
n_init (int) – The number of initializations to perform.
tol (float) – The tolerance for training convergence.
max_iter (int) – The maximum number of iterations to perform.
- hist(trace, n_bins=100, plot_kwargs=None, hist_kwargs=None)¶
Plot a histogram of the data overlaid with the model PDF.
- label(trace)¶
Label channel trace data as states.
- pdf(x)¶
Probability Distribution Function (states as rows).
- Parameters
x (np.array) – array of independent variable values at which to calculate the PDF.
- plot(trace, trace_kwargs=None, label_kwargs=None)¶
Plot a time trace with each data point labeled with the state assignment.
- property aic¶
Akaike Information Criterion.
- property bic¶
Bayesian Information Criterion.
- property exit_flag¶
Model optimization information.
- property means¶
Model state means.
- property std¶
Model state standard deviations.
- property variances¶
Model state variances.
- property weights¶
Model state weights.