GaussianMixtureModel

lumicks.pylake.GaussianMixtureModel

class GaussianMixtureModel(data, n_states, init_method, n_init, tol, max_iter)

A wrapper around scikit-learn’s GMM.

This model accepts a 1D array as training data. The state parameters are sorted according to state mean in order to facilitate comparison of models with different number of states or trained on different datasets. As the current implementation is designed to specifically handle 1D data, model parameters are also returned as 1D arrays (np.squeeze() is applied to the results) so that users do not have to be concerned with the shape of the output results.

Parameters
  • data (array_like) – Data object used for model training.

  • n_states (int) – The number of Gaussian components in the model.

  • init_method ('kmeans' or 'random') – The method used to initialize parameters.

  • n_init (int) – The number of initializations to perform.

  • tol (float) – The tolerance for training convergence.

  • max_iter (int) – The maximum number of iterations to perform.

extract_dwell_times(trace, *, exclude_ambiguous_dwells=True)

Calculate lists of dwelltimes for each state in a time-ordered statepath array.

Parameters
  • trace (lumicks.pylake.channel.Slice) – Channel data to be analyzed.

  • exclude_ambiguous_dwells (bool) – Determines whether to exclude dwelltimes which are not exactly determined. If True, the first and last dwells are not used in the analysis, since the exact start/stop times of these events are not definitively known.

Returns

  • dict – Dictionary of all dwell times (in seconds) for each state. Keys are state labels.

  • dict – Dictionary of slicing indices for all dwell ranges for each state. Keys are state labels.

classmethod from_channel(slc, n_states, init_method='kmeans', n_init=1, tol=0.001, max_iter=100)

Initialize a model from channel data.

Parameters
  • slc (Slice) – Channel data used for model training.

  • n_states (int) – The number of Gaussian components in the model.

  • init_method ('kmeans' or 'random') – The method used to initialize parameters.

  • n_init (int) – The number of initializations to perform.

  • tol (float) – The tolerance for training convergence.

  • max_iter (int) – The maximum number of iterations to perform.

hist(trace, n_bins=100, plot_kwargs=None, hist_kwargs=None)

Plot a histogram of the data overlaid with the model PDF.

Parameters
  • trace (Slice-like) – Data object to histogram.

  • n_bins (int) – Number of histogram bins.

  • plot_kwargs (Optional[dict]) – Plotting keyword arguments passed to the PDF line plot.

  • hist_kwargs (Optional[dict]) – Plotting keyword arguments passed to the histogram plot.

label(trace)

Label channel trace data as states.

pdf(x)

Probability Distribution Function (states as rows).

Parameters

x (np.array) – array of independent variable values at which to calculate the PDF.

plot(trace, trace_kwargs=None, label_kwargs=None)

Plot a time trace with each data point labeled with the state assignment.

Parameters
  • trace (Slice-like) – Data object to histogram.

  • trace_kwargs (dict) – Plotting keyword arguments passed to the data line plot.

  • label_kwargs (dict) – Plotting keyword arguments passed to the state labels plot.

property aic

Akaike Information Criterion.

property bic

Bayesian Information Criterion.

property exit_flag

Model optimization information.

property means

Model state means.

property std

Model state standard deviations.

property variances

Model state variances.

property weights

Model state weights.