GaussianMixtureModel

lumicks.pylake.GaussianMixtureModel

class GaussianMixtureModel(data, n_states, init_method, n_init, tol, max_iter)

A wrapper around sklearn.mixture.GaussianMixture.

This model accepts a 1D array as training data. The state parameters are sorted according to state mean in order to facilitate comparison of models with different number of states or trained on different datasets. As the current implementation is designed to specifically handle 1D data, model parameters are also returned as 1D arrays (numpy.squeeze() is applied to the results) so that users do not have to be concerned with the shape of the output results.

Warning

This is early access alpha functionality. While usable, this has not yet been tested in a large number of different scenarios. The API can still be subject to change without any prior deprecation notice! If you use this functionality keep a close eye on the changelog for any changes that may affect your analysis.

Parameters
  • data (numpy.ndarray) – Data array used for model training.

  • n_states (int) – The number of Gaussian components in the model.

  • init_method ({'kmeans', 'random'}) –

    • “kmeans” : parameters are initialized via k-means algorithm

    • ”random” : parameters are initialized randomly

  • n_init (int) – The number of initializations to perform.

  • tol (float) – The tolerance for training convergence.

  • max_iter (int) – The maximum number of iterations to perform.

extract_dwell_times(trace, *, exclude_ambiguous_dwells=True)

Calculate lists of dwelltimes for each state in a time-ordered statepath array.

Parameters
  • trace (Slice) – Channel data to be analyzed.

  • exclude_ambiguous_dwells (bool) – Determines whether to exclude dwelltimes which are not exactly determined. If True, the first and last dwells are not used in the analysis, since the exact start/stop times of these events are not definitively known.

Returns

Dictionary of all dwell times (in seconds) for each state. Keys are state labels.

Return type

dict

classmethod from_channel(slc, n_states, init_method='kmeans', n_init=1, tol=0.001, max_iter=100)

Initialize a model from channel data.

Parameters
  • slc (Slice) – Channel data used for model training.

  • n_states (int) – The number of Gaussian components in the model.

  • init_method ({'kmeans', 'random'}) –

    • “kmeans” : parameters are initialized via k-means algorithm

    • ”random” : parameters are initialized randomly

  • n_init (int) – The number of initializations to perform.

  • tol (float) – The tolerance for training convergence.

  • max_iter (int) – The maximum number of iterations to perform.

hist(trace, n_bins=100, plot_kwargs=None, hist_kwargs=None)

Plot a histogram of the data overlaid with the model PDF.

Parameters
  • trace (Slice) – Data object to histogram.

  • n_bins (int) – Number of histogram bins.

  • plot_kwargs (Optional[dict]) – Plotting keyword arguments passed to the PDF line plot.

  • hist_kwargs (Optional[dict]) – Plotting keyword arguments passed to the histogram plot.

label(trace)

Label channel data as states.

Parameters

trace (Slice) – Channel data to label.

pdf(x)

Calculate the Probability Distribution Function (PDF) given the independent data array x.

Parameters

x (numpy.ndarray) – Array of independent variable values at which to calculate the PDF.

Returns

PDF array split into components for each state with shape (n_states, x.size). The full normalized PDF can be calculated by summing across rows.

Return type

numpy.ndarray

plot(trace, trace_kwargs=None, label_kwargs=None)

Plot a time trace with each data point labeled with the state assignment.

Parameters
  • trace (Slice) – Data object to histogram.

  • trace_kwargs (Optional[dict]) – Plotting keyword arguments passed to the data line plot.

  • label_kwargs (Optional[dict]) – Plotting keyword arguments passed to the state labels plot.

property aic: float

Calculates the Akaike Information Criterion:

\[AIC = 2 k - 2 \ln{(L)}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function.

property bic: float

Calculates the Bayesian Information Criterion:

\[BIC = k \ln{(n)} - 2 \ln{(L)}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function

property exit_flag: dict

Model optimization information.

property means: numpy.ndarray

Model state means.

property std: numpy.ndarray

Model state standard deviations.

property variances: numpy.ndarray

Model state variances.

property weights: numpy.ndarray

Model state weights.