FdFit

lumicks.pylake.FdFit

class FdFit(*models)

Object which is used for fitting. It is a collection of models and their data. Once data is loaded, a fit object contains parameters, which can be fitted by invoking fit.

Examples

from lumicks import pylake

dna_model = pylake.ewlc_odijk_force("DNA")
fit = pylake.FdFit(dna_model)
data = fit.add_data("Dataset 1", force, distance)

fit["DNA/Lp"].lower_bound = 35  # Set lower bound for DNA Lp
fit["DNA/Lp"].upper_bound = 80  # Set upper bound for DNA Lp
fit.fit()

fit.plot("Dataset 1", "k--")  # Plot the fitted model
add_data(name, f, d, params=None)

Adds a data set to this fit.

Parameters
  • name (str) – Name of this data set.

  • f (array_like) – An array_like containing force data.

  • d (array_like) – An array_like containing distance data.

  • params (Optional[dict of {str : str or int}]) – List of parameter transformations. These can be used to convert one parameter in the model, to a new parameter name or constant for this specific data set (for more information, see the examples).

Examples

dna_model = pylake.ewlc_odijk_force("DNA")  # Use an inverted Odijk eWLC model.
fit = pylake.FdFit(dna_model)

fit.add_data("Data1", force1, distance1)  # Load the first data set like that
fit.add_data("Data2", force2, distance2, params={"DNA/Lc": "DNA/Lc_RecA"})  # Different DNA/Lc
fit(show_fit=False, **kwargs)

Fit the model

Parameters

show_fit (bool) – Show the fitting procedure as it is progressing.

Raises
  • ValueError – If the initial parameters are outside the parameter bounds.

  • RuntimeError – If this Fit has no data associated with it.

  • RuntimeError – If this Fit has no free parameters.

log_likelihood(params=None, sigma=None)

The model residual is given by chi squared = -2 log(L)

plot(data=None, fmt='', independent=None, legend=True, plot_data=True, overrides=None, **kwargs)

Plot model and data

Parameters
  • data (str) – Name of the data set to plot (optional, omission plots all for that model).

  • fmt (str) – Format string, forwarded to matplotlib.pyplot.plot().

  • independent (array_like) – Array with values for the independent variable (used when plotting the model).

  • legend (bool) – Show legend.

  • plot_data (bool) – Show data.

  • overrides (dict) – Parameter / value pairs which override parameter values in the current fit. Should be a dict of {str: float} that provides values for parameters which should be set to particular values in the plot.

  • **kwargs – Forwarded to matplotlib.pyplot.plot().

Raises
  • KeyError – If the argument data is supplied but no dataset with that name is found in the Fit.

  • RuntimeError – If the Fit uses multiple models, but no Model is selected beforehand.

Examples

from lumicks import pylake

model = pylake.ewlc_odijk_force("DNA")
fit = pylake.FdFit(model)
fit.add_data("Control", force, distance)
fit.fit()

# Basic plotting of one data set over a custom range can be done by just invoking plot.
fit.plot("Control", 'k--', np.arange(2.0, 5.0, 0.01))

# Have a quick look at what a stiffness of 5 would do to the fit.
fit.plot("Control", overrides={"DNA/St": 5})

# When dealing with multiple models in one fit, one has to select the model first when
# we want to plot.
model1 = pylake.ewlc_odijk_distance("DNA")
model2 = pylake.ewlc_odijk_distance("DNA") + pylake.ewlc_odijk_distance("protein")
fit[model1].add_data("Control", force1, distance2)
fit[model2].add_data("Control", force1, distance2)
fit.fit()

fit = pylake.FdFit(model1, model2)
fit[model1].plot("Control")  # Plots data set Control for model 1
fit[model2].plot("Control")  # Plots data set Control for model 2
profile_likelihood(parameter_name, min_step=0.0001, max_step=1.0, num_steps=100, step_factor=2.0, min_chi2_step=0.05, max_chi2_step=0.25, termination_significance=0.99, confidence_level=0.95, verbose=False)

Calculate a profile likelihood. This method traces an optimal path through parameter space in order to estimate parameter confidence intervals. It iteratively performs a step for the profiled parameter, then fixes that parameter and re-optimizes all the other parameters 4 5.

Parameters
  • parameter_name (str) – Which parameter to evaluate a profile likelihood for.

  • min_step (float) – Minimum step size. This is multiplied by the current parameter value to come to a minimum step size used in the step-size estimation procedure.

  • max_step (float) – Maximum step size.

  • num_steps (integer) – Number of steps to take .

  • step_factor (float) – Which factor to change the step-size by when step-size is too large or too small.

  • min_chi2_step (float) – Minimal desired step in terms of chi squared change prior to re-optimization. When the step results in a fit change smaller than this threshold, the step-size will be increased.

  • max_chi2_step (float) – Minimal desired step in terms of chi squared change prior to re-optimization. When the step results in a fit change bigger than this threshold, the step-size will be reduced.

  • termination_significance (float) – Significance level for terminating the parameter scan. When the fit quality exceeds the termination_significance confidence level, it stops scanning.

  • confidence_level (float) – Significance level for the chi squared test.

  • verbose (bool) – Controls the verbosity of the output.

Raises
  • KeyError – If parameter_name is not present in the Fit.

  • RuntimeError – If parameter_name is a fixed parameter in the Fit.

  • ValueError – If max_step < min_step.

  • ValueError – If max_chi2_step < min_chi2_step.

References

4

Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmüller, U., & Timmer, J. (2009). Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 25(15), 1923-1929.

5

Maiwald, T., Hass, H., Steiert, B., Vanlier, J., Engesser, R., Raue, A., Kipkeew, F., Bock, H.H., Kaschek, D., Kreutz, C. and Timmer, J., 2016. Driving the model to its limit: profile likelihood based model reduction. PloS one, 11(9).

update_params(other)

Sets parameters if they are found in the target fit.

Parameters

other (Fit or Params) –

property aic

Calculates the Akaike Information Criterion:

\[AIC = 2 k - 2 \ln{(L)}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function 6.

The emphasis of this criterion is future prediction. It does not lead to consistent model selection and is more prone to over-fitting than the Bayesian Information Criterion.

References

6

Cavanaugh, J.E., 1997. Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters, 33(2), pp.201-208.

property aicc

Calculates the Corrected Akaike Information Criterion:

\[AICc = AIC + \frac{2 k^2 + 2 k}{n - k - 1}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function 7.

The emphasis of this criterion is future prediction. Compared to the AIC it should be less prone to overfitting for smaller sample sizes. Analogously to the AIC, it does not lead to a consistent model selection procedure.

References

7

Cavanaugh, J.E., 1997. Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters, 33(2), pp.201-208.

property bic

Calculates the Bayesian Information Criterion:

\[BIC = k \ln{(n)} - 2 \ln{(L)}\]

Where k refers to the number of parameters, n to the number of observations (or data points) and L to the maximized value of the likelihood function

The emphasis of the BIC is put on parsimonious models. As such it is less prone to over-fitting. Selection via BIC leads to a consistent model selection procedure, meaning that as the number of data points tends to infinity, BIC will select the true model assuming the true model is in the set of selected models.

property cov

Returns the inverse of the approximate Hessian. This approximation is valid when the model fits well (small residuals) and there is sufficient data to assume we’re in the asymptotic regime.

It makes use of the Gauss-Newton approximation of the Hessian, which uses only the first order sensitivity information. This is valid for linear problems and problems near the optimum (assuming the model fits) 8 9.

References

8

Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P., 1988. Numerical recipes in C.

9

Maiwald, T., Hass, H., Steiert, B., Vanlier, J., Engesser, R., Raue, A., Kipkeew, F., Bock, H.H., Kaschek, D., Kreutz, C. and Timmer, J., 2016. Driving the model to its limit: profile likelihood based model reduction. PloS one, 11(9).

property dirty

Validate that all the Datasets that we are about the fit were actually linked.

property has_jacobian

Returns true if it is possible to evaluate the Jacobian of the fit.

property n_params

Number of parameters in the Fit

property n_residuals

Number of data points.

property params

Fit parameters. See also pylake.fitting.parameters.Params

property sigma

Error variance of the data points.