SciPy curve fitting

In this example we start from a model function and generate artificial data with the help of the Numpy random number generator. We then fit the data to the same model function. Our model function is

\begin{displaymath}
N(t) = N_0 \exp(-t/\tau)   .
\end{displaymath} (1)

The Python model function is then defined this way:

import numpy as np
def f(t,N0,tau):
    return N0*np.exp(-t/tau)

The function arguments must give the independent variable first (in this case $t$), followed by the parameters that will be adjusted for the best fit.

Next we want to create an artificial data set based on this function with, let's say, $N_0 = 1000$ and $\tau = 100$. Let's make the dataset cover five lifetimes $T = 500$ starting at $t = 0$ with “observations” every $dt = 10$. So first we create a set of observation times:

T = 500
dt = 10
nobs = int(T/dt + 1.1)
t = dt*np.arange(nobs)

Although $T$ and $dt$ are integers, we want to allow any real number, so we need to convert the ratio to an integer number of observations. The function int does that by truncating anything after the decimal. Note that with our chosen values of $T$ and $dt$ we should have 51 points, counting $t = 0$. If we simply divide $T/dt$ and convert it to an integer we might be converting 49.999 to an integer and getting 49 when we want 51. So we add 1.1 before converting. That way we surely cover the desired time range. Then we create exactly 51 values with the last command.

Next we need a set of “observations”. These are the function values at the observation times plus a random Gaussian fluctuation. First the function values:

N0 = 1000
tau = 100
N = f(t,N0,tau)

Then we simulate Gaussian normal fluctuations in the function values. Let's take stdev = 10 to be the standard deviation of these fluctuations and then add it to $N$. Here is how we do it.

stdev = 10
Nfluct = stdev*np.random.normal(size=nobs)
N = N + Nfluct

The function call np.random.normal(size=nobs) returns nobs random numbers drawn from a Gaussian distribution with mean zero and standard deviation 1. If we multiply it by 10 the standard deviation of the product becomes 10. When we add it to $N$, the mean value is shifted to $N$, the result we want.

Next, we need an array with the standard deviation values (errors) for each observation. Since our artificial dataset draws its deviates from a distribution with standard devation 10 for each point, our array should be all 10s.

sig = np.zeros(nobs) + stdev

Now we have almost all the ingredients needed to fit the model function to our artificial dataset. The last thing we need is a starting value for the two fit parameters, $N_0$ and $tau$. Let's pick values that are not too far from the artificial values, namely

start = (1100, 90)

The order of the starting parameter values is important. It must match the order of the arguments of the model function. Finally, we can call the ${\bf\verb\vert curve_fit\vert}$ procedure:

from scipi.optimize import curve_fit
popt, pcov = curve_fit(f, t, N, sigma=sig, p0=start, absolute_sigma=True)

The argument absolute_sigma=True is necessary. It says the values in sig are all literally the standard deviations and not just relative weights for the data points. With this option the resulting chi square can be used to determine goodness of fit.

The return value popt contains the best-fit values of the parameters. The return value pcov contains the covariance (error) matrix for the fit parameters. From them we can determine the standard deviations of the parameters, just as we did for linear least chi square. (The standard deviations are the square roots of the diagonal values.) We can also determine the correlation between the fit parameters.

Finally, we would like to know the chi square of the fit. This is

\begin{displaymath}
\chi^2 = \sum_i \frac{[N(t) - N_{exp}(t)]^2}{\sigma^2}
\end{displaymath} (2)

where
\begin{displaymath}
N_{exp}(t) = f(t,{\rm popt})
\end{displaymath} (3)

are the model function values obtained with the best fit values popt of the parameters. Here is the Python script:

Nexp = f(t, *popt)
r = N - Nexp
chisq = np.sum((r/stdev)**2)
df = nobs - 2
print("chisq =",chisq,"df =",df)

The star in *popt unpacks the popt array so the two optimized parameter values become the second and third arguments to the function.

Here is the complete code, including Pyplot code for plotting the data with error bars, along side the fit curve.



#! /usr/bin/python3

import numpy as np
from scipy.optimize import curve_fit

###########################################a                                    
def f(t,N0,tau):
    """The model function"""
    return N0*np.exp(-t/tau)

###########################################a                                    
def main():

    # Model parameters
    T = 500
    dt = 10
    N0 = 1000
    tau = 100
    stdev = 20

    # Create the artificial dataset
    nobs = int(T/dt + 1.5)
    t = dt*np.arange(nobs)
    N = f(t,N0,tau)
    Nfluct = stdev*np.random.normal(size=nobs)
    N = N + Nfluct
    sig = np.zeros(nobs) + stdev

    # Fit the curve
    start = (1100, 90)
    popt, pcov = curve_fit(f,t,N,sigma = sig,p0 = start,absolute_sigma=True)
    print(popt)
    print(pcov)

    # Compute chi square
    Nexp = f(t, *popt)
    r = N - Nexp
    chisq = np.sum((r/stdev)**2)
    df = nobs - 2
    print(“chisq =”,chisq,”df =”,df)

    # Plot the data with error bars along with the fit result
    import matplotlib.pyplot as plt
    plt.errorbar(t, N, yerr=sig, fmt = 'o', label='"data"')
    plt.plot(t, Nexp, label='fit')
    plt.legend()
    plt.show()

###########################################a                                    
main()