next up previous
Next: Goodness of Fit Up: curve_fit Previous: Linear Least Squares

Maximum Likelihood and Chi Square

Although the least squares method gives us the best estimate of the parameters $m$ and $b$, it is also very important to know how well determined these best values are. In other words, if we repeated the experiment many times with the same conditions, what range of values of these parameters would we get? To answer this question, we use a maximum likelihood method.

We start by assuming a probability distribution for the entire set of measurements ${y_1, y_2, \ldots, y_N}$. We assume that

\begin{displaymath}
P(y_1,y_2,\ldots,y_N) = \prod_{i=1}^{N}
\frac{e^{-(y_i-\bar y_i)^2/2\sigma_i^2}}{\sqrt{2\pi}\sigma_i}.
\end{displaymath} (18)

is the probability that one experiment results in the set of values ${y_1, y_2, \ldots, y_N}$. Notice that this probability is just the product of $N$ Gaussian normal distributions for each value of $y_i$, with mean value $\bar y_i$ and error $\sigma_i$. In writing the joint probability in this way, we are assuming that the outcome $y_i$ is not correlated with the outcome $y_j$ for $i \neq j$. A more complicated expression would be needed if correlations were present. For now, we take this expression as the simplest choice.

The idea of maximum likelihood is to replace the ideal mean values $\bar y_i$ with the theoretically ``expected'' values $y_{i,exp} =
mx_i + b$ predicted by the linear function. The probability distribution then becomes a model conditional probability $P(y_1,y_2,\ldots,y_N\vert m,b)$. That is, assuming that the slope and intercept are $m$ and $b$, it gives the probability for getting the result $(y_1,y_2,\ldots,y_N)$ in a single measurement. But then we use the power of Bayes's theorem to turn it around and reinterpret it as the probability $P(m,b\vert y_1,y_2,\ldots,y_N)$ that, given the experimental result, the linear relationship is given by the parameters $m$ and $b$. Dropping the list of data points, we write this probability as

\begin{displaymath}
P(m,b) \propto \exp[-\chi^2(m,b)/2]
\end{displaymath} (19)

where
\begin{displaymath}
\chi^2(m,b) = \sum_{i=1}^{N}\frac{(y_i-\bar y_{i,exp})^2}{\sigma_i^2}
\end{displaymath} (20)

(The ``proportional to'' symbol $\propto$ is there because our reinterpretation of the probability in terms of $m$ and $b$ requires us to reevaluate the normalization.)

The probability $P(m,b)$ is called the likelihood function for the parameter values $m$ and $b$. We want to find the values $m^*$ and $b^*$ that are most probable, i.e. maximize the likelihood function. Clearly this condition is equivalent to requiring that we minimize $\chi ^2$, and leads to the result discussed in the first section. But now we also have a way to estimate the reliability of our determination of the best values $m^*$ and $b^*$.

From the expressions (17) we see that $P(m,b)
\propto \exp[-\chi^2(m,b)/2]$ is similar to a normal distribution in the variables $m$ and $b$, except that instead of one variable, we have two. Instead of a simple quadratic in the exponent we have a quadratic form in the exponent. Once we have realized this, we can use standard results to estimate the error in the best fit values.

The standard deviation in the parameter $m$ is determined from the formula

\begin{displaymath}
\sigma_m^2 = \int_{-\infty}^\infty dm   db   (m - m^*)^2 P(m,b)     .
\end{displaymath} (21)

This expression is a generalization of the one we used when we were dealing with a probability distribution in a single variable. It takes some not-so-difficult calculus to do the integral, but we skip it here, and just quote the result:
\begin{displaymath}
\sigma_m^2 = (M^{-1})_{11}.
\end{displaymath} (22)

Likewise the error in $b$ is just
\begin{displaymath}
\sigma_b^2 = (M^{-1})_{22}.
\end{displaymath} (23)

The inverse $M^{-1}$ is called the ``error matrix'' for this reason. Thus the diagonal matrix elements of $M^{-1}$ give the variance of the best fit parameters. The off-diagonal elements contain information about correlations between the best fit parameter values, as we discuss in Sec. 5.

Written explicitly, we have

$\displaystyle \sigma_m^2$ $\textstyle =$ $\displaystyle \frac{\sum 1 }{\sum x^2 \sum 1 - \left(\sum x\right)^2}$ (24)
$\displaystyle \sigma_b^2$ $\textstyle =$ $\displaystyle \frac{\sum x^2}{\sum x^2 \sum 1 - \left(\sum x\right)^2}.$ (25)

This is an important result, since it allows us to assign confidence ranges for the best fit parameters.


next up previous
Next: Goodness of Fit Up: curve_fit Previous: Linear Least Squares
Carleton DeTar 2009-11-23