next up previous
Next: Maximum Likelihood and Chi Up: curve_fit Previous: curve_fit

Linear Least Squares

This section describes the theory and practice of linear least squares, also called linear regression. This technique is very often used to extract the best estimates of slope and intercept parameters from a set of $(x,y)$ data points with error $\sigma$.

Suppose we have made a series of measurements, giving $N$ data points, each consisting of a triple $(x_i,y_i,\sigma_i)$, where $\sigma_i$ is the standard deviation of the measurement $y_i$. Suppose further that we expect that the relationship between $x$ and $y$ is given by the expression

\begin{displaymath}
y = mx + b
\end{displaymath} (1)

where $m$ is the slope and $b$ is the intercept. Suppose, finally, that we want to determine the values of $m$ and $b$ from the data. We do this by adjusting $b$ and $m$ to minimize the difference between the measurement and the model. The difference is expressed through
\begin{displaymath}
\chi^2(m,b) = \sum_{i=1}^{N}\frac{(y_i - mx_i - b)^2}{\sigma_i^2}.
\end{displaymath} (2)

We will give some justification for this formula in the next section. Notice that this procedure tends to make the differences between the $y_i$ values, i.e. the ``observed'' values, and the $mx_i + b$ values, i.e. the ``predicted'' values, as small as possible. Note also, that the weighting of the points is larger if the standard deviation is smaller.

The minimum of $\chi^2(m,b)$ occurs when

\begin{displaymath}
\partial \chi^2/\partial m = 0     \partial \chi^2/\partial b = 0
\end{displaymath} (3)

The derivatives can be evaluated easily if we rewrite Eq (2) in a compact notation as
\begin{displaymath}
\chi^2(m,b) = m^2 \sum x^2 + 2mb
\sum x + b^2 \sum 1 - 2m\sum xy - 2b\sum y + \sum y^2,
\end{displaymath} (4)

where we have defined
\begin{displaymath}
\sum x^2 = \sum_{i=1}^{N} x_i^2/\sigma_i^2,
     \sum 1 = \sum_{i=1}^N 1/\sigma_i^2,     \mbox{etc.}
\end{displaymath} (5)

In this compact notation the weighted mean value of $x$ is $\bar x$ = $\sum x/\sum 1$, for example.

Equation (4) is called a quadratic form. It is a generalization of a quadratic to more than one variable. It happens to be a ``positive-definite'' quadratic form, which means that it has a minimum. Call this point $m = m^*$ and $b = b^*$. Then it increases in all directions as we vary $m$ and $b$ away from this point. Think of the two variables $m$ and $b$ as defining a plane and think of $\chi ^2$ as defining the height of a surface above the plane. The surface then looks like a bowl. The contours of constant elevation are ellipses in $m$ and $b$.

Our next task is to locate the bottom of the bowl. We do this by setting both partial derivatives of $\chi ^2$ to zero:

$\displaystyle m^*\sum x^2 + b^*\sum x$ $\textstyle =$ $\displaystyle \sum xy$ (6)
$\displaystyle m^*\sum x + b^*\sum 1$ $\textstyle =$ $\displaystyle \sum y.$ (7)

This is a simple linear system of the form
\begin{displaymath}
M \vec {a^*} = \vec c.
\end{displaymath} (8)

where the vector $\vec {a^*}$ has components $m^*$ and $b^*$, the vector $\vec c$ is
\begin{displaymath}
\vec c = \left(\begin{array}{c}\sum xy  \sum y\end{array}\right)
\end{displaymath} (9)

and the symmetric matrix $M$ is
\begin{displaymath}
M = \left(\begin{array}{cc} \sum x^2 & \sum x  \sum x & \sum 1 \end{array}\right)
\end{displaymath} (10)

The matrix $M$ is twice the ``Hessian'' matrix for the function $\chi(a_1 = m,a_2 = b)$. (The Hessian matrix is the matrix of second partial derivatives, $\partial^{2}\chi^2/\partial a_{i} \partial
a_{j}$.)

The solutions are, in compact matrix form

\begin{displaymath}
\vec{a^*} = M^{-1}\vec c
\end{displaymath} (11)

or, more explicitly,
\begin{displaymath}
\left(\begin{array}{l} m^* b^*\end{array} \right) =
M^{-1}\left(\begin{array}{l} c_1  c_2 \end{array} \right).
\end{displaymath} (12)

where $M^{-1}$ is the inverse of the matrix $M$, namely,
\begin{displaymath}
M^{-1} = \frac{1}{\sum x^2 \sum 1 - \left(\sum x\right)^2}
...
...{cc} \sum 1 & -\sum x  -\sum x & \sum x^2 \end{array}\right)
\end{displaymath} (13)

In terms of components, the solution is
$\displaystyle m^*$ $\textstyle =$ $\displaystyle \frac{\sum xy \sum 1 - \sum x \sum y}
{\sum x^2 \sum 1 - \left(\sum x\right)^2}$ (14)
$\displaystyle b^*$ $\textstyle =$ $\displaystyle \frac{\sum x^2 \sum y - \sum x \sum xy}
{\sum x^2 \sum 1 - \left(\sum x\right)^2}.$ (15)

This is the main result.

At the minimum the value of $\chi ^2$ is easily found to be

\begin{displaymath}
\chi^2_0 = \chi^2(m^*,b^*) = \sum y^2 - \vec{a^*}\cdot \vec c
\end{displaymath} (16)

If we write $\Delta m = m - m^*$ and $\Delta b = b - b^*$, it is easy to show that (2) can be written simply as
\begin{displaymath}
\chi^2(m,b) = \chi^2_0 + (\Delta m)^2\sum x^2 + 2 (\Delta m)
(\Delta b)\sum x + (\Delta b)^2\sum 1 .
\end{displaymath} (17)


next up previous
Next: Maximum Likelihood and Chi Up: curve_fit Previous: curve_fit
Carleton DeTar 2009-11-23