Linear Least Squares

This section describes the theory and practice of linear least squares, also called linear regression. This technique is very often used to extract the best estimates of slope and intercept parameters from a set of $(x,y)$ data points with error $\sigma$.

Suppose we have made a series of measurements, giving $N$ data points, each consisting of a triple $(x_i,y_i,\sigma_i)$, where $\sigma_i$ is the standard deviation of the measurement $y_i$. Suppose further that we expect that the relationship between $x$ and $y$ is given by the expression

\begin{displaymath}
y = mx + b
\end{displaymath} (1)

where $m$ is the slope and $b$ is the intercept. Suppose, finally, that we want to determine the values of $m$ and $b$ from the data. We do this by adjusting $b$ and $m$ to minimize the difference between the measurement and the model. The difference is expressed through
\begin{displaymath}
\chi^2(m,b) = \sum_{i=1}^{N}\frac{(y_i - mx_i - b)^2}{\sigma_i^2}.
\end{displaymath} (2)

We will give some justification for this formula in the next section. Notice that this procedure tends to make the differences between the $y_i$ values, i.e. the “observed” values, and the $mx_i + b$ values, i.e. the “predicted” values, as small as possible. Note also, that the weighting of the points is larger if the standard deviation is smaller. This makes sense. We want the more certain measurements to have a bigger influence on the agreement between the model and the data.

The minimum of $\chi^2(m,b)$ occurs when

\begin{displaymath}
\partial \chi^2/\partial m = 0 \ \ \ \ \partial \chi^2/\partial b = 0
\end{displaymath} (3)

The derivatives can be evaluated easily if we expand the square in Eq (2) and rearrange the terms to expose the dependence on the fit parameters $m$ and $b$. The result can be written in a compact notation as
\begin{displaymath}
\chi^2(m,b) = m^2 \sum x^2 + 2mb
\sum x + b^2 \sum 1 - 2m\sum xy - 2b\sum y + \sum y^2,
\end{displaymath} (4)

where we have defined
\begin{displaymath}
\begin{array}{ccc}
\sum x^2 = \sum_{i=1}^{N} x_i^2/\sigma_i^...
...igma_i^2 &
\sum y^2 = \sum_{i=1}^N y_i^2/\sigma_i^2
\end{array}\end{displaymath} (5)

Please note that in all cases the terms in the sum have $\sigma_i^2$ in the denominator.

Equation (4) is called a quadratic form. It is a generalization of a quadratic to more than one variable. Here the variables are $m$ and $b$. It happens to be a “positive-definite” quadratic form, which means that it has a minimum. The minimum gives our best fit value of the slope $m$ and intercept $b$. Call this point $m = m^*$ and $b = b^*$. Then it increases in all directions as we vary $m$ and $b$ away from this point. Think of the two variables $m$ and $b$ as defining a plane and think of $\chi ^2$ as defining the height of a surface above the plane. The surface then looks like a bowl. The contours of constant elevation are ellipses in $m$ and $b$.

Our next task is to locate the bottom of the bowl. We do this by setting both partial derivatives of $\chi ^2$ to zero:

$\displaystyle m^*\sum x^2 + b^*\sum x$ $\textstyle =$ $\displaystyle \sum xy$ (6)
$\displaystyle m^*\sum x + b^*\sum 1$ $\textstyle =$ $\displaystyle \sum y.$ (7)

This is a simple linear system of the form
\begin{displaymath}
\left(
\begin{array}{cc}
\sum x^2 & \sum x \\
\sum x & \sum...
...ft(
\begin{array}{c}
\sum xy \\
\sum y \\
\end{array}\right)
\end{displaymath} (8)

or, in compact matrix times vector notation
\begin{displaymath}
M \vec {a^*} = \vec c.
\end{displaymath} (9)

where the symmetric matrix $M$ is
\begin{displaymath}
M = \left(\begin{array}{cc} \sum x^2 & \sum x \\ \sum x & \sum 1 \end{array}\right) \, .
\end{displaymath} (10)

and the vector $\vec c$ is
\begin{displaymath}
\vec c = \left(\begin{array}{c} \sum xy \\ \sum y
\end{array}\right)
\end{displaymath} (11)

The matrix $M$ is twice the so-called “Hessian” matrix for the function $\chi^2(a_1 = m,a_2 = b)$. (The Hessian matrix is the matrix of second partial derivatives, $\partial^{2}\chi^2/\partial a_{i}
\partial a_{j}$.)

The solutions are, in compact matrix form

\begin{displaymath}
\vec{a^*} = M^{-1}\vec c
\end{displaymath} (12)

or, more explicitly,
\begin{displaymath}
\left(\begin{array}{l} m^*\\ b^*\end{array} \right) =
M^{-...
...ft(\begin{array}{l} \sigma xy \\ \sigma y \end{array} \right).
\end{displaymath} (13)

where $M^{-1}$ is the inverse of the matrix $M$, namely,
\begin{displaymath}
M^{-1} = \frac{1}{\sum x^2 \sum 1 - \left(\sum x\right)^2}
...
...{cc} \sum 1 & -\sum x \\ -\sum x & \sum x^2 \end{array}\right)
\end{displaymath} (14)

In terms of components, the solution is
$\displaystyle m^*$ $\textstyle =$ $\displaystyle \frac{\sum xy \sum 1 - \sum x \sum y}
{\sum x^2 \sum 1 - \left(\sum x\right)^2}$ (15)
$\displaystyle b^*$ $\textstyle =$ $\displaystyle \frac{\sum x^2 \sum y - \sum x \sum xy}
{\sum x^2 \sum 1 - \left(\sum x\right)^2}.$ (16)

This is the main result.

A quadratic

\begin{displaymath}
y = f(x) = Ax^2 + Bx + C
\end{displaymath} (17)

with positive $A$ has a minimum at $x^* = -B/(2A)$. It can be written as
\begin{displaymath}
y = A \Delta x^2 + y_0
\end{displaymath} (18)

where $\Delta x = x = x^*$ and $y_0$ is the minimum value, $y_0 =
f(x^*)$.

Likewise, the quadratic form in Eq. (4) can be written as

\begin{displaymath}
\chi^2(m,b) = \chi^2_0 + (\Delta m)^2\sum x^2 + 2 (\Delta m)
(\Delta b)\sum x + (\Delta b)^2\sum 1 .
\end{displaymath} (19)

where $\Delta m = m - m^*$, $\Delta b = b - b^*$, and $\chi^2_0
= \chi^2(m^*,b^*)$ is the minimum value of chisquare. If we think of this as a function of the coordinates $\Delta m$ and $\Delta b$, then this is the equation for an ellipse centered at the origin of those coordinates, but the major and minor axes of the ellipse are tilted if $\sum x$ is nonzero (so the cross term $(\Delta m)(\Delta b)$ is nonzero.)