Linear Least Squares

This section describes the theory and practice of linear least squares, also called linear regression. This technique is very often used to extract the best estimates of slope and intercept parameters from a set of data points with error .

Suppose we have made a series of measurements, giving data points, each consisting of a triple , where is the standard deviation of the measurement . Suppose further that we expect that the relationship between and is given by the expression

 (1)

where is the slope and is the intercept. Suppose, finally, that we want to determine the values of and from the data. We do this by adjusting and to minimize the difference between the measurement and the model. The difference is expressed through
 (2)

We will give some justification for this formula in the next section. Notice that this procedure tends to make the differences between the values, i.e. the “observed” values, and the values, i.e. the “predicted” values, as small as possible. Note also, that the weighting of the points is larger if the standard deviation is smaller. This makes sense. We want the more certain measurements to have a bigger influence on the agreement between the model and the data.

The minimum of occurs when

 (3)

The derivatives can be evaluated easily if we expand the square in Eq (2) and rearrange the terms to expose the dependence on the fit parameters and . The result can be written in a compact notation as
 (4)

where we have defined
 (5)

Please note that in all cases the terms in the sum have in the denominator.

Equation (4) is called a quadratic form. It is a generalization of a quadratic to more than one variable. Here the variables are and . It happens to be a “positive-definite” quadratic form, which means that it has a minimum. The minimum gives our best fit value of the slope and intercept . Call this point and . Then it increases in all directions as we vary and away from this point. Think of the two variables and as defining a plane and think of as defining the height of a surface above the plane. The surface then looks like a bowl. The contours of constant elevation are ellipses in and .

Our next task is to locate the bottom of the bowl. We do this by setting both partial derivatives of to zero:

 (6) (7)

This is a simple linear system of the form
 (8)

or, in compact matrix times vector notation
 (9)

where the symmetric matrix is
 (10)

and the vector is
 (11)

The matrix is twice the so-called “Hessian” matrix for the function . (The Hessian matrix is the matrix of second partial derivatives, .)

The solutions are, in compact matrix form

 (12)

or, more explicitly,
 (13)

where is the inverse of the matrix , namely,
 (14)

In terms of components, the solution is
 (15) (16)

This is the main result.