Next: Maximum Likelihood and Chi
Up: curve_fit
Previous: curve_fit
This section describes the theory and
practice of linear least squares, also called linear regression. This
technique is very often used to extract the best estimates of slope
and intercept parameters from a set of
data points with error
.
Suppose we have made a series of measurements, giving
data points,
each consisting of a triple
, where
is
the standard deviation of the measurement
. Suppose further that
we expect that the relationship between
and
is given by the
expression
 |
(1) |
where
is the slope and
is the intercept. Suppose, finally,
that we want to determine the values of
and
from the data. We
do this by adjusting
and
to minimize the difference between
the measurement and the model. The difference is expressed through
 |
(2) |
We will give some justification for this formula in the next section.
Notice that this procedure tends to make the differences between the
values, i.e. the ``observed'' values, and the
values, i.e. the ``predicted'' values, as small as possible. Note
also, that the weighting of the points is larger if the standard
deviation is smaller.
The minimum of
occurs when
 |
(3) |
The derivatives can be evaluated easily if we rewrite Eq
(2) in a compact notation as
 |
(4) |
where we have defined
 |
(5) |
In this compact notation the weighted mean value of
is
=
, for example.
Equation (4) is called a quadratic form. It is a
generalization of a quadratic to more than one variable. It happens
to be a ``positive-definite'' quadratic form, which means that it has
a minimum. Call this point
and
. Then it
increases in all directions as we vary
and
away from this
point. Think of the two variables
and
as defining a plane and
think of
as defining the height of a surface above the plane.
The surface then looks like a bowl. The contours of constant
elevation are ellipses in
and
.
Our next task is to locate the bottom of the bowl. We do this by
setting both partial derivatives of
to zero:
This is a simple linear system of the form
 |
(8) |
where the vector
has components
and
,
the vector
is
 |
(9) |
and the symmetric matrix
is
 |
(10) |
The matrix
is twice the ``Hessian'' matrix for the function
. (The Hessian matrix is the matrix of second
partial derivatives,
.)
The solutions are, in compact matrix form
 |
(11) |
or, more explicitly,
 |
(12) |
where
is the inverse of the matrix
, namely,
 |
(13) |
In terms of components, the solution is
This is the main result.
At the minimum the value of
is easily found to be
 |
(16) |
If we write
and
, it is
easy to show that (2) can be written simply as
 |
(17) |
Next: Maximum Likelihood and Chi
Up: curve_fit
Previous: curve_fit
Carleton DeTar
2009-11-23