The value of at the minimum gives a measure of how well the
straight line actually fits the data. To see this, consider the
meaning of each term in Eq (2). The numerator is the
square of the difference between the measured value, given by
and the ideal straight line value, given by . If the
discrepancy between these two values is due only to statistical
scatter according to the assumed Gaussian distribution, then we would
expect that most of the time the numerator would be roughly the same
size as the denominator. In other words, we would expect that the
average term in the expression for would be just , and the
total would be roughly . Well, actually, we would expect to do
just a bit better than an average of , since the act of optimizing
the value of and gives us an advantage. Clearly, if there
were only two points, then we could always arrange for the straight
line to go through them exactly, giving . In this case
the advantage is everything. But the more points there are, the
harder it will be to get a straight line, unless the data really
suggest a straight line, and the more we expect the value of
to come close to . The key concept here is the number
of “degrees of freedom” (df). It is defined as the number of independent
data points minus the number of fitting parameters.

(32) |

The concepts introduced in the previous paragraphs are formalized in
the analysis of goodness of fit. The question we are asking can be
phrased in probabilistic terms: Given a set of variables that are
distributed according to the multivariate Gaussian distribution of
Eq. (20), what is the probability
that , computed according to
Eq. (2), has a value in the range
? The answer is just the integral:

(33) |

A change of variables to

gives

(34) |

We can think of the integration variables as defining components of a vector . The delta function requires that the square of the length of the vector be just . In fact in view of the delta function, the exponential can be replaced by and the remaining integration just gives the surface area of a sphere of radius in an N-dimensional space. The final result is

This is called the distribution for degrees of freedom (df). When we are fitting points to a line with two adjustable parameters, we must substitute degrees of freedom in place of in this formula to correct for the bias we discussed above.

We now return to the question we asked at the beginning of this
section. Suppose we minimized and found a value
. Is it a good fit? Stated in more precise statistical
language, we ask what is the probability that we could have gotten a
value *as large or larger than* as a result of
chance, based on the probability Eq. (35). If
this probability is too small, i.e. such a large value is unlikely,
we might suspect that the fit is bad. This probability is related to
an integral:

(36) |

The confidence level graph is based on the assumption that the probability distribution is Gaussian as stated. If the probability distribution is different or there are correlations among the measurements, we can't use this graph.