Goodness of Fit

The value of at the minimum gives a measure of how well the straight line actually fits the data. To see this, consider the meaning of each term in Eq (2). The numerator is the square of the difference between the measured value, given by and the ideal straight line value, given by . If the discrepancy between these two values is due only to statistical scatter according to the assumed Gaussian distribution, then we would expect that most of the time the numerator would be roughly the same size as the denominator. In other words, we would expect that the average term in the expression for would be just , and the total would be roughly . Well, actually, we would expect to do just a bit better than an average of , since the act of optimizing the value of and gives us an advantage. Clearly, if there were only two points, then we could always arrange for the straight line to go through them exactly, giving . In this case the advantage is everything. But the more points there are, the harder it will be to get a straight line, unless the data really suggest a straight line, and the more we expect the value of to come close to . The key concept here is the number of “degrees of freedom” (df). It is defined as the number of independent data points minus the number of fitting parameters.

 (32)

In our case it is . So suppose we evaluate and get an answer that is not . What does it mean? If the answer is much bigger than , we might be tempted to say that the measured points differ from a straight line by much more than would be expected from the known errors . Therefore the data would not justify the assumption that the ideal values lie on a straight line. If the value of is much less than we might be tempted to say we overestimated our errors . But even though the average value ought to come out around , we must remember we are dealing with statistics here, so statistical fluctuations could well be the cause of the discrepancy between the actual and .

The concepts introduced in the previous paragraphs are formalized in the analysis of goodness of fit. The question we are asking can be phrased in probabilistic terms: Given a set of variables that are distributed according to the multivariate Gaussian distribution of Eq. (20), what is the probability that , computed according to Eq. (2), has a value in the range ? The answer is just the integral:

 (33)

A change of variables to

gives
 (34)

We can think of the integration variables as defining components of a vector . The delta function requires that the square of the length of the vector be just . In fact in view of the delta function, the exponential can be replaced by and the remaining integration just gives the surface area of a sphere of radius in an N-dimensional space. The final result is
 (35)

This is called the distribution for degrees of freedom (df). When we are fitting points to a line with two adjustable parameters, we must substitute degrees of freedom in place of in this formula to correct for the bias we discussed above.

We now return to the question we asked at the beginning of this section. Suppose we minimized and found a value . Is it a good fit? Stated in more precise statistical language, we ask what is the probability that we could have gotten a value as large or larger than as a result of chance, based on the probability Eq. (35). If this probability is too small, i.e. such a large value is unlikely, we might suspect that the fit is bad. This probability is related to an integral:

 (36)

The integral gives the chance of exceeding with degrees of freedom. This probability is sometimes called the “confidence level” or “-value”. It is plotted in the graph in Fig. 1. The graph is based on two numbers, namely and the number of degrees of freedom. (The graph uses in place of ). To read the graph, select the curve that corresponds to . Then locate the value of on the top or bottom, and find where the curve crosses the vertical line corresponding to . Read the confidence level from the axis on the left. The confidence level is the probability that the observed value of could be equaled or exceeded by merely random fluctuations. If this probability is low, then we could reject the straight line theory with confidence. For example, suppose we had points () and got , much bigger than we would have expected. The graph gives a confidence level of 0.01 in this case. That means that such a large value of would be expected to occur as a result of random fluctuations only 1% of the time. On the other hand if we had , the confidence level would be , so such a large fluctuation would be expected about of the time.

The confidence level graph is based on the assumption that the probability distribution is Gaussian as stated. If the probability distribution is different or there are correlations among the measurements, we can't use this graph.