Fall Semester 2000
This problem is an introduction to linear fitting.
File ~p5720/data/f16.dat gives airspeed "v" versus the force of air resistance "f" for a single-engine jet flying at 40,000 ft above sea level. (The data are very crudely derived from the performance specifications of an F-16.) The airspeed is known precisely while the fractional errors in the force are at the 2.5% level (sigma_f/f=0.025).
A model for the air resistance force as a function of airspeed is
f = C v^beta.
Use "sm" to perform a linear least squares fit of this model to
the flight data. Obtain from sm's fit the following information:
# best-fit beta: XXX +/- XXX # best-fit C: XXX # fractional error in C: XXX # chi-square: XXX # Are the data and model consistent? Y/NWrite an sm (text) script file f16.sm to get your answers. Put them as comments at the end of your script using the format given above. Note that comments in sm are preceded by the hash (#) symbol. Hints: Linearize the model. You then may translate the fractional errors in the force to absolute errors in the linear fit. (You are just "propagating errors here. Think of sigma_x of some variate x as being a small finite change in x. With this in mind, a helpful relation might then be that delta_x/x is approximately dx/x=dln(x). This holds very well when delta_x is small compared to x.) Finally, as mentioned in lecture, you can get a good sense of whether a model fits some data by looking at chi^2
chi^2 = Sum_{i=1 to N) ((y_i - model(x_i))/sigma_i)^2
where N is the number of data points (x_i,y_i), and model(x) is the
fitting function (in this case a line). If chi^2 is about equal to
number of "degrees of freedom" nu=N-M (where M is the number of model
parameters -- it is 2 here) then the fit is good. If chi^2 is
much larger, worry about a bad fit.
Submit only your script f16.sm. You need not generate any plots for this exercise (although it is often a good idea to check your fit graphically as well as numerically).
The Cosmic Microwave Background is a thermal bath of photons originating from the hot plasma which permeated the Universe at early times (less than roughly 10^5 years after the Big Bang). The photon bath cooled with the expansion of the Universe and is observed today to have a blackbody temperature of only a few Kelvin.
There are three data points, obtained by NASA's Cosmic Background Explorer satellite, in the file /u/course/p5720/data/cmb.dat. These points show photon frequency versus the observed flux of the microwave radiation -- there are three columns in the datafile, frequency, flux and flux errors. Note that these points represent a very small sample of the spectrum measured by COBE.
On the basis of these data points, what is the blackbody temperature of the cosmic photon bath?
To answer this question, use "gnuplot" to perform a nonlinear fit to the data with a Planck function (which describes a blackbody spectrum),
flux(x) = C * (x)**3 / (exp(h*x/(k*T)) - 1)
where "x" is the frequency, "h" is the Planck constant, and "k" is the
Boltzmann constant; the temperature T is a fitting parameter --
you are after its best-fit value -- and C is a normalization
constant. Although you needn't care about the value of C, you must
still use it as a fitting parameter.
Create a gnuplot (text) script in file cmb.gpl which contains instructions to perform the fit. At the end of that file, please add comments with your answers, using this format:
best-fit BB temperature (K): XXX +/- XXX chi-square: XXX Does the Planck function fit the data? Y/NBe sure to incorporate the flux errors in the fitting process. Your final answer should be in units of Kelvin.
Hints: This exercise can be answered with two lines of gnuplot instructions. Start up gnuplot and type ?fit to see how. Be careful with units. If you convert everything to cgs, you might need to multiply by an overall factor (thereby rescaling C) so that the fitting routine does not explode.
More Hints: As mentioned in the previous exercise, an "Ok" fit to data gives a chi-square value around nu, the number of degrees of freedom. You can assess this directly from the gnuplot output by noting that "WSSR" is the error-weighted sum of square residuals -- this identically chi-square as defined above.
Submit your gnuplot script cmb.gpl. As in the previous exercise, do not worry about producing plots.
Here we deal with generalized linear least squares fitting.
A file called ~p5720/data/grtest.dat has fictitious data from a pair of black holes in orbit around each other. Einstein's theory of General Relativity (GR) predicts a fit to this data according to the model
y = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3
where x and y are given in the datafile (along with errors in y)
and the coefficients a_i depend on Newton's gravitational constant,
black hole masses, and orbital parameters (among other things, possibly).
Note: this a linear model -- it is linear in the coefficients, a_i. Each a_i is multiplied by a "basis function", fn(x). Here, fn_i(x) [multiplying the coefficient a_i] is just x^i. In principle, the fn_i could be any function of x.
There are other competing descriptions of gravity such as the Brans-Dicke theory (see Misner, Thorne & Wheeler, Gravitation). Often these are extensions of GR which incorporate one or more extra parameters; in the limit of these parameters "going to zero", such theories become identical to GR.
Suppose an alternative (non-GR) model "GR++" predicts that the orbital data for the black-hole pair are dscribed by
y = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3 + a_4 * x^4.
The problem here is to find the
best-fit parameters a_i for both (GR and non-GR) models
and provide the following:
best-fit value of a_0 from GR: XXX+/-XXX best-fit value of a_0 from GR++: XXX+/-XXX chi2 from GR: XXX chi2 from GR++: XXX Goodness-of-fit q from GR: XXX Goodness-of-fit q from GR++: XXX best-pick model: GR/GR++ Reason for this choice: ...Thus, you are to give the best-fit value of the constant term in both models, chi2 values, Goodness-of-fit values (see NR's Sections 14.3 (follow discussion on the "chi-square probability function") and 15.1 (subheading: "Chi-Square Fitting"). On the basis of the Goodness of Fit choose one model or the other as your "best-pick", the one that you think best describes the data. Give a few sentences justifying your answer.
In coming up with the best-pick model, apply a sharp "Occam's razor": Cut out all but the simplest of models, unless otherwise warranted by the data.
The fit you are being asked to perform here is a generalize linear least squares (GLLS) procedure ("linear" because the models are linear in the fitting parameters, and "general" becuase there can be an arbitrary number of parameters). Unfortunately, neither gnuplot nor sm can do GLLS fits (although maple or matlab can). In a pinch, you could use gnuplot's nonlinear fit utility, but this is not a good idea. (Nonlinear fits can be slow and do not handle cases where the data do not contain enough information to perform a reasonable fit; try fitting y=a*x^2+b*x+c to 2 points.) To perform the fits with the GR and GR++ models, you may take advantage of available high-quality codes written in Fortran, C, or C++. Here use a code called linfit.cc, based on NR's svdfit() function (Sect. 15.4). (Linfit.cc is roughly a "front-end" for svdfit(), hopefully being a little easier to work with.) It is stand-alone C++ code so that you can copy it to your working directory -- you need not (indeed, should not) modify the code for this excerise. In the file linfit.cc there are two function definition, linfit(), and lingof(). Linfit() does the fitting, calculates the best-fit coefficients plus their errors/uncertainties, given data (x,y,sig) [sig is error in y values, e.g., y+/-sig]. Lingof() takes a chi^2 value and the number of degrees of freedom (data points minus number of fitting parameters) and determines Q, the "goodness of fit". Formally, Q is the likelihood that chi^2 could have a worse (larger) value given the hypothesis that the fitting model is valid. If Q=1 (chi^2=0), the fit is perfect -- all values of chi^2 likely to be worse than this. If Q=some very small number, e.g., 10^-15, then virtually all chi^2 values are expected to be better (smaller) than this, and the data are highly unlikely (1 in 10^15 chance) to have come from the model. NR recommends using a threshold of Q=0.001; for larger Q, the model is an acceptable fit to the data; for smaller Q reject the model as being consistent with the data.
To use linfit() and lingof(), you should write a code grtest.cc which calls them. The method of calling these functions is described in some detail in the comments at the top of linfit.cc. Use your local copy of linfit.cc to link with grtest.cc -- do not link to the p5720 library. One last detail: have your code read in the data grtest.dat from standard input (use "cin"). When you run the code, redirect either your own copy of the datafile to stdin, or use the version in ~p5720/data.
Note that the fitting method is very sturdy, employing an SVD-based algorithm as described in Section 15.4 of Numerical Recipes. Remember that you can fit anything you want with this type of code so long as the fitting function is linear in the coefficients a[i]. Otherwise, the basis functions fx[i] can be anything you want!
Submit your code grtest.cc.